01:25:10 pabs: denada :) 02:23:38 hi there, is --level 2 a good option for crawling a wordpress site from 2014 06:47:49 Exorcism edited Bugzilla (+0, /* Status */): https://wiki.archiveteam.org/?diff=52976&oldid=52975 06:56:09 lemuria: which program are you using? #archivebot might be a good choice instead to get it in the wayback machine (and you can grab the warcs after) 08:10:24 grab-site, i forgot to say, fireonlive 08:11:09 the archive was kinda OK-ish but fonts were missing and one of the images too, is that normal when grabbing sites? 08:23:05 Exorcism edited Bugzilla (+0, /* Status */): https://wiki.archiveteam.org/?diff=52977&oldid=52976 08:25:06 Exorcism edited Bugzilla (+0, /* Status */): https://wiki.archiveteam.org/?diff=52978&oldid=52977 08:42:09 Exorcism edited Bugzilla (+0, /* Status */): https://wiki.archiveteam.org/?diff=52979&oldid=52978 08:46:09 Exorcism edited Bugzilla (+0, /* Status */): https://wiki.archiveteam.org/?diff=52980&oldid=52979 08:47:10 Exorcism edited Bugzilla (+0, /* Status */): https://wiki.archiveteam.org/?diff=52981&oldid=52980 08:58:11 Exorcism edited Bugzilla (+0, /* Status */ aborted): https://wiki.archiveteam.org/?diff=52982&oldid=52981 09:00:12 Exorcism edited Bugzilla (+0, /* Status */): https://wiki.archiveteam.org/?diff=52983&oldid=52982 09:02:55 asie, nullpeta, c3manu: as expected, my bruteforce did not turn up any sites not found in https://asie.pl/files/hp_vector_urls_20161012_plus.txt (except 403s) 09:03:12 Exorcism edited Bugzilla (+0, /* Status */): https://wiki.archiveteam.org/?diff=52984&oldid=52983 09:03:14 i can produce a list of the 403s if that's wanted but idk how useful it would be 09:04:32 also, c3manu, how did you run that list? `!ao <` or `!a <`? 09:08:21 !ao and !a < 09:08:45 ty! 09:08:49 np :) 09:09:10 depends on what causes the 403s. if AB can someone get around that, it would be pretty useful ^^ 09:10:20 i doubt it. my (100% speculative) guess is that they're sites set to private in some way by their authors 09:10:37 what site are we investigating the 403s for 09:10:41 multiple domains? 09:10:45 hp.vector.co.jp 09:14:12 personally i'd really like to know more about the non-'VA\d{6}' authors. given that there were only two in the 2016 directory and that that directory was 99.9% complete, we're probably not missing much in that regard, but maybe somebody can grep CC/IA CDX and see if anything turns up? 09:14:14 Exorcism edited Bugzilla (+0, /* Status */): https://wiki.archiveteam.org/?diff=52985&oldid=52984 09:38:19 Exorcism edited Bugzilla (+28, /* Status */): https://wiki.archiveteam.org/?diff=52986&oldid=52985 09:46:20 Exorcism edited Bugzilla (+0, /* Status */): https://wiki.archiveteam.org/?diff=52987&oldid=52986 09:46:21 Exorcism edited Bugzilla (+0, /* Status */): https://wiki.archiveteam.org/?diff=52988&oldid=52987 09:50:21 Exorcism edited Bugzilla (+0, /* Status */): https://wiki.archiveteam.org/?diff=52989&oldid=52988 09:52:21 Exorcism edited Bugzilla (+0, /* Status */): https://wiki.archiveteam.org/?diff=52990&oldid=52989 09:54:21 Exorcism edited Bugzilla (+0, /* Status */): https://wiki.archiveteam.org/?diff=52991&oldid=52990 10:00:22 Exorcism edited Bugzilla (+0, /* Status */): https://wiki.archiveteam.org/?diff=52992&oldid=52991 10:03:23 Exorcism edited Bugzilla (+0, /* Status */): https://wiki.archiveteam.org/?diff=52993&oldid=52992 10:03:24 Exorcism edited Bugzilla (+0, /* Status */): https://wiki.archiveteam.org/?diff=52994&oldid=52993 10:05:23 Exorcism edited Bugzilla (+0, /* Status */): https://wiki.archiveteam.org/?diff=52995&oldid=52994 10:21:26 Exorcism edited Bugzilla (+0, /* Status */): https://wiki.archiveteam.org/?diff=52996&oldid=52995 10:22:26 Exorcism edited Bugzilla (+0, /* Status */): https://wiki.archiveteam.org/?diff=52997&oldid=52996 10:22:27 Exorcism edited Bugzilla (+0, /* Status */): https://wiki.archiveteam.org/?diff=52998&oldid=52997 10:23:26 Exorcism edited Bugzilla (+0, /* Status */): https://wiki.archiveteam.org/?diff=52999&oldid=52998 10:31:27 Exorcism edited Bugzilla (+0, /* Status */): https://wiki.archiveteam.org/?diff=53000&oldid=52999 10:53:31 Exorcism edited Bugzilla (+0, /* Status */): https://wiki.archiveteam.org/?diff=53001&oldid=53000 13:04:59 Exorcism edited Bugzilla (+18, /* Status */): https://wiki.archiveteam.org/?diff=53002&oldid=53001 13:05:59 PaulWise edited SmolNet (+345, add link to mercury protocl doc, mention…): https://wiki.archiveteam.org/?diff=53003&oldid=52708 13:17:01 Exorcism edited Bugzilla (+27, /* Archived */): https://wiki.archiveteam.org/?diff=53004&oldid=53002 13:46:15 Could someone archive https://hlelo101.github.io/ with archivebot please (no coverage 13:54:07 Exorcism edited Bugzilla (+38, /* Archived */): https://wiki.archiveteam.org/?diff=53005&oldid=53004 14:04:53 (That's been handled in #archivebot since.) 14:10:10 Exorcism edited Bugzilla (+34, /* Archived */): https://wiki.archiveteam.org/?diff=53006&oldid=53005 14:14:11 Exorcism edited Bugzilla (+36, /* Archived */): https://wiki.archiveteam.org/?diff=53007&oldid=53006 14:18:11 Exorcism edited Bugzilla (+39, /* Archived */): https://wiki.archiveteam.org/?diff=53008&oldid=53007 14:23:12 Exorcism edited Bugzilla (+40, /* Archived */): https://wiki.archiveteam.org/?diff=53009&oldid=53008 14:33:14 Exorcism edited Bugzilla (+45, /* Archived */): https://wiki.archiveteam.org/?diff=53010&oldid=53009 14:38:15 Exorcism edited Bugzilla (+34, /* Archived */): https://wiki.archiveteam.org/?diff=53011&oldid=53010 16:20:45 JAA: what is sense even :P 16:35:35 Exorcism edited Bugzilla (+0, /* Status */): https://wiki.archiveteam.org/?diff=53012&oldid=53011 17:01:40 Bzc6p edited Demotivalo.net (-28, /* Sister sites */ kommenthuszar.com restored): https://wiki.archiveteam.org/?diff=53013&oldid=50595 18:29:29 HELP VER 18:29:36 (sorry forgot the /) 18:45:07 /) 23:24:24 hi friends! i am wondering if anyone knows specifics about how the "length" field from wayback machine's cdx api is calculated 23:24:53 specifically, I'm seeing a bunch of cases where the digest matches, but the length is different, for example 23:24:57 it,tip)/zenit/natale-6.jpg 20010725190547 http://www.tip.it:80/zenit/natale-6.JPG image/jpeg 200 PO53TOR6WEL4F3CFWDXP3OEUEY7F25NW 29785 23:24:58 it,tip)/zenit/natale-6.jpg 20011230210923 http://www.tip.it:80/zenit/natale-6.JPG image/jpeg 200 PO53TOR6WEL4F3CFWDXP3OEUEY7F25NW 29773 23:26:30 it's not obvious to me why the length would vary if the hash is the same. is the length maybe including some http headers that varied between responses, even though the response payloads were otherwise identical? 23:32:06 ah, I see! the headers are preserved under X-Archive-Orig, and they are indeed different. so I think the length must be the compressed (gzip maybe?) size of the original entire network request, including headers 23:43:15 Wish more people could just answer their questions by looking at the list of users in the channel