-
fireonlivepabs: denada :)
-
lemuriahi there, is --level 2 a good option for crawling a wordpress site from 2014
-
h2ibotExorcism edited Bugzilla (+0, /* Status */): wiki.archiveteam.org/?diff=52976&oldid=52975
-
fireonlivelemuria: which program are you using? #archivebot might be a good choice instead to get it in the wayback machine (and you can grab the warcs after)
-
lemuriagrab-site, i forgot to say, fireonlive
-
lemuriathe archive was kinda OK-ish but fonts were missing and one of the images too, is that normal when grabbing sites?
-
h2ibotExorcism edited Bugzilla (+0, /* Status */): wiki.archiveteam.org/?diff=52977&oldid=52976
-
h2ibotExorcism edited Bugzilla (+0, /* Status */): wiki.archiveteam.org/?diff=52978&oldid=52977
-
h2ibotExorcism edited Bugzilla (+0, /* Status */): wiki.archiveteam.org/?diff=52979&oldid=52978
-
h2ibotExorcism edited Bugzilla (+0, /* Status */): wiki.archiveteam.org/?diff=52980&oldid=52979
-
h2ibotExorcism edited Bugzilla (+0, /* Status */): wiki.archiveteam.org/?diff=52981&oldid=52980
-
h2ibotExorcism edited Bugzilla (+0, /* Status */ aborted): wiki.archiveteam.org/?diff=52982&oldid=52981
-
h2ibotExorcism edited Bugzilla (+0, /* Status */): wiki.archiveteam.org/?diff=52983&oldid=52982
-
thubanasie, nullpeta, c3manu: as expected, my bruteforce did not turn up any sites not found in asie.pl/files/hp_vector_urls_20161012_plus.txt (except 403s)
-
h2ibotExorcism edited Bugzilla (+0, /* Status */): wiki.archiveteam.org/?diff=52984&oldid=52983
-
thubani can produce a list of the 403s if that's wanted but idk how useful it would be
-
thubanalso, c3manu, how did you run that list? `!ao <` or `!a <`?
-
c3manu!ao and !a <
-
thubanty!
-
c3manunp :)
-
c3manudepends on what causes the 403s. if AB can someone get around that, it would be pretty useful ^^
-
thubani doubt it. my (100% speculative) guess is that they're sites set to private in some way by their authors
-
lemuriawhat site are we investigating the 403s for
-
lemuriamultiple domains?
-
thubanhp.vector.co.jp
-
thubanpersonally i'd really like to know more about the non-'VA\d{6}' authors. given that there were only two in the 2016 directory and that that directory was 99.9% complete, we're probably not missing much in that regard, but maybe somebody can grep CC/IA CDX and see if anything turns up?
-
h2ibotExorcism edited Bugzilla (+0, /* Status */): wiki.archiveteam.org/?diff=52985&oldid=52984
-
h2ibotExorcism edited Bugzilla (+28, /* Status */): wiki.archiveteam.org/?diff=52986&oldid=52985
-
h2ibotExorcism edited Bugzilla (+0, /* Status */): wiki.archiveteam.org/?diff=52987&oldid=52986
-
h2ibotExorcism edited Bugzilla (+0, /* Status */): wiki.archiveteam.org/?diff=52988&oldid=52987
-
h2ibotExorcism edited Bugzilla (+0, /* Status */): wiki.archiveteam.org/?diff=52989&oldid=52988
-
h2ibotExorcism edited Bugzilla (+0, /* Status */): wiki.archiveteam.org/?diff=52990&oldid=52989
-
h2ibotExorcism edited Bugzilla (+0, /* Status */): wiki.archiveteam.org/?diff=52991&oldid=52990
-
h2ibotExorcism edited Bugzilla (+0, /* Status */): wiki.archiveteam.org/?diff=52992&oldid=52991
-
h2ibotExorcism edited Bugzilla (+0, /* Status */): wiki.archiveteam.org/?diff=52993&oldid=52992
-
h2ibotExorcism edited Bugzilla (+0, /* Status */): wiki.archiveteam.org/?diff=52994&oldid=52993
-
h2ibotExorcism edited Bugzilla (+0, /* Status */): wiki.archiveteam.org/?diff=52995&oldid=52994
-
h2ibotExorcism edited Bugzilla (+0, /* Status */): wiki.archiveteam.org/?diff=52996&oldid=52995
-
h2ibotExorcism edited Bugzilla (+0, /* Status */): wiki.archiveteam.org/?diff=52997&oldid=52996
-
h2ibotExorcism edited Bugzilla (+0, /* Status */): wiki.archiveteam.org/?diff=52998&oldid=52997
-
h2ibotExorcism edited Bugzilla (+0, /* Status */): wiki.archiveteam.org/?diff=52999&oldid=52998
-
h2ibotExorcism edited Bugzilla (+0, /* Status */): wiki.archiveteam.org/?diff=53000&oldid=52999
-
h2ibotExorcism edited Bugzilla (+0, /* Status */): wiki.archiveteam.org/?diff=53001&oldid=53000
-
h2ibotExorcism edited Bugzilla (+18, /* Status */): wiki.archiveteam.org/?diff=53002&oldid=53001
-
h2ibotPaulWise edited SmolNet (+345, add link to mercury protocl doc, mention…): wiki.archiveteam.org/?diff=53003&oldid=52708
-
h2ibotExorcism edited Bugzilla (+27, /* Archived */): wiki.archiveteam.org/?diff=53004&oldid=53002
-
JaffaCakes118Could someone archive hlelo101.github.io with archivebot please (no coverage
-
h2ibotExorcism edited Bugzilla (+38, /* Archived */): wiki.archiveteam.org/?diff=53005&oldid=53004
-
JAA(That's been handled in #archivebot since.)
-
h2ibotExorcism edited Bugzilla (+34, /* Archived */): wiki.archiveteam.org/?diff=53006&oldid=53005
-
h2ibotExorcism edited Bugzilla (+36, /* Archived */): wiki.archiveteam.org/?diff=53007&oldid=53006
-
h2ibotExorcism edited Bugzilla (+39, /* Archived */): wiki.archiveteam.org/?diff=53008&oldid=53007
-
h2ibotExorcism edited Bugzilla (+40, /* Archived */): wiki.archiveteam.org/?diff=53009&oldid=53008
-
h2ibotExorcism edited Bugzilla (+45, /* Archived */): wiki.archiveteam.org/?diff=53010&oldid=53009
-
h2ibotExorcism edited Bugzilla (+34, /* Archived */): wiki.archiveteam.org/?diff=53011&oldid=53010
-
arkiverJAA: what is sense even :P
-
h2ibotExorcism edited Bugzilla (+0, /* Status */): wiki.archiveteam.org/?diff=53012&oldid=53011
-
h2ibotBzc6p edited Demotivalo.net (-28, /* Sister sites */ kommenthuszar.com restored): wiki.archiveteam.org/?diff=53013&oldid=50595
-
lemuriaHELP VER
-
lemuria(sorry forgot the /)
-
katia/)
-
CookMePloxhi friends! i am wondering if anyone knows specifics about how the "length" field from wayback machine's cdx api is calculated
-
CookMePloxspecifically, I'm seeing a bunch of cases where the digest matches, but the length is different, for example
-
CookMePloxit,tip)/zenit/natale-6.jpg 20010725190547 tip.it:80/zenit/natale-6.JPG image/jpeg 200 PO53TOR6WEL4F3CFWDXP3OEUEY7F25NW 29785
-
CookMePloxit,tip)/zenit/natale-6.jpg 20011230210923 tip.it:80/zenit/natale-6.JPG image/jpeg 200 PO53TOR6WEL4F3CFWDXP3OEUEY7F25NW 29773
-
CookMePloxit's not obvious to me why the length would vary if the hash is the same. is the length maybe including some http headers that varied between responses, even though the response payloads were otherwise identical?
-
CookMePloxah, I see! the headers are preserved under X-Archive-Orig, and they are indeed different. so I think the length must be the compressed (gzip maybe?) size of the original entire network request, including headers
-
OrIdow6Wish more people could just answer their questions by looking at the list of users in the channel