02:46:00 Xaft edited List of websites excluded from the Wayback Machine (+19, added http://yanbe.net): https://wiki.archiveteam.org/?diff=52337&oldid=52324 03:01:02 JAABot edited List of websites excluded from the Wayback Machine (+0): https://wiki.archiveteam.org/?diff=52338&oldid=52337 03:36:08 PaulWise edited SmolNet (+117, add 2007 gopherspace mirror link): https://wiki.archiveteam.org/?diff=52339&oldid=52334 03:42:06 -rss/#hackernews-firehose- WARC-GPT: An Open-Source Tool for Exploring Web Archives Using AI: https://lil.law.harvard.edu/blog/2024/02/12/warc-gpt-an-open-source-tool-for-exploring-web-archives-with-ai/ https://news.ycombinator.com/item?id=40614308 03:42:54 facepalm 10:58:07 Besides archive.org, has anyone attempted to host the Yahoo Answers archive as a navigable website? I recall seeing one or two examples of this a while ago, but I wasn't able to find them recently. 11:29:37 Are there any projects that need bandwith that are not being rate limited right now? 11:54:11 aninternettroll-xmpp: nothing running currently as far as I know 11:54:23 Cool, thanks! 17:01:47 yarrow: There was at least one site, yeah. I don't recall the name either though. 17:19:28 pabs: Kinda dissapointed that the HN thread doesn't have a billion comments cause that seems like peak HN to me :P 20:08:17 Does anyone happen to know if the Wayback Machine also records the IP address a page was retrieved from and if so, where that is visible? 20:09:02 I don't think so 20:09:57 in the case of pages archived via archiveteam's distributed grabber, the grabber's IP address is *not* recorded 20:10:16 and in rare cases where it is recorded because the *server* sends the IP address back in some header, we warn about it so that people who don't want to expose their IP address refrain from running the project 20:10:52 in the case of pages archived by Internet Archive's own crawlers, I don't *think* the client's IP is recorded but I'm not sure 20:11:18 The IP address of the server sending the data is recorded in the WARC as Warc-IP-Address, see http://bibnum.bnf.fr/WARC/WARC_ISO_28500_version1_latestdraft.pdf - but WARCs from IA's own crawls are generally not downloadable so I don't think you can see those 20:13:03 Yeah, I meant from the webserver, not from the crawler 20:18:06 oh the webserver's IP? 20:29:58 Yes 20:31:37 I don't think that's exposed anywhere other than in the WARC data. 20:36:05 took a quick look at the headers and don't see it there