-
h2ibotXaft edited List of websites excluded from the Wayback Machine (+19, added yanbe.net): wiki.archiveteam.org/?diff=52337&oldid=52324
-
h2ibotJAABot edited List of websites excluded from the Wayback Machine (+0): wiki.archiveteam.org/?diff=52338&oldid=52337
-
h2ibotPaulWise edited SmolNet (+117, add 2007 gopherspace mirror link): wiki.archiveteam.org/?diff=52339&oldid=52334
-
pabs-rss/#hackernews-firehose- WARC-GPT: An Open-Source Tool for Exploring Web Archives Using AI: lil.law.harvard.edu/blog/2024/02/12…-for-exploring-web-archives-with-ai news.ycombinator.com/item?id=40614308
-
nicolas17facepalm
-
yarrowBesides archive.org, has anyone attempted to host the Yahoo Answers archive as a navigable website? I recall seeing one or two examples of this a while ago, but I wasn't able to find them recently.
-
aninternettroll-xmppAre there any projects that need bandwith that are not being rate limited right now?
-
imeraninternettroll-xmpp: nothing running currently as far as I know
-
aninternettroll-xmppCool, thanks!
-
JAAyarrow: There was at least one site, yeah. I don't recall the name either though.
-
OrIdow6pabs: Kinda dissapointed that the HN thread doesn't have a billion comments cause that seems like peak HN to me :P
-
wb9688Does anyone happen to know if the Wayback Machine also records the IP address a page was retrieved from and if so, where that is visible?
-
nicolas17I don't think so
-
nicolas17in the case of pages archived via archiveteam's distributed grabber, the grabber's IP address is *not* recorded
-
nicolas17and in rare cases where it is recorded because the *server* sends the IP address back in some header, we warn about it so that people who don't want to expose their IP address refrain from running the project
-
nicolas17in the case of pages archived by Internet Archive's own crawlers, I don't *think* the client's IP is recorded but I'm not sure
-
pokechu22The IP address of the server sending the data is recorded in the WARC as Warc-IP-Address, see bibnum.bnf.fr/WARC/WARC_ISO_28500_version1_latestdraft.pdf - but WARCs from IA's own crawls are generally not downloadable so I don't think you can see those
-
wb9688Yeah, I meant from the webserver, not from the crawler
-
nicolas17oh the webserver's IP?
-
wb9688Yes
-
JAAI don't think that's exposed anywhere other than in the WARC data.
-
fireonlivetook a quick look at the headers and don't see it there