-
thetechrobo_
Yay, r/antiwork went private...
-
thetechrobo_
They say they'll be back soon tho
-
thetechrobo_
...guess what I'm downloading once it comes back up?
-
OrIdow6
That's fine wessel1512
-
JAA
TheTechRobo: You're aware that we're continuously archiving all of Reddit, right?
-
TheTechRobo
Right...I thought of that, but dismissed it for some reason.
-
pabs
JAA: outlinks too? is HN+outlinks also archived?
-
JAA
pabs: Yes, Reddit outlinks go into #//. Don't think we're covering HN anywhere at the moment, but it's small enough to probably not need a distributed project.
-
JAA
It's been on my list to investigate that for a while.
-
JAA
But you know how that goes...
-
pabs
:)
-
OrIdow6
I do believe I remember seeing an IA non-continuous crawl of HN and its outlinks somewhere
-
arkiver
aiming for a start of the pinger.pl project tomorrow
-
DopefishJustin
-
OrIdow6
Respectively: "An important in-game announcement regarding future plans for the game has been published. We ask that users login to confirm the contents of the announcement. #a3game"
-
OrIdow6
" PLEASE BOOST THIS WE R ARCHIVING A3 EN Guys !! I already have an archives discord up so if u wanna help me archive stuff please do join
discord.gg/tGEnmadD"
-
OrIdow6
-
OrIdow6
*on
-
OrIdow6
Any more info DopefishJustin?
-
DopefishJustin
nope
-
OrIdow6
Oh
-
OrIdow6
Yet another forum
-
OrIdow6
Looks like www. and straight domain are slightly different
-
OrIdow6
If someone wants me in particular to look at this I will do so tomorrow
-
OrIdow6
Same goes for A3, will investigate and add to DW tomorrow if no one else does
-
ohyes
OrIdow6, fyi, there is also 207.148.109.12 besides the www. and straight domain
-
Hifihedgehog
Hello. I have a question regarding the TechnologyGuide data, which has been posted on Archive.org. Will this also be back-populated into the Wayback Machine?
-
arkiver
yes
-
Hifihedgehog
Cool. Thanks for confirming and thanks for your work!
-
arkiver
JAA: those thanks are for you ^
-
JAA
:-)
-
JAA
Regarding the TechnologyGuide websites: technologyguide.com is done, notebookreview.com and brighthand.com are running in AB currently, digitalcamerareview.com and tabletpcreview.com have not been covered at all yet.
-
JAA
The rate limiting on the sites works differently than on the forums.
-
daxxy
hi, is there a list of the technologyguide forum URLs broken by the WAF?
-
daxxy
I'm thinking of grabbing the contents over the tapatalk API - no use for WBM, but at least they'd be *somewhere*
-
JAA
Hmm, good idea. Maybe it can be put into the WBM in some form even. Do you have more details on that API?
-
JAA
There isn't a list of the broken threads currently, but I can get one later.
-
daxxy
honestly I'm not even sure you can get posts via GET requests from it, internally it's almost all POSTs with parameters in bodies
-
daxxy
used to be xmlrpc, they added JSON later on
-
JAA
I see.
-
JAA
Well, POST can still be saved to WARC and go into the WBM. Maybe we can add a URL parameter that gets ignored by the API and serves as the topic identifier in the API, like we did for the YouTube dislikes data.
-
daxxy
oh, neat
-
JAA
Is the API open or does it require auth?
-
daxxy
I think everything we'd need is open
-
JAA
Nice
-
daxxy
the serverside implementation is freely available and all unobfuscated PHP, that's my reference:
tapatalk.com/download_xenforo
-
daxxy
(note, xenforo 1)
-
daxxy
tl;dr: e.g. /mobiquo/mobiquo.php?method_name=get_forum (for xmlrpc) and /mobiquo/tapatalk.php?method_name=get_forum (for json) are handled by mobiquo/mbqAction/MbqActGetForum.php, etc.
-
daxxy
one very neat thing about it is, there's no limit on the results/page parameter, you can request as much as you want at once until you hit httpd/php side timeouts/size limits
-
cadence
hi there! I had a question about the 2009 geocities crawl - I thought it could be fun to download some whole sites from the collection so I can quickly navigate through them, offline, using a text mode browser, rather than having to go via the wayback machine interface. however, seems as though download is disabled for the warc files in the collection?
-
cadence
if there's an older collection of geocities, I'm totally okay going back to it, since I'm looking to browse rather than to have a complete archive.
-
cadence
if this is infeasible and I am being a fool, please let me know :^)
-
OrIdow6
cadence: What is this collection you are looking at?
-
cadence
-
daxxy
that's IA's crawl, *Archive Team's* crawl is openly available, see
wiki.archiveteam.org/index.php/GeoCities
-
OrIdow6
I believe those are archive.org crawls, not ArchiveTeam ralws
-
OrIdow6
Archive.org does not usually release its raw WARCs, and I don't think ArchiveTeam used WARCs for the geocities roject
-
cadence
my bad! I'll check out the link you posted, thanks
-
OrIdow6
yeah
-
h2ibot
Adrmcr edited Game Jolt (+137, added tracker and grab links):
wiki.archiveteam.org/?diff=48222&oldid=48157
-
h2ibot
Fidel edited List of websites excluded from the Wayback Machine (+24):
wiki.archiveteam.org/?diff=48223&oldid=48213
-
cadence
I see, thanks so much for this link. so there's the patched torrent, which I can pretty easily figure out how to download. the page also mentions an art project which is apparently "far superior", though this appears to be a series of screenshots rather than a data dump. am I correct?
-
cadence
checking blog.geocities.institute/about -- afaict, the patched torrent is the one to get. thanks for the help!
-
cadence
awesome, thanks everyone <3