-
pabs
Peroniko: ArchiveBot doesn't use IA IPs, so might be feasible to save via that if the site isn't massive
-
pabs
although if they are blocking IA, they would probably just request removal of the site IA too
-
Peroniko
I would say it's at least 5 million links
-
pabs
probably doable, what about in GB?
-
Peroniko
because they add releases of published music, labels, artist...
-
Peroniko
I would say pictures would be largest part, they don't allow cover pictures above 10MB, other parts of the site are light, because site still has the spirit of 2005 with some modern tech
-
Peroniko
I think i undersold it, checking now and only releases are coming up to 10 million pages, so 20 million should be closer to the end number of pages
-
pabs
there are AB jobs running right now that got more requests than that, 47 million is the largest
-
TheTechRobo
page requisites might be a problem though?
-
TheTechRobo
assuming 20 million doesn't include them
-
Peroniko
no, but it seems that only third party one are googletagmanager and their image CDN. Some of their CSS links are on the same CDN and but they seem to use inline style, so that should reduce number of links.
-
nicolas17
fireonlive arkiver: I found two more, YV-003900000-003999999.tar YV-009900000-009999999.tar YV-003200000-003213887.tar are all friendster stuff rather than yahoo videos
-
h2ibot
TheTechRobo edited Current Projects (-121, Remove Ownlog from upcoming projects as it is dead):
wiki.archiveteam.org/?diff=50777&oldid=50759
-
thuban
did we ever get anything for ownlog, or should the status be changed to "lost"?
-
h2ibot
VoynichCr edited Svalbard Global Seed Vault (+50):
wiki.archiveteam.org/?diff=50778&oldid=47903
-
plcp
pokechu22: thanks
-
flashfire42
OI SOMEONE GIVE THE GITHUB PROJECT A TARGET
-
flashfire42
-
flashfire42
Could not find a target.
-
flashfire42
Could not get rsync target.
-
flashfire42
Failed to upload, retrying...
-
Rootliam
nicolas17: Do you have any idea when all the file lists will be done and will you be uploading them to IA?
-
nicolas17
I finished all that I could do with the fast script that skips data
-
nicolas17
now I still have to do a few .tar.bz2, and one of the .tar that actually has friendster data
-
nicolas17
(the files inside are small so there's almost no benefit to doing range requests)
-
Rootliam
Alright, do any of them have videos from user id 375869?
-
nicolas17
not seeing any
-
nicolas17
but those friendster tars makes me think something got mixed up when uploading, and maybe the "correct" yahoo videos tar with that name still exists somewhere?
-
Rootliam
Who would be the person to ask about those though
-
thuban
probably the people listed here, if you can track them down
wiki.archiveteam.org/index.php/Yahoo!_Video_Warroom
-
JAA
Have you checked whether there's perhaps something in the Friendster data?
-
arkiver
what do we know about FOIAonline? i see in the PDF that people need to contact the "Partners" to see where documents will be accessible
-
arkiver
what do we know about that? is there a new central data repository?
-
Rootliam
JAA: My program didn't find a single video file in the one of those that I checked, idk about the other ones
-
Rootliam
SketchCow: You're the one who packaged/uploaded the Yahoo Video archives right?