-
OrIdow6
Will see about doing estimates on Google Drive
-
OrIdow6
Will be naive, but may give an idea of how big the problem is
-
OrIdow6
Confusing this with Mediafire
-
arkiver
OrIdow6: so you were actually talking about mediafire ot?
-
arkiver
or*
-
OrIdow6
arkiver: No
-
OrIdow6
Just remarking that it's structurally similar
-
arkiver
on google drive - not sure how much we can do on that
-
arkiver
right
-
arkiver
do we have URLs?
-
OrIdow6
Yeah I suspect it will be big
-
OrIdow6
All I've done so far is download some WBM CDX listings
-
OrIdow6
So sort of, but not really, since there's a lot of processing to do on those
-
arkiver
OrIdow6: yeah we're likely looking at 100s of TBs
-
arkiver
if we find only a select number of URLs, we could give it a try
-
OrIdow6
At least 100s
-
arkiver
but i'm not sure about spending 100s of TBs on this
-
arkiver
at least youtube will be nicely playable through the wayback machine
-
Sanqui
-
Sanqui
interesting...
-
OrIdow6
Anyhow, on Google Drive, what I plan to do is get a little prototype for the grab process written first, then run that on a sample of URLs
-
arkiver
sure
-
arkiver
OrIdow6: do you have a sample of URLs available? do i know what type of data this is
-
arkiver
and well, i think we'll not get 100s of TBs of google drive to be honest - unless we find very interesting data sources
-
arkiver
OrIdow6: any rough estimate btw when you might have something to check? no worries if not yet
-
OrIdow6
arkiver: Here are some random lines from WBM CDX, this is only from 1 URL format of at least 2
transfer.archivete.am/WSejD/some_gdrive_from_cdx
-
OrIdow6
No rough estimate yet
-
OrIdow6
Got my attention dragged away from it fairly quickly
-
h2ibot
Usernam edited List of websites excluded from the Wayback Machine (+37):
wiki.archiveteam.org/?diff=47028&oldid=46979
-
h2ibot
TheTechRobo edited Deathwatch (+176, add Enderman):
wiki.archiveteam.org/?diff=47029&oldid=47019
-
h2ibot
-
h2ibot
-
h2ibot
Beano edited YouTube (+1328, Archival incentives):
wiki.archiveteam.org/?diff=47032&oldid=47024
-
h2ibot
Gridkr edited Coronavirus (+106, /* Thailand */):
wiki.archiveteam.org/?diff=47033&oldid=47005
-
JensRex
The danish version of The Daily Stormer is shutting down -->
dagensblaeser.net
-
JensRex
-
JensRex
"Turning off saturday evening 7th august."
-
JensRex
I gotta head out for a bit, but I just came across this.
-
rewby
Oh that's not good\
-
JensRex
They had a podcast as well with 156 episodes as far as I can tell.
-
JensRex
-
JensRex
Really gone this time, I'm super duper late...
-
arkiver
JensRex: thanks, it's in archivebot
-
TheTechRobo
I'm running grab-site, and my wpull process seems to have crashed
-
TheTechRobo
Or at least frozen
-
TheTechRobo
I made my computer fall asleep, forgetting that it was running.
-
TheTechRobo
I tried disabling the Internet for a few seconds to try to get a Broken pipe error, to no avail. I also tried pausing the progress for a few seconds to get a timeout, also to no avail.
-
TheTechRobo
Is there any way to fix this?
-
TheTechRobo
Ah, nevermind...it seems to have continued just as I asked!
-
JensRex
rewby: I don't think they'll be missed.
-
JensRex
From what I can gather, they're shutting down because they aren't getting very much traffic. At least one of the authors hinted at that being an issue previously.