-
pabs
-
pabs
death sentence for twitter/youtube stuff
-
nicolas17
youtube project is weird, my VPS downloaded 33GB and uploaded 6GB
-
nicolas17
is it downloading videos and throwing them away based on some criteria instead of uploading?
-
audrooku|m
I was under the impression it only scraped metadata, if itd grabbing the youtube webpage for example it can theow away all of the html and just upload a json zst
-
audrooku|m
It's* throw*
-
JAA
→ #down-the-tube
-
JAA
pabs: Hmm, can't find him, but it might all be in Arabic of course.
-
nicolas17
audrooku|m: it's certainly making 100-1000MB downloads
-
pabs
-
fireonlive
-
fireonlive
-
fireonlive
"Looks like the Japanese forum is going offline too:
keisan.casio.jp/keisan/user_forum"
-
pokechu22
Larsenv: looks like the forum in question is
discussions.apple.com/browse - that's probably worth saving, but we might want to wait a few days for the Gabon stuff to settle down (and the other urgent stuff too). It doesn't sound like they'll be immediately deleting the forum on October 1, just stopping employee posting on it
-
pokechu22
I don't recognize the forum software in use so it might be a bit of a mess for archivebot
-
nicolas17
probably custom?
-
Larsenv
What is Gabon
-
pokechu22
Do we actually know how big it is?
-
pokechu22
-
Larsenv
It's probably 20 years of history!
-
cas
oh? what is?
-
nicolas17
-
fireonlive
damn it
-
fireonlive
but yeah, it's been around forever it feels
-
fireonlive
hmmmm
-
fireonlive
-
fireonlive
not encouraging...
-
fireonlive
latest activity: oldest is something from 5 months ago
-
fireonlive
did they swap software/turn on auto-prune at some point?
-
pokechu22
-
pokechu22
-
pokechu22
-
fireonlive
-
fireonlive
so it exists somewhere
-
fireonlive
-
nicolas17
if I sort by "recently created", it only lets me get to page 103
-
nicolas17
which has posts from *one day ago* so the forum seems to get 2000 threads per day?
-
cptcobalt
I’m sad that the 10000 or whatever post is not from someone ID hunting
-
nicolas17
anyway if threads have sequential IDs, why bother with /browse
-
fireonlive
yeah seems to be
-
fireonlive
-
fireonlive
there's some threads that ask me for login
-
fireonlive
and some just 404
-
fireonlive
but
-
fireonlive
randomly playing with numbers
-
fireonlive
so many unanswered questions :p
-
pokechu22
Looking at the recent ones they don't seem to be fully sequential, in fact
discussions.apple.com/thread/255097976 is listed before
discussions.apple.com/thread/255097973 - but there might be some kind of spam queue that needs manual approval to escape or something?
-
fireonlive
-
fireonlive
also 200001's reply links have 'answerid' params.. 987106022 for the first reply
-
fireonlive
but can be ignored i think
-
h2ibot
FireonLive edited Xuite (+4, not so "Xuite" news, Xuite is offline):
wiki.archiveteam.org/?diff=50630&oldid=50629
-
h2ibot
Yts98 edited Xuite (+77, Xuite goes offline):
wiki.archiveteam.org/?diff=50631&oldid=50630
-
h2ibot
FireonLive edited Current Projects (+30, move Xuite to done):
wiki.archiveteam.org/?diff=50632&oldid=50617
-
fireonlive
auto project is Xuite still, dunno if there's a better choice
-
flashfire42
are any other projects currently active and feeding items? telegram maybe?
-
h2ibot
FireonLive edited Current Projects (+18, linkify YouTube's 'selected videos'):
wiki.archiveteam.org/?diff=50633&oldid=50632
-
imer
telegram is empty again
-
imer
reddit once arkiver gets time to look at the image stuff I guess
-
fireonlive
imgur is going to have a temporary startup as well
-
fireonlive
but yeah reddit will need quite the backlog crunch
-
flashfire42
/tableflip I guess I will find something to feed into something then
-
flashfire42
do you want more youtube videos? Mediafire chews it up way too quickly and I have exhausted my telegram stuff
-
fireonlive
if they fit within scope i don't see the harm, but i assume we'll chew though them quite quickly
-
nicolas17
there's still some youtube reclaims
-
nicolas17
and I have a ton of free bandwidth left on my VPS so I'm getting my money's worth :P
-
fireonlive
:D
-
flashfire42
so youtube is the only one running? plus the dregs of gfycat?
-
imer
telegram is also running, but empty currently
-
imer
but thats about it yeah
-
h2ibot
-
cas
littlewitchacademia.jp/tv1st Requesting to queue this site for archival, if possible. Not necessarily urgent, but bandai namco deleted their page pertaining the anime a while back (
web.archive.org/web/20180119174059/…itch-academia/little-witch-academia) so I hope the site as-is can be archived to
-
cas
preserves its content.
-
flashfire42
JAA do you have the way to turn off the rearchive everything thing on telegram?
-
JAA
flashfire42: No, also wrong channel.
-
flashfire42
you and I both know things get buried in #telegrab I wanted to be sure it was seen
-
JAA
I read my pings. :-)
-
JAA
(And I expect others to do so, too, regardless of other noise.)
-
rewby
nstrom|m: I don't see an announcement on dedipath's site about the shutdown
-
JAA
Apparently just an email, and people on LET are reporting sender verification failures:
lowendtalk.com/discussion/188358/dedipath-closure-of-business
-
nstrom|m
was just about to post that over here. yeah I got the email from them because I have a server as well, can verify the contents of that post
-
flashfire42
if we need more work for warriors poke me when I am not working or sleeping I will toss some more into whatever you want
-
Exorcism|TheLounge
-
eggdrop
-
bladem
Dedipath confirmed via email to LES moderator that they are shutting down entirely today:
lowendspirit.com/discussion/comment/148585/#Comment_148585
-
JAA
VirMach is using DediPath for a lot of their locations. I wonder what other providers either have their stuff colo'd there or are just reselling.
-
nstrom|m
I *think* dedipath didn't own the colocation facilities, just had ASN (as35913) & hardware
-
nstrom|m
I could be wrong on that though
-
nstrom|m
I know there were definitely resellers of dedipath VPSes. .ethernetservers.com was reselling dedipath in NJ but recently switched providers there after a datacenter fire
-
nstrom|m
they had space in 10 datacenters and their ASN announces a pretty big chunk of addresses so I'm sure there will be lots of affected customers in any case
-
nstrom|m
(they = dedipath in above)
-
JAA
nstrom|m: They mention colo stuff in the email though...?
-
JAA
'In regards to our colocation customers if you are in the following locations please send a ticket to ...'
-
nstrom|m
yeah I think the companies they tell colo customers to contact are the companies that actually own the datacenters
-
nstrom|m
so it's probably something like "move your stuff off of the dedipath racks onto some other rack in the same facility if you want to stay"
-
nstrom|m
if I had to guess
-
nstrom|m
I'm not a colo customer there
-
JAA
Ah
-
rewby
DC's usually only like to sell in units of 1 or more racks
-
rewby
As in, entire racks
-
rewby
So there's an ecosystem of companies that basically sublease racks
-
rewby
They get a rack in a DC (or even an entire cage)
-
rewby
And then rent out individual RU with power and network
-
rewby
So if dedipath had colo customers, this is likely what they did
-
rewby
The DC would just drop power and maybe one or two fiber feeds into the rack
-
rewby
And then the company puts in a top of rack, a PDU and brings more ips
-
JAA
Oh yeah, as someone pointed out, LET still has a DediPath ad at the bottom. Beautiful.
-
immibis
"Summer Sale - 20% off select dedicated servers from just $36/m + save 20% on all VPS and web hosting! Click Here to Save Now!"
-
immibis
save because they will never deliver any service or bill you?
-
qyxojzh|m
<immibis> ""Summer Sale - 20% off select..." <- Is that a scam email or a genuine sale?
-
immibis
that's the banner at the top of dedipath.com
-
immibis
which is apparently going out of business today
-
immibis
according to the discussion above
-
arkiver
44 TB of items with WARCs in the archiveteam collections have mediatype 'data' - this will make them unavailable in the Wayback Machine
-
arkiver
i'm moving these to mediatype=web
-
arkiver
after this i'll do a check to see if all WARCs have actually been derived - i already see quite some that have not been derived
-
arkiver
now rederiving some 113 TB of items with WARCs that may not be correctly indexed yet
-
arkiver
(that is 30k items)
-
JAA
Yay, more work for IA's overloaded systems. :-)
-
TheTechRobo
-
arkiver
or... more records in the Wayback Machine without having to upload new data :P
-
arkiver
more completeness yay
-
JAA
But great to fix those. I hope they weren't intentionally marked with a different mediatype. Probably not many of those in the AT collection though.
-
arkiver
most of these were 2014 or 2015 items from 'before AB'
-
arkiver
usually uploaded by a single user
-
JAA
Ah, the dark ages.
-
arkiver
-
fireonlive
😰
-
arkiver
they were initially uploaded as mediatype=data
-
arkiver
(there were also some cases in which other items seemed to have been accidentally uploaded as mediatype=data)
-
arkiver
but those are indeed pretty old, ~9 years or so
-
arkiver
at this point in time, WARCs that should not go into the Wayback Machine are in
archive.org/details/warczone
-
arkiver
meanwhile we also still have
archive.org/details/archiveteam-mobileme-hero , which contains 282 TB of WARCs... inside tar files :/ so also not in the Wayback Machine
-
arkiver
basically that entire project is invisible to most users
-
fireonlive
hey my warcs are in the zone t_t
-
fireonlive
and i thought i was special! :P
-
fireonlive
but hmm interesting
-
fireonlive
“This massive collection represents one of the largest projects Archive Team may ever do: Over 272 terabytes…”
-
fireonlive
:)
-
TheTechRobo
lmao
-
TheTechRobo
urls project: 5.03PiB
-
JAA
I've uploaded WARCs in TARs before, precisely so they don't accidentally get derived and put in the WBM.
-
arkiver
but i think we want the mobileme collection to be indexed?
-
JAA
Assuming there were no auth shenanigans involved, probably.
-
HCross
oh arkiver I see you found the rest of XS4ALL :P
-
HCross
is there a way to unpack those TAR files inside the IA, or do we need something to pull them down and reprocess them
-
arkiver
HCross: might have to pull them down :/
-
arkiver
there's more than just WARCs. it might be best to pull them down, just pull out the WARCs and put those WARCs back in together with tars containing leftovers
-
fireonlive
xs4all now that’s a name i haven’t heard in ages
-
rewby
Me and HCross made an effort to archive their ISP hosting
-
rewby
Was a bit of a pain
-
rewby
And completeness is ???
-
rewby
But it's something
-
arkiver
it's a very good "something" :)
-
rewby
?
-
rewby
It was me having a go at heritrix
-
arkiver
how was the experience?
-
rewby
Good and bad?
-
rewby
It needed some code tweaks to do what I wanted it to do
-
rewby
And I ended up doing some cursed sharding
-
rewby
To make it go faster
-
rewby
Took a while to get my head around
-
rewby
So really not any worse or better than most tools
-
rewby
I ended up modifying some of the code around frontier management
-
rewby
Because I needed it to do some cursed things to deal with xs4all
-
rewby
One thing that was a pain was that at some point xs4all converted from xs4all.net/~user (or something similar) to user.xs4all.net (or something) (I forget the exact subdomains involved, but they went from ~user to subdomain)
-
rewby
And while the redirect existed
-
rewby
I had to hack a few checks out of the frontier code
-
rewby
Because I didn't want to grab many outlinks
-
rewby
But I needed it to go through to the ~user links on a different subdomain
-
rewby
And follow the redirects
-
rewby
It ended up working reasonably well
-
rewby
Also iframes did some cursed stuff I dont' remember
-
JAA
Are those patches/hacks available somewhere?
-
rewby
No, I don't even know if I still have them
-
JAA
Oof
-
rewby
I mean, it's not that hard
-
rewby
The source code is on github
-
rewby
Find the frontier code and just fuck around and find out
-
rewby
I will point out I never touched the actual retrieval and warc writing
-
rewby
I just fucked around with link discovery and frontier
-
JAA
Yeah, I just like this ideal world where the code for the archival is also freely available for anyone to figure out why the data is the way it is.
-
rewby
Yeah I get that
-
rewby
But genuinely, it was a copy paste hack job in a bunch of places
-
fireonlive
ahh :)
-
rewby
also, I can't actually share the seed url list
-
fireonlive
ah yes a .nl isp :3. probably saw it on irc way back when
-
h2ibot
0KepOnline edited VHS on YouTube (+253, Added Ukrainian VHS on YT):
wiki.archiveteam.org/?diff=50635&oldid=37007
-
h2ibot
0KepOnline edited Local TV News (+2215, /* Europe */ Ukraine):
wiki.archiveteam.org/?diff=50636&oldid=48641
-
h2ibot
Rexma edited List of websites excluded from the Wayback Machine/Partial exclusions/Twitter accounts (+38, add account):
wiki.archiveteam.org/?diff=50637&oldid=49984
-
h2ibot
Rob Kam edited WikiTeam (+142, The MediaWiki comparison of wiki farms is more…):
wiki.archiveteam.org/?diff=50638&oldid=50537
-
fireonlive
-
eggdrop
-
fireonlive
"Volition staff on Twitter are reporting that parent company Embracer has just closed the 30-year-old studio behind Saint's Row and Red Faction, with mass layoffs #VolitionJobs"
-
nicolas17
ran telegram at high concurrency... pop went the modem
-
arkiver
:P
-
arkiver
well at time you're getting the maximum out of your modem :)
-
nulldata
fireonlive - that's so fucking sad
-
fireonlive
yeah :(
-
nulldata
Even sadder that the Saints Row 2 PC patch IdolNinja was so passionate about and organized before he died of cancer very likely won't see the light of day either
-
fireonlive
goddamn
-
h2ibot
-
nulldata
It's almost as if one big company spending billions in debt buying up all the other publishers and developers isn't such a great thing for the stability of everyone involved 🤔
-
flashfire42|m
Just gave telegram a bit more work. Will queue more soon