-
pabs
-
pabs
"Bandcamp workers say they are unable to do their jobs after being locked out of critical systems. They’re also expecting layoffs."
-
pabs
/cc arkiver JAA :)
-
audrooku|m
So when are we grabbing bandcamp boys
-
audrooku|m
don't let em off easy like soundcloud
-
fireonlive
didn't like soundcloud threaten archiveteam
-
audrooku|m
IA, so basically yes
-
audrooku|m
a 128k mp3 BC grab would certainly be less than 1PB
-
audrooku|m
nevermind probably more like 1.5
-
mgrandi
re Xentax: is there a way to get the files if the main page is not working? that XML dump that i request that they thankfully posted doesn't have the files
-
pokechu22
-
pokechu22
ia802609.us.archive.org/view_archiv…e=wikixentaxcom-20230513-images.txt) but there isn't an easy way to list files with the index not working (and the api not supporting json)... but it looks like it supports JSON now so hmm
-
pokechu22
mgrandi: ok, I got an image only dump:
archive.org/details/wiki-wiki.xentax.com-20231008 (it doesn't seem like wikibot wants to dump the non-image content now, so that's fun)
-
thuban
bandcamp band and item ids appear to be nonsequential 10-digit numbers, but there's a "full artist index":
bandcamp.com/artist_index
-
thuban
(other potential discovery sources include the "discover" endpoint at
bandcamp.com/api/discover/3/get_web, although each query is limited to ~4.3k results, and the in-html recommendations on each item page--but i doubt either would include anything somehow absent from the index)
-
mgrandi
@pokechu22 thats good that at least we got something from this year, but i was meaning that the main page of the wiki apparently fails to render and i think they siad that the PHP version is out of date or something so i'm not sure we can get anything from the latest version
-
pokechu22
-
pokechu22
-
pokechu22
-
mgrandi
huh, i guess if that api.php page works then i guess the wikibot tools still work, neat!
-
pokechu22
Well, kinda - I couldn't get it to export page history, only images, but we already got a separate page history dump so good enough
-
mgrandi
14mb seems low, maybe most of the files are on the forum?
-
pokechu22
That sounds possible at least
-
audrooku|m
thuban: re: bandcamp: band, album, and track ids are random 32 bit uints, if you want to get a list of tracks to grab I'd definitely suggest crawling the artists listed in the index
-
JAA
Eh, what's 4.3 billion requests between friends? :-)
-
kiryu
Not sure where to ask this but do I try to archive a Cloudflared site with Selenium and Playwright?
-
kiryu
Or is that a very *tough* process?
-
kiryu
I found the origin IP but they seems to block every way of archivng (accessing it returns 302 to the cloudflared main domain)
-
kiryu
CDN links seems to be loaded only one time then it gets 403'd
-
JAA
You could try something browser-based with warcprox, yeah. With the origin IP, perhaps you could also send the relevant headers so the origin thinks the request comes from Buttflare. But if it's implemented by a half-competent sysadmin, that shouldn't work.
developers.cloudflare.com/fundamentals/reference/http-request-headers
-
audrooku|m
JAA: I agree that 4.3BN isn't that bad, I've done nearly double that with soundcloud.. I just think crawling the artist index and WARCing all the pages would be useful for discovering the content
-
JAA
audrooku|m: No disagreement there. At least it'd be a good first pass.
-
arkiver
pabs: ouch, thanks :/
-
wrnines
Hey, so I've never participated in Archive Team and have more just been admiring it from afar for a long while, but I figured I should pop into the IRC because that's what the FAQ says to do to let the team know about sites that are dying
-
joepie91|m
👋
-
wrnines
It just got announced today that the online writing/literature magazine/writing workshop site LitReactor is shutting its doors, and after December 31 2023 the site is going to be gone
-
wrnines
-
wrnines
I'm not sure if the site is small enough for the ArchiveBot since the site has been running since 2011, but I thought it was probably worth informing archive team about. I guess from here I should go to the archivebot IRC channel to let the folks there know about running it for LitReactor??
-
arkiver
while we track the situation, let's make a bandcamp channel
-
arkiver
any ideas for a channel name?
-
kpcyrd
#tapecamp
-
that_lurker
#concen.... nevermind
-
JAA
lol, my brain just took the same turn. :-)
-
project10
#bandaid
-
flashfire42
#flute
-
JAA
#bandgulag
-
flashfire42
cause you know this one time. at bandcamp
-
project10
#bandcramp
-
JAA
Hah, nice one.
-
arkiver
bandcramp is a nice one
-
arkiver
yeah
-
arkiver
#bandcramp i guess :P
-
that_lurker
sounds good
-
project10
never been at the ground floor for a channel christening :P
-
FireFly
"so that's how it's done huh"
-
HCross
I believe that was a witnessing of democracy
-
magmaus3
yeah
-
magmaus3
:3
-
arkiver
the Telegram project has been restarted in #telegrab
-
audrooku|m
good stuff :*)
-
h2ibot
-
h2ibot
JustAnotherArchivist edited Bandcamp (-11, Add IRC channel):
wiki.archiveteam.org/?diff=50959&oldid=50294