02:36:37 <pabs> https://www.wired.com/story/epic-games-sale-bandcamp-music-platform-limbo/
02:36:50 <pabs> "Bandcamp workers say they are unable to do their jobs after being locked out of critical systems. They’re also expecting layoffs."
02:37:04 <pabs> /cc arkiver JAA :)
03:02:37 <audrooku|m> So when are we grabbing bandcamp boys
03:02:48 <audrooku|m> don't let em off easy like soundcloud
03:03:02 <fireonlive> didn't like soundcloud threaten archiveteam
03:03:18 <audrooku|m> IA, so basically yes
03:03:29 <audrooku|m> a 128k mp3 BC grab would certainly be less than 1PB
03:24:04 <audrooku|m> nevermind probably more like 1.5
03:48:41 <mgrandi> re Xentax: is there a way to get the files if the main page is not working? that XML dump that i request that they thankfully posted doesn't have the files
04:20:37 <pokechu22> mgrandi: https://archive.org/details/wiki-wikixentaxcom_202305 and https://archive.org/details/wiki-wikixentaxcom-20230811 both contain files. Looks like https://wiki.xentax.com/images/8/83/File_stripper_01.png etc still works  (from
04:20:40 <pokechu22> https://ia802609.us.archive.org/view_archive.php?archive=/16/items/wiki-wikixentaxcom_202305/wikixentaxcom-20230513-wikidump.7z&file=wikixentaxcom-20230513-images.txt) but there isn't an easy way to list files with the index not working (and the api not supporting json)... but it looks like it supports JSON now so hmm
04:41:46 <pokechu22> mgrandi: ok, I got an image only dump: https://archive.org/details/wiki-wiki.xentax.com-20231008 (it doesn't seem like wikibot wants to dump the non-image content now, so that's fun)
04:46:13 <thuban> bandcamp band and item ids appear to be nonsequential 10-digit numbers, but there's a "full artist index": https://bandcamp.com/artist_index
04:49:13 <thuban> (other potential discovery sources include the "discover" endpoint at https://bandcamp.com/api/discover/3/get_web, although each query is limited to ~4.3k results, and the in-html recommendations on each item page--but i doubt either would include anything somehow absent from the index)
05:46:45 <mgrandi> @pokechu22  thats good that at least we got something from this year, but i was meaning that the main page of the wiki apparently fails to render and i think they siad that the PHP version is out of date or something so i'm not sure we can get anything from the latest version
05:47:08 <pokechu22> mgrandi: https://archive.org/details/wiki-wiki.xentax.com-20231008 is from today
05:47:25 <pokechu22> it was done using https://wiki.xentax.com/api.php
05:47:39 <pokechu22> err, http://wiki.xentax.com/api.php
05:47:44 <mgrandi> huh, i guess if that api.php page works then i guess the wikibot tools still work, neat!
05:48:06 <pokechu22> Well, kinda - I couldn't get it to export page history, only images, but we already got a separate page history dump so good enough
05:48:09 <mgrandi> 14mb seems low, maybe most of the files are on the forum?
05:49:06 <pokechu22> That sounds possible at least
12:07:01 <audrooku|m> thuban: re: bandcamp: band, album, and track ids are random 32 bit uints, if you want to get a list of tracks to grab I'd definitely suggest crawling the artists listed in the index
12:30:19 <JAA> Eh, what's 4.3 billion requests between friends? :-)
12:41:11 <kiryu> Not sure where to ask this but do I try to archive a Cloudflared site with Selenium and Playwright?
12:42:37 <kiryu> Or is that a very *tough* process?
12:46:04 <kiryu> I found the origin IP but they seems to block every way of archivng (accessing it returns 302 to the cloudflared main domain)
12:48:31 <kiryu> CDN links seems to be loaded only one time then it gets 403'd
12:52:05 <JAA> You could try something browser-based with warcprox, yeah. With the origin IP, perhaps you could also send the relevant headers so the origin thinks the request comes from Buttflare. But if it's implemented by a half-competent sysadmin, that shouldn't work. https://developers.cloudflare.com/fundamentals/reference/http-request-headers/
13:14:58 <audrooku|m> JAA: I agree that 4.3BN isn't that bad, I've done nearly double that with soundcloud.. I just think crawling the artist index and WARCing all the pages would be useful for discovering the content
13:15:43 <JAA> audrooku|m: No disagreement there. At least it'd be a good first pass.
14:19:04 <arkiver> pabs: ouch, thanks :/
15:13:00 <wrnines> Hey, so I've never participated in Archive Team and have more just been admiring it from afar for a long while, but I figured I should pop into the IRC because that's what the FAQ says to do to let the team know about sites that are dying
15:13:51 <joepie91|m> 👋
15:13:56 <wrnines> It just got announced today that the online writing/literature magazine/writing workshop site LitReactor is shutting its doors, and after December 31 2023 the site is going to be gone
15:14:08 <wrnines> https://litreactor.com/news/litreactor-the-end-of-an-era
15:18:51 <wrnines> I'm not sure if the site is small enough for the ArchiveBot since the site has been running since 2011, but I thought it was probably worth informing archive team about. I guess from here I should go to the archivebot IRC channel to let the folks there know about running it for LitReactor??
21:08:58 <arkiver> while we track the situation, let's make a bandcamp channel
21:09:03 <arkiver> any ideas for a channel name?
21:10:45 <kpcyrd> #tapecamp
21:14:58 <that_lurker> #concen.... nevermind
21:15:16 <JAA> lol, my brain just took the same turn. :-)
21:15:30 <project10> #bandaid
21:16:08 <flashfire42> #flute
21:16:10 <JAA> #bandgulag
21:16:25 <flashfire42> cause you know this one time. at bandcamp
21:16:57 <project10> #bandcramp
21:17:13 <JAA> Hah, nice one.
21:17:16 <arkiver> bandcramp is a nice one
21:17:17 <arkiver> yeah
21:17:24 <arkiver> #bandcramp i guess :P
21:17:51 <that_lurker> sounds good
21:18:17 <project10> never been at the ground floor for a channel christening :P
21:37:23 <FireFly> "so that's how it's done huh"
21:57:35 <HCross> I believe that was a witnessing of democracy
22:21:56 <magmaus3> yeah
22:21:58 <magmaus3> :3
22:36:33 <arkiver> the Telegram project has been restarted in #telegrab
22:37:03 <audrooku|m> good stuff :*)
23:00:15 <h2ibot> JAABot edited CurrentWarriorProject (-1): https://wiki.archiveteam.org/?diff=50958&oldid=50938
23:30:21 <h2ibot> JustAnotherArchivist edited Bandcamp (-11, Add IRC channel): https://wiki.archiveteam.org/?diff=50959&oldid=50294