02:36:37 https://www.wired.com/story/epic-games-sale-bandcamp-music-platform-limbo/ 02:36:50 "Bandcamp workers say they are unable to do their jobs after being locked out of critical systems. They’re also expecting layoffs." 02:37:04 /cc arkiver JAA :) 03:02:37 So when are we grabbing bandcamp boys 03:02:48 don't let em off easy like soundcloud 03:03:02 didn't like soundcloud threaten archiveteam 03:03:18 IA, so basically yes 03:03:29 a 128k mp3 BC grab would certainly be less than 1PB 03:24:04 nevermind probably more like 1.5 03:48:41 re Xentax: is there a way to get the files if the main page is not working? that XML dump that i request that they thankfully posted doesn't have the files 04:20:37 mgrandi: https://archive.org/details/wiki-wikixentaxcom_202305 and https://archive.org/details/wiki-wikixentaxcom-20230811 both contain files. Looks like https://wiki.xentax.com/images/8/83/File_stripper_01.png etc still works (from 04:20:40 https://ia802609.us.archive.org/view_archive.php?archive=/16/items/wiki-wikixentaxcom_202305/wikixentaxcom-20230513-wikidump.7z&file=wikixentaxcom-20230513-images.txt) but there isn't an easy way to list files with the index not working (and the api not supporting json)... but it looks like it supports JSON now so hmm 04:41:46 mgrandi: ok, I got an image only dump: https://archive.org/details/wiki-wiki.xentax.com-20231008 (it doesn't seem like wikibot wants to dump the non-image content now, so that's fun) 04:46:13 bandcamp band and item ids appear to be nonsequential 10-digit numbers, but there's a "full artist index": https://bandcamp.com/artist_index 04:49:13 (other potential discovery sources include the "discover" endpoint at https://bandcamp.com/api/discover/3/get_web, although each query is limited to ~4.3k results, and the in-html recommendations on each item page--but i doubt either would include anything somehow absent from the index) 05:46:45 @pokechu22 thats good that at least we got something from this year, but i was meaning that the main page of the wiki apparently fails to render and i think they siad that the PHP version is out of date or something so i'm not sure we can get anything from the latest version 05:47:08 mgrandi: https://archive.org/details/wiki-wiki.xentax.com-20231008 is from today 05:47:25 it was done using https://wiki.xentax.com/api.php 05:47:39 err, http://wiki.xentax.com/api.php 05:47:44 huh, i guess if that api.php page works then i guess the wikibot tools still work, neat! 05:48:06 Well, kinda - I couldn't get it to export page history, only images, but we already got a separate page history dump so good enough 05:48:09 14mb seems low, maybe most of the files are on the forum? 05:49:06 That sounds possible at least 12:07:01 thuban: re: bandcamp: band, album, and track ids are random 32 bit uints, if you want to get a list of tracks to grab I'd definitely suggest crawling the artists listed in the index 12:30:19 Eh, what's 4.3 billion requests between friends? :-) 12:41:11 Not sure where to ask this but do I try to archive a Cloudflared site with Selenium and Playwright? 12:42:37 Or is that a very *tough* process? 12:46:04 I found the origin IP but they seems to block every way of archivng (accessing it returns 302 to the cloudflared main domain) 12:48:31 CDN links seems to be loaded only one time then it gets 403'd 12:52:05 You could try something browser-based with warcprox, yeah. With the origin IP, perhaps you could also send the relevant headers so the origin thinks the request comes from Buttflare. But if it's implemented by a half-competent sysadmin, that shouldn't work. https://developers.cloudflare.com/fundamentals/reference/http-request-headers/ 13:14:58 JAA: I agree that 4.3BN isn't that bad, I've done nearly double that with soundcloud.. I just think crawling the artist index and WARCing all the pages would be useful for discovering the content 13:15:43 audrooku|m: No disagreement there. At least it'd be a good first pass. 14:19:04 pabs: ouch, thanks :/ 15:13:00 Hey, so I've never participated in Archive Team and have more just been admiring it from afar for a long while, but I figured I should pop into the IRC because that's what the FAQ says to do to let the team know about sites that are dying 15:13:51 👋 15:13:56 It just got announced today that the online writing/literature magazine/writing workshop site LitReactor is shutting its doors, and after December 31 2023 the site is going to be gone 15:14:08 https://litreactor.com/news/litreactor-the-end-of-an-era 15:18:51 I'm not sure if the site is small enough for the ArchiveBot since the site has been running since 2011, but I thought it was probably worth informing archive team about. I guess from here I should go to the archivebot IRC channel to let the folks there know about running it for LitReactor?? 21:08:58 while we track the situation, let's make a bandcamp channel 21:09:03 any ideas for a channel name? 21:10:45 #tapecamp 21:14:58 #concen.... nevermind 21:15:16 lol, my brain just took the same turn. :-) 21:15:30 #bandaid 21:16:08 #flute 21:16:10 #bandgulag 21:16:25 cause you know this one time. at bandcamp 21:16:57 #bandcramp 21:17:13 Hah, nice one. 21:17:16 bandcramp is a nice one 21:17:17 yeah 21:17:24 #bandcramp i guess :P 21:17:51 sounds good 21:18:17 never been at the ground floor for a channel christening :P 21:37:23 "so that's how it's done huh" 21:57:35 I believe that was a witnessing of democracy 22:21:56 yeah 22:21:58 :3 22:36:33 the Telegram project has been restarted in #telegrab 22:37:03 good stuff :*) 23:00:15 JAABot edited CurrentWarriorProject (-1): https://wiki.archiveteam.org/?diff=50958&oldid=50938 23:30:21 JustAnotherArchivist edited Bandcamp (-11, Add IRC channel): https://wiki.archiveteam.org/?diff=50959&oldid=50294