01:06:25 !help 01:08:22 is this the room for archive bot? 01:09:06 !archive 01:09:35 !archive https://github.com 01:37:08 JC|m: the bot is in #archivebot, but only voiced/ops have access. 01:37:26 github.com is also way too big to be archived using archivebot 01:37:54 github.com is also not the right way to archive individual GitHub repos/orgs, we have #gitgud for doing that 01:38:31 if you have any other sites to save via ArchiveBot, state the URLs here and the reasons for archiving them 02:10:41 computer, please archive the entire internet 02:51:32 PaulWise edited Mailman2 (+964, add more, one done): https://wiki.archiveteam.org/?diff=50948&oldid=50886 03:17:16 "JC: the bot is in #archivebot..." <- Is there a difference between archiving here or directly from the website? 03:17:50 in https://archive.org/web/ 03:18:45 Two different systems, yes. archiveteam is not the internet archive 03:19:48 https://subscene.com 03:20:09 I was looking into archiving this site. 03:21:46 imer: 03:22:04 pabs: 03:24:44 JC|m: what is the reason for archiving subscene? 03:25:27 They have a big directory of subtitles for movies and TV shows. 03:26:01 The website has had some downtime recently. 03:26:11 hmm, lots of filenames indicating pirate sites. 03:26:26 I wanted to be archived if anything happened 03:26:26 * pabs wonders what the policy is for that sort of stuff 03:27:28 The subtitles are not pirated. 03:28:09 A lot of famous streaming sites upload their subs on subsence 03:28:54 streaming sites like Disney+ or? 03:29:03 nah 03:29:24 Netflix? 03:29:51 30Nama 03:29:53 filimo 03:29:56 namava 03:31:08 And independent people who translate subtitles 03:44:13 looks like it was last saved in 2018 https://archive.fart.website/archivebot/viewer/domain/subscene.com 03:44:24 so time for an update I guess 03:45:22 interesting, Google deleted one of its results for this site because of https://www.lumendatabase.org/notices/18635190 03:49:59 overzealous court. SRT files (the actual subtitles) are just timestamps and text. 03:50:22 presumably the dialog/script is copyrighted though :) 03:51:04 and the movie cover/posters 03:51:09 anyway, running now 03:51:28 JC|m: see http://archivebot.com/ if you want to follow the job 03:51:41 oh, it got a 403 error 03:55:11 JC|m: no dice, all my attempts got 403 errors for the front page 03:59:02 maybe something for qwarc from JAA 03:59:24 pabs: is it the robots.txt? 03:59:59 no, AB ignores robots.txt, the first request AB sent (for the front page) got a 403 error 04:00:27 same goes for the subdomains, including the forum 04:01:36 Do you guys also archive forums, right? 04:02:07 I archived all of subscene last year. Though not in the warc format, but just an archive of all subtitle files. 04:02:12 The main problem was URL discovery, as there is no easy way to get a list of all shows. Also had some problems with cloudflare. 04:07:38 Is v3.2 The latest version of Warrior? 04:07:50 from 2021 04:08:32 from https://warriorhq.archiveteam.org/downloads/warrior3/ ? those are virtual machine images 04:09:36 if you're familiar with docker, you can also run a container and get the latest that way. I think the VM images would keep themselves up to date, but I've never used them 04:10:13 there is a dedicated #warrior channel if you need support for either option 04:13:37 JC|m: yeah, we often safe forums 04:13:41 er save forums 04:16:26 https://discuss.privacyguides.net/ 04:16:30 https://linustechtips.com/ 04:18:30 Some forums that you may want to archive 04:20:06 I tried to archive forum.cdm.me before, but there any many ignores that I had to apply 04:20:33 All the pages with this were behind the login screen: ^(?:(?!private\.php\?|register\.php\?|sendmessage\.php\?|itrader_feedback\.php\?|newreply\.php\?|usercp\.php\?|subscription\.php\?).)*$ 04:22:49 seems like a very aggressive cloudflare configuration on subscene 06:29:11 https://twitter.com/PretendoNetwork/status/1710896499700150390 06:29:12 nitter: https://nitter.net/PretendoNetwork/status/1710896499700150390 06:42:54 hmm, I feel like the AB websocket is not passing on all URL requests/responses 06:54:25 pabs: There's a known bug near the end of jobs, where the last couple lines might get swallowed. Other than that, it should only drop messages when it can't keep up. It currently looks like there are two clients that are too slow and get messages dropped regularly. 06:55:25 hhmm, I have two non-browser clients attached, that must be me 07:11:18 JAA: an example: I just redid upload.systems, but the job doesn't show up at all in the browser, id y6rqls8zrwd2r7hc1fa2znok 11:47:03 hi 11:47:10 could I have this whole website archived on the internet archive? It's a very small website. 11:47:13 https://oowmun.org/ 11:47:18 !archive https://oowmun.org/ 13:13:38 PinstripedProspects.com, a blog covering the New York Yankees minor league system, has announced it's shutting down in 2 weeks. https://www.pinstripedprospects.com/pinstriped-prospects-website-shutting-down-65038/ 14:12:44 PaulWise edited Software Heritage (+424, add some more info and related projects): https://wiki.archiveteam.org/?diff=50949&oldid=28671 14:14:44 TheTechRobo edited URLTeam (-564, Improve tiny.cc entry): https://wiki.archiveteam.org/?diff=50950&oldid=50889 14:15:45 TheTechRobo edited URLTeam (+19, Add another t.ly link): https://wiki.archiveteam.org/?diff=50951&oldid=50950 14:17:45 TheTechRobo edited URLTeam (+44, T.ly is non-incremental): https://wiki.archiveteam.org/?diff=50952&oldid=50951 14:27:47 PaulWise created Trac (+2072, create Trac project page): https://wiki.archiveteam.org/?title=Trac 14:29:47 PaulWise edited Bugzilla (+0, redhat bugzilla crashed): https://wiki.archiveteam.org/?diff=50954&oldid=50756 14:30:47 PaulWise edited Mailman2 (+157, more): https://wiki.archiveteam.org/?diff=50955&oldid=50948 14:32:48 PaulWise edited GitHub (+19, Category:Code): https://wiki.archiveteam.org/?diff=50956&oldid=50737 14:41:49 PaulWise edited IRC/Logs (+31, wordpress logs): https://wiki.archiveteam.org/?diff=50957&oldid=50917 15:15:34 what's the forecast for AT spinning back up? is it a storage or cpu/net saturation issue on IA's end? 16:29:04 audrooku|m: "soon" I believe. #shreddit is going straight to IA and #// is due to restart (in some capacity) as well I think? 16:29:22 from #archiveteam ": The problems at IA that prevented us from uploading large amounts of data are getting better. We will now start uploading (part of) the offloaded data to IA, and probably resume projects after. The situation is not completely 'back to normal' yet, but will likely be in about a month." 16:30:08 that's good to hear, thanks for answering my question as I missed that message 16:30:19 no worries 19:45:45 One week after the supposed shutdown, the Canucks forum is still going. I'm running a continuous thing that fetches new posts as they're being made until it does shut down. 19:51:36 Also, for the record, the community-chosen successor seems to be https://www.canucksfanforum.com/ (which somehow already has 35k posts since mid-Sept). 20:05:02 JAA: did you see mountainbladder's message about TaleWorlds whitelisting AB pipeline IPs? 20:05:07 it was a few days ago I think 20:05:33 pokechu22: Yes, I replied as well, just didn't have time to act on it yet. 23:13:54 blast from the past: Archiverse – Archive Team's dump of Nintendo's Miiverse (2012–2017): https://archiverse.guide/ :)