01:47:52 arkiver: there are definitely times when AB is at or above the job limit + 5 pending, especially when flashfire42 is around doing ISP stuff. I tend to avoid doing proactive stuff a fair bit, unless we are more than a few jobs below the limit and things seem quiet 01:48:29 and when/if snscrape comes back, there will be a big backlog of twitter archiving to do 01:49:02 Sorry bout that hahaha 01:50:23 and there will be other situations where higher peak capacity is useful; for eg the adelaide university merger is going to be tons of jobs due to the many subdomains 01:51:53 no, I think you're doing good stuff flashfire42 :) 01:52:25 anyway, I'm sure we will always be able to reach whatever the job limit is :) 02:30:59 What's the easiest way to archive a mid-sized portion of a website? (ex. example.com/stuff-to-archive/*) 02:36:40 owen: archivebot 02:37:02 just pass us the URL and we will run it, then everything happens automatically after that 02:37:36 that needs a directory index or some other link based mechanism that lists all subcontents though 03:06:10 The alternative technique is saving the entire site :P 03:12:39 AB job 8k26biu6lro5cb6vi3awnu3z8 is a chonky one 03:20:06 #shreddit is restarted 03:20:42 pabs: so, we're holding back now? 03:21:21 some folks are occasionally yeah 03:27:52 Reminder, inactive user content excluding YouTube on Google may start being deleted starting on 2023 December S: 03:33:29 when IA is all fine again with taking data, got great plans for expanding our archiving - or well especially plans for #// 03:33:48 we'll significantly increase our coverage of 'important stuff' 03:36:55 awesome possum 04:00:18 JAABot edited CurrentWarriorProject (-4): https://wiki.archiveteam.org/?diff=50742&oldid=50671 04:00:36 Ryz: does that include public blogspot/blogger/etc stuff? 04:11:32 pabs, yes...S: 04:13:40 The problem with Blogger user number IDs is that it gives 429s pretty easily at least on running ArchiveBot, which is why I would want this to take off the ground as soon as possible... 04:13:46 arkiver? 04:27:40 fuck 04:29:36 * pabs . o O 0 ( #Y ) 04:31:42 * pabs has 1323 URLs in his blogspot archive TODO... 04:31:50 x_x 04:33:27 ISTR with blogspot it is easy to enumerate lots of blogspot starting with one blog, see what other blogs that author has, and same for all the commenters 04:33:59 * pabs checks shell history for some terrible oneliners 04:34:19 also theres tons of spammers on blogspot 04:35:45 yeah one of the sites i want to get archived eventually is just 99% overrun with spam (it's also js-hell-frontend-on-top-of-phpBB2) :/ 04:35:48 sad to se 04:35:49 see 04:38:48 https://transfer.archivete.am/gJyh0/blogspot-profile-enumerator.sh 04:40:59 https://transfer.archivete.am/sKHm2/pabs-archive-blogspot-todo.txt 05:30:50 I got a couple of tasks that keep getting stuck at "Lua runtime error: reddit.lua:286: attempt to call global 'unicode_codepoint_as_utf8' (a nil value)"? They are reddit project tasks. 05:47:14 shinji257: thats known i think, #shreddit is the project channel :) 05:47:33 Just waiting for a fix, should be sorted later today 09:15:55 does anyone know if AB looks at links inside HTML comments? 09:43:01 https://www.msn.com/en-us/news/technology/atari-pulls-nostalgia-power-move-and-buys-homebrew-community-forum/ar-AA1grqaA, I've heard rumblings that they are going to purge boards on https://forums.atariage.com , dunno how easy it is to archive , it's an Invision forum board 09:52:28 there is an AB job in progress 09:52:46 and the forum has been saved before, 2021 or 2019 IIRC 09:53:08 unfortunately we had to restart the job a couple of times and slow it down a fair bit 09:57:57 Awesome 09:59:17 got the main website too and some other subdomains 14:09:05 Hey, do you have the video link, it's called https://www.youtube.com/watch?v=fUVrK6089fs 14:50:47 imer: acknowledged 15:37:34 Bzc6p edited ArchiveTeam Domains (+37, /* archiveteam.hu */ Lecsű is discontinued): https://wiki.archiveteam.org/?diff=50743&oldid=50703 15:42:34 Bzc6p edited Deathwatch (-3, /* 2023 */ fix grammar): https://wiki.archiveteam.org/?diff=50744&oldid=50741 15:43:35 Bzc6p edited Valhalla (+0, /* Physical Options */ typo): https://wiki.archiveteam.org/?diff=50745&oldid=50740 16:01:03 hello 16:01:25 is this like an archiving project? 16:02:43 This is the team that does the projects. You can find info about current and old archiving projects in the wiki https://wiki.archiveteam.org/index.php/Main_Page 16:03:06 On the page of every project you can also find the corresponging irc channel. 16:04:14 there's no everyplay archive right? 16:07:06 https://wiki.archiveteam.org/index.php/Everyplay doesn't look like it 16:07:25 thats so sad 16:07:33 i lost all my videos 16:12:41 yeah, I unfortunately haven't been able to find anyone who archived it 18:24:11 Exorcism edited DokuWiki (+92): https://wiki.archiveteam.org/?diff=50746&oldid=50527 18:26:11 Exorcism edited Wordpress.com (+106): https://wiki.archiveteam.org/?diff=50747&oldid=28940 19:02:05 pabs: AB parses the HTML and then walks the element tree. It shouldn't see anything in comments. 19:18:53 thuban: is there any update on the orange sites coming back? 19:19:21 Myusernameisanything edited University Web Hosting (-7, Changing not saved yet tag to lost.): https://wiki.archiveteam.org/?diff=50748&oldid=47676 19:19:22 Myusernameisanything edited List of websites excluded from the Wayback Machine (+57, Added 2 links): https://wiki.archiveteam.org/?diff=50749&oldid=50702 19:19:23 Myusernameisanything edited BluWiki (+10, If there are about 20 dumps, it is partially…): https://wiki.archiveteam.org/?diff=50750&oldid=27576 19:19:24 Gridkr edited List of websites excluded from the Wayback Machine (+20, Add https://nexo.com/): https://wiki.archiveteam.org/?diff=50751&oldid=50749 20:00:29 JAABot edited List of websites excluded from the Wayback Machine (+0): https://wiki.archiveteam.org/?diff=50754&oldid=50751 20:04:33 JustAnotherArchivist edited SoundCloud (+236, Datetimeify, add 2019 projectn't, add…): https://wiki.archiveteam.org/?diff=50755&oldid=48897 21:15:43 #archivebot jobs submit discovered things into the backfeed system, yes? 21:28:51 i don’t think so, assuming you mean e.g. queuing imgur URLs in #imgone 21:30:36 project10: No, there's zero interaction between AB and DPoS projects. 21:36:34 well the genesis of my question was seeing #telegrab items submitted via AB (job 1ty54jgyh2n6iv2ri6o0gbbbp) 21:44:18 That's just me archiving URLs shared in AT channels so our logs aren't full of dead links in the future. 21:44:36 oh :) 21:46:32 JAA++ 21:47:10 we need commode points system here aswell :P 22:23:26 JAA++ 22:23:27 -eggdrop- karma for 'JAA' is now 1 22:23:30 lol 22:31:05 2 files remaining and I'll finish getting the listing of all yahoo-videos .tar.bz2 files 22:32:13 my intention was to get *.tar.bz2 first while I wrote a more efficient script to get the .tar lists, which of course I haven't actually started yet so I'll have to continue the .tar files the slow way 22:38:50 ++fireonlive 22:39:03 f 22:39:13 Pff, doesn't even understand pre-incrementing. 22:39:29 :p 22:44:58 eggdrop— 22:45:26 Oh thanks the lounge i really needed that transformation 22:45:56 The Lounge-- 22:45:57 -eggdrop- [karma] 'The Lounge' is now at -1 22:46:33 The Lounge-- 22:46:35 -eggdrop- [karma] 'The Lounge' is now at -1 22:46:43 Ah, works with a normal space, too. :-) 22:46:51 The Lounge++ 22:46:51 -eggdrop- [karma] 'The Lounge' is now at 0 22:46:54 TheTechRobo: i do believe that was iOS 22:46:56 :P 22:47:05 Oh thanks apple then 22:47:10 iPhone--\ 22:47:12 iPhone-- 22:47:13 -eggdrop- [karma] 'iPhone' is now at -1 22:47:18 ! 22:47:26 Dictating how I type letters, thanks Timmy 22:48:28 >not knowing how to configure text replacement 22:49:39 This can go in -ot now. :-) 22:49:49 :) 22:49:51 Apparently my FuzzyMemories.TV crawl is nearly done. 22:50:40 It has a bit of pagination to hunt down but has already retrieved most /watch/ pages and the accompanying videos (that aren't 404s). 22:52:57 Specifically, video IDs go to 4794, and my crawl has retrieved 4668 as of a couple minutes ago. 22:53:16 ~100 GiB so far 22:55:35 4054 actual videos as of just now based on some crude log grepping.