00:00:52 skeletor wins 00:05:52 I'm not sure where the number comes from, but one source stated 2,398,412 posts 00:18:21 At the bottom on https://www.he-man.org/forums/boards/forum.php 00:32:50 (And 'the source' is https://old.reddit.com/r/Archiveteam/comments/17rps3j/hemanorg_forums_shutting_down_after_over_20_years/ I guess.) 00:38:50 ahhhhhh. thanks 01:51:04 Anyone know what's going on with HikariNoAkari? I heard there was some drama and it was shutting down 01:52:09 it's been discussed, but not with any particular insight 01:52:12 Yeah, apparently: https://i.imgur.com/jeJSEu6.jpg 04:12:05 So the AB job for Hikari No Akari finished, but it timed out on almost all sitemaps, i.e. I'm not sure it's complete. It does appear to have gone through the pagination, but new posts being added could still have led to some things getting missed. 07:56:05 PaulWise edited Bugzilla (+78, updates): https://wiki.archiveteam.org/?diff=51119&oldid=50954 13:18:13 high chance it's already been done, but Minnesota is / has run a flag-design contest, all* 2123 designs are on the site: https://serc.mnhs.org/flags 13:18:37 *all => I'm pretty sure that they're no longer accepting submissions. not 100%, though 13:19:22 I'm not going to put it in AB because I have only glanced at the site and am not sure if extra work is needed to capture the "click each submission to view the larger-size image" 16:13:43 hello. 16:13:54 like using archive bot 16:18:27 i’m glad you do! 16:18:39 it’s a nice bit 16:18:42 bot* 16:27:10 Bit Bot 16:33:34 beep boop 16:35:12 betamax: Looks like the large images work just fine without JS. They're standard links. I've thrown it into AB. 17:20:26 https://transfer.archivete.am/l5gdO/static.spore.com-ids-2016.txt.zst - Full (?) list of Spore.com creation IDs including INVALID/PURGED/BANNED statuses, as of 2016. 17:21:21 20741764 entries ^ 17:54:48 ScenarioPlanet: what speed and concurrency can that be ran at? 17:55:18 oh, wait, those are just numeric IDs, so it can't be ran directly 17:55:44 17:20 https://transfer.archivete.am/l5gdO/static.spore.com-ids-2016.txt.zst - Full (?) list of Spore.com creation IDs including INVALID/PURGED/BANNED statuses, as of 2016. 17:55:47 17:21 20741764 entries ^ 17:56:00 Pedrosso: might find that interesting 17:56:11 Hey uh pokechu22, I've been looking over archivebots logs of "https://davoonline.com/phpBB3?archiveteam". It's been archiving a lot of the same login page with different "?*" things 17:56:23 Also yes, I find it very interesting 17:56:36 Yeah, that doesn't look great :| 17:56:52 well, it makes sense for viewtopic, but https://davoonline.com/phpBB3/ucp.php?style=17&mode=login&redirect=search.php%3Fauthor_id%3D6234%26sd%3Dd%26sk%3Dt%26sr%3Dposts%26st%3D0%26start%3D40%26style%3D17 isn't useful 17:57:11 idk too much about the archivebot. I've seen ignorelists used. Idk how they work but can a "just ignore all mode=login lol" work? 17:58:02 Yeah, mode=login would ignore any URLs with the text mode=login in it, while ^https://davoonline.com/phpBB3/ucp\.php.*[?&]mode=login ignores anything starting with https://davoonline.com/phpBB3/ucp.php and containing either ?mode=login or &mode=login 17:58:37 Now, https://davoonline.com/phpBB3/viewtopic.php?style=17&p=18852 is a bit weird too since I'm not sure where the style=17 is coming from - I haven't seen other styles though so maybe it's fine? 17:59:49 hmm, no, URLs from https://davoonline.com/phpBB3/ don't have style=17 but once a URL with style=17 is retrieved that same parameter is added to everything else... and it looks identical to without it 18:00:18 pokechu22: https://transfer.archivete.am/inline/68759/2y5iu7ey1kzbspqay7vlkbkuq-trace 18:01:04 ... ah, it came from one link that has style=17 on it in https://davoonline.com/phpBB3/viewtopic.php?p=41535#p41535 it looks like 18:01:12 Probably best to just nuke that then 18:01:18 we don't need to save everything twice 18:01:47 Is it able to be modified live with these ignores? 18:02:05 Oh, nice 18:02:10 Yep, you can add and remove ignores as needed (and adjust the speed and concurrency at which it runs too) 18:03:39 As for the spore.com archive, I didn't know it would be able to archive users, but I'm glad it found its way, hah 18:04:10 Looking at http://archivebot.com/ignores/2y5iu7ey1kzbspqay7vlkbkuq we have /ucp\.php\?mode=(login|delete_cookies|pm) in the forums ignoreset - which doesn't work with the random style=17 in the middle 18:07:08 Yeah, that should probably be /ucp\.php\?(.*&)?mode=(login|delete_cookies|pm)(&|$) instead. 18:52:52 Since AB grabs a lot more than just the urls on a site, is there a way to determine whether a job will finish in time or not? Based on the current rate, he-man.org could grab ~3.3m if all goes well, but those aren't nessesarily all the urls on the site and probably external links 18:56:44 The easiest approach is to use --no-offsite if it seems like it'll be close to not finishing, and then manually run the offsite links afterwards (but that requires having the job's database manually saved since links skipped by --no-offsite don't end up in the log) 19:16:33 What's the deal with all the 600,001-600,001 ms delays? 19:18:23 most of those are in a pipeline that has been offline for about a year 19:19:09 s/most of // 19:19:51 JAA: Didn't you plan to remove the pipeline? Or is it on the todo list 19:20:04 Yeah, the latter. 21:41:13 https://t.me/zlibrary_official/41 "Sad news! Yesterday a large number of our domains were seized again. We should highlight that the majority of the seized domains were not mirrors of the Z-Library website, but they were separate sub-projects, containing only books in rare languages of the world, and their blocking is confusing. For instance, these 21:41:13 domains included books in Tamil, Mongolian, Catalan, Urdu, Pashto, and other languages." 21:55:42 Lost connection, did I miss anything? 21:57:27 Thanks to whoever added esporo to the archivebot :] 23:59:34 Vokunal edited Deathwatch (+11): https://wiki.archiveteam.org/?diff=51120&oldid=51118