05:44:51 i've seen this 888 spam before 05:45:29 thanks for the ping datechnoman and for pausing JAA , that *.top spam stuff needs some special filtering 08:33:59 All good mate, anytime :) 08:34:18 Will need some special regex to filter that rubbish spam out :o 09:16:15 thuban: at https://github.com/archiveteam/urls-sources 09:17:33 arkiver: i mean where in the repo 09:18:55 huh 09:19:08 did i forgot to `git add` the file? hmm 09:20:59 thuban: it's included now 09:21:13 900_bbc_mediaguide.txt 09:25:34 arkiver: cool, ty! 09:28:28 are all of those really duplicates? some of them look spurious. also, do you want the script i wrote to generate the list? (i see there's some similar stuff in /other) 09:33:19 thuban: yeah, feel free to put the script in /other ! 09:33:47 thuban: yeah, those should be duplicates. they're deduplicated from other lists, using the deduplicate_lists.py script 09:35:54 ah, so they are. sorry, i checked with github's code search but github's code search is trash 09:36:39 so need for sorry :) 09:36:41 but yeah 09:37:39 datechnoman++ 09:37:40 -eggdrop- [karma] 'datechnoman' now has 8 karma! 09:37:42 JAA++ 09:37:43 -eggdrop- [karma] 'JAA' now has 33 karma! 09:37:59 is there negative karma 09:38:11 discord-- 09:38:12 -eggdrop- [karma] 'discord' now has -8 karma! 09:38:14 yes :P 09:38:18 arkiver-- 09:38:18 -eggdrop- [karma] self karma is a selfish pursuit. 09:38:18 LOL 09:38:25 so selfish of me! 09:38:28 arkiver++ 09:38:28 -eggdrop- [karma] 'arkiver' now has 21 karma! 09:38:31 arkiver++ 09:38:31 -eggdrop- [karma] 'arkiver' now has 22 karma! 09:42:28 arkiver: the list generation is a bit awkward because afaict the bbc doesn't have an index of these anywhere--i manually assembled a list of pages from search results (that's actually the bbc_mediaguides.tsv i accidentally uploaded earlier) and fed them to the scraper 09:51:44 https://transfer.archivete.am/ojjT8/bbc_mediaguides.tsv (corrected), https://transfer.archivete.am/na5xh/bbc-media-lister.py 09:51:44 inline (for browser viewing): https://transfer.archivete.am/inline/ojjT8/bbc_mediaguides.tsv https://transfer.archivete.am/inline/na5xh/bbc-media-lister.py 09:52:01 can clean up for format etc later 12:39:16 Just about to get some rest arkiver . Assuming we will be on hold for awhile longer? No biggie if so. Just checking it I should wait around for a moment or not 12:42:26 datechnoman: probably 12:42:38 can ping you when we resume if you'd like 12:43:32 Ack cheers mate. I shall head off than. Night! 12:44:55 cheers mate 15:17:55 the user agents list is updated 15:31:13 the spam loop should be fixed 15:31:16 resuming the project 15:34:03 the fake nasa one is filtered out now simply 15:35:41 nice 15:36:23 whats being moved to todo? 15:36:49 the nasa stuff 15:36:53 so it if filtered out fast 15:37:35 because if the filtering is done while the filtered out items are in :todo:backfeed, items will also be taken out of :todo:secondary, which will cause more items to be queued back to :todo:backfeed 15:37:37 makes sense 15:38:02 and then :todo:backfeed goes down slower, which decreases our ability to spot loops or reasons why it may not go down 15:38:21 however i'll now also move :todo:backfeed to :todo:secondary to start with a fresh empty :todo:backfeed 15:38:53 that is happening now 15:40:42 20240417.03 is now the minimum version 15:40:58 so the spam loop was a new 'version' of an old one https://github.com/ArchiveTeam/urls-grab/commit/e926de30fd2f134a94c54282a78bdf6adf1f12ad 15:44:28 paused for a bit as items are being moved around 16:41:14 datechnoman: fyi for when you spin back up ^ (project is still paused) 17:29:29 looks unpaused now, we're back in business 18:25:52 datechnoman: ^ 22:34:38 Thanks for the ping. Will get my fleet up and running shortly 22:41:09 brrrrrrrrrrr 22:47:15 Slap the tracker and target around again xD 23:00:04 :D 23:21:38 * nyany slaps the tracker around a bit with a large elver