00:32:38 datechnoman: Deduplicated and queued 24992054 items. (Wx5KNGEC) 00:48:17 datechnoman: Skipped 9091 invalid URLs: https://transfer.archivete.am/zJbOo/edu_urls_batch_0002_001.txt.bad-urls.txt (gqnV5nbc) 00:48:18 datechnoman: Skipped 155 very long URLs: https://transfer.archivete.am/HMTTs/edu_urls_batch_0002_001.txt.skipped.txt (gqnV5nbc) 00:48:19 datechnoman: Deduplicating and queuing 24990754 items. (gqnV5nbc) 00:49:26 !a https://transfer.archivete.am/Qj1l/edu_urls_batch_0002_002.txt 00:49:26 datechnoman: Registering EOZxQYNK for '!a https://transfer.archivete.am/Qj1l/edu_urls_batch_0002_002.txt' 01:19:42 datechnoman: Skipped 4389 invalid URLs: https://transfer.archivete.am/9oo6B/edu_urls_batch_0002_002.txt.bad-urls.txt (EOZxQYNK) 01:19:45 datechnoman: Skipped 62 very long URLs: https://transfer.archivete.am/t5IT3/edu_urls_batch_0002_002.txt.skipped.txt (EOZxQYNK) 01:19:46 datechnoman: Deduplicating and queuing 13724758 items. (EOZxQYNK) 01:31:55 datechnoman: Deduplicated and queued 24990755 items. (gqnV5nbc) 01:32:13 !status 01:32:14 datechnoman: Jobs running: 1, jobs waiting for a slot: 0. 01:37:08 datechnoman: Deduplicated and queued 13724759 items. (EOZxQYNK) 02:51:00 !a https://transfer.archivete.am/ats1x/edu_urls_batch_0003_000.txt 02:51:00 datechnoman: Registering BVq3IfHU for '!a https://transfer.archivete.am/ats1x/edu_urls_batch_0003_000.txt' 02:56:28 !a https://transfer.archivete.am/X54jb/burnthetwitch-urls 02:56:29 TheTechRobo: Registering rjtwSINt for '!a https://transfer.archivete.am/X54jb/burnthetwitch-urls' 02:56:34 TheTechRobo: Deduplicating and queuing 5489 items. (rjtwSINt) 02:56:43 TheTechRobo: Deduplicated and queued 5489 items. (rjtwSINt) 03:29:24 datechnoman: Skipped 5216 invalid URLs: https://transfer.archivete.am/gNhdR/edu_urls_batch_0003_000.txt.bad-urls.txt (BVq3IfHU) 03:29:26 datechnoman: Skipped 56 very long URLs: https://transfer.archivete.am/HA1cN/edu_urls_batch_0003_000.txt.skipped.txt (BVq3IfHU) 03:29:27 datechnoman: Deduplicating and queuing 24994727 items. (BVq3IfHU) 03:34:07 !a https://transfer.archivete.am/bInf9/edu_urls_batch_0003_001.txt 03:34:08 datechnoman: Registering dBUwCpLn for '!a https://transfer.archivete.am/bInf9/edu_urls_batch_0003_001.txt' 04:16:42 datechnoman: Deduplicated and queued 24994730 items. (BVq3IfHU) 04:24:25 datechnoman: Skipped 4999 invalid URLs: https://transfer.archivete.am/FCXVl/edu_urls_batch_0003_001.txt.bad-urls.txt (dBUwCpLn) 04:24:27 datechnoman: Skipped 110 very long URLs: https://transfer.archivete.am/TI2sa/edu_urls_batch_0003_001.txt.skipped.txt (dBUwCpLn) 04:24:28 datechnoman: Deduplicating and queuing 24994891 items. (dBUwCpLn) 04:53:04 datechnoman: Deduplicated and queued 24994892 items. (dBUwCpLn) 05:28:03 !a https://transfer.archivete.am/rIAoC/edu_urls_batch_0003_002.txt 05:28:07 datechnoman: Registering rMLCrn7k for '!a https://transfer.archivete.am/rIAoC/edu_urls_batch_0003_002.txt' 05:51:35 datechnoman: Skipped 3540 invalid URLs: https://transfer.archivete.am/12Q1HC/edu_urls_batch_0003_002.txt.bad-urls.txt (rMLCrn7k) 05:51:36 datechnoman: Skipped 36 very long URLs: https://transfer.archivete.am/48jtQ/edu_urls_batch_0003_002.txt.skipped.txt (rMLCrn7k) 05:51:37 datechnoman: Deduplicating and queuing 14343597 items. (rMLCrn7k) 06:08:18 datechnoman: Deduplicated and queued 14343600 items. (rMLCrn7k) 07:04:53 nyuuzyou: can you add that to Deathwatch? also ensure you add a reference link https://wiki.archiveteam.org/index.php/Deathwatch 08:30:23 JAA: hey, just checking in if you found the time to add xrel to the list of scheduled captures. 08:31:11 i'm not sure if the AT captures arrive in batches in the wayback machine 12:17:30 https://transfer.archivete.am/PJutJ/flagella.crbs.ucsd.edu.log this one has been around for a while, http://flagella.crbs.ucsd.edu/images?image_search_parms[cellular_component]=polytene chromosome&advanced_search=advanced+search&per_page=10&page=3&per_page=10&page=3[...] - probably filter out more than one (per_)page entry? 12:17:32 inline (for browser viewing): https://transfer.archivete.am/inline/PJutJ/flagella.crbs.ucsd.edu.log 17:18:17 knecht4: Not yet, no. If you could send a PR to https://github.com/ArchiveTeam/urls-sources , that'd be great. You can model it after 60_tech_link_forums.txt, I think. 17:18:57 The first number is the request interval in seconds. 17:22:34 I'll look into it. As a separate file or into 60_tech_link_forums.txt? 17:24:44 possibly you want to add something like https://www.xrel.to/comments2-latest.html?tab=releases-all&page=1 17:32:38 nice find. would it be beneficial to use this instead of the home page? 17:34:46 imo yes, would reduce noise 17:36:52 ah yeah these also exist for board and reviews 17:38:36 yeah nice, could grab all three and cut out the fluff 17:46:43 this is just an iframe on the main page afaict 17:47:10 so not sure it'd even get grabbed if you just gave the homepage to urls 17:50:39 yeah and it still contains links to the actual comment pages 17:51:12 redirects are followed? because the links are shortened 17:52:48 yeah redirects should be followed with a depth of 1 i think, but i'm not sure 17:53:25 alright, thank you! PR is incoming 17:53:46 Separate file, I think. 17:54:13 the hacker news file was renamed to tech forums 17:54:16 Let's add it and check in a few days what ends up in the WBM. :-) 17:54:35 Yeah, but xREL isn't a tech forum. 17:55:22 i was thinking to add a generic file named "warez_related" or whatever but now its just "60_xrel.txt" 17:55:27 right :p 17:55:58 i guess it could be renamed in the future if fitting stuff gets added. 17:56:19 Yep, 60_xrel.txt sounds good. 17:57:29 alright, it's live :) https://github.com/ArchiveTeam/urls-sources/pull/36 17:57:47 arkiver: ^ 19:55:44 new pdfs if someone wants to plop them in: https://transfer.archivete.am/dG2Cz/pdf_CC-MAIN-2024-30.txt.zst https://transfer.archivete.am/wSXMu/pdf_CC-MAIN-2024-33.txt.zst 20:58:51 does h2ibot take .zst? 21:02:17 No, but transfer can auto-decompress when you remove .zst. 21:09:41 TIL! 21:15:13 Originally added for socialbot, but turned out to be useful for qubert, too.