03:01:57 !a help 03:01:58 datechnoman: Not a transfer.archivete.am URL. 03:02:03 !a https://transfer.archivete.am/qM02g/goo-gl.2023-02-05-14-17-02.txt.zst 03:05:38 datechnoman: I don't think the bot supports decompressing. 03:06:02 (But transfer does.) 03:06:12 I submitted the link and was like oops.... 03:06:20 Was waiting for it to yell at me 03:06:51 !a https://transfer.archivete.am/qM02g/goo-gl.2023-02-05-14-17-02.txt 03:06:52 arkiver: Maybe throw an error on \.zst$ URLs? 03:07:06 Sorry all. Just end user testing :P 03:10:28 datechnoman: Skipped 14319277 bad URLs: https://transfer.archivete.am/ncIXz/goo-gl.2023-02-05-14-17-02.txt.zst.bad-urls.txt 03:11:08 I ... think I'll skip throwing these into AB. 03:11:38 datechnoman: Fixed 11493254 unprintable URLs: https://transfer.archivete.am/nHCvu/goo-gl.2023-02-05-14-17-02.txt.zst.not-printable.txt 03:11:39 datechnoman: Deduplicating and queuing 0 items. 03:11:40 datechnoman: Deduplicated and queued 0 items. 03:14:42 JAA This is data that was taken from our WARC's on Archive.org so no point keeping them as we will be double copying them 03:15:28 datechnoman: 'Was taken from' is vague enough that I like to still keep the actual list. :-) 03:15:34 Or is it from the URLTeam files? 03:15:49 Yup straight from URLTeam files 03:16:15 I grab the warc, process it (grab google links for example) compress and tell #// to queue basically 03:16:30 There are no WARCs on URLTeam though. 03:17:00 zip files sorry 03:17:10 Im out of practice. Been a month >.< 03:17:57 Right, yeah, then I guess it's unnecessary. 03:21:32 !a https://transfer.archivete.am/eYvIO/goo-gl.2023-02-07-20-17-02.txt 03:26:36 The previous one is still ingesting. 03:26:44 Those messages were for the .zst. 03:28:14 datechnoman: Skipped 26220 bad URLs: https://transfer.archivete.am/13mVPF/goo-gl.2023-02-07-20-17-02.txt.bad-urls.txt 03:28:18 Magic thanks for that. Was downloading it and trying to figure out why it failed. Makes sense 03:28:48 datechnoman: Deduplicating and queuing 8676692 items. 03:31:44 datechnoman: Deduplicated and queued 8676692 items. 17:40:41 something is going on here 18:04:04 I am seeing a bunch of Server returned 0 (HEOF). 18:04:14 but maybe it's my VM. the URLs seem to load from home 18:05:21 well we have another annoying loop... blegh 18:06:03 pausing for a few hours until I can fix this 21:08:38 Stupid loops! >:(