00:17:05 -guybrush.hackint.org- *** Notice -- TS for #// changed from 1655942823 to 1604184663 07:26:18 arkiver poke me when the next batch/bulk load is loaded. Will spin back up :) 07:29:24 Been running great since the last url source update 07:52:44 I think I have an addiction to data hoarding... 13:28:10 datechnoman: haha, will do :) 13:28:26 working on an update to the code, and after that we'll resume getting sitemaps 13:32:11 No stress or pressure at all. I know your always very busy! :) 13:32:47 Anything exciting being updated? 13:33:29 we'll be pushing the periodical stuff into a separate backfeed shard 13:33:51 so it doesn't add to the main backfeed shard we use for general queuing of URLs 13:34:16 and sitemaps will be awesome :) 13:34:32 once per month we'll get the sitemaps of every site we come across 13:34:46 since those contain very valuable information and are not always available anymore when needed 13:40:44 That is genius and also very effective with the sitemaps 13:40:54 I like it 13:41:47 The extra backfeeds will hopefully allow for better control of incoming items 19:45:12 !a https://transfer.archivete.am/5QGP1/techrights.org-46lme-outlinks-wo-tweets 19:45:38 arkiver: Invalid command message. 19:46:20 !a https://transfer.archivete.am/5QGP1/techrights.org-46lme-outlinks-wo-tweets 19:46:44 arkiver: Deduplicated and queued 0 items. 19:47:54 !a https://transfer.archivete.am/5QGP1/techrights.org-46lme-outlinks-wo-tweets 19:48:08 arkiver: Invalid command message. 19:48:38 !a https://transfer.archivete.am/5QGP1/techrights.org-46lme-outlinks-wo-tweets 19:49:46 !a https://transfer.archivete.am/5QGP1/techrights.org-46lme-outlinks-wo-tweets 19:50:04 arkiver: Deduplicating and queuing 3536804 items. 19:50:54 !a https://transfer.archivete.am/MMnDx/discord-linux-emoji-hub 19:51:00 !a https://transfer.archivete.am/MMnDx/discord-linux-emoji-hub 19:51:04 arkiver: Deduplicating and queuing 125 items. 19:51:05 arkiver: Deduplicated and queued 0 items. 19:51:36 !a https://transfer.archivete.am/MMnDx/discord-linux-emoji-hub 19:51:38 arkiver: Deduplicating and queuing 125 items. 19:51:40 arkiver: Deduplicated and queued 125 items. 19:51:44 !a https://transfer.archivete.am/5QGP1/techrights.org-46lme-outlinks-wo-tweets 19:52:01 arkiver: Deduplicating and queuing 3536804 items. 19:54:02 !a https://transfer.archivete.am/5QGP1/techrights.org-46lme-outlinks-wo-tweets 19:54:23 arkiver, thoughts on the Google Docs discussion above? oo; 19:54:32 arkiver: Deduplicating and queuing 3536804 items. 19:56:09 !a https://transfer.archivete.am/5QGP1/techrights.org-46lme-outlinks-wo-tweets 19:56:17 Ryz: first fixing this 19:56:22 arkiver: Deduplicating and queuing 3536804 items. 19:57:40 arkiver: Invalid command message. 19:58:45 !a https://transfer.archivete.am/5QGP1/techrights.org-46lme-outlinks-wo-tweets 19:58:57 arkiver: Deduplicating and queuing 3536804 items. 20:00:54 Hmm oo; 20:06:28 !a https://transfer.archivete.am/5QGP1/techrights.org-46lme-outlinks-wo-tweets 20:06:43 arkiver: Skipped 1 very long URLs - https://transfer.archivete.am/LVI7Q/techrights.org-46lme-outlinks-wo-tweets.skipped.txt 20:06:44 arkiver: Deduplicating and queuing 3536803 items. 20:07:20 that is better 20:08:01 arkiver: Deduplicated and queued 3536803 items. 20:08:45 JAA: systwi: ^ 20:09:04 :-) 20:09:16 there was a huge URL in there, we'll now skip those and give data about what was skipped 20:09:22 Whew, nice long URL, hehe. 20:12:18 arkiver: Did you queue https://transfer.archivete.am/15uux3/discord-nevergrind ? 20:16:27 !a https://transfer.archivete.am/15uux3/discord-nevergrind 20:16:30 JAA: Deduplicating and queuing 13608 items. 20:22:00 Well...? :-) 20:22:20 Is it supposed to have a "finished" message? 20:25:04 Yeah, see above. 20:25:08 arkiver: ? 20:30:41 Wasn't sure if that finished message was for testing or not. 20:42:19 Thanks arkiver! :D 20:44:50 So, when queueing jobs here, are they specifically run on Warriors as opposed to ArchiveBot, where jobs are run on pipelines? 20:45:13 !help 20:45:15 systwi: The following commands are available: 20:45:16 systwi: !help: Print this help message. 20:45:17 systwi: !a: Deduplicate and archive a list of URLs hosted on transfer.archivete.am. CAREFUL, DDOS. 20:45:49 I'm guessing that to be the case, judging from the last statement. 20:46:10 Yes, it runs on the URLs project. h2i just provides an interface to more easily queue into the project. 20:47:40 Ahh, gotcha, thanks. 20:48:20 Is there a dashboard to view the progress of ongoing jobs? 20:48:59 https://tracker.archiveteam.org/urls/ 20:49:00 :-) 20:49:27 Lists queued through the bots aren't treated specially, so there's no real good way to track the progress of a specific last as far as I'm aware. 20:50:26 Thanks for the info, that makes sense. 21:06:01 JAA: whoops, invalid character, let me encode those 21:36:48 !a https://transfer.archivete.am/JBPBV/test-url-list 21:37:58 I probably would need voice/op for that (just checking). 23:08:10 !a https://transfer.archivete.am/15uux3/discord-nevergrind 23:08:15 arkiver: Deduplicating and queuing 13608 items. 23:08:17 arkiver: Deduplicated and queued 13608 items. 23:08:25 JAA: TheTechRobo: ^ 23:08:44 we're percent encoding characters not supported by backfeed now 23:09:03 !a https://transfer.archivete.am/JBPBV/test-url-list 23:09:05 arkiver: Deduplicating and queuing 2 items. 23:09:06 arkiver: Deduplicated and queued 2 items. 23:09:13 systwi: ^ 23:09:17 and yeah admin is needed 23:09:50 !a https://transfer.archivete.am/JBPBV/test-url-list 23:09:51 arkiver: Deduplicating and queuing 2 items. 23:09:52 arkiver: Deduplicated and queued 2 items. 23:12:41 nom nom nom 23:12:48 Amazing how fast we process URL's lol 23:12:53 hah :) 23:12:57 and yeah! 23:16:15 Am i correct in saying we are only going max-depth 1 (one hop) for links found in a URL page we process? 23:17:14 yes 23:17:33 or 23:18:21 two actually 23:18:50 so hop 2 hops, but the first hop is only followed if it was not found before 23:20:33 Ahhh yup gotcha! Thanks. We technically arent crawling then