00:21:57 arkiver or anyone, when checking https://archive.org/download/archiveteam_urls_20220622000554_038a19a3 - why are the latter half of the 3 links have locks on them? Are they in the middle of uploading right now? 00:22:27 The message before it being "this item is currently being modified/updated by the task: derive" 00:25:06 arkiver, checking from https://archive.org/download/archiveteam_urls_20220621134222_a603ce80/urls_20220621134222_a603ce80.1650286618.megawarc.warc.os.cdx.gz - is there a way that when https://docs.google.com/document/d/14onFbq-440wOViookx7VXCtu9P_MSmizMbKlWk9-5Ks/edit is being archived, also include archiving 00:25:14 https://docs.google.com/document/d/14onFbq-440wOViookx7VXCtu9P_MSmizMbKlWk9-5Ks/preview and https://docs.google.com/document/d/14onFbq-440wOViookx7VXCtu9P_MSmizMbKlWk9-5Ks/export 00:25:39 Ryz: Because archiveteam_urls_20220622000554_038a19a3 only got moved over from the inbox, which is access-restricted, a few minutes ago, and the corresponding task didn't get processed yet. It should become accessible soon. 00:25:59 I know that when ArchiveBot accesses Google Drive links, it always gives 429s, I'm not sure if Google Docs links are the same dealio 00:27:13 This could also apply with spreadsheets ( https://docs.google.com/spreadsheets/d/1cUen0-fDAba-vpz9WDZvfpKoz3q4CEn8eOYQDOEyyxw/edit ) with 00:27:15 https://docs.google.com/spreadsheets/d/1cUen0-fDAba-vpz9WDZvfpKoz3q4CEn8eOYQDOEyyxw/preview and https://docs.google.com/spreadsheets/d/1cUen0-fDAba-vpz9WDZvfpKoz3q4CEn8eOYQDOEyyxw/export 00:29:51 (That item seems to be fully accessible now.) 00:31:43 Unfortunately with something like https://docs.google.com/presentation/d/1IvfYbYKyTRT9da16vlsLLqzpsOqxSHf_Y8tY1lmGA_w/edit - previewing it only shows the slide pieces, but not the extra stuff 00:31:56 https://docs.google.com/presentation/d/1IvfYbYKyTRT9da16vlsLLqzpsOqxSHf_Y8tY1lmGA_w/edit#slide=id.g1080cddb5a_2_95 having the extra text and https://docs.google.com/presentation/d/1IvfYbYKyTRT9da16vlsLLqzpsOqxSHf_Y8tY1lmGA_w/edit#slide=id.g1080cddb5a_2_84 not 00:32:19 Doing the https://docs.google.com/presentation/d/1IvfYbYKyTRT9da16vlsLLqzpsOqxSHf_Y8tY1lmGA_w/export trick unfortunately also just fetches a .PNG of the first slide 00:34:48 Something like https://docs.google.com/drawings/d/1ZimNRIi0HIFO8GUOuU9We13lOrFR1-s9m-0Ft_hnb8U/edit - https://docs.google.com/drawings/d/1ZimNRIi0HIFO8GUOuU9We13lOrFR1-s9m-0Ft_hnb8U/preview bears similar problems with how content is rendered, 00:35:07 With https://docs.google.com/drawings/d/1ZimNRIi0HIFO8GUOuU9We13lOrFR1-s9m-0Ft_hnb8U/export basically importing what's seen in preview as a .PNG 00:46:15 I hope this isn't too much of an overload, because I feel further research should be needed, especailly beyond the documents and spreadsheets, when archiving them through ArchiveBot too, not just for this channel 01:23:55 Ryz: Don't forget https://docs.google.com/document/d/14onFbq-440wOViookx7VXCtu9P_MSmizMbKlWk9-5Ks/mobilebasic 01:28:42 Huh, I somehow forgot about that~ 01:29:05 The real curiosity is to that extent it would work with the other filetypes TheTechRobo 01:29:23 Ryz: I know for a fact it doesn't seem to work for Google Sheets 01:29:28 not sure about any others 01:29:30 Yeah, that doesn't work... 01:29:59 Presentation files and Drawing files don't work 01:30:06 Unless it's a different name that can be used... 01:31:17 There's also .../htmlview for (some?) spreadsheets. 01:31:44 https://docs.google.com/spreadsheets/d/1cUen0-fDAba-vpz9WDZvfpKoz3q4CEn8eOYQDOEyyxw/htmlview 01:33:12 .../pubhtml also exists sometimes: https://docs.google.com/spreadsheets/d/1ncxe8ngEVQcis1_oSbidcxN0XNGUd-MTRC7D-YxrpTo/pubhtml (slow and large!) 01:35:13 https://docs.google.com/spreadsheets/d/10lbPnDYJXhbtlA0ls0cGjjX_osFSG559IDrTbhgPHvc/pubhtml -> "We're sorry. This document is not published." 01:36:17 Yeah, Google Docs is a mess. 01:36:29 Other times, you get a login form. 01:36:33 I can just imagine some tired engineer on a Friday. :P 05:36:47 Ready for the next wave of new URL's if there are any :) 05:38:16 Did the queueing of that list finish? The bot never replied to the !a. 05:38:32 But maybe it's just slow. :-) 05:40:18 I did see a bulk import of about 1 million items/urls a few hours ago but they were chewed up real fast 05:40:47 Been rebuilding some of my nodes to be more efficient and process much higher IOPS that are ripping through things now 05:43:47 (I thought it was just slow and we were waiting on a finished message. Arki ver said earlier it was going at 500 URL per request, so I can't imagine it was going _that_ fast?) 05:47:43 The input list was 3.5M URLs, but I guess there was some duplication. Not sure I'd expect over 70% dupes though. 05:49:04 Mind you as they were importing/queuing we were processing thousands per second already 05:49:12 Soooooo mayyyyyybe? haha 05:49:38 But i do agree. No completed import message