-
Ryzarkiver or anyone, when checking archive.org/download/archiveteam_urls_20220622000554_038a19a3 - why are the latter half of the 3 links have locks on them? Are they in the middle of uploading right now?
-
RyzThe message before it being "this item is currently being modified/updated by the task: derive"
-
Ryzarkiver, checking from archive.org/download/archiveteam_ur….1650286618.megawarc.warc.os.cdx.gz - is there a way that when docs.google.com/document/d/14onFbq-…iookx7VXCtu9P_MSmizMbKlWk9-5Ks/edit is being archived, also include archiving
-
Ryz
-
JAARyz: Because archiveteam_urls_20220622000554_038a19a3 only got moved over from the inbox, which is access-restricted, a few minutes ago, and the corresponding task didn't get processed yet. It should become accessible soon.
-
RyzI know that when ArchiveBot accesses Google Drive links, it always gives 429s, I'm not sure if Google Docs links are the same dealio
-
RyzThis could also apply with spreadsheets ( docs.google.com/spreadsheets/d/1cUe…pz9WDZvfpKoz3q4CEn8eOYQDOEyyxw/edit ) with
-
Ryz
-
Jake(That item seems to be fully accessible now.)
-
RyzUnfortunately with something like docs.google.com/presentation/d/1Ivf…a16vlsLLqzpsOqxSHf_Y8tY1lmGA_w/edit - previewing it only shows the slide pieces, but not the extra stuff
-
Ryz
-
RyzDoing the docs.google.com/presentation/d/1Ivf…6vlsLLqzpsOqxSHf_Y8tY1lmGA_w/export trick unfortunately also just fetches a .PNG of the first slide
-
RyzSomething like docs.google.com/drawings/d/1ZimNRIi…GUOuU9We13lOrFR1-s9m-0Ft_hnb8U/edit - docs.google.com/drawings/d/1ZimNRIi…uU9We13lOrFR1-s9m-0Ft_hnb8U/preview bears similar problems with how content is rendered,
-
RyzWith docs.google.com/drawings/d/1ZimNRIi…OuU9We13lOrFR1-s9m-0Ft_hnb8U/export basically importing what's seen in preview as a .PNG
-
RyzI hope this isn't too much of an overload, because I feel further research should be needed, especailly beyond the documents and spreadsheets, when archiving them through ArchiveBot too, not just for this channel
-
TheTechRoboRyz: Don't forget docs.google.com/document/d/14onFbq-…XCtu9P_MSmizMbKlWk9-5Ks/mobilebasic
-
RyzHuh, I somehow forgot about that~
-
RyzThe real curiosity is to that extent it would work with the other filetypes TheTechRobo
-
TheTechRoboRyz: I know for a fact it doesn't seem to work for Google Sheets
-
TheTechRobonot sure about any others
-
RyzYeah, that doesn't work...
-
RyzPresentation files and Drawing files don't work
-
RyzUnless it's a different name that can be used...
-
JAAThere's also .../htmlview for (some?) spreadsheets.
-
JAA
-
JAA.../pubhtml also exists sometimes: docs.google.com/spreadsheets/d/1ncx…SbidcxN0XNGUd-MTRC7D-YxrpTo/pubhtml (slow and large!)
-
TheTechRobodocs.google.com/spreadsheets/d/10lb…s0cGjjX_osFSG559IDrTbhgPHvc/pubhtml -> "We're sorry. This document is not published."
-
JAAYeah, Google Docs is a mess.
-
JAAOther times, you get a login form.
-
TheTechRoboI can just imagine some tired engineer on a Friday. :P
-
datechnomanReady for the next wave of new URL's if there are any :)
-
JAADid the queueing of that list finish? The bot never replied to the !a.
-
JAABut maybe it's just slow. :-)
-
datechnomanI did see a bulk import of about 1 million items/urls a few hours ago but they were chewed up real fast
-
datechnomanBeen rebuilding some of my nodes to be more efficient and process much higher IOPS that are ripping through things now
-
Jake(I thought it was just slow and we were waiting on a finished message. Arki ver said earlier it was going at 500 URL per request, so I can't imagine it was going _that_ fast?)
-
JAAThe input list was 3.5M URLs, but I guess there was some duplication. Not sure I'd expect over 70% dupes though.
-
datechnomanMind you as they were importing/queuing we were processing thousands per second already
-
datechnomanSoooooo mayyyyyybe? haha
-
datechnomanBut i do agree. No completed import message