00:08:38 aww, one site I wanted to crawl has some links only in comments :( 00:16:33 project10: I definitely think AT needs a system for feeding links discovered by #archivebot, #wikibot, #// and other places to different projects. imgur links for eg usually 429 in AB. the mailman/buzilla/codearchiver/SWH/wikibot/etc projects could also use those auto-discovery services 00:19:02 especially with (seemingly?) longer-running projects now like imgur, reddit, telegram. I assume things like imgur discovered through #// are sent via backfeed to the imgur queue in that case? 00:19:27 I *think* #// and other projects do feed into imgur, but not archivebot 00:20:07 https://sh.itjust.works/post/4842435 00:20:55 so what, it was based on the metro novels? or just a metro-like idea? 01:26:38 PaulWise edited Bugzilla (+785, add bugzilla-url-list by JAA strategy): https://wiki.archiveteam.org/?diff=50756&oldid=50599 02:42:29 arkiver: i've checked periodically, but i still just get redirects to the shutdown notice. plcp might know more 02:58:56 DigitalDragon edited NewsGrabber (+18): https://wiki.archiveteam.org/?diff=50757&oldid=50706 02:59:38 that works 03:02:46 does AB have a max size per fetched URL? I see the debian.org/releases job fetching netinst ISOs but no others, I assume size limit at play? 03:04:43 hmm, didn't mean to fetch those 03:34:04 archivebot jobs for katapult are all done; i will grab the meta files and extract srcset components when they get uploaded (probably tomorrow) 03:50:58 what does ab do with 429 errors? 03:53:10 im noticing on the empire minecraft job that its not getting imgur links and some of them arent in the wbm, the rest were grabbed by the imgur project already 04:07:31 It retries them twice and then dismisses them, but imgur will never succeed with AB - you'll need to download the meta-warc and send a list of them to the imgur project 04:11:21 not really sure how to do that lol 04:11:53 kind sounds like something that could be automated (not that i know how to do that either) 14:17:10 JustAnotherArchivist edited The WARC Ecosystem (+304, /* Tools */ Add ArchiveBox): https://wiki.archiveteam.org/?diff=50758&oldid=50711 14:44:57 not sure if this is -ot but we might need a watch on bandcamp https://teddydd.me/2023/backup-your-bandcamp-music/ 15:18:51 Wonder if archivebox could use wget-AT 15:42:22 not sure if this is -ot but we might need a watch on bandcamp https://teddydd.me/2023/backup-your-bandcamp-music/ 15:42:31 Reminds me of Amazon Prime Video's bs 15:43:41 TheTechRobo: if it did i would be so happy 16:17:09 how can I mirror a site with wget-lua and include all page requisites? 16:17:23 --recursive, --mirror, and --page-requisites doesn't seem to work 16:20:40 qq44|m: are you using a lua script? (I dont know the solution, I assume wget-lua behaved like regular wget, but with more scripting) 16:21:28 imer: im not using a script, just plain old wgetlua 16:22:40 grab-site seems to work properly with page requisites, but wget doesn't seem to pull them with recursive downloads 18:00:57 FireonLive edited Current Projects (-163, Remove NG -- superseded by URLs): https://wiki.archiveteam.org/?diff=50759&oldid=50685 18:27:40 Looking at https://web.archive.org/web/20230000000000*/https://e.orange.fr/error404.html some captures show in blue and some show in orange - I'm pretty sure https://e.orange.fr/error404.html always returns 404, so is there a reason for them being blue? (that page has a ton of captures because any personal page that had a 404 or didn't exist would *redirect* there, and 18:27:42 archivebot doesn't dedupelicate redirect targets) 18:28:53 pokechu22: perhaps the server didn't return 404 error code in the headers, and instead returned 200 but said 404 on the page? 18:30:02 Picking a snapshot from april 2 that shows as blue (https://web.archive.org/web/20230402090744/https://e.orange.fr/error404.html) still shows a 404 in my developer tools when loading the page 18:46:30 I've found the colours in the calendar to be wildly inaccurate all the time. 18:50:01 calendars, the bane of our existence 18:51:15 In other news, my FuzzyMemories.TV grab-site crawl finished. 18:52:08 Three /watch/ URLs failed, otherwise it looks fine. 18:52:26 nice 18:53:02 4232 video files from 4761 attempted IDs 18:54:50 Random example of a video where the file is a 404: http://www.fuzzymemories.tv/watch/2276/kiddieland-amusement-park-commercial-1-1990/ 18:55:18 awesome :) 18:55:46 Total WARC size is 107 GiB. It'll be on its slow way to IA soon. 19:56:18 JustAnotherArchivist created The Museum of Classic Chicago Television (+611, Created page with "{{Infobox project | URL =…): https://wiki.archiveteam.org/?title=The%20Museum%20of%20Classic%20Chicago%20Television 19:57:18 JustAnotherArchivist created FuzzyMemories.TV (+54, Redirected page to [[The Museum of Classic…): https://wiki.archiveteam.org/?title=FuzzyMemories.TV 19:57:19 JustAnotherArchivist created FuzzyMemoriesTV (+54, Redirected page to [[The Museum of Classic…): https://wiki.archiveteam.org/?title=FuzzyMemoriesTV 21:12:37 FireonLive edited Reddit (+129, wording fix?): https://wiki.archiveteam.org/?diff=50763&oldid=50722 21:13:18 https://wiki.archiveteam.org/index.php/ArchiveBot/2019_Australian_federal_election a question do pages like this still work as originally intended? 21:13:42 https://wiki.archiveteam.org/index.php/Special:Contributions/HadeanEon makes me think no 21:13:52 No 21:14:31 Alright then. I will have to use the viewer to work out what to put in and what not then. All good. Still good sources of things to throw in. 22:11:52 JustAnotherArchivist edited The Museum of Classic Chicago Television (+597, Add known archives): https://wiki.archiveteam.org/?diff=50764&oldid=50760 22:57:17 -+rss:#hackernews- Microsoft to kill off third-party printer drivers in Windows: https://www.theregister.com/2023/09/11/go_native_or_go_home/ https://news.ycombinator.com/item?id=37473628 22:57:19 "To be clear, the end of servicing applies to drivers provided via Windows Update. Manufacturers will, according to Microsoft, "need to provide customers with an alternative means to download and install those printer drivers." Legacy v3 and v4 Windows printer drivers are facing the end of servicing ax." 23:25:08 I have never seen 3rd party drivers updating via Windows Update 23:26:47 looks like this 'Mopria' has existed for a while and more newer printers are using it?