00:08:38 <pabs> aww, one site I wanted to crawl has some links only in comments :(
00:16:33 <pabs> project10: I definitely think AT needs a system for feeding links discovered by #archivebot, #wikibot, #// and other places to different projects. imgur links for eg usually 429 in AB. the mailman/buzilla/codearchiver/SWH/wikibot/etc projects could also use those auto-discovery services
00:19:02 <project10> especially with (seemingly?) longer-running projects now like imgur, reddit, telegram. I assume things like imgur discovered through #// are sent via backfeed to the imgur queue in that case?
00:19:27 <nicolas17> I *think* #// and other projects do feed into imgur, but not archivebot
00:20:07 <nulldata> https://sh.itjust.works/post/4842435
00:20:55 <project10> so what, it was based on the metro novels? or just a metro-like idea?
01:26:38 <h2ibot> PaulWise edited Bugzilla (+785, add bugzilla-url-list by JAA strategy): https://wiki.archiveteam.org/?diff=50756&oldid=50599
02:42:29 <thuban> arkiver: i've checked periodically, but i still just get redirects to the shutdown notice. plcp might know more
02:58:56 <h2ibot> DigitalDragon edited NewsGrabber (+18): https://wiki.archiveteam.org/?diff=50757&oldid=50706
02:59:38 <fireonlive> that works
03:02:46 <project10> does AB have a max size per fetched URL? I see the debian.org/releases job fetching netinst ISOs but no others, I assume size limit at play?
03:04:43 <pabs> hmm, didn't mean to fetch those
03:34:04 <thuban> archivebot jobs for katapult are all done; i will grab the meta files and extract srcset components when they get uploaded (probably tomorrow)
03:50:58 <DogsRNice> what does ab do with 429 errors?
03:53:10 <DogsRNice> im noticing on the empire minecraft job that its not getting imgur links and some of them arent in the wbm, the rest were grabbed by the imgur project already
04:07:31 <pokechu22> It retries them twice and then dismisses them, but imgur will never succeed with AB - you'll need to download the meta-warc and send a list of them to the imgur project
04:11:21 <DogsRNice> not really sure how to do that lol
04:11:53 <DogsRNice> kind sounds like something that could be automated (not that i know how to do that either)
14:17:10 <h2ibot> JustAnotherArchivist edited The WARC Ecosystem (+304, /* Tools */ Add ArchiveBox): https://wiki.archiveteam.org/?diff=50758&oldid=50711
14:44:57 <anarcat> not sure if this is -ot but we might need a watch on bandcamp https://teddydd.me/2023/backup-your-bandcamp-music/
15:18:51 <TheTechRobo> Wonder if archivebox could use wget-AT
15:42:22 <icedice> <anarcat> not sure if this is -ot but we might need a watch on bandcamp https://teddydd.me/2023/backup-your-bandcamp-music/
15:42:31 <icedice> Reminds me of Amazon Prime Video's bs
15:43:41 <fireonlive> TheTechRobo: if it did i would be so happy
16:17:09 <qq44|m> how can I mirror a site with wget-lua and include all page requisites?
16:17:23 <qq44|m> --recursive, --mirror, and --page-requisites doesn't seem to work
16:20:40 <imer> qq44|m: are you using a lua script? (I dont know the solution, I assume wget-lua behaved like regular wget, but with more scripting)
16:21:28 <qq44|m> imer: im not using a script, just plain old wgetlua
16:22:40 <qq44|m> grab-site seems to work properly with page requisites, but wget doesn't seem to pull them with recursive downloads
18:00:57 <h2ibot> FireonLive edited Current Projects (-163, Remove NG -- superseded by URLs): https://wiki.archiveteam.org/?diff=50759&oldid=50685
18:27:40 <pokechu22> Looking at https://web.archive.org/web/20230000000000*/https://e.orange.fr/error404.html some captures show in blue and some show in orange - I'm pretty sure https://e.orange.fr/error404.html always returns 404, so is there a reason for them being blue? (that page has a ton of captures because any personal page that had a 404 or didn't exist would *redirect* there, and
18:27:42 <pokechu22> archivebot doesn't dedupelicate redirect targets)
18:28:53 <qq44|m> pokechu22: perhaps the server didn't return 404 error code in the headers, and instead returned 200 but said 404 on the page?
18:30:02 <pokechu22> Picking a snapshot from april 2 that shows as blue (https://web.archive.org/web/20230402090744/https://e.orange.fr/error404.html) still shows a 404 in my developer tools when loading the page
18:46:30 <JAA> I've found the colours in the calendar to be wildly inaccurate all the time.
18:50:01 <fireonlive> calendars, the bane of our existence
18:51:15 <JAA> In other news, my FuzzyMemories.TV grab-site crawl finished.
18:52:08 <JAA> Three /watch/ URLs failed, otherwise it looks fine.
18:52:26 <imer> nice
18:53:02 <JAA> 4232 video files from 4761 attempted IDs
18:54:50 <JAA> Random example of a video where the file is a 404: http://www.fuzzymemories.tv/watch/2276/kiddieland-amusement-park-commercial-1-1990/
18:55:18 <fireonlive> awesome :)
18:55:46 <JAA> Total WARC size is 107 GiB. It'll be on its slow way to IA soon.
19:56:18 <h2ibot> JustAnotherArchivist created The Museum of Classic Chicago Television (+611, Created page with "{{Infobox project | URL =…): https://wiki.archiveteam.org/?title=The%20Museum%20of%20Classic%20Chicago%20Television
19:57:18 <h2ibot> JustAnotherArchivist created FuzzyMemories.TV (+54, Redirected page to [[The Museum of Classic…): https://wiki.archiveteam.org/?title=FuzzyMemories.TV
19:57:19 <h2ibot> JustAnotherArchivist created FuzzyMemoriesTV (+54, Redirected page to [[The Museum of Classic…): https://wiki.archiveteam.org/?title=FuzzyMemoriesTV
21:12:37 <h2ibot> FireonLive edited Reddit (+129, wording fix?): https://wiki.archiveteam.org/?diff=50763&oldid=50722
21:13:18 <flashfire42> https://wiki.archiveteam.org/index.php/ArchiveBot/2019_Australian_federal_election a question do pages like this still work as originally intended?
21:13:42 <pokechu22> https://wiki.archiveteam.org/index.php/Special:Contributions/HadeanEon makes me think no
21:13:52 <JAA> No
21:14:31 <flashfire42> Alright then. I will have to use the viewer to work out what to put in and what not then. All good. Still good sources of things to throw in.
22:11:52 <h2ibot> JustAnotherArchivist edited The Museum of Classic Chicago Television (+597, Add known archives): https://wiki.archiveteam.org/?diff=50764&oldid=50760
22:57:17 <fireonlive> -+rss:#hackernews- Microsoft to kill off third-party printer drivers in Windows: https://www.theregister.com/2023/09/11/go_native_or_go_home/ https://news.ycombinator.com/item?id=37473628
22:57:19 <fireonlive> "To be clear, the end of servicing applies to drivers provided via Windows Update. Manufacturers will, according to Microsoft, "need to provide customers with an alternative means to download and install those printer drivers." Legacy v3 and v4 Windows printer drivers are facing the end of servicing ax."
23:25:08 <nicolas17> I have never seen 3rd party drivers updating via Windows Update
23:26:47 <fireonlive> looks like this 'Mopria' has existed for a while and more newer printers are using it?