-
pabs
aww, one site I wanted to crawl has some links only in comments :(
-
pabs
project10: I definitely think AT needs a system for feeding links discovered by #archivebot, #wikibot, #// and other places to different projects. imgur links for eg usually 429 in AB. the mailman/buzilla/codearchiver/SWH/wikibot/etc projects could also use those auto-discovery services
-
project10
especially with (seemingly?) longer-running projects now like imgur, reddit, telegram. I assume things like imgur discovered through #// are sent via backfeed to the imgur queue in that case?
-
nicolas17
I *think* #// and other projects do feed into imgur, but not archivebot
-
nulldata
-
project10
so what, it was based on the metro novels? or just a metro-like idea?
-
h2ibot
PaulWise edited Bugzilla (+785, add bugzilla-url-list by JAA strategy):
wiki.archiveteam.org/?diff=50756&oldid=50599
-
thuban
arkiver: i've checked periodically, but i still just get redirects to the shutdown notice. plcp might know more
-
h2ibot
-
fireonlive
that works
-
project10
does AB have a max size per fetched URL? I see the debian.org/releases job fetching netinst ISOs but no others, I assume size limit at play?
-
pabs
hmm, didn't mean to fetch those
-
thuban
archivebot jobs for katapult are all done; i will grab the meta files and extract srcset components when they get uploaded (probably tomorrow)
-
DogsRNice
what does ab do with 429 errors?
-
DogsRNice
im noticing on the empire minecraft job that its not getting imgur links and some of them arent in the wbm, the rest were grabbed by the imgur project already
-
pokechu22
It retries them twice and then dismisses them, but imgur will never succeed with AB - you'll need to download the meta-warc and send a list of them to the imgur project
-
DogsRNice
not really sure how to do that lol
-
DogsRNice
kind sounds like something that could be automated (not that i know how to do that either)
-
h2ibot
JustAnotherArchivist edited The WARC Ecosystem (+304, /* Tools */ Add ArchiveBox):
wiki.archiveteam.org/?diff=50758&oldid=50711
-
anarcat
not sure if this is -ot but we might need a watch on bandcamp
teddydd.me/2023/backup-your-bandcamp-music
-
TheTechRobo
Wonder if archivebox could use wget-AT
-
icedice
<anarcat> not sure if this is -ot but we might need a watch on bandcamp
teddydd.me/2023/backup-your-bandcamp-music
-
icedice
Reminds me of Amazon Prime Video's bs
-
fireonlive
TheTechRobo: if it did i would be so happy
-
qq44|m
how can I mirror a site with wget-lua and include all page requisites?
-
qq44|m
--recursive, --mirror, and --page-requisites doesn't seem to work
-
imer
qq44|m: are you using a lua script? (I dont know the solution, I assume wget-lua behaved like regular wget, but with more scripting)
-
qq44|m
imer: im not using a script, just plain old wgetlua
-
qq44|m
grab-site seems to work properly with page requisites, but wget doesn't seem to pull them with recursive downloads
-
h2ibot
FireonLive edited Current Projects (-163, Remove NG -- superseded by URLs):
wiki.archiveteam.org/?diff=50759&oldid=50685
-
pokechu22
Looking at
web.archive.org/web/20230000000000*/https://e.orange.fr/error404.html some captures show in blue and some show in orange - I'm pretty sure
e.orange.fr/error404.html always returns 404, so is there a reason for them being blue? (that page has a ton of captures because any personal page that had a 404 or didn't exist would *redirect* there, and
-
pokechu22
archivebot doesn't dedupelicate redirect targets)
-
qq44|m
pokechu22: perhaps the server didn't return 404 error code in the headers, and instead returned 200 but said 404 on the page?
-
pokechu22
Picking a snapshot from april 2 that shows as blue (
web.archive.org/web/20230402090744/https://e.orange.fr/error404.html) still shows a 404 in my developer tools when loading the page
-
JAA
I've found the colours in the calendar to be wildly inaccurate all the time.
-
fireonlive
calendars, the bane of our existence
-
JAA
In other news, my FuzzyMemories.TV grab-site crawl finished.
-
JAA
Three /watch/ URLs failed, otherwise it looks fine.
-
imer
nice
-
JAA
4232 video files from 4761 attempted IDs
-
JAA
-
fireonlive
awesome :)
-
JAA
Total WARC size is 107 GiB. It'll be on its slow way to IA soon.
-
h2ibot
JustAnotherArchivist created The Museum of Classic Chicago Television (+611, Created page with "{{Infobox project | URL =…):
wiki.archiveteam.org/?title=The%20M…of%20Classic%20Chicago%20Television
-
h2ibot
JustAnotherArchivist created FuzzyMemories.TV (+54, Redirected page to [[The Museum of Classic…):
wiki.archiveteam.org/?title=FuzzyMemories.TV
-
h2ibot
JustAnotherArchivist created FuzzyMemoriesTV (+54, Redirected page to [[The Museum of Classic…):
wiki.archiveteam.org/?title=FuzzyMemoriesTV
-
h2ibot
FireonLive edited Reddit (+129, wording fix?):
wiki.archiveteam.org/?diff=50763&oldid=50722
-
flashfire42
-
pokechu22
-
JAA
No
-
flashfire42
Alright then. I will have to use the viewer to work out what to put in and what not then. All good. Still good sources of things to throw in.
-
h2ibot
JustAnotherArchivist edited The Museum of Classic Chicago Television (+597, Add known archives):
wiki.archiveteam.org/?diff=50764&oldid=50760
-
fireonlive
-
fireonlive
"To be clear, the end of servicing applies to drivers provided via Windows Update. Manufacturers will, according to Microsoft, "need to provide customers with an alternative means to download and install those printer drivers." Legacy v3 and v4 Windows printer drivers are facing the end of servicing ax."
-
nicolas17
I have never seen 3rd party drivers updating via Windows Update
-
fireonlive
looks like this 'Mopria' has existed for a while and more newer printers are using it?