#archiveteam-bs

00:08

pabs

aww, one site I wanted to crawl has some links only in comments :(
00:16

pabs

project10: I definitely think AT needs a system for feeding links discovered by #archivebot, #wikibot, #// and other places to different projects. imgur links for eg usually 429 in AB. the mailman/buzilla/codearchiver/SWH/wikibot/etc projects could also use those auto-discovery services
00:19

project10

especially with (seemingly?) longer-running projects now like imgur, reddit, telegram. I assume things like imgur discovered through #// are sent via backfeed to the imgur queue in that case?
00:19

nicolas17

I *think* #// and other projects do feed into imgur, but not archivebot
00:20

nulldata

sh.itjust.works/post/4842435
00:20

project10

so what, it was based on the metro novels? or just a metro-like idea?
01:26

h2ibot

PaulWise edited Bugzilla (+785, add bugzilla-url-list by JAA strategy): wiki.archiveteam.org/?diff=50756&oldid=50599
02:42

thuban

arkiver: i've checked periodically, but i still just get redirects to the shutdown notice. plcp might know more
02:58

h2ibot

DigitalDragon edited NewsGrabber (+18): wiki.archiveteam.org/?diff=50757&oldid=50706
02:59

fireonlive

that works
03:02

project10

does AB have a max size per fetched URL? I see the debian.org/releases job fetching netinst ISOs but no others, I assume size limit at play?
03:04

pabs

hmm, didn't mean to fetch those
03:34

thuban

archivebot jobs for katapult are all done; i will grab the meta files and extract srcset components when they get uploaded (probably tomorrow)
03:50

DogsRNice

what does ab do with 429 errors?
03:53

DogsRNice

im noticing on the empire minecraft job that its not getting imgur links and some of them arent in the wbm, the rest were grabbed by the imgur project already
04:07

pokechu22

It retries them twice and then dismisses them, but imgur will never succeed with AB - you'll need to download the meta-warc and send a list of them to the imgur project
04:11

DogsRNice

not really sure how to do that lol
04:11

DogsRNice

kind sounds like something that could be automated (not that i know how to do that either)
14:17

h2ibot

JustAnotherArchivist edited The WARC Ecosystem (+304, /* Tools */ Add ArchiveBox): wiki.archiveteam.org/?diff=50758&oldid=50711
14:44

anarcat

not sure if this is -ot but we might need a watch on bandcamp teddydd.me/2023/backup-your-bandcamp-music
15:18

TheTechRobo

Wonder if archivebox could use wget-AT
15:42

icedice

<anarcat> not sure if this is -ot but we might need a watch on bandcamp teddydd.me/2023/backup-your-bandcamp-music
15:42

icedice

Reminds me of Amazon Prime Video's bs
15:43

fireonlive

TheTechRobo: if it did i would be so happy
16:17

qq44|m

how can I mirror a site with wget-lua and include all page requisites?
16:17

qq44|m

--recursive, --mirror, and --page-requisites doesn't seem to work
16:20

imer

qq44|m: are you using a lua script? (I dont know the solution, I assume wget-lua behaved like regular wget, but with more scripting)
16:21

qq44|m

imer: im not using a script, just plain old wgetlua
16:22

qq44|m

grab-site seems to work properly with page requisites, but wget doesn't seem to pull them with recursive downloads
18:00

h2ibot

FireonLive edited Current Projects (-163, Remove NG -- superseded by URLs): wiki.archiveteam.org/?diff=50759&oldid=50685
18:27

pokechu22

Looking at web.archive.org/web/20230000000000*/https://e.orange.fr/error404.html some captures show in blue and some show in orange - I'm pretty sure e.orange.fr/error404.html always returns 404, so is there a reason for them being blue? (that page has a ton of captures because any personal page that had a 404 or didn't exist would *redirect* there, and
18:27

pokechu22

archivebot doesn't dedupelicate redirect targets)
18:28

qq44|m

pokechu22: perhaps the server didn't return 404 error code in the headers, and instead returned 200 but said 404 on the page?
18:30

pokechu22

Picking a snapshot from april 2 that shows as blue (web.archive.org/web/20230402090744/https://e.orange.fr/error404.html) still shows a 404 in my developer tools when loading the page
18:46

JAA

I've found the colours in the calendar to be wildly inaccurate all the time.
18:50

fireonlive

calendars, the bane of our existence
18:51

JAA

In other news, my FuzzyMemories.TV grab-site crawl finished.
18:52

JAA

Three /watch/ URLs failed, otherwise it looks fine.
18:52

imer

nice
18:53

JAA

4232 video files from 4761 attempted IDs
18:54

JAA

Random example of a video where the file is a 404: fuzzymemories.tv/watch/2276/kiddiel…nd-amusement-park-commercial-1-1990
18:55

fireonlive

awesome :)
18:55

JAA

Total WARC size is 107 GiB. It'll be on its slow way to IA soon.
19:56

h2ibot

JustAnotherArchivist created The Museum of Classic Chicago Television (+611, Created page with "{{Infobox project | URL =…): wiki.archiveteam.org/?title=The%20M…of%20Classic%20Chicago%20Television
19:57

h2ibot

JustAnotherArchivist created FuzzyMemories.TV (+54, Redirected page to [[The Museum of Classic…): wiki.archiveteam.org/?title=FuzzyMemories.TV
19:57

h2ibot

JustAnotherArchivist created FuzzyMemoriesTV (+54, Redirected page to [[The Museum of Classic…): wiki.archiveteam.org/?title=FuzzyMemoriesTV
21:12

h2ibot

FireonLive edited Reddit (+129, wording fix?): wiki.archiveteam.org/?diff=50763&oldid=50722
21:13

flashfire42

wiki.archiveteam.org/index.php/Arch…ot/2019_Australian_federal_election a question do pages like this still work as originally intended?
21:13

pokechu22

wiki.archiveteam.org/index.php/Special:Contributions/HadeanEon makes me think no
21:13

JAA

No
21:14

flashfire42

Alright then. I will have to use the viewer to work out what to put in and what not then. All good. Still good sources of things to throw in.
22:11

h2ibot

JustAnotherArchivist edited The Museum of Classic Chicago Television (+597, Add known archives): wiki.archiveteam.org/?diff=50764&oldid=50760
22:57

fireonlive

-+rss:#hackernews- Microsoft to kill off third-party printer drivers in Windows: theregister.com/2023/09/11/go_native_or_go_home news.ycombinator.com/item?id=37473628
22:57

fireonlive

"To be clear, the end of servicing applies to drivers provided via Windows Update. Manufacturers will, according to Microsoft, "need to provide customers with an alternative means to download and install those printer drivers." Legacy v3 and v4 Windows printer drivers are facing the end of servicing ax."
23:25

nicolas17

I have never seen 3rd party drivers updating via Windows Update
23:26

fireonlive

looks like this 'Mopria' has existed for a while and more newer printers are using it?

a year ago

« a day earlier

a day later »

today »