00:57:37 "Replit permanently moves to paid hosting after 7 years of free service" https://noreplit.com https://news.ycombinator.com/item?id=37950534 01:00:24 01:00:40 the discord server mentioned on noreplit seems to be gone: https://replit.com/discord 01:00:52 but i have like; this weird dejavu that we archived their discord 01:01:57 grep of irc logs says no at least 01:06:34 https://irc.project10.net/uploads/40ed40d9ca8e16ba/no.jpg 01:07:22 :3 01:17:45 wtf is happening 01:17:54 was there a netsplit? 01:18:07 looks like a bunch of people just all quit at once 01:18:24 God damn netsplits 01:18:26 Network weather 01:18:38 some citing a 'ping timeout' or 'no ping reply' (but not the typical ircd ping timeout?) 01:18:47 my ping time went to infinite, so I reconnected 01:18:50 Yeah, my client got a ping timeout. 01:18:59 ping timeout here as well, was on chaostal 01:19:03 Some have proper quits though, so... ¯\_(ツ)_/¯ 01:19:09 pabs: welcome back bonedaddy :) 01:19:41 My chaostal client is fine. 01:19:51 hmm weird 01:20:27 looks like firebot and eggdrop are both on ing.. as am i 01:20:38 I got a couple of EHOSTUNREACH while reconnecting, but I'm not sure if that's related 01:21:55 ah there's the proper ircd ping timeout :) 01:43:01 https://transfer.archivete.am/2nkrC/audrooku-urls-1.txt 01:43:01 could someone kindly run this through archivebot for me? :-) 01:44:27 Awful filename for an AB job, but we could just grab all of https://developers.soundcloud.com/ again. It's been a few years since the last run. 01:46:08 does https://www.thoughtworks.com/insights/blog/bff-soundcloud link from developers? 01:50:20 If not, it was archived at least once by AB and a couple dozen times in total. 01:53:45 ah oki 01:53:50 =] 02:36:51 > Awful filename for an AB job 02:36:51 What would be the preferred naming schema? 03:08:10 audrooku|m: generally it's useful to mention the target site (in this case developers.soundcloud.com) so that searching for that on https://archive.fart.website/archivebot/viewer/ also brings up the list 03:08:32 e.g. I've done several like this: https://archive.fart.website/archivebot/viewer/?q=libraries.minecraft.net 03:08:59 Gotcha, wasn't aware that viewer existed 10:21:58 i know there's a github project, but I don't see the channel. You may archive https://github.com/immibis/ before I ask github to delete it under GDPR. 10:23:25 #codearchiver i think 10:23:40 or #gitgud immibis not sure 14:08:32 https://www.theverge.com/2023/10/19/23924549/jon-stewart-apple-ai-china-cancel 14:08:50 Might be good to grab socials and the podcast for the show 16:12:37 Nano412510 edited Alive... OR ARE THEY (+0, /* Endangered */): https://wiki.archiveteam.org/?diff=51014&oldid=51012 16:32:23 Hello. May I ask, how can I create a WARC from a list of file links with wget on Windows? (private WARC, not for uploading to the IA) What options should I use? 16:39:26 ScenarioPlanet: Firstly, upstream wget writes WARCs that most tools can't read correctly. I'd recommend using either wget-at or an older version of upstream wget (1.19.x or older IIRC). As for options, it depends. --input-file and --warc-file are the obvious ones. If these links are HTML pages, you probably want to use --page-requisites as well. Beyond that, you might need a different user agent, 16:39:32 cookies, etc., which strongly depends on what you're retrieving. 16:40:45 The links are (Discord) files. Also, does wget-at work on Windows? 16:41:05 Maybe, depending on your pain tolerance. 16:41:30 Understandable 16:42:48 Also, I remember there was an option to prevent it from downloading junk (-o NUL?), does wget-at need this too? 16:44:04 Right, --delete-after 16:46:19 You might also want to consider using grab-site instead. 16:48:28 The problem here, I can't put both .warc and junk to my hdd, I don't have enough of space. 17:09:21 Also, is the reason of generating unreadable WARCs known? 17:10:04 Yes, wget is the only software that inserts angle brackets around the WARC-Target-URI value. 17:10:30 It's been reported to them years ago, and the cleanest fix (bump to WARC/1.1) was proposed a long time ago, too. 17:11:06 Technically, the WARC/1.0 spec requires those angle brackets in its grammar, but the examples in the spec don't have them, and no other implementation is known to use them. 17:11:26 WARC/1.1 was modified to no longer require them because that's what everyone except wget 1.20+ was doing anyway. 17:11:49 Also: https://wiki.archiveteam.org/index.php/The_WARC_Ecosystem#Tools 17:40:12 Another problem with wget 1.19.1: it doesn't write binary files into .warc properly. Downloaded files are OK, but in .warc they seem to be stripped (like, only "‰PNG" for pngs). 17:40:35 That doesn't sound right. 17:40:59 wget -U "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36 OPR/103.0.0.0" --warc-cdx --warc-file="warcname" --wait 0.2 --waitretry 5 --timeout 60 --tries 3 -i *list* --no-cookies --restrict-file-names=windows 17:42:24 Can you run that on https://transfer.archivete.am/inline/bG4mu/aatt.png and upload the resulting WARC to https://transfer.archivete.am/? 17:43:25 I suspect it's an issue in whatever software you're using to look at the WARC, not wget or the WARC itself. 17:43:51 But I only have a much older wget version handy right now. 17:44:00 (Or newer) 17:45:40 https://transfer.archivete.am/d89Tw/test%20aatt.warc.gz 17:45:50 I mean, it's literally 3kb 17:46:15 Weird 17:46:27 And it has the correct 'Content-Length: 15896' header, too... 17:46:40 `mingw32` ? 17:46:54 software: Wget/1.19.1 (mingw32) 17:46:55 what is that 17:47:28 MinGW aka GCC & Co. for Windows 17:49:16 ScenarioPlanet: that indeed does not look good 17:49:23 seems like an odd problem for 1.19.1 17:49:26 Where did you get wget from? 17:49:47 I suspect it is something with this build or the OS/context 17:52:47 arkiver: https://eternallybored.org/misc/wget/ 17:53:24 JAA: ^ 17:54:13 Same for 1.19.2 17:54:30 And that was for 1.17 too 17:54:52 ScenarioPlanet: what if you don't use --restrict-file-names=windows ? 17:55:05 Same 17:55:56 https://eternallybored.org/misc/wget/src/wget-1.21-2gb-win.patch is a patch i see 17:56:23 and https://eternallybored.org/misc/wget/src/wget-openssl-init.patch 17:57:01 Could you try with the most recent version, 1.21.4? 17:58:48 Same again 17:58:55 Hmm 17:59:19 Are there any other builds for Winx64? 18:00:17 i would advise to start experimenting with Debian (linux) if you're serious about all this 18:00:21 Not sure. Those are the ones usually recommended, although they're unofficial. 18:00:33 GNU doesn't officially support Windows for what should be fairly obvious reasons. 18:01:05 I'll mention this in their IRC channel though just so they're aware of it. 18:01:12 (That's #wget on Libera.) 18:04:57 ScenarioPlanet: Which Windows version? 18:05:16 10 22h2 18:06:10 What's about using Heritrix3 for that? Is that a good idea? 18:07:57 If you're on Windows 10 why not just use WSL 18:22:24 ^ was going to mention WSL as well, way less painful 19:01:30 Thank you appledash and imer, WSL is the tool I've searched for 19:02:07 And JAA & arkiver for trying to help me with that wget build, thank you too 19:06:13 JustAnotherArchivist edited The WARC Ecosystem (+265, /* Tools */ Add Windows wget build bugs): https://wiki.archiveteam.org/?diff=51015&oldid=50758 19:13:42 May I also ask about building of wget-at? automake gives me some "build-aux/git-version-gen: not found" errors and exits with "automake: error: cannot open < lib/gnulib.mk: No such file or directory" 19:36:46 * JAA hasn't actually built wget-at before. 19:36:58 I just used the container image when I needed it. 19:37:50 I'd start by looking at the Dockerfile. 20:02:31 https://odysee.com/@lbry:3f/theendoflbryinc:d