-
fireonlive
"Replit permanently moves to paid hosting after 7 years of free service"
noreplit.com news.ycombinator.com/item?id=37950534
-
JAA
<surprised_pikachu.png>
-
fireonlive
the discord server mentioned on noreplit seems to be gone:
replit.com/discord
-
fireonlive
but i have like; this weird dejavu that we archived their discord
-
fireonlive
grep of irc logs says no at least
-
project10
-
fireonlive
:3
-
nicolas17
wtf is happening
-
nicolas17
was there a netsplit?
-
fireonlive
looks like a bunch of people just all quit at once
-
flashfire42
God damn netsplits
-
JAA
Network weather
-
fireonlive
some citing a 'ping timeout' or 'no ping reply' (but not the typical ircd ping timeout?)
-
pabs
my ping time went to infinite, so I reconnected
-
TheTechRobo
Yeah, my client got a ping timeout.
-
project10
ping timeout here as well, was on chaostal
-
JAA
Some have proper quits though, so... ¯\_(ツ)_/¯
-
fireonlive
pabs: welcome back bonedaddy :)
-
JAA
My chaostal client is fine.
-
fireonlive
hmm weird
-
fireonlive
looks like firebot and eggdrop are both on ing.. as am i
-
TheTechRobo
I got a couple of EHOSTUNREACH while reconnecting, but I'm not sure if that's related
-
fireonlive
ah there's the proper ircd ping timeout :)
-
audrooku|m
-
audrooku|m
could someone kindly run this through archivebot for me? :-)
-
JAA
Awful filename for an AB job, but we could just grab all of
developers.soundcloud.com again. It's been a few years since the last run.
-
fireonlive
-
JAA
If not, it was archived at least once by AB and a couple dozen times in total.
-
fireonlive
ah oki
-
fireonlive
=]
-
audrooku|m
> Awful filename for an AB job
-
audrooku|m
What would be the preferred naming schema?
-
pokechu22
audrooku|m: generally it's useful to mention the target site (in this case developers.soundcloud.com) so that searching for that on
archive.fart.website/archivebot/viewer also brings up the list
-
pokechu22
-
audrooku|m
Gotcha, wasn't aware that viewer existed
-
immibis
i know there's a github project, but I don't see the channel. You may archive
github.com/immibis before I ask github to delete it under GDPR.
-
katia
#codearchiver i think
-
katia
or #gitgud immibis not sure
-
nulldata
-
nulldata
Might be good to grab socials and the podcast for the show
-
h2ibot
Nano412510 edited Alive... OR ARE THEY (+0, /* Endangered */):
wiki.archiveteam.org/?diff=51014&oldid=51012
-
ScenarioPlanet
Hello. May I ask, how can I create a WARC from a list of file links with wget on Windows? (private WARC, not for uploading to the IA) What options should I use?
-
JAA
ScenarioPlanet: Firstly, upstream wget writes WARCs that most tools can't read correctly. I'd recommend using either wget-at or an older version of upstream wget (1.19.x or older IIRC). As for options, it depends. --input-file and --warc-file are the obvious ones. If these links are HTML pages, you probably want to use --page-requisites as well. Beyond that, you might need a different user agent,
-
JAA
cookies, etc., which strongly depends on what you're retrieving.
-
ScenarioPlanet
The links are (Discord) files. Also, does wget-at work on Windows?
-
JAA
Maybe, depending on your pain tolerance.
-
ScenarioPlanet
Understandable
-
ScenarioPlanet
Also, I remember there was an option to prevent it from downloading junk (-o NUL?), does wget-at need this too?
-
JAA
Right, --delete-after
-
JAA
You might also want to consider using grab-site instead.
-
ScenarioPlanet
The problem here, I can't put both .warc and junk to my hdd, I don't have enough of space.
-
ScenarioPlanet
Also, is the reason of generating unreadable WARCs known?
-
JAA
Yes, wget is the only software that inserts angle brackets around the WARC-Target-URI value.
-
JAA
It's been reported to them years ago, and the cleanest fix (bump to WARC/1.1) was proposed a long time ago, too.
-
JAA
Technically, the WARC/1.0 spec requires those angle brackets in its grammar, but the examples in the spec don't have them, and no other implementation is known to use them.
-
JAA
WARC/1.1 was modified to no longer require them because that's what everyone except wget 1.20+ was doing anyway.
-
JAA
-
ScenarioPlanet
Another problem with wget 1.19.1: it doesn't write binary files into .warc properly. Downloaded files are OK, but in .warc they seem to be stripped (like, only "‰PNG" for pngs).
-
JAA
That doesn't sound right.
-
ScenarioPlanet
wget -U "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36 OPR/103.0.0.0" --warc-cdx --warc-file="warcname" --wait 0.2 --waitretry 5 --timeout 60 --tries 3 -i *list* --no-cookies --restrict-file-names=windows
-
JAA
-
JAA
I suspect it's an issue in whatever software you're using to look at the WARC, not wget or the WARC itself.
-
JAA
But I only have a much older wget version handy right now.
-
JAA
(Or newer)
-
ScenarioPlanet
-
ScenarioPlanet
I mean, it's literally 3kb
-
JAA
Weird
-
JAA
And it has the correct 'Content-Length: 15896' header, too...
-
arkiver
`mingw32` ?
-
arkiver
software: Wget/1.19.1 (mingw32)
-
arkiver
what is that
-
JAA
MinGW aka GCC & Co. for Windows
-
arkiver
ScenarioPlanet: that indeed does not look good
-
arkiver
seems like an odd problem for 1.19.1
-
JAA
Where did you get wget from?
-
arkiver
I suspect it is something with this build or the OS/context
-
ScenarioPlanet
-
arkiver
JAA: ^
-
ScenarioPlanet
Same for 1.19.2
-
ScenarioPlanet
And that was for 1.17 too
-
arkiver
ScenarioPlanet: what if you don't use --restrict-file-names=windows ?
-
ScenarioPlanet
Same
-
arkiver
-
arkiver
-
JAA
Could you try with the most recent version, 1.21.4?
-
ScenarioPlanet
Same again
-
JAA
Hmm
-
ScenarioPlanet
Are there any other builds for Winx64?
-
arkiver
i would advise to start experimenting with Debian (linux) if you're serious about all this
-
JAA
Not sure. Those are the ones usually recommended, although they're unofficial.
-
JAA
GNU doesn't officially support Windows for what should be fairly obvious reasons.
-
JAA
I'll mention this in their IRC channel though just so they're aware of it.
-
JAA
(That's #wget on Libera.)
-
JAA
ScenarioPlanet: Which Windows version?
-
ScenarioPlanet
10 22h2
-
ScenarioPlanet
What's about using Heritrix3 for that? Is that a good idea?
-
appledash
If you're on Windows 10 why not just use WSL
-
imer
^ was going to mention WSL as well, way less painful
-
ScenarioPlanet
Thank you appledash and imer, WSL is the tool I've searched for
-
ScenarioPlanet
And JAA & arkiver for trying to help me with that wget build, thank you too
-
h2ibot
JustAnotherArchivist edited The WARC Ecosystem (+265, /* Tools */ Add Windows wget build bugs):
wiki.archiveteam.org/?diff=51015&oldid=50758
-
ScenarioPlanet
May I also ask about building of wget-at? automake gives me some "build-aux/git-version-gen: not found" errors and exits with "automake: error: cannot open < lib/gnulib.mk: No such file or directory"
-
» JAA hasn't actually built wget-at before.
-
JAA
I just used the container image when I needed it.
-
JAA
I'd start by looking at the Dockerfile.
-
nulldata