-
datechnoman
-
h2ibot
-
h2ibot
datechnoman: Deduplicating and queuing 200925 items. (for '
transfer.archivete.am/WMRvV/discord_urls.txt')
-
h2ibot
datechnoman: Deduplicated and queued 200925 items. (for '
transfer.archivete.am/WMRvV/discord_urls.txt')
-
arkiver
AK: checking
-
arkiver
interesting one on the whitespaces
-
arkiver
i would think they're probably in the URL as given in the HTML, and normally browsers trim them off, except Wget-AT
-
nicolas17
arkiver: I have just subjected myself to the hell that is modern web specifications and yes it seems browsers trim leading and trailing whitespace from href attributes
-
JAA
I wonder what wget does with '<a href="\nhttps://example.org/">'.
-
JAA
Or other whitespace before the protocol.
-
arkiver
nicolas17: yeah i think Wget-AT may not
-
arkiver
i'll queue URLs in #// with trailing whitespace with an without that trailing whitespace
-
arkiver
JAA: no idea...
-
JAA
-
arkiver
hah :)
-
nicolas17
eugh
-
arkiver
it could probably be handled better yeah
-
JAA
I don't think that's what the specs say, but I don't feel like wading through that right now.
-
arkiver
i'll create an issue and get it fixed soon
-
arkiver
but there is probably a reason they do that in Wget-AT
-
nicolas17
yeah, this would be a different part of the spec altogether
-
arkiver
so we might run into further problems if we start treating this differently
-
nicolas17
I looked at the part that says how to handle a click to <a href=" foo ">
-
JAA
-
nicolas17
once you already have " foo " parsed as the value of the attribute
-
nicolas17
urghh
-
nicolas17
I get ANGRY every time I look at whatwg specs
-
JAA
arkiver: My suspicion would be 'because that's how we've always been doing it' with a dash of 'the early internet was hell'.
-
nicolas17
this is not a spec, this is a browser implementation, written in the programming language known as "English"
-
arkiver
maybe yeah
-
arkiver
do we know how usual it is for this to occur today?
-
JAA
There are so many unspecified things in earlier HTML it isn't even funny.
-
arkiver
the leadin \n
-
JAA
I feel like the relevant question should probably be: what do browsers do?
-
nicolas17
nowadays, browsers do what the spec does
-
JAA
Yeah
-
JAA
Except when they don't. :-P
-
nicolas17
because if major browsers agree to handle broken code in the same way, they change the spec to match as well
-
arkiver
browser probably do a lot more complex parsing than Wget-AT does
-
arkiver
which means we could run into problems with Wget-AT
-
JAA
Browsers implement the entire HTML standard, yes.
-
arkiver
but will make an issue anyway and see
-
nicolas17
add a unit test >.>
-
JAA
-
nicolas17
oh integration tests are written in perl
-
nicolas17
pain
-
fireonlive
whatwg loves you :3
-
fireonlive
w3c was way too slow
-
arkiver
AK: i don't see namesuppressed.com in recent WARCs
-
fireonlive
those mentioned IPs don't do pipelines do they
-
AK
Interesting, I wonder what it was then