-
Ryz
Hmm, I'm pondering on doing more proactive archiving on anything Blogspot related, I stumbled upon
fieldsofhether.blogspot.com/2016/04…pend-way-too-much-time-digging.html during internet searching, but it doesn't exist anymore,
-
Ryz
web.archive.org/web/20181226201600/…pend-way-too-much-time-digging.html exists - I was initially wondering if the website got cybersquatted or something, but nope, the website still holds up more or less
-
Ryz
The big trouble that made me gaze my eye a bit wider is at the bottom of the page in the website in general:
-
Ryz
"404 Errors & Missing Links"
-
Ryz
Followed by,
-
Ryz
"Please note that some of my posts are currently being removed by blogger. They are then reviewed, and put back - so the post may be there one day, missing the next, and back a week later. I have no control over this unfortunately. For all the latest free svgs, be sure to check out the facebook group for this page - and if you have any suggestions
-
Ryz
on other website hosts, I am currently looking at options."
-
Ryz
...So yeah, I'm not sure if those removals were automated or what, and the person had to try and retrieve it
-
fireonlive
:|
-
fireonlive
i could see google throwing some ill advised ML at 'spam detection'
-
flashfire42
Ryz so can Blogspot be hit as hard as tumblr?
-
Ryz
flashfire42, eeeeeh, I'm not too sure, I think the individual posts are fine, but the pagination navigation will suffer if hit too harshly S:
-
Ryz
Also, there's some funkyness on archiving Blogger profiles, as they're quite strict with that giving 429s for too much checking too many times
-
fireonlive
krebsonsecurity.com used to be blogspot... but i think that changed during the big ddosing, can't find a source article anymore
-
Ryz
...And other stuff that I found out
-
pabs
Barto: ISTR you often do company acquisitions archiving
-
pabs
superkuh, JAA: re mastodon, either append /embed to the URL to get plain HTML of the single post, or use zygolophodon in a terminal to get the thread
github.com/jwilk/zygolophodon
-
pabs
re attachment content disposition, there is a browser extension that lets you override any content type/disposition for any request and set your own
-
JAA
fireonlive: If true, it must've been much longer ago I think. On the big DDoS a few years ago (the one that was a record at the time), it was Akamai dropping him. But I believe he was already using a self-hosted Wordpress blog for years prior to that.
-
JAA
pabs: Good to know re /embed, shame that it's only the single post though.
-
JAA
And yeah, there are several extensions like that.
-
fireonlive
ahh
-
pabs
yeah, I usually do /embed to determine if I want to read the thread, then go zygolophodon in a terminal to read it
-
fireonlive
kk
-
fireonlive
my krabs timelines are super fuzzy
-
JAA
zygolophodon looks interesting, hadn't seen it before, thanks!
-
h2ibot
Flashfire42 edited URLTeam/Warrior (+54, /* Warrior projects */):
wiki.archiveteam.org/?diff=50265&oldid=48605
-
h2ibot
Flashfire42 edited URLTeam/Warrior (+67, /* Warrior projects */):
wiki.archiveteam.org/?diff=50266&oldid=50265
-
pabs
JAA: IIRC it uses the same APIs used by the JS frontend, because they work without being logged in
-
superkuh
pabs, thanks for the /embed tip. I'll try that. zygolophodon is okay but quite a hassle to leave the browser.
-
h2ibot
Entartet edited List of websites excluded from the Wayback Machine (+24, Added zainamro.com.):
wiki.archiveteam.org/?diff=50267&oldid=50192
-
pabs
is there a Vimeo project? manu|m said on #archivebot this person died
vimeo.com/channels/suemarxfilms
-
ShakespeareFan00
Hi.
-
ShakespeareFan00
Has there been any progress on this :-
wiki.archiveteam.org/index.php/Usenet ?
-
ShakespeareFan00
Google Groups is for most groups effectively unusable
-
ShakespeareFan00
And Google certainly has removed entire groups like uk.railway and comp.lang.oberon rather than actually clean out spam.
-
nstrom|m
that's so sad
-
ShakespeareFan00
I'm especially annoyed in respect of uk.railway, because I was flagging spam in that group in the hope that someone reasonable would actually tackle the issue.
-
ShakespeareFan00
(But this is the wrong forum for calling out Google's behaviour.)
-
ShakespeareFan00
Running an NTTP server to collate current postings to various NNTP groups is a technical feasibility.
-
ShakespeareFan00
That way 'new' content to those groups will not be lost.
-
ShakespeareFan00
However, there is the issue of archives Google and other servers clearly hold, but which can;t be accessed.
-
ShakespeareFan00
BTW Is there an effort to 'archive' sites like Discord?
-
ShakespeareFan00
(Aside: I can retrive material I posted to Discord, but I can't retrieve the replies from others.)
-
nstrom|m
discord is very archival unfriendly, they seem to be trying really hard to keep their stuff a walled garden
-
ShakespeareFan00
Well , some of it's almost certainly GDPR...
-
ShakespeareFan00
Or equivalent...
-
ShakespeareFan00
One other suggestion I had for future archival work , is archiving the responses from Bing/OpenAI etc...
-
ShakespeareFan00
(Yes those responses are effectively random fictions right now, but it helps to have evidence of the mistakes they are making)
-
ShakespeareFan00
Not sure how you'd archive them though,
-
ShakespeareFan00
BTW I can't currently suggest things like Discord archival on the Wiki, as I had a disagreement with a wiki mod several years ago.
-
albertlarsan68
FWIW, I am currently building an Edge extension to collect all interesting links (Imgur, Mediafire). Let me know if you want more URLs or IDs.
-
albertlarsan68
I plan on submitting its findings, is there any problems with that?
-
rewby
We have IRC bots you can use to submit imgur and mediafire links I think
-
fireonlive
we do
-
fireonlive
and submit away they take anything
-
fireonlive
only limits i’m aware of is #down-the-tube where’s there’s criteria (explained in the wiki, exceptions granted on case by case basis)
-
fireonlive
(dtt is for youtube)
-
Barto
pabs: ah, that is true
-
Barto
gotta do it then :)
-
fireonlive
-
albertlarsan68
Great! Once I have around 100 matches, I will dump them. i use the (very wide) regexes in the wiki pages, so there may be many false positives. Hope it works, and that I will be useful!
-
fireonlive
:) no worries about fps, bot can filter those out
-
VickoSaviour
did we collected all of the Wysp data?
-
fireonlive
lol