00:37:37 Hmm, I'm pondering on doing more proactive archiving on anything Blogspot related, I stumbled upon https://fieldsofhether.blogspot.com/2016/04/we-spend-way-too-much-time-digging.html during internet searching, but it doesn't exist anymore, 00:37:58 https://web.archive.org/web/20181226201600/http://fieldsofhether.blogspot.com/2016/04/we-spend-way-too-much-time-digging.html exists - I was initially wondering if the website got cybersquatted or something, but nope, the website still holds up more or less 00:38:15 The big trouble that made me gaze my eye a bit wider is at the bottom of the page in the website in general: 00:38:26 "404 Errors & Missing Links" 00:38:29 Followed by, 00:38:44 "Please note that some of my posts are currently being removed by blogger. They are then reviewed, and put back - so the post may be there one day, missing the next, and back a week later. I have no control over this unfortunately. For all the latest free svgs, be sure to check out the facebook group for this page - and if you have any suggestions 00:38:44 on other website hosts, I am currently looking at options." 00:40:44 ...So yeah, I'm not sure if those removals were automated or what, and the person had to try and retrieve it 00:40:48 :| 00:41:15 i could see google throwing some ill advised ML at 'spam detection' 00:41:58 Ryz so can Blogspot be hit as hard as tumblr? 00:42:59 flashfire42, eeeeeh, I'm not too sure, I think the individual posts are fine, but the pagination navigation will suffer if hit too harshly S: 00:43:44 Also, there's some funkyness on archiving Blogger profiles, as they're quite strict with that giving 429s for too much checking too many times 00:43:48 https://krebsonsecurity.com/ used to be blogspot... but i think that changed during the big ddosing, can't find a source article anymore 00:43:50 ...And other stuff that I found out 00:45:16 Barto: ISTR you often do company acquisitions archiving 00:46:52 superkuh, JAA: re mastodon, either append /embed to the URL to get plain HTML of the single post, or use zygolophodon in a terminal to get the thread https://github.com/jwilk/zygolophodon 00:47:58 re attachment content disposition, there is a browser extension that lets you override any content type/disposition for any request and set your own 00:51:07 fireonlive: If true, it must've been much longer ago I think. On the big DDoS a few years ago (the one that was a record at the time), it was Akamai dropping him. But I believe he was already using a self-hosted Wordpress blog for years prior to that. 00:51:26 pabs: Good to know re /embed, shame that it's only the single post though. 00:51:47 And yeah, there are several extensions like that. 00:52:08 ahh 00:52:10 yeah, I usually do /embed to determine if I want to read the thread, then go zygolophodon in a terminal to read it 00:52:13 kk 00:52:20 my krabs timelines are super fuzzy 00:52:41 zygolophodon looks interesting, hadn't seen it before, thanks! 01:01:28 Flashfire42 edited URLTeam/Warrior (+54, /* Warrior projects */): https://wiki.archiveteam.org/?diff=50265&oldid=48605 01:11:30 Flashfire42 edited URLTeam/Warrior (+67, /* Warrior projects */): https://wiki.archiveteam.org/?diff=50266&oldid=50265 01:40:14 JAA: IIRC it uses the same APIs used by the JS frontend, because they work without being logged in 02:12:57 pabs, thanks for the /embed tip. I'll try that. zygolophodon is okay but quite a hassle to leave the browser. 06:38:24 Entartet edited List of websites excluded from the Wayback Machine (+24, Added zainamro.com.): https://wiki.archiveteam.org/?diff=50267&oldid=50192 13:00:19 is there a Vimeo project? manu|m said on #archivebot this person died https://vimeo.com/channels/suemarxfilms 14:03:12 Hi. 14:03:40 Has there been any progress on this :- https://wiki.archiveteam.org/index.php/Usenet ? 14:03:54 Google Groups is for most groups effectively unusable 14:04:31 And Google certainly has removed entire groups like uk.railway and comp.lang.oberon rather than actually clean out spam. 14:06:11 that's so sad 14:10:40 I'm especially annoyed in respect of uk.railway, because I was flagging spam in that group in the hope that someone reasonable would actually tackle the issue. 14:10:57 (But this is the wrong forum for calling out Google's behaviour.) 14:11:33 Running an NTTP server to collate current postings to various NNTP groups is a technical feasibility. 14:11:49 That way 'new' content to those groups will not be lost. 14:12:25 However, there is the issue of archives Google and other servers clearly hold, but which can;t be accessed. 14:13:21 BTW Is there an effort to 'archive' sites like Discord? 14:13:53 (Aside: I can retrive material I posted to Discord, but I can't retrieve the replies from others.) 14:15:03 discord is very archival unfriendly, they seem to be trying really hard to keep their stuff a walled garden 14:18:50 Well , some of it's almost certainly GDPR... 14:19:02 Or equivalent... 14:20:16 One other suggestion I had for future archival work , is archiving the responses from Bing/OpenAI etc... 14:20:52 (Yes those responses are effectively random fictions right now, but it helps to have evidence of the mistakes they are making) 14:21:03 Not sure how you'd archive them though, 14:22:06 BTW I can't currently suggest things like Discord archival on the Wiki, as I had a disagreement with a wiki mod several years ago. 16:25:43 FWIW, I am currently building an Edge extension to collect all interesting links (Imgur, Mediafire). Let me know if you want more URLs or IDs. 16:47:07 I plan on submitting its findings, is there any problems with that? 16:48:06 We have IRC bots you can use to submit imgur and mediafire links I think 16:58:38 we do 16:58:47 and submit away they take anything 16:59:14 only limits i’m aware of is #down-the-tube where’s there’s criteria (explained in the wiki, exceptions granted on case by case basis) 17:00:45 (dtt is for youtube) 18:09:58 pabs: ah, that is true 18:10:17 gotta do it then :) 18:13:26 pabs: https://wiki.archiveteam.org/index.php/Vimeo saeems no 18:36:34 Great! Once I have around 100 matches, I will dump them. i use the (very wide) regexes in the wiki pages, so there may be many false positives. Hope it works, and that I will be useful! 18:38:43 :) no worries about fps, bot can filter those out 18:44:49 did we collected all of the Wysp data? 18:54:33 lol