00:26:55 there's no bugmenot login for the forums? 02:00:55 https://www.merklemap.com/ 02:07:51 how does that work? certificate transparency? 02:23:15 No idea 02:24:20 Saw it posted on HN 02:40:21 https://users.rust-lang.org/t/merklemap-ct-subdomain-search-engine/117223 looks like it 03:00:02 So yes basically 03:00:20 Someone should suggest they add AB log monitoring lol 03:03:04 from ##programming It seems like spotify is killing particle detector :( 03:05:32 Imgsrc.ru 03:20:42 heres a ghetto-style client for the merkelmap API: curl -s 'https://api.merklemap.com/search?query=*.google.com&stream=true' | sed 's/^data: //;/^$/d' | jq -r '.domain , .subject_common_name' | sed 's/^\*\.//' | sort -u 03:21:12 what does stream=true do? 03:22:03 prevents the need for pagination, found it in https://github.com/Barre/merklemap-cli/blob/master/src/lib.rs 03:23:55 * pabs not sure how this is different to crt.sh too, since it seems to be based on CT logs indeed 03:25:49 re Particle Detector, https://docs.google.com/spreadsheets/d/1gyAs28Z-5FDHv3VF18o6SMQzpi0_DnKku8Vop5N6UyM/pubhtml# 03:29:45 Imgsrc 03:30:32 Hi Nicker8, how can we help you? 03:30:49 How do I get to imgsrc.ru? 03:31:26 Probably by opening it in a browser, not an IRC client. 03:31:52 It showed in a wiki that it was suppose to be here 03:32:52 This is where we would (at least initially) discuss about potentially archiving it. 03:33:05 You haven't archived it? 03:33:52 Not to my knowledge. 03:35:33 (Which the wiki says, too.) 03:41:36 Why is there a wiki of imgsrc.ru if it don't work on here 03:41:57 "this seems to be at risk of disappearing and maybe we should archive it" goes to the wiki too 03:42:22 the wiki is for internal collaboration, not a catalog of archived data for people to use 03:43:09 Nicker8: was there an announcement of it shutting down soon or something like that? 03:44:14 https://wiki.archiveteam.org/index.php/Imgsrc.ru and I don't get why it makes me think there was archived data when it was at top search 03:44:35 it says "not saved yet" 03:44:46 Darnit. 03:45:39 JAA: huh looks like imgsrc.ru is excluded from the WBM 03:46:20 Because there is pornography on it and wayback machine refuses certain sites that contain pornography and its also a russian site 03:47:09 I gues if archiveteam had archived it, the data would be on archive.org, but inaccessible because the site is in WBM exclusions :P 03:47:42 Wouldn't there be another way to archive the data? 03:48:05 The WBM doesn't have an issue with including pornography. Those exclusions are normally manually triggered by a request from the site owners. 03:48:25 It's a russian site so maybe that's why 03:53:31 I dont know if there is another way to archive it 03:53:57 Archiving it isn't necessarily the problem. But making the archive accessible is. 03:54:36 You can't if wbm doesn't let you and there should be another way 03:56:27 Wbm shouldn't be the only archive site 03:56:43 Being Russian isn't a reason IA excludes for either lol 03:57:17 The site is russian not a person 03:58:38 It's expensive to run something like the IA/WBM. Yes, it'd be nice if all the eggs weren't in one basket, but there simply is no other institution currently. And I don't see that changing either. 03:59:45 There's other types of archives sites I think when you search but I don't know if it would work for imgsrc.ru 04:00:01 Nothing is comparable in scale to IA. 04:01:16 What about archive.su? 04:01:37 Archive.fo* 04:03:34 Yeah, it's cute. It's over 100 times smaller than IA, IIRC. 04:04:11 are 04:04:26 archive.today could work right? 04:06:01 It's slow, not automatable, has no data exports (and never supported WARC), etc. I don't consider it a proper archive. 04:06:16 It's also entirely unclear who runs it etc. 04:06:24 What else is nothing? 04:10:34 Nicker8: is imgsrc.ru at risk of shutdown? 04:11:14 And I'm pretty sure it would depend 04:12:03 It's ran since 2006 and surprisingly has not been archived like wow its been around since 2006 like youtube been around since 2005 already was archived and imgsrc.ru hasn't? Wow 04:12:32 lol surely you don't think we have the entire youtube archived 04:13:17 Atleast you have youtube on wbm still 04:14:18 Only an extremely small portion compared to what is available lol 04:14:47 I mean so what you expect everything to have everything on it? 04:15:42 we don't know why imgsrc is blocked on WBM, we're not the Internet Archive 04:16:01 Pretty sure because it doesn't accept .ru url 04:17:14 Thats not how any of this works anymore. Since Archive.org started ignoring robots.txt it is only really excluded if its CSAM or if the website owner requests its removal 04:17:36 Oh maybe both 04:18:00 Well actually there's still CSAM on archive.org possibly depends on what your type is 04:18:11 https://web.archive.org/web/20240907062326/https://cctld.ru/ 04:18:18 If it were me, I would not be hosting anything that came from imgsrc.ru 04:18:20 dunno where you get the idea that all of .ru is blocked 04:18:50 I don't know where he gets the idea of how CSAM content is all blocked 04:20:00 I meant on the wayback machine its only manual exclusions these days. If the website is requested to be removed by the site owner or if someone emails the archive.org info email adress and goes hey t.me/DEFSNOTILLEGAL/8887 has CSAM maybe you should block that from being viewed 04:20:11 Thats what I meant 04:20:30 Things dont generally get excluded from wayback 04:20:37 Yeah I get it it's not fully accurate of what it does I know 04:21:48 There arent blanket blocks on tld tho 04:22:58 What do you mean blanket blocks lol 04:24:03 I meant they wont just block a whole TLD. the individual site owner has to say hey exclude my website from archive.org or an admin at IA has to go Well fuck keeping this available to the public is not worth it right now lets dark it 04:24:45 Pretty sure the owner probably didn't want it on there 04:25:06 what do you want us to do? 04:25:06 Then yeah that is what has happened then they would have manually requested exclusion 04:26:04 Find another site like wbm if it's possible or find an archiving accessible data site that either gets created or already is there 04:31:46 You're welcome to create your own 04:32:08 I have to do all the work? 04:32:42 Build it and they will come - maybe 04:33:00 I just signed up to this and this is what I'm getting right now 04:33:02 we have to do all the work for you? 04:33:15 I thought that's what your job is 04:33:25 our job is to archive the data 04:33:27 not to store it 04:33:27 The "archiveteam" 04:33:34 Storing data is expensive 04:33:54 The Mildom project is 240TiB so far. That's roughly ten 24TB hard drives 04:34:00 There's plenty of existinf open source AT projects to bootstrap from 04:34:15 why should we work on archiving imgsrc.ru with so many other sites to archive? (some of which already announced they will shut down soon so they are obviously a priority) 04:34:39 Isn't imgsrc.ru one of them? 04:34:53 did they announce they are shutting down soon? 04:35:18 You don't know when it can shut down sometimes they don't tell you and it gets shut down 04:35:52 Funny, I feel like that's been asked a few times in the past hour with no answer lol 04:36:22 sure... meanwhile there's another site that announced it will shut down in September 1st, and it did, but some videos are still in the CDN so we're saving as much as we can 04:36:41 It's September 7th what are you talking about 04:36:43 and we're having trouble with that because our intermediate servers are absolutely saturated with network and disk 04:37:32 Do you remember? The 21 first night of September. 04:37:32 exactly, the website shut down 7 days ago and we're saving the data that is still in their file server, if we got the links from the website before the shutdown 04:39:05 This has to be a circus right now this is looking like a stand up comedy show that didn't get many claps right now and apparently this isn't what I expected to sign up 04:39:29 yeah you thought you could come here and give a request and everyone would drop what they are doing and work on your request for free 04:39:54 It wasn't a request it was to see if imgsrc.ru was already archived 04:40:17 maybe it is, maybe it isn't, we can't know because archive.org blocked it, maybe go ask archive.org why it's blocked 04:40:43 We already know apparently you didn't see that 04:40:54 Nicolas17 is being a bit of a dick but he is saying exactly what we are all thinking. We have to prioritise what we can. It would be great to grab every video every uploaded to youtube but we dont have the space or time to do that. We only archive we cant control if something is blocked from the archive. It fucking sucks but they are well within 04:40:54 their rights to block it 04:41:08 And I commend him for saying what we are all thinking 04:41:35 Flashfire42: I guess you were still typing when Nicker8 successfully out-dicked me :P 04:41:50 yeah I was XD 04:42:07 Yeah do you think you can control everything that happens on imgsrc.ru like removed accounts or removed posts hm? 04:42:31 Exactly that's what could've been archived 04:43:58 I don't think nulldata knows his days very well lol 04:44:02 maybe go ask archive.org why it's blocked 04:44:19 Bro we already know did you not see? 04:47:27 Nicker8 my advice. If you want to help? Run a warrior. Donate to the archive.org fundraisers. Stick around for a bit. recommend a few sites to run in archivebot. Get to know how we work and then you can start to mkae polite requests 04:47:39 Can't you see? Woha can't you see. 04:48:02 I have to pay money now? 04:48:56 I mean you dont have to donate money directly to the archive but if you come in here making demands of a site that hasnt announced a shutdown then you should be contributing something 04:48:59 This isn't what I thought it was about requests 04:49:36 This ain't joquinit, this seems overly antagonistic 04:49:52 ArchiveTeam OnlyFans 04:49:56 I mean assuming archivebot isnt still on fire it might be able to be run in archivebot but we cant do anything about making it Publically accessible if the site is manually excluded from the wayback machine 04:50:13 Nicker8: So from my understanding, imgsrc.ru is important to the Russian Internet, but not curreently at immediate heightened risk? 04:50:14 Flashfire42: 04:50:36 Flashfire42: nsfw content seems to have a click-through that sets a cookie, so at least *that* won't work with archivebot 04:50:38 Nope it is 04:50:48 Nicker8: How so? 04:51:04 * steering passed around popcorn 04:51:10 It's very rare? 04:52:05 Nicker8: When you say "nope it is", are you replying to "imgsrc.ru is important to the Russian Internet", or to "but not curreently at immediate heightened risk"? 04:52:15 *currently 04:52:42 :| 04:53:29 * DigitalDragons comes back from grabbing snacks 04:53:34 aw they left 04:53:38 DigitalDragons well they left but I still want snacks 04:53:50 I remembered this recent conversation https://cdn.discordapp.com/attachments/779844089196576809/1281668489425322184/image0.jpg?ex=66dddfcc&is=66dc8e4c&hm=4e3d1745791353bfa6a487f310c78284161931ed4ac5eb53ee37a9d8a41a29b8& 04:54:12 And here I was actually trying to communicate... 04:54:28 nicolas17: lmao 04:54:36 * steering thinks back to 13 on irc 04:54:38 you're not wrong 04:55:15 OrIdow6: I don't think it's at any more risk than any other semi-legal free image host that allows porn 04:55:30 "it could shut down at any moment you never know" is the default state of every website 04:55:34 ^ 04:55:40 steering: When you say "semi-legal" do you mean it hosts a lot of child porn? 04:55:52 OrIdow6: it certainly used to have a reputation for such. 04:56:26 it has "passworded" galleries that are, or were, basically unmoderated 04:56:49 I browsed around the nudity category and didn't even find actual nudity, only very risque teasing 04:57:26 If it's indeed a Russian site I guess that means the Russian political/media environment would be what determines if it's at heightened risk for that? 04:57:43 CSAM not child porn 04:57:56 I'm not familiar with Russian politics 04:58:23 did a search for "young" and yup I didn't want to see any of that, even if it is AI 04:58:37 glad I used tor *washes hands* 04:58:51 oh yeah I didn't try that kind of thing 04:59:30 Could still be a nice proactive project to do I guess? Image hosts to tend to die rather dramatically 04:59:42 But that makes it priority number... ca 70 05:00:06 what about WBM's block though 05:01:10 I can't give you a super well-thought-out answer to that, but in short even if the WBM blocks it now it still may be useful in 50 years 05:01:36 Certainly does make it less attractive though 05:05:49 WBM excluded sites have been done before (zippyshare) 05:06:52 if it were in the US I'd say it's certain to at least go the way of imgur, but yeah I don't know about RU 05:08:35 I'm guessing it's excluded due to the quantity of "depictions of minors" which are illegal in the US (but maybe not RU?) 05:10:44 ("And here I was actually trying to communicate" was not meant to be casting shade BTW, implicit "in my laborious manner" at the end) 05:11:25 If they're not imminently shutting down, I don't see much of a reason for an organized grab 05:12:25 Especially if they're well known for hosting "depictions of minors" 07:17:02 VoynichCr created Talk:INTERNETARCHIVE.BAK/torrents implementation (+1257, Created page with "== We can simply theā€¦): https://wiki.archiveteam.org/?title=Talk%3AINTERNETARCHIVE.BAK/torrents%20implementation 12:21:04 Greetings, does anyone is operator (@) or voice (+), can help me archive a link in #archivebot, thank you! 16:11:33 TIL my job description here. 21:50:15 I have been privately informed that "toucharcade.com is likely heading for a shutdown" 21:56:01 seems like it should be ok to ab, i'll start a job 21:57:19 looks like it's wordpress 23:09:23 thuban: toucharcade AB queue keeps growing a lot (210k URLs), any idea what that could be? 23:10:20 I guess we won't know until it finishes with the current recursion depth level and goes to the next ones 23:33:08 nicolas17: article pages + related resources (images) + forum threads 23:50:51 oh, and external links. 23:53:04 which is not to say that there's nothing that should be ignored, but i did a bunch of ignores during the first few levels which should cover most of it