-
myself
there's no bugmenot login for the forums?
-
nulldata
-
nicolas17
how does that work? certificate transparency?
-
nulldata
No idea
-
nulldata
Saw it posted on HN
-
OrIdow6
-
nulldata
So yes basically
-
nulldata
Someone should suggest they add AB log monitoring lol
-
pabs
from ##programming <dostoyevsky2> It seems like spotify is killing particle detector :(
-
Nicker8
Imgsrc.ru
-
pabs
heres a ghetto-style client for the merkelmap API: curl -s '
api.merklemap.com/search?query=*.google.com&stream=true' | sed 's/^data: //;/^$/d' | jq -r '.domain , .subject_common_name' | sed 's/^\*\.//' | sort -u
-
nicolas17
what does stream=true do?
-
pabs
-
» pabs not sure how this is different to crt.sh too, since it seems to be based on CT logs indeed
-
pabs
-
Nicker8
Imgsrc
-
JAA
Hi Nicker8, how can we help you?
-
Nicker8
How do I get to imgsrc.ru?
-
JAA
Probably by opening it in a browser, not an IRC client.
-
Nicker8
It showed in a wiki that it was suppose to be here
-
JAA
This is where we would (at least initially) discuss about potentially archiving it.
-
Nicker8
You haven't archived it?
-
JAA
Not to my knowledge.
-
JAA
(Which the wiki says, too.)
-
Nicker8
Why is there a wiki of imgsrc.ru if it don't work on here
-
nicolas17
"this seems to be at risk of disappearing and maybe we should archive it" goes to the wiki too
-
nicolas17
the wiki is for internal collaboration, not a catalog of archived data for people to use
-
nicolas17
Nicker8: was there an announcement of it shutting down soon or something like that?
-
Nicker8
wiki.archiveteam.org/index.php/Imgsrc.ru and I don't get why it makes me think there was archived data when it was at top search
-
nicolas17
it says "not saved yet"
-
Nicker8
Darnit.
-
nicolas17
JAA: huh looks like imgsrc.ru is excluded from the WBM
-
Nicker8
Because there is pornography on it and wayback machine refuses certain sites that contain pornography and its also a russian site
-
nicolas17
I gues if archiveteam had archived it, the data would be on archive.org, but inaccessible because the site is in WBM exclusions :P
-
Nicker8
Wouldn't there be another way to archive the data?
-
JAA
The WBM doesn't have an issue with including pornography. Those exclusions are normally manually triggered by a request from the site owners.
-
Nicker8
It's a russian site so maybe that's why
-
Nicker8
I dont know if there is another way to archive it
-
JAA
Archiving it isn't necessarily the problem. But making the archive accessible is.
-
Nicker8
You can't if wbm doesn't let you and there should be another way
-
Nicker8
Wbm shouldn't be the only archive site
-
nulldata
Being Russian isn't a reason IA excludes for either lol
-
Nicker8
The site is russian not a person
-
JAA
It's expensive to run something like the IA/WBM. Yes, it'd be nice if all the eggs weren't in one basket, but there simply is no other institution currently. And I don't see that changing either.
-
Nicker8
There's other types of archives sites I think when you search but I don't know if it would work for imgsrc.ru
-
JAA
Nothing is comparable in scale to IA.
-
Nicker8
What about archive.su?
-
Nicker8
Archive.fo*
-
JAA
Yeah, it's cute. It's over 100 times smaller than IA, IIRC.
-
Nicker8
are
-
Nicker8
archive.today could work right?
-
JAA
It's slow, not automatable, has no data exports (and never supported WARC), etc. I don't consider it a proper archive.
-
JAA
It's also entirely unclear who runs it etc.
-
Nicker8
What else is nothing?
-
nicolas17
Nicker8: is imgsrc.ru at risk of shutdown?
-
Nicker8
And I'm pretty sure it would depend
-
Nicker8
It's ran since 2006 and surprisingly has not been archived like wow its been around since 2006 like youtube been around since 2005 already was archived and imgsrc.ru hasn't? Wow
-
nicolas17
lol surely you don't think we have the entire youtube archived
-
Nicker8
Atleast you have youtube on wbm still
-
nulldata
Only an extremely small portion compared to what is available lol
-
Nicker8
I mean so what you expect everything to have everything on it?
-
nicolas17
we don't know why imgsrc is blocked on WBM, we're not the Internet Archive
-
Nicker8
Pretty sure because it doesn't accept .ru url
-
Flashfire42
Thats not how any of this works anymore. Since Archive.org started ignoring robots.txt it is only really excluded if its CSAM or if the website owner requests its removal
-
Nicker8
Oh maybe both
-
Nicker8
Well actually there's still CSAM on archive.org possibly depends on what your type is
-
nicolas17
-
steering
If it were me, I would not be hosting anything that came from imgsrc.ru
-
nicolas17
dunno where you get the idea that all of .ru is blocked
-
Nicker8
I don't know where he gets the idea of how CSAM content is all blocked
-
Flashfire42
I meant on the wayback machine its only manual exclusions these days. If the website is requested to be removed by the site owner or if someone emails the archive.org info email adress and goes hey t.me/DEFSNOTILLEGAL/8887 has CSAM maybe you should block that from being viewed
-
Flashfire42
Thats what I meant
-
Flashfire42
Things dont generally get excluded from wayback
-
Nicker8
Yeah I get it it's not fully accurate of what it does I know
-
Flashfire42
There arent blanket blocks on tld tho
-
Nicker8
What do you mean blanket blocks lol
-
Flashfire42
I meant they wont just block a whole TLD. the individual site owner has to say hey exclude my website from archive.org or an admin at IA has to go Well fuck keeping this available to the public is not worth it right now lets dark it
-
Nicker8
Pretty sure the owner probably didn't want it on there
-
nicolas17
what do you want us to do?
-
Flashfire42
Then yeah that is what has happened then they would have manually requested exclusion
-
Nicker8
Find another site like wbm if it's possible or find an archiving accessible data site that either gets created or already is there
-
nulldata
You're welcome to create your own
-
Nicker8
I have to do all the work?
-
nulldata
Build it and they will come - maybe
-
Nicker8
I just signed up to this and this is what I'm getting right now
-
nicolas17
we have to do all the work for you?
-
Nicker8
I thought that's what your job is
-
TheTechRobo
our job is to archive the data
-
TheTechRobo
not to store it
-
Nicker8
The "archiveteam"
-
TheTechRobo
Storing data is expensive
-
TheTechRobo
The Mildom project is 240TiB so far. That's roughly ten 24TB hard drives
-
nulldata
There's plenty of existinf open source AT projects to bootstrap from
-
nicolas17
why should we work on archiving imgsrc.ru with so many other sites to archive? (some of which already announced they will shut down soon so they are obviously a priority)
-
Nicker8
Isn't imgsrc.ru one of them?
-
nicolas17
did they announce they are shutting down soon?
-
Nicker8
You don't know when it can shut down sometimes they don't tell you and it gets shut down
-
nulldata
Funny, I feel like that's been asked a few times in the past hour with no answer lol
-
nicolas17
sure... meanwhile there's another site that announced it will shut down in September 1st, and it did, but some videos are still in the CDN so we're saving as much as we can
-
Nicker8
It's September 7th what are you talking about
-
nicolas17
and we're having trouble with that because our intermediate servers are absolutely saturated with network and disk
-
nulldata
Do you remember? The 21 first night of September.
-
nicolas17
exactly, the website shut down 7 days ago and we're saving the data that is still in their file server, if we got the links from the website before the shutdown
-
Nicker8
This has to be a circus right now this is looking like a stand up comedy show that didn't get many claps right now and apparently this isn't what I expected to sign up
-
nicolas17
yeah you thought you could come here and give a request and everyone would drop what they are doing and work on your request for free
-
Nicker8
It wasn't a request it was to see if imgsrc.ru was already archived
-
nicolas17
maybe it is, maybe it isn't, we can't know because archive.org blocked it, maybe go ask archive.org why it's blocked
-
Nicker8
We already know apparently you didn't see that
-
Flashfire42
Nicolas17 is being a bit of a dick but he is saying exactly what we are all thinking. We have to prioritise what we can. It would be great to grab every video every uploaded to youtube but we dont have the space or time to do that. We only archive we cant control if something is blocked from the archive. It fucking sucks but they are well within
-
Flashfire42
their rights to block it
-
Flashfire42
And I commend him for saying what we are all thinking
-
nicolas17
Flashfire42: I guess you were still typing when Nicker8 successfully out-dicked me :P
-
Flashfire42
yeah I was XD
-
Nicker8
Yeah do you think you can control everything that happens on imgsrc.ru like removed accounts or removed posts hm?
-
Nicker8
Exactly that's what could've been archived
-
Nicker8
I don't think nulldata knows his days very well lol
-
nicolas17
maybe go ask archive.org why it's blocked
-
Nicker8
Bro we already know did you not see?
-
Flashfire42
Nicker8 my advice. If you want to help? Run a warrior. Donate to the archive.org fundraisers. Stick around for a bit. recommend a few sites to run in archivebot. Get to know how we work and then you can start to mkae polite requests
-
nulldata
Can't you see? Woha can't you see.
-
Nicker8
I have to pay money now?
-
Flashfire42
I mean you dont have to donate money directly to the archive but if you come in here making demands of a site that hasnt announced a shutdown then you should be contributing something
-
Nicker8
This isn't what I thought it was about requests
-
OrIdow6
This ain't joquinit, this seems overly antagonistic
-
nulldata
ArchiveTeam OnlyFans
-
Flashfire42
I mean assuming archivebot isnt still on fire it might be able to be run in archivebot but we cant do anything about making it Publically accessible if the site is manually excluded from the wayback machine
-
OrIdow6
Nicker8: So from my understanding, imgsrc.ru is important to the Russian Internet, but not curreently at immediate heightened risk?
-
nicolas17
Flashfire42:
-
nicolas17
Flashfire42: nsfw content seems to have a click-through that sets a cookie, so at least *that* won't work with archivebot
-
Nicker8
Nope it is
-
OrIdow6
Nicker8: How so?
-
» steering passed around popcorn
-
Nicker8
It's very rare?
-
OrIdow6
Nicker8: When you say "nope it is", are you replying to "imgsrc.ru is important to the Russian Internet", or to "but not curreently at immediate heightened risk"?
-
OrIdow6
*currently
-
OrIdow6
:|
-
» DigitalDragons comes back from grabbing snacks
-
DigitalDragons
aw they left
-
Flashfire42
DigitalDragons well they left but I still want snacks
-
nicolas17
-
OrIdow6
And here I was actually trying to communicate...
-
steering
nicolas17: lmao
-
» steering thinks back to 13 on irc
-
steering
you're not wrong
-
steering
OrIdow6: I don't think it's at any more risk than any other semi-legal free image host that allows porn
-
nicolas17
"it could shut down at any moment you never know" is the default state of every website
-
steering
^
-
OrIdow6
steering: When you say "semi-legal" do you mean it hosts a lot of child porn?
-
steering
OrIdow6: it certainly used to have a reputation for such.
-
steering
it has "passworded" galleries that are, or were, basically unmoderated
-
nicolas17
I browsed around the nudity category and didn't even find actual nudity, only very risque teasing
-
OrIdow6
If it's indeed a Russian site I guess that means the Russian political/media environment would be what determines if it's at heightened risk for that?
-
Flashfire42
CSAM not child porn
-
OrIdow6
I'm not familiar with Russian politics
-
steering
did a search for "young" and yup I didn't want to see any of that, even if it is AI
-
steering
glad I used tor *washes hands*
-
nicolas17
oh yeah I didn't try that kind of thing
-
OrIdow6
Could still be a nice proactive project to do I guess? Image hosts to tend to die rather dramatically
-
OrIdow6
But that makes it priority number... ca 70
-
nicolas17
what about WBM's block though
-
OrIdow6
I can't give you a super well-thought-out answer to that, but in short even if the WBM blocks it now it still may be useful in 50 years
-
OrIdow6
Certainly does make it less attractive though
-
DigitalDragons
WBM excluded sites have been done before (zippyshare)
-
steering
if it were in the US I'd say it's certain to at least go the way of imgur, but yeah I don't know about RU
-
steering
I'm guessing it's excluded due to the quantity of "depictions of minors" which are illegal in the US (but maybe not RU?)
-
OrIdow6
("And here I was actually trying to communicate" was not meant to be casting shade BTW, implicit "in my laborious manner" at the end)
-
DigitalDragons
If they're not imminently shutting down, I don't see much of a reason for an organized grab
-
DigitalDragons
Especially if they're well known for hosting "depictions of minors"
-
h2ibot
VoynichCr created Talk:INTERNETARCHIVE.BAK/torrents implementation (+1257, Created page with "== We can simply the…):
wiki.archiveteam.org/?title=Talk%3A…CHIVE.BAK/torrents%20implementation
-
ixitUIIRX
Greetings, does anyone is operator (@) or voice (+), can help me archive a link in #archivebot, thank you!
-
JAA
TIL my job description here.
-
nicolas17
I have been privately informed that "toucharcade.com is likely heading for a shutdown"
-
thuban
seems like it should be ok to ab, i'll start a job
-
nicolas17
looks like it's wordpress
-
nicolas17
thuban: toucharcade AB queue keeps growing a lot (210k URLs), any idea what that could be?
-
nicolas17
I guess we won't know until it finishes with the current recursion depth level and goes to the next ones
-
thuban
nicolas17: article pages + related resources (images) + forum threads
-
thuban
oh, and external links.
-
thuban
which is not to say that there's nothing that should be ignored, but i did a bunch of ignores during the first few levels which should cover most of it