-
OrIdow6
docs.framasoft.org/fr/framasite/rattachement-nom-de-domaine.html is their page on custom domains, might be able to get a few from the Rapid7 data - I will run this myself when I get back, if no one else has
-
OrIdow6
JAA (or anyone else with a means of queueing them to AB): Can you run those lists I put in above?
-
JAA
OrIdow6: Yep, will set it up shortly.
-
JAA
Also checking Rapid7 now.
-
JAA
framasite_subdomains_from_wbm_cdx is running through queueh2ibot now.
-
arkiver
JAA: this is nice
-
arkiver
looks very good on the dashboard as well
-
» arkiver didnt know we have auto queuing for AB now
-
JAA
:-) Thanks.
-
JAA
Yeah, only really for big lists of stuff that can be done mostly unsupervised though.
-
JAA
It has been deployed three times before: US elections, UK elections, and pttk.pl.
-
JAA
It has a configurable limit and checks against
dashboard.at.ninjawedding.org/status constantly to not build up a huge queue.
-
JAA
OrIdow6: 118 frama.site subdomains in Project Sonar FDNS (2021-06-25-1624579787-fdns_a.json.gz):
transfer.archivete.am/5vETi/sonar_fdns_a_frama.site
-
JAA
-
JAA
Yeah, I saw somewhere that a date for the static site transformation was supposed to be announced last month.
-
JAA
(That's on GNOME Bugzilla.)
-
pabs
JAA: good to hear that GNOME Bugzilla will become a static site, the comments on bug closing make it sound like they will just shut down the server
-
JAA
Yeah, they want to shut down the Bugzilla-specific infrastructure by the sound of it. I.e. just throw a static copy on something they're running anyway, then get rid of that.
-
JAA
But let's archive it anyway because such migrations are always a nice source of errors and issues.
-
pabs
agreed
-
JAA
OrIdow6: The 48 sites that appear in Rapid7's list but not in the CDX one are running through now. Will deal with the wikis afterwards, but that's messier due to DokuWiki requiring extra ignores.
-
JAA
And now the 83 Framawikis from CDX + Rapid7 are running through AB.
-
OrIdow6
Thanks
-
OrIdow6
JAA: Did you do A and AAAA in addition to CNAME?
-
OrIdow6
Since that's how they instruct people to do custom domains
-
JAA
OrIdow6: I did A, not CNAME. See specific filename above.
-
JAA
I'm guessing Rapid7 does an A lookup and puts all answer records into the A file regardless of whether they're actually A records.
-
JAA
Also, this was essentially a `grep .frama.site`, so I'm guessing it should've caught custom domains unless they weren't using a CNAME.
-
JAA
Oh, actually, no.
-
JAA
Running another scan without the leading dot, i.e. frama.site and frama.wiki; that's how the CNAMEs show up.
-
JAA
(Sorry for not wanting to parse 2+ billion JSON objects and run the scan properly. lol)
-
OrIdow6
JAA: I mean, A to 144\.76\.131\.210 and AAAA to 2a01:4f8:141:3421::210
-
JAA
Ah
-
JAA
No, did not.
-
OrIdow6
Mine for that may have finished
-
OrIdow6
Yes it id
-
Barto
OrIdow6: i guess we just don't live on the same side of the world, is it? :-)
-
JAA
But I did find a few custom domains with frama.site.
-
OrIdow6
Barto: So it would seem, haha
-
JAA
Apparently zstdgrep doesn't support multiple patterns, so it didn't search for wikis. :-|
-
Barto
OrIdow6: their page to describe which service they'll keep and which they'll drop sounds complete. Anything else needed from my end?
-
JAA
-
OrIdow6
-
OrIdow6
Thanks for this new one
-
JAA
Guess we need to filter out their own sites first. Can you prepare a combined list of these two so I can throw them into queueh2ibot?
-
OrIdow6
Barto: What we would like to have is a list of user-created domains, links, etc.
-
OrIdow6
Within the services
-
OrIdow6
JAA: Alright; do you have something automatically doing dedup on your end?
-
JAA
I can handle dedup, yeah.
-
OrIdow6
OK
-
JAA
I just realised I'll also have to detect DokuWikis on these though. :-|
-
JAA
Shame that they don't use distinct IPs for sites and wikis.
-
OrIdow6
Yeah
-
OrIdow6
Would make this is bit easier as well
-
OrIdow6
Going to strip out non-www if www is present
-
OrIdow6
-
OrIdow6
There's at least one (
mta-sts.b0c.asso.st) that seems to be misconfigured in some way, so all it gives is a cert error and then (if you bypass it) a site not found error
-
OrIdow6
But do not think that should cause problems
-
JAA
Yeah, there are 25 of those.
-
Barto
OrIdow6: i see. Shall I try to ask them kindly on libera or is it a lost cause?
-
JAA
Or well, 25 certificate issues, didn't check further.
-
OrIdow6
Barto: I think there's still a chance
-
OrIdow6
Since they do very much seem to be a "community" organization - the thing I can see shifting it the other way would be privacy concerns, hence my focus on Framasite, Framawiki, and Framalink, which seem to be the most "public" of the ones going down
-
OrIdow6
Be aware that hook54321 asked ca. 35 minutes ago in that channel, and there has been no response nor any other activity sense
-
OrIdow6
*since
-
JAA
List of AB commands is ready, just waiting for the current batch to finish.
-
Barto
their irc channel is unofficial, will try another way
-
OrIdow6
Oh
-
OrIdow6
Do what you think is best
-
Barto
i'm just seeing there's no ops in this channel and the title explicitely mentions it's "unofficial" in their description
-
Barto
i'll try to throw them a message via
contact.framasoft.org asking how they do their framasite counter, and if they're allowed to prove this number
-
Barto
and i'll ask if i can check that none of my friend are affected by this closure, how can i find that ;)
-
OrIdow6
I'd rather not be deceitful about it
-
Barto
do you want me to be direct and ask them if they have a list of sites?
-
Barto
just so we do throw archivebot on it and slam their bandwidth?
-
JAA
I'd suggest something along the lines of: 'Hi, we heard you're shutting down some services and would like to preserve them indefinitely at the Internet Archive. Would you be willing to work with us to make this possible?' (But clear enough that we aren't IA etc.)
-
OrIdow6
Yeah
-
OrIdow6
"just so we do throw archivebot on it and slam their bandwidth?" - I don't think it's using up very much of their BW at present, if that's what this is asking?
-
Barto
alright, i'll try something like that
-
Barto
i was worried they'd close the door shut if i were too explicit
-
JAA
Yeah, I haven't noticed any slowdown even with 40+ AB jobs in parallel. But that's also part of the 'working with us' basically. Acceptable request rate limits etc.
-
OrIdow6
Even if lying is more effective, I'd rather not do it
-
JAA
Yeah, fully agreed.
-
JAA
OrIdow6: That last list is running through now.
-
FalconK
whois 64.71.160.46
-
FalconK
argh
-
thuban
is there a way to confirm (on current infra) whether an archivebot job finished normally?
-
thuban
oh nvm, it's in the json metadata
-
thuban
it looks like no new ab jobs have been submitted for the hong kong media sites, even though several of the first round of jobs have finished and there are more sites on the list.
-
thuban
i've just updated
wiki.archiveteam.org/index.php/Hong_Kong_media ; can someone put in a new round? (we're not waiting on pipeline capacity, are we?)
-
thuban
have also just added the twitter links nuroten dug up for potential snscrape jobbs; thanks again nuroten
-
thuban
(are you sure about the social media for hkpeanut and the twitter for memehk? they seem unrelated to me)
-
HCross
arkiver: you were right
-
thuban
i've also just updated the youtube and youtubearchive video counts (diff:
wiki.archiveteam.org/index.php?titl…ype=revision&diff=46929&oldid=46928).
-
thuban
if someone with yta privileges could have a look, that would be really helpful--some channels we didn't and still don't have complete copies of, some channels we did have complete copies of but have since published more news, and some channels' video counts have dropped precipitously
-
thuban
last group is tvmost and d100, which we seem to have had most of, and i-cable, which we definitely didn't
-
AK
thuban, goot point, I need to go through and add them in
-
AK
I'll get the rest of the HK media stuff added in today
-
thuban
thank you! i'm taking a quick break before i add the political parties and other stuff from the rest of the etherpad
-
thuban
let me know whether you start submitting jobs for stuff that isn't on the wiki page yet, and if so i'll make sure i link them
-
Iki
Highly speculative, but maybe worth proactively covering Taiwan media in the next couple years?
-
Iki
-
Iki
Given the recent HK project
-
lunik1
what about Macau too, I don't know what the situation is there
-
nuroten
thuban: thanks for updating the wiki page with the links. some of the associations from the parties section have disbanded or since have deactivated their FB pages, so feel free to remove in that case. not sure if it will be useful to note in a separate section the dead ones, mostly if someone asks if they have been saved
-
nuroten
* since disbanded and deactivated
-
nuroten
they're falling off at a quicker pace now, unfortunately I can't keep track of them all daily
-
Barto
OrIdow6: no answers yet, we'll see if someone will answer this evening
-
Ryz
Don't think you should remove those Facebook links, just because of future reference stuff; just mark them as dead and not delete the entries
-
nuroten
okay. a few are crossed out in place as they might have other accounts or website that are still up, some only have Facebook, not much to archive otherwise?
-
nuroten
1-2 I moved to a dead section, but yeah, whatever makes most sense to people
-
thuban
nuroten: do the neo democrats still have a facebook page? i guess it doesn't matter that much if we can't archive it anyway, but the news story about the disbandment mentions it
-
JAA
AK: Your wiki account is automoderated now.
-
AK
Uh oh
-
AK
That sounds dangerous
-
JAA
No more manual approval of your edits. :-)
-
thuban
is there a way to get the full archivebot job id for a finished job? the json file doesn't include it, and
archive.fart.website/archivebot/viewer only displays a truncated version
-
JAA
Not easily, no.
-
JAA
It's in the -meta.warc.gz in the command line arguments and log messages.
-
thuban
good enough for me. thanks!
-
Iki
tai
-
Iki
oops
-
nuroten
thuban: probably not anymore, it's not coming up for me in searches
-
nuroten
individual party members may still have Facebook accounts (likely managed by friends/relatives), also tried looking at the WBM snapshots of their website, didn't see one
-
nuroten
thuban: the wiki links to hkpeanut.com which is a different website. hkpeanuts.com is down
-
thuban
that would explain it, thanks
-
nuroten
the domain redirects to simcast ... facebook link updated, I haven't found twitter yet (the wrong links were initially grabbed from that other website)
-
thuban
fixed on etherpad & wiki page
-
thuban
(job cyko5axvbovcbec1vahgyz2xm was started on the wrong site, but it looks like it's finished already)
-
nuroten
I can't check if their Facebook is still active, site keeps prompting me to log in, but twitter is gone
-
nuroten
ah well, hkparenting might be eventually useful haha
-
nuroten
they have an updated tumblr though
-
thuban
thanks. i'm gonna add the last of the stuff from the etherpad sometime in the next few hours, so if you haven't already i'll edit the link in then
-
nuroten
great, thanks :)
-
nuroten
regarding Taiwan and Macau, I'm unfamiliar with the situation there, at this time I would probably put Taiwanese media in the "healthy" category. last I heard Macau already passed a version of NSL a few years ago, so protests there are rare now
-
nuroten
(but I agree it might not be a bad idea to start covering Taiwan eventually, if only to have a head start when things go south fast)
-
OrIdow6
Thanks Barto
-
AK
At the very least, a list of possible Taiwanese sites would make the job of capturing them easier if/when it's needed
-
OrIdow6
In the event of an invasion, it is unlikely that only the Taiwanese media would be threatened
-
OrIdow6
We have (the old) domains-grab
-
arkiver
yeah
-
AK
Good point
-
AK
It'd be a big one
-
arkiver
hopefully #Y will be ready before that happens
-
arkiver
(i think it will)
-
AK
eu-domains was one of the first projects I did