-
thubanon that note, what about world of tanks?
-
h2ibotManu edited Mailman/2 (+50, /* calypso.tux.org/pipermail lost */): wiki.archiveteam.org/?diff=52173&oldid=52170
-
touristthuban: should I repost those messages here?
-
thubantourist: for the benefit of log-readers, sure
-
tourist[reposting from #archiveteam]
-
touristHi, just want to discuss before editing Deathwatch because it's a bit vague:
-
touristbooru.org is a site which allowed people to host their own tag-based 'booru' imageboards. Some are basically archives themselves for fandoms or special interests.
-
touristThere are about 3000 boorus hosted, eighty have over 10,000 images, ten have over 100,000 images and two exceptional boorus have 1.5 and 1.7 million images respectively.
-
touristI propose it be placed on deathwatch due to this post a couple of weeks ago from the site admin: forum.booru.org/viewtopic.php?t=14193
-
tourist>The project is closed and winding down, resources for search functionality etc. at peak times will get tapped out.
-
touristDoes it seem like this site would require a dedicated project or should it be added to the Deathwatch page as normal, with 'Unknown' date.
-
tourist[/end repost]
-
thubantourist: in theory sites with troubling vital signs but no clear shutdown announcement should go on 'fire drill' rather than deathwatch, but that page is a bit of a mess and i've been meaning to clean it up for some time, so i think deathwatch is ok for now
-
thubanand we do add sites to deathwatch even if they get their own wiki pages/dedicated projects (although my guess is that it won't be necessary in this case)
-
touristAlright, I'll add it to the list now. Thanks :)
-
thubantourist: you're welcome! do you know whether there's a way to get a list of all the boorus, and/or whether booru creation/activity has been disabled?
-
touristBooru creation is closed. Boorus are still active.
-
touristList of boorus can be found at booru.org/top but you can only grab up to 200 per page.
-
thubanthat's fine if it's a complete list; let's see
-
thubanyep, looks like it
-
thubanthanks!
-
arkivershould we do something for opensubtitles.org ? they have been restricting access greatly lately
-
arkiverRyz: no updates, and no reply from them
-
arkiverperhaps at this point best is just gathering lists of abload.de URLs and pushing them through AB if it's not an extreme number of URLs
-
arkiverhuh is opensubtitles.org completely behind login now?
-
arkiverwhat... looks like it, same for others?
-
arkiver:(
-
thubanarkiver: no, not for me
-
arkiverthuban: do you have a example of a subtitle URL?
-
arkiverthat is not behind a login for you
-
thuban
-
arkiversends me to a login form
-
arkiverlet me VPN this
-
arkiverthuban: hmm from a different location i get no login scen
-
arkiverscreen*
-
arkiveri feel like opensubtitles.org is becoming more shitty fast though
-
thubanarkiver: X-Forwarded-For trick work?
-
arkiverthuban: no i think
-
steeringarkiver: yes, very much becoming more shitty fast
-
steeringi think at one point (some months ago) it also tried to make me login but doesn't do it now
-
arkiveri think we'll launch a project for them
-
arkiverbetter archive it before its too late
-
arkivernow they apparently have a forced login for some IPs
-
arkiver---
-
eggdrop[karma] '-' now has -2 karma!
-
arkiverWHAT
-
arkiver---
-
eggdrop[karma] '-' now has -3 karma!
-
arkiver- --
-
eggdrop[karma] '-' now has -4 karma!
-
arkiverwhat magic is this
-
steeringsourcery
-
thubanstrip trailing '--', trim remainder of message
-
steering-- -
-
steeringno pre-decrement
-
arkiveranyway
-
arkiverso something i came across
-
arkiverthis one is apparently going away in June bedfordregiment.org.uk - clearly some simple site made by probably a single enthusiastic person
-
arkiver(i put it in AB)
-
arkiverthey have a list of sources/links to similar sites at bedfordregiment.org.uk/links.html
-
arkivernear the bottom of the page they have
-
arkiver> A Northamptonshire family history site worth knowing, which carries a wide array of [...]
-
arkiverwith a link to familyhistorynorthants.co.uk , which is a blog about gambling.
-
arkiverbut looking the front page up in the wayback machine, one finds a beautiful simple little site rich with information... gone and taken over by some gambling/scam business recently
-
arkiveri wonder if we can find these sites easily somehow and get them all archived, it's sad to see how some of these end up. i bet many of these are maintained by old enthusiastic people, who may pass away in the coming years, after which their sites go down and tons of information gets lost
-
thubanarkiver: i was just thinking along the same lines (i love sites like this and run them trough ab whenever i come across them)
-
arkiverthuban: yeah!
-
thubanmarginalia.nu is not a bad source for these; i think the index is available somewhere
-
arkiverwe should get them all
-
arkiverperhaps we can also contact several of these sites and let them know that we can archive these types of sites
-
arkiverperhaps they could spread the word, and people behind these sites could submit lists of sites like these that they know about
-
arkiverthere's maybe forums with enthusiasts around these kind of subjects?
-
arkiverthuban: did you ever contact marginalia? maybe we should contact them?
-
thubanarkiver: downloads.marginalia.nu/exports ! i think 'domains' is what we would want?
-
thubanor 'urls', depending on how we handle it
-
thuban^ *through
-
arkiverthuban: love it, yeah! i don't know much about marginalia, do they only collect these type of little home made sites?
-
thubani don't know that much more, but there's a fair amount of writing about the project(s) and philosophy on the site
-
thuban
-
arkiverthuban: i love it
-
arkiverjust looking at search.marginalia.nu
-
arkiveri need to #Y up and running really
-
arkiverso we can get all these domains
-
h2ibotManu edited Mailman/2 (+46, /* datacast.hu/mailman/listinfo saved */): wiki.archiveteam.org/?diff=52174&oldid=52173
-
arkiverbedfordregiment.org.uk is in the list!
-
arkiver
-
arkiveryeah we need to get this archive, amazing!
-
c3manuarkiver: are you looking for a crawled index of individual pages, or seed URLs?
-
arkiverc3manu: any
-
c3manu
-
c3manuthis is where people can submit urls :)
-
c3manu..or what people submitted
-
arkiverlovely!
-
arkiveryeah we should get that too
-
c3manufeel free to extend the wiki page ;)
-
c3manu
-
c3manui also think webrings would be good for indices. in the indie corners of the internet those are getting popular again
-
c3manujust look at this: webring.xxiivv.com
-
thubanin theory yes; in practice it might be difficult to identify webring to/from links (since they can be formatted arbitrarily)
-
thubanah, a central index :)
-
c3manuyeah, that's definitely not going to be fun ^^
-
arkiverperhaps it's more something for marginalia to find these sites through those ^ and list them online?
-
arkiveri will send marginalia.nu an email about this awesomeness
-
arkiverdo we have a pipeline on AB that can handle a 180 GB file?
-
arkiveri want to throw downloads.marginalia.nu into it
-
thuban^^ sounds good, i'm not sure whether the index is really curated or the search engine is doing the heavy lifting
-
c3manui approve re awesomeness email :)
-
arkiverthuban: i guess they do some checks on the website front page to see if it is "old style" and include it only if it is
-
kiskaI feel like this is going to be a problem server8.kiska.pw/uploads/0bc070b5366d602c/image.png
-
kiskaLimited to 10 per day...
-
arkiverkiska: yeah it would be a very long term effort
-
nyanyarkiver: depends on how it's done
-
nyanyif it's ip based, sure we're metaphorically screwed
-
nyanybut if it's SESSION based... (JAA's favorite)
-
nyanyi.e. store session with 24h expiry as cookie object, thus enabling easy bypassing if one were to simply ignore cookies
-
that_lurkerarkiver: Space wise the new pipelines like firepipe should fit 100+ gig files easily
-
arkiverthat_lurker: thanks! i put it on firepipe-f
-
AKInteresting
-
AKNever heard of marginalia before now
-
kiskanyany: looks to be IP based
-
nstrom|mdoes look like www.opensubtitles.org supports ipv6 though
-
ThreeHMThat 10/day limit only appears on their new "beta" site for me, I can still download as much as I want through the regular/older one
-
ThreeHMYet another reason to archive it before they change that I guess
-
JaffaCakes118My friend has a file analysis site and would like all his current reports archived, I have a list of almost 400k links and was wondering if someone could start the archivebot archive of them please - transfer.archivete.am/ffsKw/neiki%20analytics%20links.txt
-
eggdropinline (for browser viewing): transfer.archivete.am/inline/ffsKw/neiki%20analytics%20links.txt
-
JaffaCakes118sites currently running cloudflare with the "essentially off" setting enabled, but I can get them to disable cloudflare completely if needed, but don't think it will be needed
-
thubanJaffaCakes118: your friend's reports depend on javascript-initiated requests; archivebot will be useless
-
thubani suppose it might work if we generated the corresponding api url for every page
-
JaffaCakes118thuban the links are able to be archived perfectly through the save now page
-
JaffaCakes118is it not the same for archivebot?
-
thubansave page now is not archivebot
-
thubanno
-
katiasave page now runs a browser, archivebot doesn't
-
JaffaCakes118ah ok
-
JaffaCakes118is there any way we can still archive it? My friend of course will be willing to make changes
-
katiabut seems we'd just need api.neiki.dev/analyze/reports?sha256=...
-
JaffaCakes118yeah he said save the api instead
-
JaffaCakes118and it will return the data of it
-
katiawell, alongside
-
JaffaCakes118I will get a list of links now for the api.neiki.dev
-
thubanno need
-
JaffaCakes118oh ok
15 minutes ago