-
h2ibot
KamafaDelgato edited Deathwatch (+133, /* 2023 */):
wiki.archiveteam.org/?diff=49939&oldid=49926
-
h2ibot
JustAnotherArchivist edited Deathwatch (+2, /* 2023 */ Fix Miraheze date):
wiki.archiveteam.org/?diff=49940&oldid=49939
-
h2ibot
JacksonChen666 edited URLTeam (-2, utilize example URL column, reorder words,…):
wiki.archiveteam.org/?diff=49941&oldid=49886
-
h2ibot
KamafaDelgato edited Current Projects (+75, /* Upcoming & proposed projects */):
wiki.archiveteam.org/?diff=49942&oldid=49933
-
pabs
mikolaj|m: sure. IIRC you don't have an account, so I'll add it later today
-
pabs
fireonlive: Google does career-driven product development, so basically all their stuff could die at any moment
-
pabs
BTW everyone, I welcome help doing Mailman2 AB archiving, some tips and the list of sites here:
wiki.archiveteam.org/index.php/Mailman2
-
nicolas17
Google Domains officially exited beta in 2022
-
nicolas17
I'm sure the person who successfully brought it past the "no longer beta" finishing line got their promotion
-
nicolas17
and then had no reason to give a shit about the product anymore
-
pabs
still. probably the one person who cared about it got fired or moved on or something :)
-
nicolas17
nobody gets promoted for maintenance
-
pabs
:(
-
pabs
the world needs more people paid to do FOSS maintenance
-
pabs
hmm, I should also make wiki pages for Bugzilla and MoinMoin instance archiving
-
nicolas17
pabs: you linked to
mail.kde.org/pipermail but that doesn't exist as is
-
nicolas17
I guess it makes sense for consistency though
-
nicolas17
mail.kde.org/pipermail/kde-www archives are indeed under that URL, but you can't get a list of mailing lists from /pipermail/
-
nicolas17
also, I have server access to mail.kde.org, but I can't just get you the archives from there because there's private mailing lists :P
-
h2ibot
-
pabs
nicolas17: ack, that part of the list was just a raw extract of URLs from browser history, with the list names stripped off
-
pabs
and yes, not going to archive private stuff obviously :)
-
pabs
please add more stuff to Mailman2 if you have them
-
nicolas17
I did (ab)use my server access to grep for imgur URLs on all the forums and all the mailing list archives
-
fireonlive
;)
-
nicolas17
-
nicolas17
I sent several minor doc fixes via pull requests :(
-
nicolas17
I don't think there's anything for us to archive here since the repos will stay on github as "archived", just thought you'd find it interesting
-
nicolas17
JAA: what's the best way to archive a small number of URLs... which are ~12GB each? archivebot? I assume the url project is not suitable for those sizes
-
nicolas17
or perhaps qwarc and manual upload, since there's often two URLs for each file and we'd want the deduplication
-
JAA
nicolas17: For dedupe, yeah, qwarc or wget-at I'd say. AB has no deduping.
-
nicolas17
-
nicolas17
usually they release "developer beta" and "public beta" under different URLs but the files are identical
-
nicolas17
and they release both one day after the beta is available in other forms (ipsw and incremental update)
-
nicolas17
oddly this time they released the developer beta the same day (public beta *might* appear tomorrow), so at least for now there is only that one URL, not two
-
JAA
As mentioned, qwarc only dedupes within the same process, so grabbing that now and the other one whenever it becomes available would result in no deduplication.
-
JAA
wget-at might be able to do it, but I've never tried it.
-
nicolas17
wget-at can load CDXs from other runs yeah
-
JAA
-
nicolas17
ffs, is -at affected by that too?
-
JAA
Maybe, maybe not. Haven't tried it.
-
nicolas17
also, there's a user who has uploaded many of these files (some of them after they were already gone from Apple, so he probably has a local hoarded stash)
-
nicolas17
-
nicolas17
but how would either mine or his stuff appear on WBM?
-
fireonlive
maybe with apple it’s easier because everything’s signed? but someone would have to verify that with a well-known root i guess
-
fireonlive
if we’re not accepting the LTT-coined “trust me bro”
-
nicolas17
culpir's WARCs are also not deduplicated, there's even http and https copies x_x
-
drallamsia
meta.miraheze.org/wiki/Board/Policies/20230615-Statement this was posted today, TL;DR at the end of August Miraheze will shut down for good
-
aismallard
rip backup nick
-
h2ibot
Yts98 created Banciyuan (+3504, Created page with "{{Infobox project | title =…):
wiki.archiveteam.org/?title=Banciyuan
-
h2ibot
-
leo60228
miraheze's own dumps are mediawiki xml, right? so they wouldn't be able to be imported into the wayback machine?
-
pokechu22
Yeah, but could be imported into other wikis
-
pokechu22
(and would contain all revisions, though I'm not 100% sure if they'd include images)
-
» pabs wonders if an AB without history and off-site-links (chuck them in #//) is feasible for all of Miraheze
-
pabs
-
h2ibot
Yts98 edited Current Projects (+182, Add LINE BLOG to upcoming):
wiki.archiveteam.org/?diff=49946&oldid=49942
-
h2ibot
PaulWise edited Mailman2 (+0, lists.linuxcontainers.org AB jobs running):
wiki.archiveteam.org/?diff=49947&oldid=49937
-
h2ibot
Yts98 edited Deathwatch (-9, gimplearn.net was dead; adjust links and…):
wiki.archiveteam.org/?diff=49948&oldid=49940
-
h2ibot
PaulWise edited Mailman2 (+27, AB job for lists.man.lodz.pl for mikolaj|m running):
wiki.archiveteam.org/?diff=49949&oldid=49947
-
h2ibot
MasterX244 edited Reddit (+698, Newest developments):
wiki.archiveteam.org/?diff=49950&oldid=49932
-
pabs
mikolaj|m: wow, some mails from 1996 on lists.man.lodz.pl, didn't know mailman existed them :)
-
mikolaj|m
pabs: thanks!
-
JAA
pabs: Wikipedia says a first version of it existed in 1998. Version 1.0 is from 1999. So those were imported from something else.
-
yasomi
it looks like cohost is starting to have funding issues:
cohost.org/staff/post/1690393-h1-2023-financial-up
-
nicolas17
how long has it been running? lol
-
yasomi
Feburary 2022
-
yasomi
their main problem is they don't want to run ads lol
-
yasomi
with only 15 months of usage and a long waitqueue at the beginning of it's run, it's probably a smaller scale project
-
nicolas17
JAA: ka.cdn has new files Content/DWAPromos/en-US/SoD-061523_LightFurySwordstealer_NewToCampus.jpg Content/DWAPromos/en-US/SoD-061523_LightFurySwordstealer_NewToCampus_822x640.jpg
-
JAA
nicolas17: I'll rerun the bucket listing when the download finishes and will then download anything that's new.
-
nicolas17
yasomi: that article mentions how most ad-supported social networks aren't profitable either
-
nicolas17
so is it really "their main problem is they don't want to run ads"?
-
JAA
About a quarter of the Knowledge Adventure bucket is done. ETA is still fine.
-
arkiver
miraheze going down
-
JAA
transfer broke, will fix shortly.
-
nicolas17
I have 3 zip files to save, largest 170MB, do I ask in #archivebot?
-
pokechu22
Sure, archivebot can do that
-
pokechu22
probably fine to !ao each individually (for a larger list !ao < list for a file on transfer would be better, but that's overkill here and transfer is broken)
-
JAA
transfer is fixed.
-
nstrom|m
I'm still getting 502s from some locations. maybe the CDN needs to catch up
-
JAA
Yeah, something's still wrong, continuing to look into it.
-
nicolas17
is transfer back up?
-
nicolas17
nope, connection refused... will try later
-
JAA
Might be fine now, we'll see.
-
nicolas17
yes it works now
-
JAA
Yeah, the question is whether it'll break again. :-P
-
fireonlive
what broke :p
-
JAA
The front fell off.
-
that_lurker
-
that_lurker
This might be better :P
i.imgur.com/AA3K8Bd.gif
-
JAA
It wasn't me who kicked it though. :-P
-
that_lurker
damn who did it?
-
fireonlive
haha
-
h2ibot
Yts98 edited LINE BLOG (+682, Describe layout and 301 redirects):
wiki.archiveteam.org/?diff=49952&oldid=49920