-
mgrandi
Even if it has ads, the original video is still there for archival purposes so it's not the end of the world at that end
-
immibis
i am banned from the-archive and distributed youtube archive, or else i might see what they have going for "important channel lists"
-
immibis
the petty politics is not good for archiving without duplication
-
arkiver
immibis: what do you mean?
-
immibis
if the channels can't all be archived, having a list of channels that should be archived but currently aren't is still useful for when someone can do them
-
arkiver
immibis: what channels are this and do you have descriptions of them? we also have #down-the-tube , but there are _very_ strict rules for that on the wiki
-
immibis
arkiver: that's what I meant - I'm not aware of any watchlist of channels worth archiving in advance of actually archiving them
-
immibis
my own system has a prioritized list of channels, and works through them at its own rate, with *months* of backlog
-
audrooku|m
> i am banned from the-archive and distributed youtube archive, or else i might see what they have going for "important channel lists"
-
audrooku|m
Nothing
-
immibis
i know that DYA has a spreadsheet covering stuff that is already archived
-
immibis
i'm pretty sure neither has a "want to have" list
-
immibis
probably because they just archive it, instead of putting it on a list
-
audrooku|m
Yeah
-
Reece
-
fireonlive
lol
-
audrooku|m
"my work here is done"
-
that_lurker
at least its the correctish channel :P
-
fireonlive
well, they did try in #archiveteam after
-
fireonlive
xP
-
that_lurker
oh yeah :P
-
arkiver
JAA: who runs archivebot pipelines?
-
arkiver
we should really archive some hamas (and likely related) sites - but this may have negative implications for whatever IP this is run on
-
JAA
arkiver: Yours truly.
-
arkiver
JAA: ah :)
-
JAA
Two machines are my own, the rest are rented by others, and I run everything from there on.
-
JAA
s/rented // (not all are rented servers, actually)
-
arkiver
so basically looking for someone who might want to run a temporary archivebot which we can use to archive hamas and related content
-
mgrandi
Is there an existing project that handles a phpbb forum? Xentax is closing at the end of the year and has a lot of attachments that aren't hosted anywhere else
-
thuban
mgrandi: not specifically, plus (while there's been an ab job for the forums) i'm given to understand that attachments are login-walled
-
mgrandi
Yeah, that makes it hard for AB right
-
mgrandi
But I was seeing if maybe I could look at the seesaw project code if one exists for a phpbb forum
-
thuban
sort of--archivebot is technically capable of doing logged-in crawls, but the interface is designed not to allow them to be configured, because as a matter of policy we don't send them to the wbm
-
thuban
grab-site is basically the same internals and would work well for an 'unofficial' crawl if given login cookies (just use the forums igset)
-
mgrandi
I know past seesaw scrapes do login crawls, dunno if policy has changed
-
thuban
the only one i'm aware of was yahoo groups, and that was agreed on as a special case
-
mgrandi
I know a few art site scrapes were cause you needed to be logged in to see nsfw art
-
thuban
til, i must not have been around for those
-
thuban
anyway, a grab-site run would be a good start--could dump it on ia as an item if nothing else
-
pokechu22
I think there's also JAA's qwarc - it *probably* could do logged in stuff if needed
-
mgrandi
-
mgrandi
Was one of them
-
mgrandi
I can see if I can write a script to grab urls, or see if one exists
-
thuban
2015, wow
-
JAA
Yeah, we did a couple projects with accounts, but the most recent one was over 5 years ago I believe.
-
JAA
And generally, such data won't go into the WBM these days.
-
thuban
JAA: yahoo groups was 2019-2020. but that one was... special
-
JAA
Well, yeah, but we didn't create WARCs with accounts there, I believe.
-
JAA
Maybe I'm misremembering, but I think it was only for GMD exports.
-
arkiver
i believe so yes
-
arkiver
(but same disclaimer here)
-
thuban
-
JAA
> from warcio import WARCWriter
-
JAA
*twitch*
-
thuban
:(
-
JAA
'Special' indeed...
-
imer
oof. >"1/3 is in australia, 1/3 with me, and 1/3 on IA"[12]. As of September 2022 neither of the first two parts have been uploaded.
-
thuban
just the flowchart makes me wanna cry
-
arkiver
oh
-
arkiver
was this that alternative project?
-
JAA
Yeah
-
thuban
(the state of affairs depicted by the flowchart, not the flowchart itself, it's a very nice flowchart, thank you Doranwen)
-
arkiver
sigh
-
arkiver
where did those WARCs end up?
-
arkiver
that was annoying from what i remember
-
arkiver
ah the data was never uploadd?
-
arkiver
it should be, but not in the wayback machine due to warcio and login
-
thuban
some of it's been uploaded (but not in wbm afaik), some of it's floating around in limbo
-
thuban
ask marked and/or lennier1 (lennier2?)
-
arkiver
they'll upload when they upload, there's been plenty of time
-
thuban
anyway
-
thuban
mgrandi: writing your own script(s) seems like overkill; something wrong with grab-site?
-
arkiver
mgrandi: are the links you to through login actually then downloadable without login?
-
thuban
oh, good question
-
mgrandi
No I think it needs a cookie but I can find out later
-
mgrandi
I dunno how grab site works or if I can run that heh
-
thuban
it's basically local archivebot, dashboard and all
-
thuban
-
h2ibot
Kevidryon2 created "osu!" (+2283, Version 2):
wiki.archiveteam.org/?title=%22osu%21%22
-
h2ibot
JustAnotherArchivist moved "osu!" to Osu!:
wiki.archiveteam.org/?title=Osu%21
-
lennier2
I never did figure out for sure what that "1/3 in Australia" was referring to. Did datechnoman run any targets? I was meaning to ask them.
-
h2ibot
JustAnotherArchivist edited Osu! (-59, Cleanup):
wiki.archiveteam.org/?diff=50978&oldid=50977
-
mgrandi
-
mgrandi
Looks like you need to be logged in, maybe it does a redirect to the raw file that might not check
-
thuban
oh dammit, they moved the date up
-
h2ibot
Switchnode edited Deathwatch (-1, /* 2023 */ update xentax with new deadline):
wiki.archiveteam.org/?diff=50979&oldid=50973
-
thuban
(dec 1 now)
-
fireonlive
:/
-
Doranwen
thuban: I didn't create the flowchart, lol - I did contribute to some of the *data* being acquired, and to sorting it all out (still working on that!) - but I'd have to search through logs to see who created the flowchart
-
audrooku|m
I appreciate your dedication <3
-
thuban
oh, my mistake--looks like it was OrIdow6. i hereby redirect my thanks
-
thuban
ditto, though