-
Suika
Hm, more or less. There is patreon_data, patreon_inline, shared_data +fantia_data
-
Suika
patreon_data is basically `[Patreon ID]/[Post ID]/[data or another folder]`
-
Suika
patreon_inline and shared_data are loose data.
-
Suika
I think the patreon_inline uses [Post ID] for it's folder naming, let me check
-
Suika
And shared_data is user uploaded content.
-
Suika
Everything can be grouped by simply processing the HTML files to extract the links and then move the data in it's own contianers.
-
Suika
If you are interested, here are the html/json files.
mega.nz/folder/Ik5yxIDR#Ah-5yJwBNd2ae_uBuCJuUA
-
Suika
Whatever links to local content can be found in every html file, will correspong to the location on disk.
-
mgrandi
I'll take a look
-
Suika
Mind you, json was always a broken mess.
-
mgrandi
Is that from an API on the site or something m
-
Suika
Json was server by the site and used ages ago for something internal. But never removed. It has broken links. The rest of the data is... fine I think.
-
OrIdow6
So I'll agree that, if it's found permissible, the best way to do it would be to split it up one user per item (unless there are 100 million users or something), then split the file such that the original is the union of all the per-user items (and perhaps some miscellaneous data)
-
OrIdow6
But that's dependent on that condition, and (apparently unlike some of the other people here), I don't seem to have too good of an understanding of what this data is
-
Suika
Basically you pay on patreon for whatever you use, then you enter you session id into the site and it scrapes the content you have access to.
-
OrIdow6
But I have other things to worry about, anyhow (#chromeweblore now has ops and is open for business)
-
mgrandi
Yeah, so it basically is like having one person pay for netflix, giving this site your session cookie and then the site scrapes and hosts for free all of netflix's content
-
JAA
SketchTheCow: Yeah, we've been pushing a lot of stuff since FalconK's recent return and the addition of his pipeline. I'll try to divert some of the data directly to S3.
-
SketchTheCow
It's too much. It just is.
-
SketchTheCow
At least make him space it out
-
SketchTheCow
FOS does a lot of things. It should really stop being a pinchpoint
-
JAA
Aye
-
kyndigs
I archived entire iTunesU as I read rumors it was being closed, not sure the best to way preserve it online now, it needs a bit of organizing first, I grabbed as much meta data as possible for each institution and course, including artwork etc, any suggestions?
-
kyndigs
I was thinking archive.org, per institution/course uploads
-
JAA
kyndigs: Just to clarify, it's not a rumour, iTunes U is indeed shutting down at the end of 2021:
support.apple.com/guide/itunesu/welcome/web
-
kyndigs
looks like i got it just in time then
-
kyndigs
:|
-
JAA
2021
-
JAA
In a bit over a year.
-
kyndigs
is anyone else working on archiving it?
-
kyndigs
it was a bit brute force as the api are not great
-
kyndigs
The only thing is the organisation is not the best as it is in iTunesU app, as you essentially need to enroll the course then you can spoof the cookie to download the course structure data, but for 1000s of courses it was not feasible for me to do alone.
-
kyndigs
so only have the itunesu store data instead
-
kyndigs
I think I should share all the meta information with download links etc included and get suggestions on best way to organize, or can have a better effort to archive it better than I did.
-
Suika
Either way, I'm off to sleep. I'm on a bouncer, so poke me if you decided something. I'll read it the coming days-
-
arkiver
EggplantN: anything on fixing the discord thing?
-
tech234a
List of all iTunes versions since 7.3.0 which added support for the original iPhone:
theiphonewiki.com/wiki/ITunes. Not sure if iTunes 12.6.5.3 would be helpful for archiving iTunes U courses (it was the last version before they removed a bunch of features, perhaps iTunes U was one of the removed features).
-
kyndigs
I already found how to archive it, and done so, I have all the meta with the direct links to download the media, if someone else wants to archive it with better organization, would be a matter of reading the meta and links and using that data to organize a bit better.
-
kyndigs
Institution like this
-
kyndigs
-
kyndigs
Then course like this
-
kyndigs
-
kyndigs
-
purplebot
-
OrIdow6
kyndigs: By doing this a year in advance, you are about 11 months ahead of when ArchiveTeam will do it
-
kyndigs
hehe
-
kyndigs
headstart so no last minute panic :D
-
kiska
We are the masters of doing last minute projects :P
-
OrIdow6
Anyhow, yeah, I'd say organize by course
-
kyndigs
But i will point out there is some 3rd party hosted content on itunesu
-
OrIdow6
This is actually contrary to what ArchiveTeam did with the UC Berkeley videos
-
OrIdow6
-
kyndigs
which is lost, so my worry was waiting longer might lose more
-
OrIdow6
Where there was 1 video per item, and an IA collection for each course
-
kyndigs
Was that manually organized do you know?
-
OrIdow6
What do you mean, manually?
-
kyndigs
As to get the course structure you need to subscribe to it and use the auth cookie to grab the structure
-
kyndigs
Allowing you to identify "Lecture 1" etc
-
OrIdow6
-
OrIdow6
However, there was presumably some manual effort to create/coordinate the creation of the collections on IA
-
OrIdow6
As those can't be made by normal users
-
OrIdow6
(Sourced from Youtube playlists, apparently)
-
OrIdow6
Oh, rereading, I see what your problem is
-
kyndigs
yes those are the playlist, those are easy to get i was talking about this
-
kyndigs
-
kyndigs
you see how the app shows it, organized its purely meta data on how to display but would be quite useful in organizing
-
kyndigs
we can get that info, but its a bit manual and tedious
-
kyndigs
-
kyndigs
and you can pass the authentication cookie in the request
-
kyndigs
but you need to subscribe to the course first to get that cookie which for 1000s of courses is a pain!
-
OrIdow6
I'm not familiar with Mac & all that too much, so if there is some way to automate it, I wouldn't know it
-
OrIdow6
And if there's no easy solution, there's no easy solution
-
kyndigs
yeh i even contemplated jailbreaking the ipad and writing something to do it but no idea how to do IOS stuff either!
-
kyndigs
worst comes to worst ill subscribe all the courses then get the data, main thing was the materials first which we got.
-
OrIdow6
Yeah, it sounds like the recordings are the most important thing
-
OrIdow6
If nothing else, you can piece them together after
-
OrIdow6
"You" not being you specifically
-
kyndigs
yeh what ill do now is upload all the meta and people can have a look also
-
kyndigs
the upload of materials will take some time for sure
-
kyndigs
kind of regretted starting it half way through :D
-
OrIdow6
So as it stands now, you just have a soup of videos, with nothing to indicate automatically that two are from the same course or school?
-
kyndigs
i can organize files to institution and course
-
OrIdow6
Oh, good
-
kyndigs
but not deeper
-
kyndigs
like lecture 1, assignment, quiz
-
OrIdow6
Well, it means that if or when you have all the metadata, you won't need to reupload everything into a different structure
-
kyndigs
exactly with the meta it can be used to organize how one likes
-
kyndigs
still a lot of meta atm but just not the app specific structure of the course which is much deeper and organized
-
kyndigs
i contacted IA also for some guidance
-
kyndigs
as i dont want to upload just a big dump of files :D
-
OrIdow6
If you upload more than 50? (I think that's the threshold) items of the same theme, you can email them and they'll put them into a collection for you
-
OrIdow6
But I suppose you've already done that, in a way, anyhow
-
OrIdow6
But again, one course per item is the way I would do it, but evidently it's been done differently in the past
-
kyndigs
yes makes most sense but will see what they say, "send us hdd we will do" ideally :D
-
OrIdow6
They've done it in the past, though I don't know how common it is
-
OrIdow6
But good luck with it
-
mgrandi
Any work you do should be added to the wiki so even if you aren't able to finish it, others can take it up. If it's not already added to deathwatch, you should do that too with a link to the wiki page
-
JAA
It's on Deathwatch.
-
kyndigs
will do
-
FalconK
SketchTheCow: sorry for filling your disk lol
-
FalconK
I have enough disk to hang onto WARCs for a while if needs must; I could also put them directly into IAS3 again if that's desirable
-
FalconK
... I'm glad I'm still here. I just had the absolute worst flight round trip from seattle to portland.
-
» FalconK breathes
-
fionera
Fusl: how far are you from dusseldorf?
-
fionera
I feel offended
-
fionera
even tho there is no reason. I just need smth to be angery
-
Fusl_
fionera?
-
fionera
Just ignore me im bored
-
Fusl_
uhhh, okayy?
-
OrIdow6
arkiver: Anything on the .eu domains?
-
SketchTheCow
It's fine, just pace it.
-
SketchTheCow
Do a... I don't know, 1 minute sleep between WARCs
-
arkiver
OrIdow6: not yet
-
SketchTheCow
Hey, fucksticks.
-
EggplantN
<Northern#2000> Ah shit Jason is losing it in lockdown
-
SketchTheCow
So, look. The new IRC is all nice and secure and all that but in a shocking twist, the guy who refuses to run https on his website keeps getting shit for his IRC client not running SASL or whatever
-
SketchTheCow
Now, this is fine. The only issue is Archivebot gives me guff when I make requests.
-
SketchTheCow
That's fine.
-
SketchTheCow
As well.
-
SketchTheCow
What I'd like to know is who is in that channel all the time, or here, who can go over to archivebot and do a save when I ask.
-
SketchTheCow
I ask, if you look back, every 3-10 days. Mostly because I noticed something or I was asked.
-
SketchTheCow
Speak up
-
SketchTheCow
Or I can make a phrase here that does it.
-
SketchTheCow
Otherwise, I'm just fine with things. I'm busy elsewhere with shit none of you can do because *makes jazz hands* and you wouldn't understand
-
SketchTheCow
I was trapped near the inner circle of thought.
-
JAA
Ryz and/or I are usually around, so if you ping us in #archivebot, that'd probably work. Or just mention it here (so it doesn't get buried in the bot traffic) and whoever sees it first can throw it in.
-
Ryz
Loot!
-
SketchTheCow
OK, that works.
-
SketchTheCow
I'm just not on IRC much and my communications are mostly elsewhere.
-
SketchTheCow
I stay on here to help with major project issues or if someone needs a completely redundant old guy to back up their ridiculous petty shitfight
-
SketchTheCow
And that is fine. But Archivebot wants specific things and I'm not interested.
-
SketchTheCow
SPEAKING of raining on everyone's parade, I've been informed FOS has to go down for a hardware upgrade on the 24th
-
JAA
Oof
-
SketchTheCow
Hopefully just a few hours, we'll see
-
SketchTheCow
I told you! FOS as a pinchpoint is bad!
-
JAA
I know!
-
SketchTheCow
It's not doing anything special, either
-
SketchTheCow
I mean, in theory, I can teamarchive1 do some of this work
-
SketchTheCow
I've never, actually, you know, actually evaluated the pipeline I use
-
SketchTheCow
I bet I could do that experiment.
-
SketchTheCow
Maybe it'll work better. It has less space than FOS
-
OrIdow6
(From #archiveteam)
-
OrIdow6
This is the sort of non-game Flash site that I think is going to be somewhat overlooked
-
JAA
-
SketchTheCow
I just checked. teamarchive1 has about 3 terabytes.
-
SketchTheCow
I would be able to get an archivebot pipline to it, but that's not a massive amount of headroom
-
JAA
Well, we're more limited by the upload to IA than the pipelines at the moment.
-
JAA
Or do you mean for uploads?
-
SketchTheCow
So two possibilities.
-
SketchTheCow
TEAMARCHIVE1 just got mega upgraded, so maybe the network performs better.
-
SketchTheCow
I am on Off The Hook tonight, but I'll see if I can mirror the archivebot pipeline on teamarchive1, and then we can aim some archivebot output at it.
-
SketchTheCow
If this works, we can do a swap during crushtimes.
-
SketchTheCow
Like when FOS has to go down
-
JAA
Sounds good.
-
SketchTheCow
Maybe later tonight we can try it.
-
SketchTheCow
Actually, let's try it now. Is it hard to swap over?
-
kiska
SketchTheCow: Could we get FalconK added to the archivebot collection for uploads?
-
JAA
Every pipeline uploader needs to be swapped individually, so kind of but not very.
-
SketchTheCow
Well, just do one pipeline, we'll see how teamarchive1 keeps up
-
kiska
JAA: Could do the instagram pipeline
-
SketchTheCow
We'll know pretty quickly if this is a good idea.
-
SketchTheCow
Whether it's a replacement or a spare
-
JAA
kiska: Nah, I'd rather not wait a half century before the first file is uploaded. :-P
-
kiska
:P
-
kiska
Or kiskaJD
-
kiska
kiskaJDC*
-
SketchTheCow
Downtime for FOS moved to December 1st.
-
SketchTheCow
And waiting on network guy to make the backup pipeline work