-
FalconK
SketchTheCow: I've added a 60 second delay after each WARC upload from ananiel for the moment, let me know if that's not enough.
-
OrIdow6
JAA: What channel was this message you saw on EFNet? Don't see it in my logs
-
kiska
#archiveteam :D
-
kiska
I do
-
OrIdow6
Oh, looks like I was disconnected for 2 days
-
OrIdow6
Whoops
-
kiska
-
OrIdow6
Thanks
-
mgrandi
Is there a way to see if archivebot or similar has backed up a twitter account before I request itm
-
JAA
Not easily (yet).
-
kiska
Just queue it™
-
JAA
Or grep -Fi your logs.
-
mgrandi
I'm assuming not everyone has access to queue stuff?
-
mgrandi
twitter.com/Manga4Congress is the account I was looking at saving
-
jodizzle
mgrandi: According to my logs, that was saved just recently
-
mgrandi
Ok thanks
-
mgrandi
Hence why I wanted to check rather than spam
-
mgrandi
Do we post in #archivebot to request? Or here
-
jodizzle
Probably here is better, since #archivebot is cluttered with machine messages
-
jodizzle
But it looks like you're in #archivebot. Can you search your logs yourself?
-
mgrandi
i can, although irccloud doesn't have a search all history feature yet sadly >.>
-
mgrandi
also, i asked this question before, i'm uploading a twitch stream and it has the file extension .ts, should i rename it to be .mp4? i think it is technically .mp4 video
-
mgrandi
"Codec: H264 - MPEG-4 AVC (part 10) (h264)" for video, "Codec: ADTS" for audio
-
JAA
.ts is normally MPEG-TS, not MP4.
-
mgrandi
hmm, what i posted is what vlc says, i'll leave it as a .ts file extension then
-
JAA
You posted the video and audio codecs, not the container type.
-
JAA
H.264 can be stored in MP4, MTS, M2TS, MKV, etc.
-
mgrandi
i just used ffprobe and it says `mpegts`, so you were right
-
mgrandi
makes sense since twitch stores VODs in like 30 second clips and then this program concatenates them together and other traditional formats probably wouldn't like that
-
mgrandi
/s/formats/containers
-
JAA
Yep
-
JAA
It would probably be a good idea to archive InfoWars.
tv.infowars.com lists ~11k entries with direct links to video and audio files. We had an AB job for this site (under another domain but serving the same content,
tv.us-west-1c.infowars.com ) which retrieved 1.5k URLs from pravda.infowars.com, which serves the video/audio files, for a total of a bit over 2 TB, and almost 33k such
-
JAA
URLs were ignored later. So as a rough estimate, this is 40-50 TB. However, there are duplicates in here, namely URLs with and without some timestamp (cache buster?) parameter and higher/lower resolution. Still probably very roughly 15-20 TB in total. Unfortunately, the site is fairly low on metadata; for example, the description is 'to be added...' for many episodes from the past year or so.
-
OrIdow6
You're saying that the AB job didn't get them because of ignores?
-
JAA
Well, because of the size, we added the ignore.
-
OrIdow6
Oh
-
JAA
Not going to grab 40-50 TB through AB, *especially* not on the pipeline it was running on, which has had upload issues to everywhere for the past few weeks.
-
JAA
Also, this should probably be in IA items, not in random WARCs.
-
JAA
Perhaps there's better metadata somewhere else.
-
OrIdow6
What's the risk? Is this something imminent?
-
JAA
Not that I'm aware of.
-
JAA
More of a 'we should probably keep a copy of this somewhere' thing.
-
mgrandi
infowars is 50 tb? what the hecko
-
OrIdow6
Well, looks simple enough, notwithstanding the problem of getting more metadata
-
OrIdow6
Hours and hours of hd video
-
mgrandi
the guy has been banned from so many platforms i dont know what he is on anymore
-
mgrandi
that might have more metadata
-
mgrandi
are you not able to have folders when you upload something to archive.org?
-
mgrandi
guess you can't, whoopsie
-
JAA
You can.
-
mgrandi
i'm using the ia command line with --spreadsheet, let me see what column i need
-
JAA
Never used --spreadsheet, but basically what you do is set the --remote-name to a relative path. The directories get created automatically as you upload something into them.
-
mgrandi
not sure if i should attempt to fix this or if i can be deleted and i start again, i apparently also didn't specify the collection right and its in the 'opensource' collection rather than something more appropriate
-
JAA
You can also `ia upload` an entire directory, which will then end up as a directory in the item. (Internally, it really just uploads each file in the directory with the corresponding path, and the directory itself is recreated.)
-
mgrandi
hmm , the documentation sorta implies that its only really available with the --spreadsheet option which is what i tried to use
-
mgrandi
-
mgrandi
archive.org/details/aoc_among_us_twitch_stream_2020_10_20 is what i created, i guess if i could edit it i would just get rid of the 'history' folder and then just have 1 copy of the chat log (the 776770697.* files), and then edit the hashes.json/readme.md to fix the two copies of the chat log not being there)
-
mgrandi
first upload, i'm bad at this :)
-
JAA
Ah, the history directory, always fun.
-
JAA
`ia delete identifier history/files/foobar -H x-archive-keep-old-version:0`
-
JAA
List each of the files you want to delete there (can be multiple in the same command).
-
JAA
(When you upload a file and there is already a file with the same name/path, the existing file gets moved to history/files to prevent accidental overwrites. This happens even when you delete a file. The special header disables that.)
-
mgrandi
yeah, i had two files with the same name but in different folders, but i guess didn't specify the remote name so it just overwrote it
-
mgrandi
i tried deleting them but i think its freaking out cause its still deriving it, it said no such directory
-
JAA
Right, that was actually an issue I reported a while ago, but Jake decided not to read properly:
jjjake/internetarchive #266
-
mgrandi
whoops
-
JAA
I assume the same thing applies on --spreadsheet.
-
mgrandi
yeah
-
mgrandi
i'll add this to my list of things to fix, as well as the documentation that leaves out that you can just upload a folder directly
-
mgrandi
unrelated: now that youtube-dl is back up, did we download all the issues and all that?
-
SketchTheCow
JAA: It worked
-
SketchTheCow
Right now there's a small set sitting in the inbox due to the way the system worked. I am probably going to force them over into the last item JUST to keep it so something doesn't suffer in limbo.
-
SketchTheCow
But it worked.
-
SketchTheCow
So feel free to pop something more aggressive over there so we see how it holds up
-
SketchTheCow
Forcing the waiting 40gb just so we're clean for the next batch
-
JAA
SketchTheCow: Cool, thanks! Will switch the fastest pipeline to it for now. There's no /pipeline.html or Telegraf on it though, so can't really monitor for out of disk space until it goes boom.
-
SketchTheCow
We can approach getting telegraf on it.
-
SketchTheCow
Send instructions, I'll try
-
JAA
I have no idea how it works.
-
JAA
Fusl_: ^
-
Fusl
SketchTheCow: what distro and version is it?
-
SketchTheCow
Oh, was it fusl
-
SketchTheCow
You all look the exact same
-
Fusl
hi
-
SketchTheCow
Ubuntu 14.04.6 LTS (Trusty Tahr)
-
Fusl
ancient technology
-
Fusl
:P
-
Fusl
let me see what i can dig out
-
SketchTheCow
Let me see what they said to me
-
SketchTheCow
Maybe it's not that
-
SketchTheCow
focal/20.04
-
SketchTheCow
It WAS Trusty, now Focal
-
Fusl
nice
-
Fusl
easy
-
Fusl
does it have docker?
-
purplebot
ITunesU created by Kyndigs (+1413, Creation) just now --
archiveteam.org/?diff=45766&oldid=0
-
Fusl
SketchTheCow: just in case you havent noticed, i dm'd you stuff
-
SketchTheCow
Saw
-
SketchTheCow
Off it goes
-
Fusl
SketchTheCow: isnt this teamarchive2.fnf.archive.org?
-
Fusl
-
SketchTheCow
There is teamarchive2 and teamarchive1
-
Fusl
ah
-
SketchTheCow
I just installed on 1
-
SketchTheCow
2's been going for a while
-
SketchTheCow
But then JAA got all "blah blah blah uncomfortable shooting hundreds of gigabytes into a potential black hold"
-
SketchTheCow
So here we are
-
JAA
:-)
-
JAA
mgrandi: I don't think anyone looked into it yet. I could grab it with qwarc probably, but I don't have Flash, so someone would have to tell me what needs to be grabbed.
-
mgrandi
Doesn't chrome still have it built in? Or did it get removed
-
mgrandi
Regardless I'll take a look
-
JAA
I don't know, I don't use Chrome. :-)
-
mgrandi
Neither do I, but I have it installed at least
-
mgrandi
Flash however is bugging me and wanting to uninstall itself, it's time is near. o7
-
JAA
SketchTheCow, Fusl: So uh, what's the status? Doesn't show up on atdash as far as I can see.
-
SketchTheCow
I mean, my thing thinks it's going
-
kiska
systemctl restart telegraf?