00:30:49 SketchTheCow: I've added a 60 second delay after each WARC upload from ananiel for the moment, let me know if that's not enough. 00:48:54 JAA: What channel was this message you saw on EFNet? Don't see it in my logs 00:50:17 #archiveteam :D 00:50:18 I do 00:53:39 Oh, looks like I was disconnected for 2 days 00:54:25 Whoops 00:55:19 https://server8.kiska.pw/uploads/0c620cb9dc6420d3/image.png Here 01:02:45 Thanks 02:25:40 Is there a way to see if archivebot or similar has backed up a twitter account before I request itm 02:29:59 Not easily (yet). 02:30:37 Just queue itâ„¢ 02:30:46 Or grep -Fi your logs. 02:38:45 I'm assuming not everyone has access to queue stuff? 02:38:53 https://twitter.com/Manga4Congress is the account I was looking at saving 02:44:23 mgrandi: According to my logs, that was saved just recently 02:44:52 Ok thanks 02:45:06 Hence why I wanted to check rather than spam 02:45:21 Do we post in #archivebot to request? Or here 02:45:36 Probably here is better, since #archivebot is cluttered with machine messages 02:46:03 But it looks like you're in #archivebot. Can you search your logs yourself? 02:52:18 i can, although irccloud doesn't have a search all history feature yet sadly >.> 02:53:03 also, i asked this question before, i'm uploading a twitch stream and it has the file extension .ts, should i rename it to be .mp4? i think it is technically .mp4 video 02:55:01 "Codec: H264 - MPEG-4 AVC (part 10) (h264)" for video, "Codec: ADTS" for audio 03:00:11 .ts is normally MPEG-TS, not MP4. 03:01:01 hmm, what i posted is what vlc says, i'll leave it as a .ts file extension then 03:01:52 You posted the video and audio codecs, not the container type. 03:02:37 H.264 can be stored in MP4, MTS, M2TS, MKV, etc. 03:05:38 i just used ffprobe and it says `mpegts`, so you were right 03:06:18 makes sense since twitch stores VODs in like 30 second clips and then this program concatenates them together and other traditional formats probably wouldn't like that 03:06:26 /s/formats/containers 03:06:45 Yep 03:47:26 It would probably be a good idea to archive InfoWars. https://tv.infowars.com/ lists ~11k entries with direct links to video and audio files. We had an AB job for this site (under another domain but serving the same content, http://tv.us-west-1c.infowars.com/ ) which retrieved 1.5k URLs from pravda.infowars.com, which serves the video/audio files, for a total of a bit over 2 TB, and almost 33k such 03:47:32 URLs were ignored later. So as a rough estimate, this is 40-50 TB. However, there are duplicates in here, namely URLs with and without some timestamp (cache buster?) parameter and higher/lower resolution. Still probably very roughly 15-20 TB in total. Unfortunately, the site is fairly low on metadata; for example, the description is 'to be added...' for many episodes from the past year or so. 03:58:07 You're saying that the AB job didn't get them because of ignores? 03:58:26 Well, because of the size, we added the ignore. 03:58:46 Oh 03:59:08 Not going to grab 40-50 TB through AB, *especially* not on the pipeline it was running on, which has had upload issues to everywhere for the past few weeks. 03:59:34 Also, this should probably be in IA items, not in random WARCs. 03:59:55 Perhaps there's better metadata somewhere else. 04:05:29 What's the risk? Is this something imminent? 04:08:22 Not that I'm aware of. 04:08:39 More of a 'we should probably keep a copy of this somewhere' thing. 04:12:44 infowars is 50 tb? what the hecko 04:12:56 Well, looks simple enough, notwithstanding the problem of getting more metadata 04:13:02 Hours and hours of hd video 04:13:18 the guy has been banned from so many platforms i dont know what he is on anymore 04:13:24 that might have more metadata 04:17:32 are you not able to have folders when you upload something to archive.org? 04:34:28 guess you can't, whoopsie 04:35:08 You can. 04:38:49 i'm using the ia command line with --spreadsheet, let me see what column i need 04:40:34 Never used --spreadsheet, but basically what you do is set the --remote-name to a relative path. The directories get created automatically as you upload something into them. 04:41:08 not sure if i should attempt to fix this or if i can be deleted and i start again, i apparently also didn't specify the collection right and its in the 'opensource' collection rather than something more appropriate 04:41:28 You can also `ia upload` an entire directory, which will then end up as a directory in the item. (Internally, it really just uploads each file in the directory with the corresponding path, and the directory itself is recreated.) 04:43:33 hmm , the documentation sorta implies that its only really available with the --spreadsheet option which is what i tried to use 04:43:41 https://archive.org/services/docs/api/internetarchive/cli.html#bulk-uploading 04:47:10 https://archive.org/details/aoc_among_us_twitch_stream_2020_10_20 is what i created, i guess if i could edit it i would just get rid of the 'history' folder and then just have 1 copy of the chat log (the 776770697.* files), and then edit the hashes.json/readme.md to fix the two copies of the chat log not being there) 04:48:38 first upload, i'm bad at this :) 04:54:16 Ah, the history directory, always fun. 04:56:44 `ia delete identifier history/files/foobar -H x-archive-keep-old-version:0` 04:57:05 List each of the files you want to delete there (can be multiple in the same command). 04:58:46 (When you upload a file and there is already a file with the same name/path, the existing file gets moved to history/files to prevent accidental overwrites. This happens even when you delete a file. The special header disables that.) 05:00:16 yeah, i had two files with the same name but in different folders, but i guess didn't specify the remote name so it just overwrote it 05:00:30 i tried deleting them but i think its freaking out cause its still deriving it, it said no such directory 05:03:38 Right, that was actually an issue I reported a while ago, but Jake decided not to read properly: https://github.com/jjjake/internetarchive/issues/266 05:04:21 whoops 05:04:40 I assume the same thing applies on --spreadsheet. 05:04:51 yeah 05:05:13 i'll add this to my list of things to fix, as well as the documentation that leaves out that you can just upload a folder directly 05:13:28 unrelated: now that youtube-dl is back up, did we download all the issues and all that? 05:18:32 JAA: It worked 05:19:10 Right now there's a small set sitting in the inbox due to the way the system worked. I am probably going to force them over into the last item JUST to keep it so something doesn't suffer in limbo. 05:19:14 But it worked. 05:19:25 So feel free to pop something more aggressive over there so we see how it holds up 05:21:08 Forcing the waiting 40gb just so we're clean for the next batch 16:58:51 SketchTheCow: Cool, thanks! Will switch the fastest pipeline to it for now. There's no /pipeline.html or Telegraf on it though, so can't really monitor for out of disk space until it goes boom. 17:10:31 We can approach getting telegraf on it. 17:10:49 Send instructions, I'll try 17:11:09 I have no idea how it works. 17:11:17 Fusl_: ^ 17:12:34 SketchTheCow: what distro and version is it? 17:33:29 Oh, was it fusl 17:33:33 You all look the exact same 17:33:37 hi 17:34:27 Ubuntu 14.04.6 LTS (Trusty Tahr) 17:34:34 ancient technology 17:34:35 :P 17:34:40 let me see what i can dig out 17:34:47 Let me see what they said to me 17:34:50 Maybe it's not that 17:35:17 focal/20.04 17:35:24 It WAS Trusty, now Focal 17:36:17 nice 17:36:20 easy 17:38:41 does it have docker? 18:10:18 -purplebot- ITunesU created by Kyndigs (+1413, Creation) just now -- https://www.archiveteam.org/?diff=45766&oldid=0 19:59:58 SketchTheCow: just in case you havent noticed, i dm'd you stuff 20:14:57 Saw 20:14:59 Off it goes 20:18:05 SketchTheCow: isnt this teamarchive2.fnf.archive.org? 20:18:12 https://atdash.meo.ws/d/000000058/telegraf-detail?orgId=1&refresh=30s&var-includeall=false&var-user=sketchcow&var-host=teamarchive2.fnf.archive.org&var-ival=1s&from=now-2d&to=now 20:19:30 There is teamarchive2 and teamarchive1 20:19:34 ah 20:19:38 I just installed on 1 20:19:42 2's been going for a while 20:20:09 But then JAA got all "blah blah blah uncomfortable shooting hundreds of gigabytes into a potential black hold" 20:20:16 So here we are 20:24:34 :-) 22:42:34 mgrandi: I don't think anyone looked into it yet. I could grab it with qwarc probably, but I don't have Flash, so someone would have to tell me what needs to be grabbed. 22:43:08 Doesn't chrome still have it built in? Or did it get removed 22:43:13 Regardless I'll take a look 22:43:22 I don't know, I don't use Chrome. :-) 22:43:33 Neither do I, but I have it installed at least 22:44:59 Flash however is bugging me and wanting to uninstall itself, it's time is near. o7 22:46:32 SketchTheCow, Fusl: So uh, what's the status? Doesn't show up on atdash as far as I can see. 22:50:27 I mean, my thing thinks it's going 23:28:41 systemctl restart telegraf?