00:49:18 -purplebot- Elections/2020 September Swiss votes edited by JustAnotherArchivist (+881, Add some cantonal and local votes) 13 minutes ago -- https://www.archiveteam.org/?diff=45540&oldid=45539 01:06:18 -purplebot- Political parties/Switzerland edited by JustAnotherArchivist (+1105, /* Other political entities */ …) just now -- https://www.archiveteam.org/?diff=45541&oldid=45478 02:04:18 -purplebot- Elections/2020 September Swiss votes edited by JustAnotherArchivist (+25, /* Zürich: Hardturm-Stadion */ …) 11 minutes ago -- https://www.archiveteam.org/?diff=45542&oldid=45540 02:20:03 Re: XR I just added my target 02:45:16 arkiver: Does the XR grab do anything when downloadableVideo is available? 04:16:35 XR going to be a last minute project? Looks like itnl 04:16:46 s/itnl/it 09:06:01 Hi, I archived some websites and want to move it to archiveteam. https://archive.org/details/web-wwwmeconomicsnet https://archive.org/details/web-juscinet https://archive.org/details/web-oldweboverclockzonecom 14:57:48 THat won't happen, you'll be in warczone 15:17:49 OrIdow6^2: yes it gets the downloadable video 15:18:06 kiska: no, finishing up now since tracker is working again 15:23:54 Oh, I see, it's done through the generic URL extraction 15:25:28 Hmmm 15:25:55 I'm going to make another Archive Team "Miscellaneous" collection. 15:26:09 archiveteam-fire is a mess and should be dealt with 15:26:33 update coming up for samsung xr 15:26:47 SketchTheCow: I assume all warrior project (also small ones) will still get their own collection? 15:26:56 In general, yes 15:26:58 https://archive.org/details/archiveteam_soupio&reCache=1 15:27:05 arkiver: Video IDs can have underscores, e.g. https://samsungvr.com/view/Wv_0tcndBOG and https://samsungvr.com/view/hq_vozffUc6 , don't know if you've noticed this 15:27:13 I'm the only one (besides you) who can make collection sets and populate and fix them 15:27:21 And as we've already established, you're lazy and work at half my speed 15:27:41 Just idlin' around, learning some crap degree 15:27:47 Who even USES physics 15:29:40 What's a physic? 15:29:47 no physics would mean no archiveteam :) 15:29:56 It's this high-falutin' way of saying "shit falls over" 15:35:42 its challenging 15:41:51 https://archive.org/details/archiveteam_misc 15:54:24 So much stuff is blowing through the inbox due to the projects, it's sometimes hard to notice these 20-50 items that have been flopping along the bottom of the tank in the torrents 16:09:39 So the mess that is now -fire will become a new mess in _misc? 16:15:47 The mess that is in -fire will split out into collections, or end up in misc. 16:16:35 I can't do much about 1,000 items with one single representative of itself. 16:16:48 But fire was essentially replaced with -inbox 16:17:42 And there's definitely collection-sized sets in there. 16:18:17 Ah, right. 16:18:59 But like https://archive.org/details/archiveteam_singstar_20200207201500_aaaaaaaa 16:19:02 That needs a description 16:21:10 heh, _aaaaaaaa is usually me 16:21:27 i don't remember what that was, tbh 16:22:21 Oh, I know it's you 16:22:31 Always has been. 16:23:33 That was from #singstop https://archiveteam.org/index.php?title=Singstar 16:23:47 'A DPoS project was launched on 2020-01-22 but almost immediately crippled the servers. No further attempts were made after that.' 16:23:55 So that's all we grabbed from that I think, maybe? 16:29:56 I can do a lot of this. It'd be nice to get some help 16:30:16 Obviously I get pulled away a lot. 16:30:25 But I can get things at least somewhat synced. 16:32:09 I THINK I cleared the inbox of everything that isn't an active pipe (twitter and weibo, where the amount coming in means a few dozen are always sitting there, being derived before moving) 16:32:19 how does one create a collection 16:32:23 Be me 16:32:29 well then 16:32:34 i see 16:32:36 how does one become the mighty cow of sketch 16:32:53 Give a talk at an event at the Internet Archive, then walk down into the office and demand a job 16:33:37 that involves being in murica 16:33:41 wanting to avoid that for now 16:33:53 Well, yes 16:34:07 Whatever happened to the location in Canada? 16:34:19 It exists 16:34:23 We don't talk about it 16:34:28 It exists though 16:34:29 Ah :-) 16:34:32 ah one of those 16:34:38 :D 16:34:43 couple of raspberry pi's on a DSL line 16:34:55 Yes, that's how we do things at the archive 16:34:56 s/DSL/dial-up 16:35:04 :P 16:35:05 #bringbackmicrowave 16:35:09 s/dial-up/ham radio with a modem/ 16:35:21 Connected to a pile of 1 GB HDDs. 16:35:36 There are a lot of USB splitters involved. 16:36:00 Connected to 5.25 inch floppies :D 16:36:45 With a human sorting "machine" 16:40:32 samsung xr has just under 6000 videos 16:40:50 perfect lets go 16:41:36 Ok, inbox "fixed" 16:42:02 If it's in there (for the moment), IA is processing it and it'll move once done 16:42:17 arkiver: Are we doing discovery as we go? 16:44:39 no 16:44:41 project is online 16:44:49 So, archiveteam-fire has 62,000 items. 16:44:49 I see EggplantN_d already grabbed a bunch of items 16:44:54 This should be fucking delightful 16:45:08 whoops 16:45:09 lol 16:45:16 doing 2Gbit atm over 2 boxes 16:46:00 For samsung 6k videos seem... small? 16:46:57 odd sizes 16:47:07 EggplantN_d: did you already update scripts to 20200928.01? 16:47:13 yes 16:47:38 ['GNU Wget 1.20.3-at.20200919.01'], 16:47:38 VERSION = '20200928.01' 16:47:38 root@kvm1:/storage/samsung-xr-grab# cat pipeline.py | grep 2020 16:47:38 root@kvm1:/storage/samsung-xr-grab# 16:49:27 scripts updated again 16:49:49 do i need to update? 16:50:00 yeah, and you can abort everything 16:50:20 let's soo how it performs now, I don't think this update will fix the possible problem 16:51:39 requeued 16:51:48 the 10 from the start need readding arkiver if you can 16:51:59 yeah I requeued everything 16:52:20 we're at a nice 10 items/min :) 16:54:08 surely we can go faster than 10/min? 16:54:53 btw we sure there are 6k videos? 17:00:04 EggplantN_d: excuse me, another update 17:00:12 aaaaaaaaaa 17:00:13 kiska: well seems like it yeah 17:00:13 ok 17:00:18 feel free to abort 17:00:34 should be final one 17:01:06 hahahahah I broke the internet archive queue 17:01:10 With photos 17:01:34 also some items might be very large for samsung 17:01:41 SketchTheCow: how did you do that 17:03:04 Well, shoving in 9,000 photos into objects forces the entire copy of the item over while it checks it 17:03:11 So 9000 * 25gb an item 17:03:27 Yeah, looked liked videos had several resolutions, plus some had the downloadable option that let you get a stereoscopic video. 17:03:52 SketchTheCow: the image needs to be derived? 17:04:58 Speaking of breaking IA... I have 15.4k/16.8 GiB WARCs from Joe Rogan's YouTube video comments (two per video). All in one item will be terrible, and one item per video will be terrible. Merging them together is also terrible for accessibility. What's the least bad way to upload these? 17:06:13 JAA: what is 15.4k/16.8GiB 17:06:24 15.4k files totalling 16.8 GiB 17:06:27 ah 17:06:45 So ~2 MB per video on average. 17:06:56 I'd cat them together and upload to single item 17:07:06 maybe keep the originals until deriving went well 17:07:16 Yeah, but then accessing a single video's comments will be horrible. 17:07:22 I don't expect this to work well in the WBM at all. 17:07:30 This all sounds terrible 17:07:33 you can do a range request on the WARC 17:07:41 What have you do 17:07:42 using the data from the CDX 17:07:43 ne 17:07:52 you fool, you've killed us all 17:07:57 yeah he did 17:08:13 CDX won't help because the video ID is not in the comment API URLs, only an opaque continuation token. 17:08:19 YouTube is fun. 17:08:41 keep some metadata file with references between URLs and video IDs? 17:08:51 Yeah, I guess so. Basically megawarc then. 17:08:56 15.4k files in an items is really not a good idea 17:08:58 yeah megawarc 17:10:12 Megawarc it, and if you're concerned about the conversion, make a second item with a simple cat like a .tar or something, with a "use sometime" introduction and note. 17:13:40 Alright, sounds good. 17:19:29 EggplantN_d: crap another update coming up 17:19:44 looks like samsungvr sometimes gives 404 for m3u8 files :/ 17:19:53 will need requeue? 17:20:12 yeah 17:20:16 this site sucks as well 17:20:18 even items that are done? 17:20:19 😦 17:20:31 yeah 17:20:40 set 03 to min version anyway on tracker 17:20:46 all status codes except 200 and 302 will yield an error now 17:20:50 *043 17:20:53 **04 17:24:26 So, I've gone ahead and made a choice wrt the archiveteam-fire stuff 17:24:40 Anything with identifier warc-* is going into a collection. 17:24:54 https://archive.org/details/archiveteam_earlywarcs 17:25:09 and I assume in the archiveteam collection? 17:25:58 EggplantN_d: all updated *again* 17:26:07 non 200 or 302 codes are bad now 17:26:48 have you requeued? 17:26:53 also do the 66 done need redoing? 17:27:20 i've requeued out 17:28:20 everything is requeued again 17:28:30 what is it lately with sites that suck 17:28:34 tencent as well 17:28:37 not sure :9 17:28:44 hows google sites lol 17:28:46 is that as bad 17:35:16 JAA: What's the correct sever for the Efnet channel? 17:35:22 server* 17:35:51 paraphysics of choopa? 17:35:55 or* choopa 17:36:32 I'm trying to duplicate the hexchat configuration I have on one machine, to another one. and for some reason I can't connect 17:39:00 arkiver: Does this download the stereoscopic video for 3D videos when downloadableVideo is enabled? We were discussing a few days ago that the web viewer doesn't offer stereoscopic. 17:39:23 lennier1: do you have an example URL? 17:39:54 I can find one in the logs. 17:42:27 https://samsungvr.com/resource/item/download/hq_vozffUc6 17:42:52 lennier1: Yes 17:43:25 Wait, misread that 17:43:56 HP_Archivist: None because we're moving soon. :-P Pick any from http://www.efnet.org/?module=servers or use the generic irc.efnet.org if you're not in too many channels. 17:44:24 Still yes 17:44:35 lennier1: yes 17:44:43 or OrIdow6^2 answered yeah 17:44:57 (I read the "misread that" only :P) 17:45:02 JAA: Yeah, I PM'ed you with an error I keep getting 17:45:15 OK, good to know. :) 17:52:28 final items for naver have been fixed, and queued 18:06:38 https://archive.org/details/archiveteam_earlywarcs 18:11:41 That'll be 13,000 items or so 18:22:18 -purplebot- Wikipedia edited by M.Barry (+95, /* External links */ Adding link …) just now -- https://www.archiveteam.org/?diff=45543&oldid=45537 18:39:41 Is youtube deleting the community captions today? 18:40:36 kinda 18:52:08 Community captions that were submitted, but never approved by the video owner, I believe. 19:27:52 There was a non warrior project for it, should I turn my workers onto that I guess? Since tencent weibo is dead now 19:40:58 themadpro was looking for help on it earlier. https://github.com/Data-Horde/ytcc-archive 19:41:20 no need to spin up more they are capped at 2k/min 19:41:42 Current project is https://tracker.archiveteam.org/samsung-xr 20:09:46 https://archive.org/details/archiveteam_samsungxr 20:25:03 more is coming up, no worries :P 20:25:22 more what >_> arkiver 20:26:41 any ip limit on samsungxr? 20:26:51 no, just slow due to file sizes 20:27:00 so dont go insane 20:27:04 k thanks 20:27:06 20/min limit set via tracker also 20:27:18 ah 20:27:33 targets are a bit toasty but even with that eta is ~5 hours to finish samsung XR so I'm not fussed about doing anything 22:07:45 From EFnet #warrior: 21:23:27 < n00b__> Hi, there's this Estonian image host that will be closing very very very soon, I am certain it will come with data loss. What can I do to help archive it? Here's the site: http://fotoalbum.ee/ 22:09:58 1st of october 22:10:47 aaaaaaaaa 22:10:50 another juan 22:13:17 Too much for AB? 22:13:46 found a way to get all images 22:14:45 for URLs like https://fotoalbum.ee/photos/MagusKiisu/112474281/ you need both username and ID 22:15:19 but URL http://fotoalbum.ee/popup.php?type=share&pic=112474281 will give you the link with user while only having photo ID in the URL 22:15:26 IDs are sequential 22:15:41 so I guess we'll go through 100+ million IDs 22:16:07 Nice 22:16:30 Also seeing some links in the upper-left, like album.ee. Seems like they're all closing 22:18:00 nicely noticed 22:18:00 yep 22:18:27 projects coming up 22:18:42 I'm gonna throw the social medias and such in AB 22:19:14 full script project arkiver? 22:19:29 EggplantN_d: not sure if I get what your question, but "yes" 22:19:41 jodizzle: thanks, please also throw in the main websites, so we get the main pages 22:19:52 Got it 22:20:22 you can go !a and cancel after some time (or ignore photo pages/URLs) 22:20:34 it's so we get any FAQs, announcement, etc. ('special pages') 22:21:52 ah great 22:22:07 Yeah, I'll ignore the photo pages if we successfully get them by warrior 22:23:18 -purplebot- Deathwatch edited by JustAnotherArchivist (+137, /* 2020 */ Add Fotoalbum) just now -- https://www.archiveteam.org/?diff=45544&oldid=45538 22:24:45 I guess because it's different website it should be different projects 22:24:58 also in structure different website 22:24:59 channel? 22:25:03 and different names/logos/etc. 22:25:10 no idea Kaz :P 22:25:51 fotoalbum-eek 22:26:01 fotooff 22:26:18 fotoff seems like we've had something similar in the past 22:26:26 same with any sort of fotofail-type name 22:26:48 whats the length limit on hackint 22:27:19 ah yes, I'm allowed #lookatthisfotoalbum, only problem is i hate it 22:27:51 #lookatthisfotograph surely 22:28:11 https://www.youtube.com/watch?v=sIlNIVXpIns 22:28:24 i can hear it in my head. why would you do this to us 22:28:29 hahahahaha 22:28:34 well yeah, but the site _is_ called fotoalbum 22:29:09 but for the meme kaz 22:29:14 yes ik 22:29:18 sod it, lets go for #lookatthisfotograph 22:29:23 navergonnagiveyouup 22:31:03 https://www.youtube.com/watch?v=aANF2OOVX40 youtube recommended takes you to strange places 22:34:11 lennier1, Doranwen: have you guys finished downloading my torrent? (i want to move the data back to cold storage) 22:36:47 thuban: what is in the torrent? 22:37:52 arkiver: gmds from the yahoo groups project that were sitting inaccessible on marked1's target 22:38:09 We have a channel for that, no? 22:39:18 thuban: you can upload the torrents to IA, if it's not too many individual files 22:39:28 and yeah JAA 22:39:35 #yahoosucks 22:39:39 oh they are both in it. i assumed it would be deserted at this point 22:41:08 Speaking of channels, do we want one for Samsung XR? I suggested #sandsung yesterday, and even though nobody responded to that, there are 8 people in it now (including myself). 22:43:26 it'll be over in a few hours JAA 22:48:04 arkiver: i'll pass for now due to the unresolved privacy issues--iirc betamax had some plans for sorting/processing that data, but i don't know whether any progress was ever made. (i guess further discussion can go back to #yahoosucks) 22:48:19 i see 23:03:45 so #lookatthisfotograph for all the photo/video sites 23:03:50 the .ee sites 23:08:57 as long as we're talking about new projects: is the warrior still going to be supported in the future, or are we going over solely to docker containers? 23:09:00 i've upgraded from -3 to the very recent -3.1 and run the warrior-extras-installer, but my warrior still complains on e.g. samsung-xr of "No usable Wget+At found" 23:09:58 yes and no 23:10:06 we'd like to get the warrior to a place where it's a little more.. stable 23:10:16 but nobody really has the time to look at it properly 23:15:05 ahh, i see 23:15:11 i wish i could help more--i'm more of a programmer than a sysadmin, but if anyone has scutwork that needs done (documentation?) hmu 23:15:36 (i realize that individual machines are a drop in the (bit)bucket compared to the mass cloud deployments people have been doing, but when a project comes along we can't finish, every drop matters, right?) 23:15:52 exactly! 23:15:58 every drop absolutely matters 23:16:20 if we can't finish something in time, every URL you archived is saved by you for eternity :) 23:19:12 it'd be super nice to have the warriors in a workable state again 23:19:28 we're a lot better at managing uploads too, which was a pain point in the past 23:20:15 we're finally getting around to having a variety of targets setup ready 24/7 23:20:27 then the warrior is next likely 23:23:37 sounds good :)