00:08:33 potentially. There are only two lines in my MV-Errors.log file. 00:08:33 gm26791 (MV): 'animation1Name' 00:08:33 gm25516 (MV): Unterminated string starting at: line 24 column 1489 (char 94827) 00:09:38 ah ok 00:10:05 26791 looks the same as my 17614 00:10:10 not sure about 25516 00:17:23 K, ive split it into a lot (136) of smaller jobs and am running them in parallel. Its still slower but it is vastly improved. 00:17:44 nice 00:29:31 Wickedplayer494 edited Surrender at 20 (+367, SS is paying attention to the looming expiry…): https://wiki.archiveteam.org/?diff=50041&oldid=49232 01:28:00 thuban: commit f14a67e fix 8298. I'm doing it. 01:32:38 Looking for someone to put https://transfer.archivete.am/OrLXQ/api_requests.txt.zst (900 Game Atsumaru comment / scoreboard API requests) quickly before 3AM UTC. 01:32:54 i.e. in 1.5 hours. 01:33:04 s/900/900K/ 01:33:56 you say 'requests'; are these just urls that can go in archivebot? 01:34:57 Yes. they're GET request URLs without session requirement. 01:35:58 i see you've asked in #archivebot, so someone will get to it shortly 01:39:10 job id: askbyh40bwvdmpjqb5v0l3por 01:55:52 yzqzss|m: mv 22500-24999 ok? 01:57:49 Summary of #archivebot: AB is far too slow for this, but I'm grabbing them with qwarc instead, which does more brrrr. 01:58:27 is that wbm certified or nah? 01:58:27 Almost all responses are empty, but I guess that's expected. 01:59:08 Where empty = {"meta":{"status":200},"data":{}} 02:02:09 I just ran into a couple thousand 400s, random example: https://api.game.nicovideo.jp/v1/rpgtkool/games/gm10857/scoreboards/25.json 02:03:14 "boardIdが範囲外です" == "boardId is out of range" 02:05:34 "yzqzss: mv 22500-24999 ok?" <- it's ok 02:11:01 Those invalid URLs slow me down significantly due to how qwarc works on errors, but it should still finish in time at the current rate. 02:46:20 It's done. 02:49:56 FireonLive edited Egloos (+172, Egloos is offline. Data been partially saved.…): https://wiki.archiveteam.org/?diff=50042&oldid=49938 03:00:31 lets see if minor edits show up too 03:00:58 FireonLive edited Egloos (+50, tweaks, add lead): https://wiki.archiveteam.org/?diff=50043&oldid=50042 03:01:00 so it do 03:01:12 can't get anything past h2ibot 03:01:43 Edits in the User and User_talk namespaces get ignored, but otherwise, no. 03:01:53 ahh ok :) 03:02:13 also, curse you mediawiki for making every first letter capital :p 03:08:37 yts98: did you get 9445? it still errors out for me 03:08:44 (after pulling, i mean) 03:09:23 thuban: not able to get 9445 yet 03:09:48 ok. sorry, wasn't sure what strikeout meant 03:13:37 DDL was reached and the front-end pages are HTTP 30X to 03:22:34 strikeout means I got it myself 03:23:59 ? 9445 was struck out, did it work or no 03:24:22 re IRL; it's already past 12pm PDT 03:24:27 hm 03:24:34 (from #archiveteam) 03:25:33 seeing 500s for profiles and 404s for post links nulldata: https://www.irl.com/meg-myers/bTduA5Al5N 03:26:12 where's what sketch quote about sites losing as much data as possible 03:26:24 what→that 03:28:31 I like this Foone quote: https://twitter.com/Foone/status/1146136479573299200 03:29:18 ah! yes it was foone not sketch :) 03:29:21 The Game Atsumar's api endpoint has also been redirected. 03:29:21 To prevent wget from accidentally overwriting the previous warc files, don't run any step6-* script now. 03:30:06 * thuban nods 03:30:11 where are we sending files? 03:34:53 I don't know if we should create an IA project and upload to it respectively, or transfer all data to one place to generate magawarcs first. 03:51:37 So I'm looking for help from volunteers experienced in running staging servers. 04:01:17 yts98 - you would want to speak with JAA_ and arkiver_ regarding this. They can assist in getting a formal project put together 04:02:49 datechnoman: the time for that is over, site is already dead. we're just talking about how to get manually collected data onto ia in a convenient format 04:06:09 Yts98 edited Niconico (+228, Game Atsumaru was down): https://wiki.archiveteam.org/?diff=50044&oldid=50015 04:06:10 FireonLive edited Current Projects (-447, move Egloos to recently finished, remove…): https://wiki.archiveteam.org/?diff=50045&oldid=50024 04:09:10 Yts98 edited Current Projects (-156, Game Atsumaru finished): https://wiki.archiveteam.org/?diff=50046&oldid=50045 04:13:21 Ahh gotcha no worries at all. Apologies! 04:13:55 Should we add whatever Android app textfiles is referring to to the Deathwatch list? :P https://digipres.club/@textfiles/110619602655711553 04:14:54 datechnoman: none needed! that's probably whom i'd ask about how to format an ia item anyway 04:19:40 For sure. The experts for sure :) 07:03:19 company acquiry https://www.networkworld.com/article/3700616/cisco-to-buy-network-monitoring-firm-samknows-for-better-last-mile-visibility.html 07:03:49 two actually 07:09:56 https://techcrunch.com/2023/06/26/decentralized-social-networking-app-damus-to-be-removed-from-app-store-will-appeal-decision/ 07:43:49 https://www.hollywoodreporter.com/business/business-news/siriusxm-to-shut-down-stitcher-podcast-app-1235524707/ 13:15:00 yts98: how many WARCs is this? 13:19:53 Jack Thompson edited Deathwatch (+440, Added Showbuzz Daily and IRL): https://wiki.archiveteam.org/?diff=50047&oldid=50035 14:01:22 arkiver: each game produces 1~4 WARCs. I have 12556 files in 62GiB, but other operators including thuban_ and 3 STWP members have claimed more IDs, so they will have more. 14:06:02 We also keep the files besides WARCs. Since the script overwrites WARC generated by previous runs, we cannot guarantee that every crawled file will appear in the WARC. 14:06:11 yts98: feel free to upload to IA as items with 1000 WARCs each 14:06:22 note that this is only okey in this case, other cases maybe not 14:29:43 yts98: actually, each of these games, since you also get them outside of WARCs. are they playable somehow easily with the files you archived? 14:29:56 for example like an index.html you can run and then the game plays, or similar? 14:35:15 I'm sad to find out that some major browsers reject localhost CORS, so some games should be uploaded to an http server like WAMP to be playable. 14:39:16 got it 14:39:43 arkiver I got almost 3000 warcs, ~60gb. plus the extracted assets yts98 mentioned. The assets are ~67gb and 260,000 files. 14:41:34 alright 14:42:14 yts98: let's do the following 14:42:39 each individual game can be upload in it's own item with mediatype=software 14:43:41 perhaps identifier game-atsumaru-ID, or atsumaru-ID, whatever is more appropriate 14:43:53 and with proper metadata 14:44:30 and, the WARC can be uploaded separately into items with 1000 WARCs in each item 14:44:33 does that sound good? 14:47:28 STWP also proposed game-atsumaru-warc-{game_type}-{range_start}-{range_end} . 14:47:29 Should we schedule each item to contain nearly 1000 WARCs, or use a predictable range as the index? 15:20:39 yts98: let's say max 1000 WARCs per items, how many items with WARCs would you create in your 'predictable range'? 15:47:46 arkiver: Oh, I see: if each game generates an average of 3 WARCs, then 30 items with WARC will be created; If I conservatively map 250 game IDs to an item, there will be 119 items. 15:49:05 yts98: ah that is no problem. let's say max 1000 items with WARCs, each max 1000 WARCs 15:49:28 and the games files for each individual game can be uploaded to a separate item 15:54:15 Sound good. 15:54:26 I wonder if it's possible for two IA accounts to upload files to the same item? If not, is it safe to share S3 access keys with others? 15:54:56 why do you need two people uploading to the same item? 15:55:00 the WARCs? 15:55:10 IA does not recommend sharing keys 15:58:38 Got it, so each operator should decide on the identifier separately. 15:58:56 hm 15:59:15 does IA use AWS-like authentication and signing? 16:06:50 nicolas17: 16:14:20 afterwards flashpoint should be crosschecked, too. (flashpoint does html games, too so its worth to get the games into that, too). having them already grouped at the IA makes that work easier since no filehunt needed anymore 17:00:43 Initial Knowledge Adventure CDN download finished just over 12 hours ago, as predicted. I'm now relisting the bucket and will grab anything that was added in the past couple weeks. 17:01:28 32.23 TiB downloaded into 3.22 TiB of WARCs :-) 17:06:31 There were 65907 403s, i.e. files which appear to be inaccessible. 17:08:34 Also 14662 404s which I need to deal with later. Many of those are 'directories', which aren't accessible through the media* domains. Some are because I forgor to encode stuff correctly. 17:11:38 Is there a better way to see how long your worker has been at the same task than viewing the folder creation time in the data folder? 17:17:02 On the warrior web page there's an elapsed time at the bottom right of each task (below the log) 18:06:53 thanks, I should have specified in the docker images 18:07:24 Container logs would be another option, but otherwise, not really. 18:07:54 Well, I guess you could check when the wget-at process was started, also. 18:14:25 ok, had a long running task and was curious how far back it went... thanks for the proc runtime suggestion. Hadn't considered doing it outside of the container 18:32:40 https://archive.org/details/archiveteam_tracker_2020 18:32:44 i can't say i'm not surprised 18:32:47 but nice 18:33:50 ah it's not official-dicial 18:45:23 hello, world! 18:45:53 has digitize.archiveteam.org been discontinued? "This server could not prove that it is digitize.archiveteam.org; its security certificate is from internetarchive.archiveteam.org. This may be caused by a misconfiguration or an attacker intercepting your connection." 18:46:08 (ref: https://wiki.archiveteam.org/index.php/Storage_Media) 18:52:25 I've never heard of that project before, hmm 18:53:05 It might have become http://fileformats.archiveteam.org/wiki/Main_Page ? 18:53:08 if you bypass the security warning it shows "Hello archive team.org!" 18:53:29 hmmm i don't think so 18:53:36 this one was "a wiki dedicated to digitizing many types of storage media including paper media, CDs, tapes, video, slides, and floppies" 18:53:45 more on the ingest side 18:54:09 hmm, some fairly dubious recent captures having captchas and stuff: https://web.archive.org/web/20230301000000*/https://digitize.archiveteam.org/ 18:54:11 though there is a 'file creation software' on there 18:54:18 but, https://web.archive.org/web/20190212144505/http://digitize.archiveteam.org/index.php/Main_Page does look like a different project 18:54:49 http://succeed-project.eu/wiki/index.php/Main_Page seems to be dead too :| 18:55:01 DNS points to 213.184.85.58 which has a PTR of archiveteam.org (which can be set by anyone with no validation but it's a hint at least) 18:55:19 run on a 'Hosting4Real' server 18:55:19 hm 18:55:40 not that it is an emergency or anything i was just curious 18:55:46 archiveteam.org resolves to the same IP 18:55:54 i saw it and was like ooh ingest methods 18:56:01 nicolas17: ah! thanks 18:56:05 There is a somewhat old export at https://archive.org/details/wiki-digitizearchiveteamorg (which is incorrectly in warczone when it should be in wikiteam it seems) 18:56:26 based on https://web.archive.org/web/20210329052052/http://digitize.archiveteam.org/index.php/Main_Page I don't think it ever really got much content 18:57:38 as does wiki.archiveteam.org; so maybe something jrwr ran? 18:58:04 ooh that logo on Main_Page great though 18:59:22 ye there's a few interesting pages on there 18:59:26 but ye 18:59:32 neat though :) 19:05:29 i pinged jason about it 19:12:35 ah! thanks 19:12:51 it's been off since last year 19:36:55 yts98: it might be easier to keep things consistent if one person consolidates + uploads 20:19:10 FireonLive edited Tiki (+782, The torches have been blown out): https://wiki.archiveteam.org/?diff=50048&oldid=50023 20:20:10 FireonLive edited Tiki (-27, fix infobox syntax): https://wiki.archiveteam.org/?diff=50049&oldid=50048 20:50:32 I've no issues with sending everything onto someone else for upload. 23:29:20 good afternoon good afternoon 23:29:29 more fires for this hellscape we call earth 23:29:35 National Geographic lays off its last remaining staff writers: https://www.washingtonpost.com/media/2023/06/28/national-geographic-staff-writers-laid-off/ 23:31:17 earth? life. 23:33:18 seems like their online edition needs an account to 'see more' https://www.nationalgeographic.com/ 23:40:38 sigh. and of course the wapo article uncritically repeats the narrative that it has "fallen on hard times" in the same breath as stating it's the most-read magazine in the US. 23:40:48 anything to not call out what it really is, I guess