00:00:22 looks like http://www.random.fora.pl/ itself isn't listed on there, but that's reasonable since it's not "supposed to" be an actual forum presumably 00:00:56 http://www.icewarriors.fora.pl/ well the random forum isnt very good 00:02:24 it looks like the catalog in the sidebar orders them by the number of posts, which is also good for getting an idea of how big some of them are. 00:03:40 This is probably a situation where !a < list would be nice if it weren't completely broken :| 00:03:41 the list of the forums should be also obtainable by scraping the "katalog" 04:14:54 I'm looking for an operator with ArchiveBot access to save the list of games I provide for Game Atsumaru. 04:14:54 The lists I want to archive are data/public.txt, data/public_payment.txt, data/key_valid.txt in https://github.com/yts98/game-atsumaru-discovery. 04:14:54 This site will close in 47 hours. 04:18:32 https://ch.nicovideo.jp/indies-game will be taken down together definitly, and https://atsumaru.github.io/api-references/ https://github.com/atsumaru are possibly in danger. 04:19:25 yts98 3tt38940863qg137qhybq2xbe see this job on the dashboard It is already running 04:20:58 and yts98 #gitgud for Github stuff 04:21:12 Got it, thanks for your help! 04:21:29 I dont know how to throw in those lists you provided or if I need ops to do it 04:22:12 Granted its at concurrency 1 because I cant actively babysit it. but it should run through a lot 04:22:57 Also, I'm sweeping its API to discover dynamically loaded danmaku comments and scoreboards. I would have an API url list later. 04:23:09 I am gonna try doing the lists you gave us too 04:24:22 data/key_valid.txt are unlisted games found on Google. 04:24:22 And also, I'm doing static analysis for the game to discover lazily loaded resources, 04:25:24 but they're guarded by the session cookie: https://wiki.archiveteam.org/index.php/Niconico#Game_loading_mechanism 04:26:27 so when I have the resource lists, they cannot be digested by Archive Bot of WBM SPN directly. 04:28:51 so jobs e83a218c9a0dff1eac9a5c1f641eee49 9301dfd206dc1b81e97b8593564d63e9 will be all 404 04:47:11 Ever since I posted like 100 links in the telegrab chat this happened https://imgur.com/a/ePYCcur 04:47:39 I completely broke the chat. It's on matrix too, so leaving the room didn't fix it 04:55:34 oh no 04:59:09 there's a 'clear cache and reload' button in the advanced section of settings (bottom) but i'm not entirely sure what it does other than what it sounds like (clear the entire cache/room states for all rooms on all servers) 05:01:40 this never happened, but a comment on https://github.com/vector-im/element-web/issues/5800 says "initial sync (same thing the clear cache & reload does)" so i mean it could take a long time depending how you use matrix (and depending on hackint retention you could lose older messages) so perhaps look for a solution from elsewhere before touching 05:01:40 the big red button 05:03:32 yts98: you may want to use grab-site (https://github.com/ArchiveTeam/grab-site/) or just wget to handle the resources gated by the session cookie--both can accept cookies and output warcs 05:03:37 said warcs won't be whitelisted and can't go in the wbm, but you can still upload them to the internet archive for safekeeping 05:05:22 (the wbm can't play back post requests, so the games wouldn't work through it anyway) 05:06:37 thuban: I'm currently using my scripts to let wget produce warcs, but my bandwidth is relatively limited :/ 05:13:41 i have to go afk for now, but if (it's necessary and) you upload the details of your wget process to your github repo, i will help save stuff on my fat pipe tomorrow 05:14:28 :3 05:15:13 I'll look into it. hackint.org works fine, and I never really browsed telegrab 05:24:31 someone get these for me thanks https://transfer.archivete.am/BXpPB/honkaiwiki-newlinks.txt https://transfer.archivete.am/S8TgV/honkaiwiki-newfiles.txt 05:29:36 vokunal|m: kk 05:30:07 might even be able to have fun with sqlite ™ or something (but ya know be sure to backup) 06:34:11 tartarus needs to put his stuff on downthetube, not mediafire. I've been silently crying since I saw them join it, knowing they'd eventually cement a spot out of the top list forever, and now they've finally passed me by .01TiB 10:24:12 Yts98 edited Niconico (+708, Saving Game Atsumaru): https://wiki.archiveteam.org/?diff=50014&oldid=50006 15:17:08 Switchnode edited Niconico (+13, /* Game Atsumaru */ correct archiving type): https://wiki.archiveteam.org/?diff=50015&oldid=50014 15:17:45 yts98: back, ping me if you want me to run something 15:31:17 thuban: you can run step6-MV_iterate_.py on github.com/yts98/game-atsumaru-discovery at first. 15:31:17 you can specify the gameId range, and it will find game resources for 61% of the games. 15:45:41 yts98: is there an id range i should choose to avoid duplicating your work, or have you not run this step yet? 15:47:23 I only run a little (3~190), so you could archive all the RPGMaker MV games. 15:50:14 I'm going to analyze RPGMakerMZ and EasyRPG, and I'm still looking for many volunteers to analyze other game frameworks, or find scraping tools off the shelf 15:50:52 off the shelf scraping tools for this type of stuff can be difficult 15:51:12 especially when it comes to archiving work where one really needs to make sure all required URLs are preserved 15:52:08 Agree. other games use Akashic Engine, TyranoBuilder, Unity and others. I guess there may be some tools for Unity? 15:57:23 yts98: ok, running 16:14:13 yts98: i'm seeing a lot of 404s on some games (eg 191); is this correct or is there a problem with the script? 16:15:28 it's correct because some creators removed resources but not removed their reference from the script 16:16:59 ok, good 16:18:56 the script you're running is also doing plugin statistics across the games, in order to find out rarely used plugins that may point to more resources. 18:52:11 yts98: i am working on ids 191-6500, but my provider is having some storage issues i need to work around, so if you or anyone else wants to grab another id range, that would be helpful 18:53:14 (unfortunately i think the bottleneck here is not client-side bandwidth; 12x parallelized i've gotten ~230 games down so far) 19:11:42 in other words - site is too slow? 19:12:35 not sure, suspect that python/wget could also be more efficient 19:43:22 yts98: error on game 262: https://transfer.archivete.am/qVG89/atsumaru_262.log 19:45:52 ditto 4053 19:46:18 and 5047 19:47:40 5533 has a different error due to a bad start byte: https://transfer.archivete.am/DclPO/atsumaru_5533.log 19:59:01 ditto 1695 19:59:27 (just dumping the ids here so i know what i've skipped) 20:03:39 4078 has the dict error 20:11:33 hello is anyone on 20:25:02 1116 has a different error: https://transfer.archivete.am/2uLFe/atsumaru_1116.log 21:07:25 tiki.video is shutting down tomorrow https://techcrunch.com/2023/06/11/tiki-india/ 21:09:08 what 21:09:38 how many hours do we have? 21:09:51 eef this articles was posted June 11 and we didn't know 21:10:37 Oof 21:10:39 and we didn't know about this either https://techcrunch.com/2023/02/24/xiaomi-zili-app-shutdown/ 21:11:39 rewby: i hope you are still around 21:11:43 we have a deadline for tomorrow 21:11:52 can someone please figure out how many hours we have left given timezones? 21:12:07 21 hours 21:12:08 > We regret to inform you that Tiki will be shutting down its operations. As of 11.59 PM India time, June 27, 2023, all Tiki functions and services will cease 21:12:10 oh crap 21:12:12 alright 21:12:33 rewby: we need an urgent target 21:12:38 archiveteam_tiki_ 21:12:39 tiki_ 21:12:42 Archive Team Tiki: 21:13:50 let's make a channel 21:13:54 i have no ideas 21:14:02 we'll get at least metadata 21:14:31 tzt: how did you find out about this? i wonder how we can better monitor for this in the future 21:15:54 arkiver: shutting down OR "closing down" AND site OR service OR server after:2023-05-30 21:16:21 tiki'd off ? 21:16:56 Ah, India Time, one of those lovely half-hour-offset-because-fuck-you-that's-why time zones. 21:17:21 2023-06-27 18:29 UTC 21:17:27 oh boy 21:17:34 tikingbomb 21:20:53 * arkiver is rushing something together 21:21:12 blegh much of this is POST requests 21:21:17 i've thrown some stuffs into ab 21:21:34 it aint much, but it's honest work 21:22:51 I have a German feces and football pun, but that's not a good idea. :-P 21:23:24 I like tikingbomb. 21:23:43 JAA: please do tell 21:23:49 tiki-kacka 21:23:55 google says lei'dback 21:23:57 lol 21:24:08 JAA: i'm for tiki-kacka given this SHITTY timing 21:24:14 lol 21:24:15 ticking shit bomb 21:24:16 arkiver: interesting thing that ab spotted: a load of videos can be found via sitemap 21:24:29 Barto: very nice! 21:24:31 I mean, I'm fine with that. :-P 21:24:33 wait what’s the football part in this? 21:24:39 #tiki-kacka 21:24:58 manu|m: Tiki Taka is a football play strategy thingy. The Spanish national team is well-known for it. 21:25:23 also titicaca lake is a thing 21:25:31 almost sounds the same 21:27:00 JAA: oh okay i just don’t know enough about football then, i can live with that. thought for a moment I was stupid or sth ;) 21:27:34 🏈 this kind, right? ;) 21:27:41 :D 21:28:00 No, football, not handegg. :-) 21:28:28 :3 21:28:35 ⚽ there ya go 21:46:24 datechnoman: we might need you at #tiki-kacka 21:53:48 arkiver - Joining channel now 21:56:25 thuban: my new commit b2cf0ff fixs game 262 1116 5533. going to deal with 4078. 21:59:49 yts98: thanks! 4078 is the same error, the other one was 5533, 1695, and now 2755 22:00:27 (did you mean 5047?) 22:03:14 thuban: 1695 5047 also work now 22:03:27 JustAnotherArchivist created Tiki (+471, Created page with "{{Infobox project | URL =…): https://wiki.archiveteam.org/?title=Tiki 22:03:28 JustAnotherArchivist edited Deathwatch (+20, /* 2023 */ Add Tiki): https://wiki.archiveteam.org/?diff=50017&oldid=50005 22:05:48 yts98: i assume i should keep my local copy of data/MV_plugins.json, but what about data/iterate/urls_gm*.txt ? 22:08:06 nvm, appear to be identical 22:08:08 I thought the url list data/iterate/urls_gm*.txt could be sent to operators trusted by IA, but you can freely delete them because urls can be derived form warcs 22:08:33 do those urls work without the session cookie? 22:08:36 s/form/from 22:09:09 no, they must be fetched with session cookie 22:09:38 and I wonder whether the session cookie is binding with IP 22:52:58 yts98: https://transfer.archivete.am/L5IEb/atsumaru_703.log 23:08:13 thuban: commit 2e4fc6c fix 703. 23:10:03 there is new step6-ER_iterate_.py that can be run in parallel for 299 RM2000/RM2003 games. 23:15:52 oh no, data/tmp conflicts. I'll fix it. 23:45:48 wget url list file conflicts fixed (for Atsumaru). 23:47:38 i'm elbow deep in the storage thing, so you probably want to run ER yourself 23:58:04 thuban: got it. start running ER and start analyzing RMMZ. 23:59:48 Ryz edited Deathwatch (+104, /* 2023 */ Add Microsoft Language Portal): https://wiki.archiveteam.org/?diff=50018&oldid=50017