01:45:17 If anyone has an account at the Supercell forums, please get in touch with me. There is a lot of content that is apparently available only to registered users (but without further restrictions like manual approval or whatever). I'd like to archive that as well, although it probably won't go into the WBM. 07:16:02 @thuban on the rthk podcasts: a better source for metadata than podchaster is the wayback machine copies of the RSS, it has more links than the podchaser dump I posted earlier. 07:16:47 I think I saved away the RSS files somewhere, but I can't find them now. They should be pretty easy to get. 07:19:40 I threw all the links into jdownloader for open line open view and backchat from the RSS and my podchaser dumps, and have already downloaded it all, though I'm not sure what to do with it. I suspect there's still gaps for open line open view, but I've yet to check, and I don't think they're too big. 07:23:11 I was able to download stuff back to Jan 2014. It seems like they implemented their 1000 link limit only a couple years ago, so once you get there the RSS will span back to 2014. 07:23:17 There's RSS from before 2014, but it looks like they reorganized things and the links no longer work as-is. 07:24:34 The files may still be out there somewhere on a different server, but none of my guesses yielded anything. 07:35:47 my one reservation with getting metadata directly from the rss feed (i've been getting it from the rthk page for each episode) is that sometimes the descriptions are truncated--will have to poke around and see whether that's a problem in this case / whether i can get it from other sources (wbm copies, podcast scrapers) 07:35:57 got a link for some of the pre-2014 urls? 09:22:22 https://atdash.meo.ws/ requires a log in now? 09:22:33 Uh... how do I sign up? 09:32:57 I guess its juat for core team members (sorry if you are one :) ) It got overloaded at some last project. 10:08:50 I've been told it's broken 12:22:45 "Now more than ever we need surveillance camera man. ... Unfortunately youtube has repeatedly deleted his videos over the years and we are currently in such a period" https://news.ycombinator.com/item?id=27904820 12:23:50 aw, none are available on youtube, only on vimeo 12:24:57 his channel is up though, could they be unlisted? 12:25:09 perhaps 12:25:25 I did read about them making lots of videos unlisted recently 12:26:47 something something "a fate worse than death" er, deletion 13:55:20 thuban: https://web.archive.org/web/20130822103939/https://podcast.rthk.hk/podcast/radio1_openline_openview.xml, http://podcast.rthk.org.hk/podcast/media/radio1_openline_openview/radio1_openline_openview_2013072217_1.mp3 13:59:15 Wouldn't the metadata from something like podchaster be derived from the rthk RSS, so it's suffer the same truncation issues? Going to the article pages in the wayback machine seems like a good idea, if they exist. 14:17:53 oh and fyi I have 125GB of open line open view and 43GB of backchat mp3s/m4as. I don't have the most upload bandwidth/data cap, and there may be (not large) gaps. If it's needed I can upload somewhere. The RTHK archive server is not the fastest to download from. 15:49:53 Hah, we tried to archive SketchFab's URL shortener through URLTeam a long while ago. They weren't very happy at the time. 17:06:46 JAA: no, they were not... 17:09:11 also, OrIdow6 sure is broken 20:16:03 how would one accurately make WARC(s) of an entire site assuming it needs a login cookie? 20:16:13 since i posted in last channel before, ty JAA 20:16:30 https://github.com/ArchiveTeam/grab-site#website-requiring-login--cookies 20:16:46 accuracy, though... YMMV 20:26:05 ty; what's the best way to open a WARC, fwiw? 20:28:39 I find that https://replayweb.page/ works great, but there's quite a lot of tools: https://wiki.archiveteam.org/index.php/The_WARC_Ecosystem 20:31:42 pywb works pretty well for local playback. 20:34:54 AK edited Hong Kong media (+62, Jobs in progress and done): https://wiki.archiveteam.org/?diff=47000&oldid=46993 21:06:23 somerando3: tried some guesses of my own, no luck either :( 21:06:33 as for the truncation issue, probably, yes, but it's worth checking out. unfortunately i don't think we're going to have much luck with the episode pages--rthk's current setup seems to take them down as soon as they scroll off the 1000-ep backlog, wbm coverage is super spotty (~50 results for this podcast), and pre-2017 they may not have existed at all 21:08:48 i have also downloaded stuff and will get it on ia eventually, but i can ping you if i find any issues 21:14:17 did viewing archived tweets on Wayback break? 21:18:29 I believe some old tweets on twitter had bad captures?