00:00:18 JAABot edited CurrentWarriorProject (-2): https://wiki.archiveteam.org/?diff=50433&oldid=50306 00:45:32 fireonlive you still wanted me to hit the polycom domains hard? 00:45:53 pls :) 00:49:39 fireonlive looks like some of them 403 with archivebot. If you wanna babysit the dashboard I can throw them in as fast as possible or I can put them on the backburner til I have time to better monitor it all 00:54:18 have the x over atm sadly so it’ll have to burner at the back 00:54:20 but thanks :) 01:35:35 Ufarwisan edited Discord (+660, Rename /* Software */ to /* Self-archival */): https://wiki.archiveteam.org/?diff=50434&oldid=48752 02:00:39 TheTechRobo edited Discord (+11, /* Self-archival */ We should probably make the…): https://wiki.archiveteam.org/?diff=50435&oldid=50434 02:05:39 TheTechRobo edited Discord (+28, /* Self-archival */ More details about tools): https://wiki.archiveteam.org/?diff=50436&oldid=50435 02:06:39 TheTechRobo edited Discord (+125, Link my URL extractor): https://wiki.archiveteam.org/?diff=50437&oldid=50436 02:11:41 TheTechRobo edited Vanillo (+117, Appears to be back up, with content dating back…): https://wiki.archiveteam.org/?diff=50438&oldid=41059 02:15:41 TheTechRobo edited Wysp (-1, It is now offline): https://wiki.archiveteam.org/?diff=50439&oldid=50417 02:19:07 thuban: looks like I have to archive all 4 video qualities for the DASH .mpd to work 02:25:20 interesting 02:26:02 at least ffmpeg/mpv/etc try to read the first segment of *every available alt quality* before they even start playing 02:26:59 if the low quality segment 1 returns 404 then it says the video is corrupted and dies, even if you told it to play 1080p 02:48:08 unfortunately archiving all qualities means 10GB per episode ugh 02:52:17 do I archivebot? 02:53:02 what are you wanting to archivebot? 02:54:07 flashfire42: https://www.rtve.es/play/videos/grand-prix/ spanish TV game show 02:55:58 nicolas17: ArchiveBot can't save full television programmes (typically), if that is what you were hoping for. 02:56:25 systwi_: what exactly do you mean by "can't"? file size limit? 02:57:06 ArchiveBot's purpose is to save web pages and eventually make them available in https://web.archive.org/ 02:57:11 https://wiki.archiveteam.org/index.php/ArchiveBot 02:58:00 earlier I asked "video is in DASH format, should I remux it to .mp4 and upload it as an item, or archive the .mpd and video segments in a WARC, or give archivebot a URL list and let it do that for me?" and thuban said "1 and 3 imho" 02:58:10 But if the URL to which you had linked were to be saved with ArchiveBot, it would try its best to save any web pages it can find. 02:59:39 All three sound good, but I think thuban has a good point, so I second it. 02:59:46 1 & 3. 03:00:06 https://transfer.archivete.am/inline/8x4IQ/6939444.txt this is what I planned to give to archivebot, not the web player :) 03:02:34 Looks good to me. Thank you for the list. I'll save it with ArchiveBot for you. 03:03:21 note having multiple video qualities it adds up to 10GB 03:04:32 ~10GB shouldn't be too problematic. 03:04:34 someone uploaded most or all of the old seasons (1996-2007) to YouTube, probably from personal VHS 03:05:20 Going the extra mile is nice. :-) 03:06:09 in fact, I searched for it on youtube to show someone, and that's where I discovered they were about to reboot it this year 03:28:40 systwi_: also, the web player loads 6939444_drm.mpd and gets a FairPlay or Widevine license to decrypt it 03:29:28 I asked a friend if he knew how to break widevine nowadays, and then I realized I could just ... remove the "_drm" part of the URL >.> 03:30:18 Haha, they store a decrypted version too? Lovely. :-P 03:32:07 I *hope* their paid content for RTVE Play+ subscribers is protected better than that 05:47:27 i hope it isn't 05:47:28 :D 05:47:34 🏴‍☠️ 05:57:39 https://old.reddit.com/r/DataHoarder/comments/15k2fa4/what_data_do_you_think_is_at_risk_of_being/jv3sk36/ 05:57:42 if only…. 05:57:51 someone should mention AT there too :) 06:09:21 Hm 06:10:36 I'm kinda surprised that AI types don't toss around AT data as much as they seem to, like, pushshift 06:10:54 If that happens could put us at risk of being more aggressively blocked 06:26:37 maybe warcs are too difficult for them lol 07:18:35 JAA, did you ever hear back from uktrainsim? 08:44:43 close 08:55:59 Exorcism uploaded File:Isitnormal-logo.png: https://wiki.archiveteam.org/?title=File%3AIsitnormal-logo.png 08:56:00 Exorcism uploaded File:Isitnormal-screenshot.png: https://wiki.archiveteam.org/?title=File%3AIsitnormal-screenshot.png 08:56:59 Exorcism edited Is It Normal? (+65): https://wiki.archiveteam.org/?diff=50442&oldid=50429 10:13:13 Exorcism edited Enjin (+37): https://wiki.archiveteam.org/?diff=50443&oldid=49748 12:11:55 Canadian file host filegenie.com will shut down for undisclosed reasons on August 31; most of the links in its sitemap, including FAQ, are dead. 12:15:13 Filegenie's file URL format is http://wl.filegenie.com/~/ . Websites that contain still-active wl.filegenie.com links should be archived too. 12:23:45 that sounds difficult to do a comprehensive grab on :| 12:28:44 needs some search engine queries I guess 12:29:22 oh, no directory listings :( 12:30:17 flashfire42 seems to be on it already 12:35:06 Thats everything from bing anyway 12:42:38 does bing have a results limit like google/ddg do? 12:42:56 Yes 12:43:40 Alright I am looping on myself I am going back to be 12:44:03 ah, did you try the adding keywords trick from https://wiki.archiveteam.org/index.php/Site_exploration ? 12:44:56 No because my usual checked urls trick doesnt work on those pdfs because it tries to download them straight away instead of opening them in a web browser 12:49:15 Google/DDG don't find many URLs 15:42:14 TheTechRobo edited The WARC Ecosystem (+713, Add section for people who just want to view…): https://wiki.archiveteam.org/?diff=50444&oldid=50100 15:45:14 Farrukhali6177 edited CNET Forums (+33, /* Shutdown notice */): https://wiki.archiveteam.org/?diff=50445&oldid=48231 15:45:15 Ersatzteilehome edited Discourse (+64): https://wiki.archiveteam.org/?diff=50446&oldid=50234 15:45:16 Ufarwisan edited Discord (+9): https://wiki.archiveteam.org/?diff=50447&oldid=50437 15:45:17 Exorcism edited Deathwatch (+96): https://wiki.archiveteam.org/?diff=50448&oldid=50318 15:52:15 TheTechRobo edited Discord (+178, /* Self-archival */ Add source code licences): https://wiki.archiveteam.org/?diff=50449&oldid=50447 15:57:48 i love all the changes to the wiki lately :) 16:39:17 hi, i intend to shutdown my warrior for a system upgrade. however, it seems like it's stuck doing nothing useful (server returned bad response & nearly 16 elapsed job). could i force stop the warrior right now? 16:39:41 *nearly 16 hours elapsed job 16:40:26 jacksonchen666: It's fine 16:42:00 I think you have already got banned and the failed items in the warrior project should return to the backfeed 16:48:55 doesn 16:48:57 doesn 16:49:00 oops again 16:49:50 seems like my warrior is still trying for some reason, switched it to another project manually 17:13:31 TheTechRobo edited Twitch.tv (+642, #burnthetwitch: Add directory structure and caveat): https://wiki.archiveteam.org/?diff=50450&oldid=50418 20:46:56 thuban: I didn't even remember sending that email, but no, I didn't. 20:47:21 ouch 20:52:21 Any ideas for what to do with a site like http://www.ericbrasseur.org/? It does a JS challenge of some sort that sets a cookie, and then redirects to a different page. But the challenge seems to fail randomly sometimes too. It seems like useful content at least 23:39:10 pokechu22: IIRC JAA had a way to archive stuff that needs a cookie 23:39:29 JAA: did you end up getting the opensource.com cookie-requiring stuff btw? 23:47:02 any requests for archivebot focus today or just me going on with my ISP hosting stuff? 23:48:15 I'm doing some greek university stuff (for a school that I think was merged into a different one in 2019) but it's not super high priority 23:49:13 I can add site:teithessaly.gr to my tabs 23:50:06 Don't worry about it - the stuff I did was the only relevant cached stuff (the other domains are live) 23:50:18 Ah ok 23:50:20 all good 23:50:58 there's also some jank with teithessaly.gr and teilar.gr being the same site (I've already handled teilar.gr for the most part, currently checking subdomains) 23:56:22 JAA: it seems s3://origin.ka.cdn/ is entirely inaccessible now? 23:56:46 its CDNs too