00:39:52 TheTechRobo edited Post News (+146): https://wiki.archiveteam.org/?diff=53484&oldid=52312 03:03:01 ah, if it was a private beta then that's fair 05:11:50 Entire Independent Board of Directors of 23andMe Resigns https://investors.23andme.com/news-releases/news-release-details/independent-directors-23andme-resign-board/ https://news.ycombinator.com/item?id=41573034 06:42:08 Barto JAA - re mastodon JS, a wrapper around zygolophodon can do small parts of it (but not a whole site I think) https://github.com/jwilk/zygolophodon https://paste.debian.net/hidden/622ccf51/ 06:43:18 pabs: Embeds are beginning to require JS as well. 06:43:37 hmm, got an example? 06:43:43 mastodon.social 06:43:50 It's this PR, I think: https://github.com/mastodon/mastodon/pull/31766 06:43:57 this works without JS https://mozilla.social/@mozilla/113153943609185249/embed 06:44:13 Yes, they're probably not running the bleeding edge. 06:45:06 The PR was only merged 6 days ago and isn't in a release yet. But mastodon.social runs it already, it seems. 06:45:45 It looks like there'll be a new release soon, and then it'll spread to most instances quickly. 06:46:29 crap 06:47:17 hmm, zygolophodon does still work with mastodon.social. maybe I can modify it to output API URLs instead 06:48:54 Rewriting the URLs should be trivial. 06:56:13 aha, it has --debug-http already 06:57:42 does 2 requests for individual posts: /api/v1/statuses/113082066860765988 /api/v1/statuses/113082066860765988/context 06:59:14 and 3 for users: /api/v1/accounts/lookup?acct=mozilla /api/v1/accounts/110306602663312748/statuses?pinned=true /api/v1/accounts/110306602663312748/statuses?exclude_replies=true&limit=40 06:59:24 (plus pagination I guess) 08:32:02 I wonder if it would be worth writing a thing that scrapes the raw ap apis instead of trying to go through the js ui 08:32:26 it would violate the "preserve original content" rules though 08:42:48 monoxane: one potential problem is that some instances require authorized fetches, which would require the scraper to have an instance. (btw, that also means that it would be possible to prevent scraping, which is both a good and a bad thing) 14:42:44 Would it be okay to make a "List of websites not captured correctly by the Wayback Machine" page on the wiki, like the exclusions page? Don't have many examples right now, but there are a few. Although I guess it can be somewhat worked around by using archivebot. 14:43:49 The one I was thinking of was this: https://www.electricsheep.co.jp/blog.php?id=431 14:43:58 HiccupJul: that sounds nice, i'm guessing final call would be with JAA ^ 14:44:18 https://wiki.archiveteam.org/index.php/How_to_use_our_wiki this says to be bold but yeah I was wondering about his opinion 14:44:55 webpage is the blog of the Gimmick! (famous NES game) developer, has behind the scenes info and such. blog pages only load if you navigate to the main page first. 14:45:54 doesn't work through Save Page Now at the very least 14:47:15 i'm asking on #archivebot if someone can try it through archivebot 14:48:06 MihaiArchive1 edited WikiTeam (+3, /* Wiki dumps */): https://wiki.archiveteam.org/?diff=53486&oldid=53483 14:48:07 MihaiArchive1 edited Wikimedia Commons (+57): https://wiki.archiveteam.org/?diff=53487&oldid=49964 14:49:06 Awauwa edited Deathwatch (+198, added mozilla.social): https://wiki.archiveteam.org/?diff=53488&oldid=53463 15:14:37 HiccupJul_: How would you define 'correctly'? 15:15:15 good question 15:15:36 but ones that don't have any of the content, like in this case, should probably be recorded 15:16:02 The content not being displayed doesn't necessarily mean it wasn't captured though. 15:17:10 I know there are sites that can be captured, all the relevant data is captured, but then something breaks on playback. If you know the API URL, you can still get the content back. 15:18:46 The SPN 'just' does a MITM proxy to capture the network traffic. The WBM dynamically rewrites things, which sometimes breaks due to how the target site's JS is written. 15:19:16 huh 15:19:41 how can i check that for myself? 15:19:55 There's no generic way. It depends on the individual site. 15:20:22 You might be able to see something in the SPN output when using the submission form (rather than /save/URL). 15:21:04 I see that https://www.electricsheep.co.jp/blog.php?id=431 returns a message about requiring cookies, so that's different, I guess. 15:21:46 yeah i think its a server-side thing 15:22:03 ah i thought you meant the wayback machine api 15:22:13 POST requests frequently break, but the failure mode varies. For example, it might only generate one capture per hour, and the playback then doesn't load the correct data. 15:22:33 Ah, sorry, no, I mean the target site's. 15:23:31 yeah looking in chrome devtools network log, loading the page in incognito, i don't see the page content 15:23:39 so i think it is a server-side check of some kind 15:24:21 Yeah 15:24:36 maybe the wiki page should just list things like that which save page now can't handle, e.g. navigating to home page first. bit of an obscure requirement though 15:27:10 Yeah, I feel like there are too many different failure modes here to document them in a sensible manner. Maybe a list of those failure modes could be useful though. 15:32:12 And then we can add a couple examples to each failure mode. 15:41:06 side question: is there a way to view the metadata of IA items (like https://archive.org/metadata/whatever) after the item is taken down? 15:49:40 HiccupJul_: no 15:52:33 ah, didn't think so. do you know if there's any third party backup of that metadata being made? 16:09:04 i dont think so 16:55:18 monoxane - Grabbing the API results via AB wouldn't violate a "preserve original content" rule. It's not ideal and wouldn't be easy to browse, but it's not making up or modifying content 17:57:16 (Faking HTML pages using the API data would however be bad.) 19:39:21 JAA: im assuming that adding additional js to make the contents readable would still violate the rule, right? 19:40:46 Naturally 19:40:52 Any modification at all does. 19:41:18 However, you could have an external page that fetches the API response from the WBM and renders it however you like. 19:41:47 ^ and then capture that in the WBM! :P 19:45:51 Why yes, I've done that before (due to CORS). :-D 19:46:23 Well, not quite that, but same principle: https://web.archive.org/web/20211001003631id_/https://ia801403.us.archive.org/33/items/picosong.com_finder/index.html 22:31:50 magmaus3: Depends on whether it's in the WARC or not. If you're modifying the WARC record, not allowed. But the Wayback Machine adding special code to fix the page would be fine. 22:32:25 Right, yes, but we have no influence over that. 22:36:46 TheTechRobo: good to know :3