00:39:52 <h2ibot> TheTechRobo edited Post News (+146): https://wiki.archiveteam.org/?diff=53484&oldid=52312
03:03:01 <hook54321> ah, if it was a private beta then that's fair
05:11:50 <that_lurker> Entire Independent Board of Directors of 23andMe Resigns https://investors.23andme.com/news-releases/news-release-details/independent-directors-23andme-resign-board/ https://news.ycombinator.com/item?id=41573034
06:42:08 <pabs> Barto JAA - re mastodon JS, a wrapper around zygolophodon can do small parts of it (but not a whole site I think) https://github.com/jwilk/zygolophodon https://paste.debian.net/hidden/622ccf51/
06:43:18 <JAA> pabs: Embeds are beginning to require JS as well.
06:43:37 <pabs> hmm, got an example?
06:43:43 <JAA> mastodon.social
06:43:50 <JAA> It's this PR, I think: https://github.com/mastodon/mastodon/pull/31766
06:43:57 <pabs> this works without JS https://mozilla.social/@mozilla/113153943609185249/embed
06:44:13 <JAA> Yes, they're probably not running the bleeding edge.
06:45:06 <JAA> The PR was only merged 6 days ago and isn't in a release yet. But mastodon.social runs it already, it seems.
06:45:45 <JAA> It looks like there'll be a new release soon, and then it'll spread to most instances quickly.
06:46:29 <pabs> crap
06:47:17 <pabs> hmm, zygolophodon does still work with mastodon.social. maybe I can modify it to output API URLs instead
06:48:54 <JAA> Rewriting the URLs should be trivial.
06:56:13 <pabs> aha, it has --debug-http already
06:57:42 <pabs> does 2 requests for individual posts: /api/v1/statuses/113082066860765988 /api/v1/statuses/113082066860765988/context
06:59:14 <pabs> and 3 for users: /api/v1/accounts/lookup?acct=mozilla /api/v1/accounts/110306602663312748/statuses?pinned=true /api/v1/accounts/110306602663312748/statuses?exclude_replies=true&limit=40
06:59:24 <pabs> (plus pagination I guess)
08:32:02 <monoxane> I wonder if it would be worth writing a thing that scrapes the raw ap apis instead of trying to go through the js ui
08:32:26 <monoxane> it would violate the "preserve original content" rules though
08:42:48 <magmaus3> monoxane: one potential problem is that some instances require authorized fetches, which would require the scraper to have an instance. (btw, that also means that it would be possible to prevent scraping, which is both a good and a bad thing)
14:42:44 <HiccupJul> Would it be okay to make a "List of websites not captured correctly by the Wayback Machine" page on the wiki, like the exclusions page? Don't have many examples right now, but there are a few. Although I guess it can be somewhat worked around by using archivebot.
14:43:49 <HiccupJul> The one I was thinking of was this: https://www.electricsheep.co.jp/blog.php?id=431
14:43:58 <arkiver> HiccupJul: that sounds nice, i'm guessing final call would be with JAA ^
14:44:18 <HiccupJul> https://wiki.archiveteam.org/index.php/How_to_use_our_wiki this says to be bold but yeah I was wondering about his opinion
14:44:55 <HiccupJul> webpage is the blog of the Gimmick! (famous NES game) developer, has behind the scenes info and such. blog pages only load if you navigate to the main page first.
14:45:54 <HiccupJul> doesn't work through Save Page Now at the very least
14:47:15 <HiccupJul> i'm asking on #archivebot if someone can try it through archivebot
14:48:06 <h2ibot> MihaiArchive1 edited WikiTeam (+3, /* Wiki dumps */): https://wiki.archiveteam.org/?diff=53486&oldid=53483
14:48:07 <h2ibot> MihaiArchive1 edited Wikimedia Commons (+57): https://wiki.archiveteam.org/?diff=53487&oldid=49964
14:49:06 <h2ibot> Awauwa edited Deathwatch (+198, added mozilla.social): https://wiki.archiveteam.org/?diff=53488&oldid=53463
15:14:37 <JAA> HiccupJul_: How would you define 'correctly'?
15:15:15 <HiccupJul_> good question
15:15:36 <HiccupJul_> but ones that don't have any of the content, like in this case, should probably be recorded
15:16:02 <JAA> The content not being displayed doesn't necessarily mean it wasn't captured though.
15:17:10 <JAA> I know there are sites that can be captured, all the relevant data is captured, but then something breaks on playback. If you know the API URL, you can still get the content back.
15:18:46 <JAA> The SPN 'just' does a MITM proxy to capture the network traffic. The WBM dynamically rewrites things, which sometimes breaks due to how the target site's JS is written.
15:19:16 <HiccupJul_> huh
15:19:41 <HiccupJul_> how can i check that for myself?
15:19:55 <JAA> There's no generic way. It depends on the individual site.
15:20:22 <JAA> You might be able to see something in the SPN output when using the submission form (rather than /save/URL).
15:21:04 <JAA> I see that https://www.electricsheep.co.jp/blog.php?id=431 returns a message about requiring cookies, so that's different, I guess.
15:21:46 <HiccupJul_> yeah i think its a server-side thing
15:22:03 <HiccupJul_> ah i thought you meant the wayback machine api
15:22:13 <JAA> POST requests frequently break, but the failure mode varies. For example, it might only generate one capture per hour, and the playback then doesn't load the correct data.
15:22:33 <JAA> Ah, sorry, no, I mean the target site's.
15:23:31 <HiccupJul_> yeah looking in chrome devtools network log, loading the page in incognito, i don't see the page content
15:23:39 <HiccupJul_> so i think it is a server-side check of some kind
15:24:21 <JAA> Yeah
15:24:36 <HiccupJul_> maybe the wiki page should just list things like that which save page now can't handle, e.g. navigating to home page first. bit of an obscure requirement though
15:27:10 <JAA> Yeah, I feel like there are too many different failure modes here to document them in a sensible manner. Maybe a list of those failure modes could be useful though.
15:32:12 <JAA> And then we can add a couple examples to each failure mode.
15:41:06 <HiccupJul_> side question: is there a way to view the metadata of IA items (like https://archive.org/metadata/whatever) after the item is taken down?
15:49:40 <arkiver> HiccupJul_: no
15:52:33 <HiccupJul_> ah, didn't think so. do you know if there's any third party backup of that metadata being made?
16:09:04 <arkiver> i dont think so
16:55:18 <nulldata> monoxane - Grabbing the API results via AB wouldn't violate a "preserve original content" rule. It's not ideal and wouldn't be easy to browse, but it's not making up or modifying content
17:57:16 <JAA> (Faking HTML pages using the API data would however be bad.)
19:39:21 <magmaus3> JAA: im assuming that adding additional js to make the contents readable would still violate the rule, right?
19:40:46 <JAA> Naturally
19:40:52 <JAA> Any modification at all does.
19:41:18 <JAA> However, you could have an external page that fetches the API response from the WBM and renders it however you like.
19:41:47 <steering> ^ and then capture that in the WBM! :P
19:45:51 <JAA> Why yes, I've done that before (due to CORS). :-D
19:46:23 <JAA> Well, not quite that, but same principle: https://web.archive.org/web/20211001003631id_/https://ia801403.us.archive.org/33/items/picosong.com_finder/index.html
22:31:50 <TheTechRobo> magmaus3: Depends on whether it's in the WARC or not. If you're modifying the WARC record, not allowed. But the Wayback Machine adding special code to fix the page would be fine.
22:32:25 <JAA> Right, yes, but we have no influence over that.
22:36:46 <magmaus3> TheTechRobo: good to know :3