04:29:36 . 04:32:32 MariaDB acquired https://www.prnewswire.com/news-releases/k1-acquires-mariadb-a-leading-database-software-company-and-appoints-new-ceo-302243508.html 04:42:43 immibis: there are several mirrors for sourceforge file releases, they redirect requests to those mirrors 04:44:00 immibis: rolling hashes like modern backup systems like restic/borg do might be good for dealing with the redundancy stuff 06:43:22 nicolas17: you're trying to save on space archiving apple releases? 06:43:43 are you storing these at IA? what are you doing to save storage? 06:44:08 wondering because if it's stored at IA, maybe in this case (on this specifically) one doesn't have to save on space 07:19:34 pabs: good idea but it's not that simple since compression such as .gz has a cascading effect: one different uncompressed bit changes the rest of the file 07:20:54 Or even the same input might result in different compression output. 07:21:19 i did some experiments like this and compressed ~400GB of minecraft mod-packs (a hopefully complete set from FTB) down to ~4GB. But you want to get the original files back at the end, especially if they are signed, and this means writing a reversible decompressor which compresses the file the same way it was compressed before. 07:24:38 original bit-for-bit identical files 08:28:42 Recent Tor Exit Node Operator Raids and Legal Harassment in Germany https://forum.torproject.org/t/tor-relays-artikel-5-e-v-another-police-raid-in-germany-general-assembly-on-sep-21st-2024/14533 https://news.ycombinator.com/item?id=41505009 08:30:18 "Artikel 5 e.V. is now calling for a general assembly on Sep 21st 2024. We are looking for new board members (who take over and organize a new registered address and keep running exits) or discuss ALL alternative options. 08:30:20 These options include "just stop running exits" or even the most drastic step of liquidating the entire organization and distribution of the remaining budget to other German organizations (that would have to qualify under our non-profit by-laws)." 10:09:16 "Recent Tor Exit Node Operator Raids and Legal Harassment in Germany" ← shouldn't the tor exit operators be counted as not responsible for the traffic? (like ISPs and etc) 10:09:33 yes but most law enforcement are violent criminals, especially in germany 10:09:49 that's like everywhere 10:09:59 it's especially in germany 10:10:04 denazification never happened 10:10:10 This discussion stops now. 10:10:23 sure 10:10:27 IMO the biggest risk isn't that they'll raid your tor node, since that will be cleared by the court - it's that they could discover something else, illegal or not, when they go to raid your tor node. 10:10:37 yeah 10:10:54 why is it illegal to talk about legal threats to tor nodes here? 10:11:04 that was for a diff reason 10:11:04 it should be in -ot? 10:11:13 maybe 10:11:24 i think the issue was with the political discussion 10:11:29 Correct 10:11:40 tor node raids are politics 10:12:43 I've archived the public web stuff of Artikel 5 e.V. that I could find with AB. 10:12:57 cool :3 10:15:56 Artikel 5 e.V. is also a highly political organization 10:16:32 Which is completely irrelevant for this channel. 10:24:23 it seems A5eV would be well served by renting a clubhouse so that the business premises do not default to being at the chairman's home. 10:36:17 immibis: it's the "denazification never happened" stuff. that comment is not welcome here 10:36:48 that has been brought up before. 10:38:24 it's part of labeling entire perceived "groups" of people as x, where x may be some word like "nazi"/"communist"/"fascist"/etc. 10:39:59 also when it is not meant literally, but rather in some symbolical way, it's not something Archive Team is the right place for. 11:26:30 immibis: it looks like they do have an office location, there are multiple companies registered at that same address (including ImmobilienScout24) 11:27:26 but they can still figure out home addresses of people involved in an e.V. (and apparently they did) 11:32:21 well the search warrant says business premises so whoever executed it belongs in prison for burglary then 11:33:33 a search warrant for location X doesn't give you a pass to burglarize location Y 11:34:04 *the search warrant published by Artikel 5 e.V. 11:34:16 there may be more 11:36:05 https://artikel5ev.de/home/hausdurchsuchung-am-fr-16-08-2024/ 11:36:11 idk, this is all very confusing 11:37:35 on their website they write "the club doesn't have any dedicated space" and that's why their homes got searched, but they very clearly do have a dedicated space at Hatzper Str. 172B 11:38:40 make it make sense 11:43:40 one of the chairman runs a company at that address, so maybe the e.V. was using a shared space 11:44:02 immibis: i'm pinging again though on what i wrote (and will leave it at that) - this has now happened a few times, i believe you have seen (or maybe not? let me know if not) the longer message i posted back then on this 11:45:30 i gather you want to ban me cause of what i said about search warrants. Fine. I'll stop all containers I may be running, delete their data without uploading it and leave IRC for good. 11:45:38 AT IRC, that is 11:46:03 Ot' 11:46:08 It's not about the warrants. 11:46:20 It's about the nazi comments. 11:47:33 you already wrote about me saying denazification never happened 11:47:53 that is about symbolical nazi remarks - not search warrants 11:49:04 i not have as a goal to ban you - i an hoping you would somewhat understand what i wrote. again, it's not about search warrant discussion, it's about symbolically labeling groups as "nazi"/"fascists"/etc. 11:49:27 for completeness, i will post the message i wrote some time ago about this https://transfer.archivete.am/inline/NzLSU/message.txt 11:53:43 (note the message contains some references to the context at the time, but i believe it is still clear and i stand behind it) 11:54:35 "especially now with everything going on" is a very timeless thing to say 11:57:01 for context see https://hackint.logs.kiska.pw/archiveteam-ot/20230520#c345976 and the previous days(s?) 12:03:50 "As a consequence, I am personally no longer willing to provide my personal address&office-space as registered address for our non-profit/NGO[...]" written by the chairman who has a company at that address 12:04:00 (at https://forum.torproject.org/t/tor-relays-artikel-5-e-v-another-police-raid-in-germany-general-assembly-on-sep-21st-2024/14533) 12:04:07 denazification is not identifying specific people as nazis, it is the removal of nazi ideology from general perception in the entire country of germany 12:04:27 so it seems the theory of "the e.V. only had a postbox at that address" tracks 12:05:25 i see that you want no politics not directly related to archive team, so the whole exit node raiding thing is not allowed, except for the statement that it happened and therefore this e.V.'s site could be at risk. 12:38:07 arkiver: I know of two people with a giant NAS at home with apple releases (including many that apple already deleted from their servers) 12:38:37 and there's so much redundant data... 12:39:39 and the files keep getting bigger and more numerous https://theapplewiki.com/wiki/Beta_Firmware/iPhone/17.x 13:42:19 nicolas17: are we actively archiving those apple CDN URLs into the wayback machine? 13:42:32 please feel free to at least with ArchiveBot (CC JAA ) 14:28:00 arkiver: I grabbed them all 14:32:20 corentin: as in, is that happening periodically? 14:46:14 arkiver: no no sorry, I mean I grabbed the ones in this URLs shared. There must be something "bigger" to do though 14:46:56 corentin: as in the ones on https://theapplewiki.com/wiki/Beta_Firmware/iPhone/17.x ? 14:47:05 nicolas17: were you archiving those on the long term? 15:01:59 what was that TLD again that got a warning? 15:02:16 for hosting too much spam or pishing addresses or something 15:02:58 probably one of the freenom ones 15:03:02 .tk et al 15:06:06 hmm yeah 15:06:22 maybe 15:30:30 arkiver: yes, sorry I should have been more clear! 15:53:14 alright! 15:56:02 well so to be clear, feel free to continue archiving this with ArchiveBot, it is well worth the size i think 16:07:55 corentin: note the wiki may have duplicates (XS and XS Max use the same files but have separate tables on the wiki) 16:08:27 if you just grabbed the URLs from the wiki and fed it to AB, I'm not sure if AB dedups 16:10:55 arkiver: I don't have the disk space to archive this long term myself :P but I'm helping people who do 16:11:04 and yeah I was planning on discussing how to archive this properly 16:11:15 how much space we talkin'? 16:11:28 an admin of theapplewiki started feeding some URLs to savepagenow and I told him that was probably not the best way 16:12:47 nicolas17: you can see feed lists into ArchiveBot! 16:12:50 IA definitely has the space for this 16:13:05 individual items on IA _next to that_ are also welcome, i can create a collection for you and others 16:13:24 https://archive.org/details/apple-ipsws already done that for some files that were already deleted from Apple 16:14:01 yeah feel free to put everything in that collection and in ArchiveBot! 16:15:07 if any help is needed, don't hesitate to ping me :) 16:16:22 hmmmm I imported info from appledb into a SQLite database, and "select sum(file_size) from sourcefile where type='ipsw'" returns 48TB, which seems low, I wonder if my import was excluding something important... I last looked into it in June or so 16:17:12 nicolas17: i see on an item like https://archive.org/details/xcode-16.1-beta1 you included a `source` metadata field, is it possible to include that for ipsw items as well? 16:17:41 with the original URL? yes 16:17:57 yeah! 16:18:33 though there's some where the source will be "someone sent it to me and it seemed to be the right file but it has been gone from apple-cdn for 2 years now" 16:19:12 nicolas17: let's add a note to that, but feel free to include at your discretion 16:21:39 what does IPSW stand for? is the official name iPSW ? i can't actually easily find this info online... 16:22:40 afaik it was originally iPod Software Update, then iPhone Software Update with a different format 16:23:18 nicolas17: AB only dedupes identical URLs. 16:23:22 nowadays even macOS uses iPhone-like IPSW files 16:23:32 JAA: oh that's fine for this case 16:23:45 And even then, not on redirects. 16:23:53 It does no content dedupe at all. 16:24:02 the wiki page has some URLs multiple times and I didn't know if corentin had dedup'd them, if AB dedups them that's enough 16:24:15 Ah 16:24:34 Last time we spoke about this, I think you said there were like 4 different URLs for each file. 16:24:59 for macOS InstallAssistant.pkg files, there's often 2 different URLs for each 16:26:01 someone uploaded WARCs of them as items to archive.org containing 4 copies, because they archived those two URLs, each on both http+https, with no content dedup :/ 16:26:16 Ah right, that's the one you mentioned, yeah. 16:26:49 afaik WBM doesn't care about http+https so we would only need to archive 2 anyway 16:27:05 Correct, the WBM doesn't care about the scheme in general. 16:27:21 nicolas17: so would you say the official name nowadays is just IPSW? even apple or wikipedia doesn't clearly mention anything else 16:27:31 i guess it has so many meanings now, that it's just IPSW 16:27:45 arkiver: I think macOS Finder shows .ipsw files as "Apple software update" nowadays :P 16:30:38 that is annoying 16:31:11 Apple has many misnomers due to scope growth tbh 16:31:57 the sharingd daemon used to deal with AirDrop (sharing files wirelessly), now it handles most of the Continuity features many of which have nothing to do with sharing 16:35:06 mail on mac: "Mail.app"; mail on iPhone: "MobileMail.app"; many MobileSomething names refer to iOS... then some features like MobileAssets get ported to macOS and nothing makes sense anymore 16:38:08 too bad that sometime device manufacturers DMCA those archives off the net even though others are glad that those archives exist 16:38:57 (got a strike due to that crap already, luckily the IA version was just a secondary location, my personal copy where the state is kept of what i got already is not visible on the open web for obvious reasons) 16:40:00 betas used to be restricted to members of the developer program 16:40:11 later only beta 1 was restricted that way 16:41:02 last year I got a DMCA takedown not for re-hosting the 17.0b1 ipsws, but for *tweeting a link* to someone else's website that re-hosted them 16:41:49 the lawyers later withdrew the claim but I had already deleted the tweet myself by then *shrug* 16:43:16 mine was for the archival of the sena (motorcycle intercom) firmware files. Might have gotten a few files that they didnt want out (their update server is a bit of a leaky pipe, got a dumb monitoring of a few files into a git and that sometimes leaks stuff before release) 16:43:18 masterx244|m: yes :/ 16:44:00 got a few 0.X.X versions that way, too 16:44:04 i do advise to keep your own copies of data very important, next to storing on IA, but in case of very large amounts of data that may not be practical 16:44:59 luckily its pretty small, 20GB or so and for coldstorage a 7zip solid compression gets it down to 300MB or so due to quite a few cross-file-redundancies 16:45:37 I can't avoid feeling bad about duplication 16:45:39 and yeah, got my local copy still since i have some automatic crawling, the IA copy was generated by that tooling, too with some CSV upload magic 16:46:17 sometimes they had different language versions where the code part was identical. or different version of code but the audio snippets for the menu were the same 16:46:28 not like "me and Siguza and qwerty and archive.org having the same file", that's good for redundancy 16:46:37 but about "archive.org having the same file 3 times in separate captures/items" 16:46:57 waaaasteful >_< 16:47:28 "the file is only 10GB it's no big deal" yeah but wasteful >_< 16:48:30 nicolas17: if it makes 50 TB go times 4, that is a big problem 16:48:37 URL agnostic deduplication would help 16:49:18 yeah, splitting the stuff into "headers" and payload and if a payload segment is == with another one just storing a pointer would be enough 16:50:55 i believe AB deduplicates (?) 16:51:12 Wget-AT can be run with the 4 URLs as input and URL agnostic deduplication turned on and it will handle it 16:59:57 AB does not dedupe. 17:03:56 ah 17:04:33 JAA: i do remember some messages about that streaming over archivebot.com in the past about duplicate content - was it a feature in the past? 17:08:47 arkiver: There is a dupe detection, but that's for stopping recursion on identical responses. It doesn't do anything about the WARC writing. It's also been broken for, uh, over 8 years, before I arrived here. 17:09:10 ah 17:09:24 thanks for clearing that up 17:14:05 How do I know when a blog has been archived by frogger? 17:16:39 I believe the bot has dedupe so you can probably just put it in and let the bot deal. That said, the only good way is to check if the Wayback machine contains the blog 17:18:54 masterx244|m: the WARC format supports that kind of deduplication (storing the request and response headers, and only a pointer to the previous response body), but archivebot doesn't use it 17:28:03 rewby, gotcha thanks 17:37:27 corentin: what did you do with theapplewiki? archivebot? I don't see the job running and I doubt it finished 17:47:46 nicolas17: no I did it myself 17:48:15 what does that mean 17:48:17 savepagenow? 17:48:39 or downloaded to your own disk? :P 17:49:22 Nulldata edited Deathwatch (+270, /* 2024 */ Added SteamRep (thanks PredatorIWD2)): https://wiki.archiveteam.org/?diff=53445&oldid=53444 17:49:40 I work at the Internet Archive, I write and maintain crawlers & crawls, I captured it with Zeno and I'll upload it at some point (when the upload process kicks off) 17:50:42 Neat, I've not checked up on how zeno's been coming along 17:50:52 corentin++ 17:50:52 -eggdrop- [karma] 'corentin' now has 1 karma! 17:51:05 I remember trying to get heritrix to do stuff a few years ago and that was pain and suffering 17:51:09 Mostly because java 17:51:10 It is 17:51:19 It's why I wrote Zeno hahaha 17:51:27 Very understandable 17:52:09 Is there any docs on zeno or just "look at the code and figure it out? 17:52:50 We've had some huge work done on it recently to try and address long standing stability issues, because for a couple of years I was the only dev on it and so I was mostly using it for "experimental" crawls. Note: the WARC writing itself is very well tested and stable, I'm just talking about the crawling part. Anyway, a lot of work from me and a 17:52:50 couple of colleagues on it recently to get it way more stable, and more expendable / solid for the future features we'll add. 17:53:24 Sadly for now, no documentation. :) But --help will help you. ./Zeno get url https://google.com, ./Zeno get list list.txt... and -h for all the options 17:53:26 Neat. I might have a go at it Later TM and see how it works 17:53:39 I hope it will, if you see any weird behavior, any bug, please open an issue 17:54:01 same. might even be useful for grabs for the own sanity. currently using grab-site for those sanity grabs 17:54:03 Just from looking at the readme, is crawl hq that couch db thing I recall from eons ago? 17:54:28 Not at all, Crawl HQ is an internal queuing system. Internal as in IA internal. 17:55:03 At some point I'll write in the README that even if Zeno is fully OSS, it still has very IA-specific features sometimes, optional of course 17:55:14 Yeah makes sense 17:55:33 ooo it has an api 17:55:35 Interesting 17:55:37 It's also opinionated, there are choices that are made so that it fits our usage more than anything else 17:55:48 That's very fair 17:55:49 Well.. the API is mostly reading, nothing more yet 17:55:52 Ah 17:56:00 But yeah of course I thought about like, adding URLs via the API etc 17:56:06 so many possibilities 17:56:18 Is there a spec for crawlhq's api somewhere? Might be interesting to do an alt implementation of that to coordinate a small set of zeno instances. 17:56:20 any PR is welcome btw, there is so much to do 17:56:35 https://git.archive.org/wb/gocrawlhq this should be public 17:56:43 It's not an API doc per say 17:56:51 but it should be enough for a smart man to understand the endpoints 17:56:54 Oh cool it supports headless browsers 17:57:11 (I'm having a read through your cmd/ directory) 17:57:29 Well... no, it's very experimental. There is actually a PR opened for that. (idk why the --headless option made it to the --help) 17:57:42 Goal with that PR is to bring the capability of doing mixed crawls 17:57:48 where headless is only used on some domains 17:57:53 it's like 80% done 17:58:00 I'll get back to it at some point haha 17:58:52 about HQ, there is actually someone that wrote his own HQ compliant system that use like MongoDB or whatever, just to interact with Zeno haha 17:58:59 How do you deal with TLS in the headless browser case? Since MITM proxying is required to get a correct WARC capture, and I've heard that TLS config on headless browsers is a mild pain. 17:59:02 it's not open source hto I think 17:59:41 can we move this to #archiveteam-dev or #archiveteam-ot ? 17:59:47 corentin: what's the state of deduplication in zeno? :P 17:59:59 I'll answer you both in ot or dev 18:00:06 -dev sounds fine. 18:00:12 arkiver: Sorry <3 18:00:19 no worries, thanks :) 18:03:51 JAA: https://transfer.archivete.am/inline/SYFTj/Screenshot_20240911_150109.png 18:04:10 nobody expects archiveteam scale :D 18:05:17 lol 18:06:14 you should share the telegram numbers :-P 18:06:33 Or reddit or #// 18:06:52 Enjoy me some 8PiB of urls 18:38:34 https://www.pcmag.com/news/wix-to-block-russian-users-take-down-their-sites-in-wake-of-us-sanctions 18:52:06 Hi, the deadline would be sept 12 18:52:07 https://www.bleepingcomputer.com/news/legal/wix-to-block-russian-users-starting-september-12/ 18:53:15 oof 18:53:21 today? 18:53:33 or no tomorrow, but yeah 18:53:34 how do we even find affected sites? 18:54:06 Some discussion on this in #archivebot too 18:54:26 "02:46 PM <@JAA> nyuuzyou shared a list of 167 presumably Russian Wix sites earlier. Needs a bit of cleanup. I was going to run it as !a <, but maybe separate jobs are better, not sure." 18:54:30 nicolas17: Дизайн этого сайта создан в конструкторе site:*.wixsite.com 19:25:24 I'm running https://transfer.archivete.am/inline/JRZXk/wixsite.com_russian_sites.txt which was obtained that way (though I also did a -site:wixsite.com search, which found a few results), apart from https://transfer.archivete.am/inline/55K2g/wix.txt which was sent by someone else and I don't know how they generated it 19:26:24 Finding more sites would be difficult because wix free sites are deliberately annoying and while https://woodland64.wixsite.com/mysite works, https://woodland64.wixsite.com/ and https://woodland64.wixsite.com/sitemap.xml are 404s 19:27:45 JAA: Slight warning, GamersNexus just posted a news video calling for "datahoarders to pull down some of that" (anandtech) so we may see a bunch of people joining and asking. 19:30:10 nicolas17: :-) 19:30:15 rewby: Ack. No specific mention of us? 19:30:30 No, but you know how this goes 19:30:36 Yeah 19:30:59 How is that AB job doing anyway? 19:31:10 Put it in front in the #archiveteam topic, maybe it'll help. 19:31:14 I don't see it on the dashboard but wiki says in progress 19:31:47 I do see a forum job going 19:31:48 The main site job finished, the forums are still going. 19:32:00 I'll update the wiki 19:32:06 No idea whether it's complete or there are complications. 19:32:13 Ah 19:32:21 We should probably check that first then 19:33:26 JAA: How do we check if the job was successful? 19:34:11 Browsing in the WBM, I guess, but not sure it's there yet. IA's upload processing is slow. 19:34:32 Or poking around on the site to find things that are problematic with JS disabled etc. 19:35:03 someone could maybe contact GamersNexus and let them know 19:35:35 Lemme see if we uploaded that WARC yet 19:36:44 The WARCs are all uploaded. 19:38:35 I think the last item(s) might still be deriving. 19:38:54 And then there's another slight delay between derives and it showing up in the WBM, at least sometimes. 19:39:23 It's starting to show up on wbm 19:40:17 No surprises there, the first WARCs were uploaded over a week ago. 19:40:27 The IA's search is just useless as per usual 19:41:57 https://archive.fart.website/archivebot/viewer/job/20240901213047bvqa8 19:42:37 Images (that were linked rather than embedded) were run in a separate job. 19:43:12 https://archive.fart.website/archivebot/viewer/job/202409092003491pjfi 19:43:19 That's definitely not in the WBM yet. 19:45:34 Did we ever reenable auto upload? 19:46:53 If not, we should 20:04:40 okay... 20:04:58 (Not yet) 20:07:10 https://transfer.archivete.am/inline/pXoXh/swcdn.apple.com-missing.txt these files still exist on Apple's CDN, and are not on WBM; Safari is ~180MB, BridgeOS is ~500MB, InstallAssistant is ~12GB 20:07:20 I assume this is a bad time to AB due to the upload problems 20:08:29 (and emergency Russian Wix jobs) 20:18:28 I'm now looking at those that *are* on WBM to see whether they are actually failed captures 20:20:56 nicolas17: the upload problems still exist? 20:21:07 idk, #archivebot topic says so 20:24:40 It'll be fine tomorrow or so 20:24:45 AB uploads are still being done by JAA, our mechanical turk, at the moment 20:24:54 01:49 PM <@eggdrop> <@JAA> Normal operations wrt uploads should probably resume either tonight or tomorrow anyway. 20:25:08 Yes. 20:25:42 We've nearly cleared out 6T of backlog on atr3 so we have the iops again for AB 20:50:33 uhh ok I have an urgent one 20:51:02 https://swcdn.apple.com/content/downloads/02/18/052-96238-A_V534Q7DYXO/lj721dkb4wvu0l3ucuhqfjk7i5uwq1s8tz/InstallAssistant.pkg this URL works intermittently depending on which CDN server I hit, I guess it was deleted from their origin 20:51:24 not sure how to deal with that; AB it, and if it gets 403, try again? 20:51:44 Yeah, that'll probably work 20:54:03 (I probably have the file content but that won't get us a WARC) 20:54:22 nicolas17: I'm grabbing it with grab-site, working from there. 20:58:07 I guess there's also a chance that the cdn cache has a partial file, and then it will die halfway 20:59:40 Nope, download just finished without issues. 20:59:57 12407486945 bytes 21:00:19 repeatedly trying locally, in between a lot of 403s, I got some 200s that failed after ~15MB 21:03:43 "nobody expects archiveteam scale..." <- when AT goes to eleven its really big hoses for pumping out the data.... 21:04:12 (and the fun starts once the banhammers are flying and get dodged) 21:05:15 > Date: Wed, 11 Sep 2024 13:06:10 GMT 21:05:17 Interesting 21:05:33 I guess they cache that header, too. 21:05:46 lol 21:05:58 isn't that against spec? 21:06:31 masterx244|m: I remember the time we accidentally'd hetzner cloud's backbone 21:14:36 nicolas17: I'm actually not sure. It's the 'date and time at which the message was originated' per RFC 9110. It then references RFC 5322 about 'Internet Message Format (IMF)', whatever that is. Sounds email-like. And that specifically mentions: 21:14:45 > it is specifically not intended to convey the time that the message is actually transported, but rather the time at which the human or other creator of the message has put the message into its final form, ready for transport. 21:15:19 But is that even the HTTP response's final form if the CDN then updates the Age header and a bunch of other things? 21:15:29 ¯\_(ツ)_/¯ 21:16:22 it doesn't matter, it's over 9000 21:16:31 :-) 21:16:38 also yeah it's the email Date header that it's referencing. 21:17:14 I would say that it should be the time of the original response, since that's the "message"; both email and HTTP assume the headers will have been modified along the way (i.e. adding Received, Return-Path) 21:17:18 TIL the official name of an email. 21:17:42 Hmm, yeah, that makes sense. 21:18:13 HTTP's analogies to MIME/email was wrong all along 21:18:14 of course there could be another argument that the spec is saying it should be the same as Last-Modified or perhaps file-birthtime :P 21:18:27 nicolas17: indeed 21:20:06 And then WARC was heavily based on HTTP including all its flaws in the old RFC. 21:20:26 I wonder how many caches do it which way 21:20:49 JAA: imagine chunked encoding at the WARC-record level /o\ 21:21:00 nicolas17: You mean segmented records? 21:21:16 Although at least they're not terminated by an empty record. 21:21:17 I regret my comment 21:21:28 Also I'm not sure any software out there properly supports them. 21:21:44 But yes, it is a thing. D: 21:22:41 There are some nice use cases, actually. Like splitting up huge responses into multiple WARCs. Or streaming to WARC without first buffering the full response. 21:23:00 It's just that nobody seems to support reading such data, so it's not used on writing either. 21:43:52 aiui Last-modified is about the body while Date is about generation of the message itself regardless of what transformations proxy/cache might further apply; e.g. at a different date there might be a different set of representations available at the origin and so the content negotiation might play out differently, last-modified value of any particular representation is independent of that 21:44:40 after all other kinds of Internet messages would also get Received header prepended or their Path header updated etc 21:45:32 I always thought Date was simply the current time of the server, used to compensate for clock skew when looking at Last-Modified and Expires and such 21:46:19 and so did the authors of httpsdate and probably most people in general 21:50:47 the question is how that works with caches (and reverse proxies in general) :) 21:51:43 the cache might not have the same clock skew as the origin after all