-
Hans5958
Freenom is a shitter
-
HP_Archivist
How would one go about archiving this page that hosts what appears to be scanned magazine or article pages, but only allows one download per file individually vs all at once?
-
HP_Archivist
-
joepie91|m
I don't suppose someone here knows a way to stash 500TB of historical package builds for cheap? ref.
discourse.nixos.org/t/the-nixos-fou…sts-require-community-support/28672
-
icedice
joepie91|m: 4 x 150TB Hetzner dedis resold by hostingby.design (formerly walkerservers/seedbox.io):
hostingby.design/dedi-hetz
-
joepie91|m
hm, what are they offering beyond what hetzner offers?
-
icedice
4 x 164,95€/month = 655,80€/month
-
icedice
Cheaper prices
-
joepie91|m
how's that work o_O
-
icedice
Volume discounts, I think
-
icedice
The owner is solid though
-
icedice
walkerservers/Seedbox.io has a great reputation
-
joepie91|m
mm
-
» joepie91|m will keep it in mind
-
icedice
-
icedice
^ For comparison
-
icedice
Hetzner charges 247,28€/month for 160TB
-
joepie91|m
right
-
icedice
And 404,36€/month for 224TB
-
icedice
Hmm
-
icedice
Seeing as the specs don't match, 150TB vs 160TB, I guess Hostingby.design co-locates with Hetzner and builds their own servers?
-
icedice
Not sure about volume discounts at Hetzner btw, but I know they at the very least have volume discounts at Leaseweb NL
-
icedice
And that's why their Leaseweb dedis have that good pricing
-
icedice
They're also pretty privacy-friendly
-
joepie91|m
that's less of a concern for this case :)
-
icedice
Yeah, I know
-
icedice
I'm just saying
-
joepie91|m
right
-
icedice
They seem pretty nice
-
icedice
Normally with Hetzner I would recommend tunneling it through a reverse proxy at some more chill hosting provider like BuyVM, but I guess NixOS doesn't have any major trolls in their community
-
icedice
Since Hetzner will respond to bogus complaints made by anyone and take servers down for it
-
joepie91|m
I wouldn't expect trouble with that in this case yeah
-
icedice
Yeah, it'll probably be all right
-
icedice
Does GitHub and GitLab have any limits?
-
icedice
Since it's open source software it should be possible to host it there
-
icedice
Would be a pain in the ass to upload 500TB of releases there (assuming they'd allow it), but there might be some way to automate it?
-
icedice
There's also SourceForge
-
spirit
google drive *ducks and hides*
-
icedice
Anything over 100TB will require manual approval from Google
-
imer
spirit: not anymore unfortunately
-
imer
5TB/user
-
spirit
i know, i was kidding
-
spirit
sorry
-
imer
you're fine :)
-
joepie91|m
icedice: anything that might violate ToS is pretty much off the table unfortunately
-
imer
just dealing with that migration currently lol they send me two emails a day
-
joepie91|m
this includes things that are reliant on goodwill of interpretation :)
-
imer
"hey did you know you're over the storage limit?"
-
joepie91|m
(as that's what I'd like to get away from)
-
imer
joepie91|m: it really depends on how valuable the data is, i'd say colo a box (or two for redundancy) with a bunch of drives (raw you'd need like 28x18tb, so like 35x18tb with 8+2 raidz2 vdevs? - thats easily doable) if you can tolerate data => gone in the unlikely event of the dc exploding
-
imer
higher up-front cost, but then you're only paying for colo/the odd replacement drive
-
imer
of course only if there's sysadmin knowledge, probably a bad idea otherwise
-
joepie91|m
colo is not really an option for organizational constraint reasons
-
joepie91|m
(I wouldn't be the one managing it)
-
imer
yeah, makes sense
-
joepie91|m
it'll probably be possible to argue for some software-level maintenance but once it requires someone to show up somewhere physically in person it'll very likely not be considered as an option :p
-
hexa-
I have not read nixpkgs on my screen yet, but
-
hexa-
at, there it is :D
-
FireFly
hehe
-
hexa-
shoudl we at some point go the self-host route I would hope to scale horizontally, not vertically
-
hexa-
being able to do maintenance without really affecting anyone
-
hexa-
loosing a machine, and not having to spend the night to fix shit
-
hexa-
and I remember that when I started using nixpkgs in 2019 it was said that the cached was between 200 and 250 TB
-
hexa-
so growth is another issue that hasn
-
icedice
<joepie91|m> icedice: anything that might violate ToS is pretty much off the table unfortunately
-
hexa-
't been talked about much yet
-
icedice
Yeah, it would have to be according to ToS obviously otherwise you put the data at risk
-
icedice
But I was wondering if GitHub/GitLab/SourceForge actually have any storage limits in place
-
icedice
Most project probably don't take up that much storage space
-
hexa-
we only host the nixpkgs.git with github, and that is much smaller
-
hexa-
the 450TB is source tarballs that we likely can'
-
hexa-
t get back anymore, and build results from the last 10y or so
-
nicolas17
hexa- joepie91|m: how do I download some files from there?
-
joepie91|m
nicolas17: huh?
-
nicolas17
I want to get some package builds to see what the data is like, but there's no file listing or anything like that :)
-
joepie91|m
ah
-
joepie91|m
that's uh, slightly nontrivial
-
joepie91|m
easiest way is to pick out something from hydra.nixos.org, look up the narinfo hash which I believe is in there, and go from there
-
joepie91|m
then cache.nixos.org/<narhash>.nar or .nar.xz I believe
-
joepie91|m
it contains references to other stuff
-
joepie91|m
generally speaking each build has a .nar (package metadata), .ls (internal file listing), and then a tarball containing the actual package
-
hexa-
actually .narinfo I think
-
joepie91|m
oh
-
joepie91|m
hm, right
-
joepie91|m
yeah please apply a "from memory" disclaimer to all of the above :p
-
hexa-
-
hexa-
that's firefox 113.0.2 from nixos-23.05
-
hexa-
it references URL: nar/17v5jakk2aj09njy4w5v5lmwqsnd17hqv6wyvphkwywj16w17b0b.nar.xz
-
hexa-
and that is the url below cache.nixos.org again
-
nicolas17
urgh
-
nicolas17
hexa-: that one is actually firefox debug symbols
-
nicolas17
which are internally compressed :/
-
» nicolas17 hunts for the actual binary
-
hexa-
oh, sorry :D
-
nicolas17
there we go, 08709hdrbqqba4l9zgsi0kqmlklznym1fl3n82s8i9rwga3bggnm.nar 11a4ymjgl6kpmw1yxggr4mvk3835md3wsdh7yhskxgbafczayhmb.nar
-
hexa-
95w0f7cvwrf195j2d83fplzcyykrnq9i
-
hexa-
is the one for firefox
-
JAA
500 TB, wow. All of Debian package history is like 150.
-
joepie91|m
-
joepie91|m
:p
-
JAA
Yeah, fair, lol
-
JAA
Are there any similar statistics for data size somewhere?
-
nicolas17
I think some deltaing or deduplication would be possible, but so far it seems it will need significant amounts of CPU and may not save enough disk :/
-
ehmry
I'd like to know if there is a way to drop the builds and keep the cached sources
-
ehmry
I suppose that might involve traversing the entire nixpkgs git history to find the tarballs
-
nicolas17
I can take the firefox-unwrapped-113.0.2 .nar.xz (60MB), apply a delta to it (15MB xdelta3, or 3.5MB bsdiff), compress it back with xz -6, and get a bit-identical result to the original firefox-unwrapped-113.0.1 .nar.xz
-
nicolas17
but the recompression takes 2 minutes so if you do it on the fly the user gets a 500KB/s download while the server is fully using a CPU core
-
nicolas17
missed a step while editing my message :P of course I decompress the .xz to apply the delta
-
ehmry
I've done some experiments and EROFS images are good at being deduped at a block level. in EROFS runs of compression don't go longer than a fixed block size
-
JAA
icedice, joepie91|m: One note on Hetzner, you can usually get cheaper deals through the auction servers. As it happens, I just went over that the other day. At the time, they had a server with 15x10TB at €1.22/TB/mo with a raidz3-type setup.
-
ehmry
EROFS isn't designed as an archive format though
-
nicolas17
ehmry: block deduplication would be great if there's a tar or nar with some changed files and some identical ones
-
nicolas17
but firefox is mostly large amounts of code, and recompiling a binary with small code changes can cause lots of tiny changes all over the file
-
nicolas17
the two .nar.xz I tried add up to 121MB, decompressing them and storing them in borg (block deduplication + compressing the blocks with lzma) takes 112MB
-
nicolas17
storing one .nar.xz + the bsdiff takes 64MB, at significant CPU cost :/
-
nicolas17
ehmry: actually even with erofs it'd eat CPU since you have to recompress the .nar to get the bit-identical .nar.xz back
-
nicolas17
and *nothing* will be able to dedup/delta (erofs, borg, restic, xdelta3, bsdiff...) if you don't unxz first
-
nicolas17
hexa-: "we lose the historical data because AWS egress fees are enormous, have we considered using AWS Snowmobile?" lol did this guy not see the specs?
-
nicolas17
snowmobile is for transferring 100PB
-
nicolas17
AWS tells you "that will cost you $1M" and you say "wow for this much data that's so much cheaper than my alternatives"
-
immibis
then you use snowball which is smaller
-
nicolas17
exactly
-
immibis
(Snow Family is a stupid name, CMV)
-
immibis
(also what the fuck is the point of the products that put EC2 instances at your site)
-
nicolas17
snowmobile is absolutely overkill for this data, it fits in 2 snowballs
-
fireonlive
when they released the aws snowball i laughed for a solid 5 minutes
-
fireonlive
no reason of course
-
nicolas17
fireonlive: did you see the snowmobile announcement video?
-
immibis
snowball used to be just a big hard drive but now they let you use it for compute and they sell different mixes of compute and storage... why??
-
fireonlive
oh is that the semi trailer lmao
-
JAA
Shipping container of hard drives, yup.
-
nicolas17
fireonlive: yep
-
immibis
snowmobile is the semi trailer, snowball is a big briefcase, i think they added a mini one that is just an external hard drive
-
JAA
'Snowcone' for the small one.
-
nicolas17
immibis: the mini one has mini EC2 instances too
-
fireonlive
ah yes!
-
immibis
nicolas17: whyyyyyyyy
-
fireonlive
protip to execs: when namings things; pop it and close variants into urban dictionary
-
nicolas17
fireonlive:
youtube.com/watch?v=8vQmTZTq7nw when they rolled said semitrailer into the stage
-
fireonlive
'successful startups, given the way they collect data
-
fireonlive
<_<
-
fireonlive
ahhh right this lmao
-
nicolas17
unproven startups paying for data storage with investor money*
-
fireonlive
yeee
-
fireonlive
i do wonder the engineering behind the stage to support a semi
-
fireonlive
oh it's not on the stage
-
nicolas17
also I'm sure it's empty
-
fireonlive
ah ye true
-
nicolas17
that's an empty container, not full of 100PB disks :P
-
fireonlive
if there's one line that will always go up no matter what it's AWS' pricing :3
-
nicolas17
I actually think it's very rare that they raise prices
-
immibis
i feel like it would've been more entertaining if they'd had the truck come onto the stage without any fanfare, like just sneaking up behind the presenter while he's talking
-
JAA
Or if the container had randomly collapsed while given a pat or something. Oh wait, wrong company.
-
nicolas17
-
immibis
nicolas17: do you understand the point of the AWS on-premises stuff?
-
nicolas17
afaik most of cloud stuff is about moving things from the cap-ex budget line to the op-ex budget line
-
immibis
so basically corporate bullshit
-
immibis
can I get rich quick if I just buy servers for companies and have them pay monthly?
-
nicolas17
"it costs more in the long run" lol imagine caring about the long run
-
nicolas17
immibis: the number of VPS companies in existence seems to imply yes
-
joepie91|m
lol
-
joepie91|m
yeah that's just "being a hosting provider" really
-
joepie91|m
it's profitable if it's managed
-
nicolas17
but what do I know
-
nicolas17
they say running a company is like raising a child
-
nicolas17
and I want a vasectomy
-
joepie91|m
margins on unmanaged are razor thin though
-
immibis
nicolas17: they're providing flexibility, though - I mean literally just buy-now-pay-later for servers. Is the industry really that stupid?
-
immibis
(later = every month for the depreciation life of the server)
-
FireFly
the snow- naming doesn't make much sense, but.. I always found it a lil cute tbh
-
immibis
it's a snowball because it's cold storage, I guess
-
FireFly
I guess
-
immibis
or maybe they were thinking of continuously changing data and you are uploading a frozen snapshot
-
nicolas17
Backblaze's box is called Fireball
-
immibis
deezballs
-
immibis
nicolas17: what was the point of this link
youtube.com/watch?v=8_Xs8Ik0h1w&t=3225s
-
FireFly
that collage of icons just reminds me of the aws logo quiz
-
fireonlive
*hard drives just spill out of the back of the snowmobile*
-
fireonlive
hm, so much 'direct to inbox' phishing from "trustwallet" for me lately
-
fireonlive
cashin' google slippin'
-
hexa-
nicolas17: at this point in time alot of weird ideas are being floated by random people
-
hexa-
people I have never heard of in the nixpkgs context
-
hexa-
imo the most viable paths are going with b2 or r2 and their solution to kill the egress fees
-
hexa-
b2 would mean we could keep fastly around
-
hexa-
not sure how that would work with cloudflare in the picture
-
hexa-
ultimately we should selfhost that shit for cost reasons
-
nicolas17
immibis: werner saying "we have a lot of services but it's your fault"
-
hexa-
hydra already sits at hetzner, so why shouldn't the cache as well?
-
hexa-
instead it sits at us-east-1 and we're pushing traffic over the fucking atlantic
-
hexa-
after receiving it from builder who are also located in locations like dallas or washington
-
nicolas17
I'm working on that deduplication stuff anyway, for Apple archival reasons :P
-
nicolas17
I know someone with upwards of 50TB of Apple updates stored in a NAS and it should be possible to shrink that number a *lot*
-
hexa-
the primary thing to find out is … who has tons of experience with an s3 compatible object store in that weight class
-
hexa-
-
hexa-
this guy is alot smarter than me and always worth a read tbh
-
hexa-
think he talked about fastcdc
-
hexa-
> FastCDC: A Fast and Efficient Content-Defined Chunking Approach for Data Deduplication
-
nicolas17
yeah but again it depends on having decompressed files
-
joepie91|m
they definitely know what they're talking about, but they also definitely have a very... CDN sysop perspective :p
-
joepie91|m
like I mentioned in my reply, it's all good and well that you need specific infrastructure characteristics for optimal performance, but if we can't afford it then we can't afford it
-
joepie91|m
this is something that I often run into with people who work a lot with AWS-y stuff - their entire way of reasoning about infrastructure is super strongly focused on the exact technical properties provided by those specific systems, with kind of the assumption of unlimited budget and no apparent experience with how to get the most out of a shoestring budget
-
FireFly
the approach flokli mentioned experimenting with is interesting too (git-style tree/blob storage with deduplication, synthesizing nar files on demand)
-
nicolas17
and taking Firefox out of the deduplicated data store and compressing it to produce a file identical to the current .nar.xz has a 500KB/s output on my laptop...
-
joepie91|m
often presenting the 'perfect approach' as if it's the only acceptable approach
-
joepie91|m
(this is not specific to that user; it's something I encounter a lot)
-
hexa-
nicolas17: good point
-
hexa-
FireFly: tvix when
-
hexa-
sorry :D
-
nicolas17
for archival that may be good enough
-
hexa-
and they'd likely want something offering an s3 api again
-
nicolas17
but if you want decent latency then no
-
hexa-
so minio, garage, ceph, etc.
-
FireFly
hexa-: I mean yeah, that's the problem :p not as a replacement today, but maybe in the medium-longer term
-
FireFly
(tvix-store-based substituter I mean)
-
hexa-
yeah, I said what I think about what should happen next
-
hexa-
r2 or b2
-
hexa-
we need to leave s3
-
nicolas17
eg. my friend with the Apple archive often has his NAS turned off, so if he needs to wait for it to boot up and spin up the disks etc, he's already not particularly caring about TTFB :P
-
hexa-
even paying 32k USD once beats staying for 4 months
-
hexa-
hah, yeah :D
-
joepie91|m
hexa-: I think we can make the egress much lower
-
joepie91|m
by exploiting the lightsail loophole
-
joepie91|m
does require a bit of elbow grease
-
nicolas17
joepie91|m: that's explicitly in the ToS as "don't", but also doesn't it give you like 1TB free?
-
joepie91|m
oh, it is?
-
hexa-
both r2 and b2 sound like they waive the egress fees some way
-
joepie91|m
and well I was thinking of the $5/mo plan actually
-
joepie91|m
times N
-
hexa-
yeah, fcking aws like that would agree with me
-
nicolas17
"51.3. You may not use Amazon Lightsail in a manner intended to avoid incurring data fees from other Services"
-
joepie91|m
bleh
-
hexa-
which means they know their data fees are fucked up
-
joepie91|m
oh of course
-
hexa-
q.e.d.
-
joepie91|m
it's literally their business
-
joepie91|m
egress is where AWS/Azure/GCP/etc. turn the juicy profits, by design
-
immibis
nicolas17: i did an experiment with deduplicating minecraft modpacks and they deduplicate very nicely. The sum total of all modpacks in that collection deduplicated from about 400GB down to about 5GB, just with per-file deduplication and no diffing similar files. (It helps that every Java class is a separate file)
-
joepie91|m
because it's the one metric that a) almost nobody can accurately estimate, b) overages are a thing for, and c) nobody thinks about until it appears on the invoice
-
fireonlive
egress fees are... egre....gious ... :D
-
hexa-
thanks :D
-
immibis
chunking might not be the best strategy. If you have a version order to go by, it may make more sense to diff the same files from adjacent versions
-
hexa-
I agree that conjuring up the .narinfo file would sound very sweet
-
immibis
(binary diff obviously)
-
immibis
PSA for small AWS users: as weell as the 100GB free egress per month, you have an additional free 1000GB if you use cloudfront, even if you don't actually need cloudfront
-
nicolas17
immibis: yeah, I could do diffing or I could do content-defined chunking... but that's not the hard part
-
nicolas17
the hard part is reproducible recompression
-
nicolas17
need to extract zip files such that I can put the bitwise-identical zip back together later, and do the deltas or cdc on the extracted files
-
immibis
I did that. My implementation is really horrible and slow, though.
-
immibis
I noticed that most files in my sample are LZ-compressed the obvious way using the most recent match, but some have "sub-optimal" matches all over the place, which I guess could be from a fancy algorithm designed to optimize the LZ+huffman together instead of each one separately, or a compressor with memory limitations
-
nicolas17
oh that's awful
-
nicolas17
in my case I can just use zlib
-
immibis
deflate is LZ77+huffman
-
nicolas17
yeah I mean, zlib produces an identical result, vs having to match the exact behavior of a different deflate implementation
-
immibis
my implementation of reproducible LZ was to see how many of the next bytes could be encoded as one symbol (literal or back-reference), then note all possible ways to encode that symbol, then write which one appears in the compressed data. Usually this just gives you a long string of 0s; sometimes it doesn't.
-
immibis
(0 because of course you order them so the most optimal one is number 0)
-
immibis
nicolas17: if you know they were compressed by a certain version of zlib with certain settings, you just embed that
-
nicolas17
yep
-
nicolas17
...and then deal with the dmg inside the zip (which in most versions is stored uncompressed in the zip because it has its own block-wise zlib)
-
nicolas17
or nowadays lzfse
-
nicolas17
and make up a file format to store those layers and the instructions on how to recompress them
-
immibis
zip container is easy, you just extract all the parts that are not the file data, and store them separately, and then recurse into the file data, since you assume the file size is large compared to the metadat size
-
icedice
JAA: Right, I forgot about that
-
icedice
Yeah, true
-
icedice
I'd rather deal with hostingby.design than Hetzner though
-
icedice
But whatever gets the best pricing
-
icedice
joepie91|m: I'm pretty sure you don't want to delve into the decentralized hosting rabbithole, but I figured I'd send this anyway because why not:
-
icedice
-
icedice
-
icedice
^ File hosting site using Sia and their requirements from hosters
-
joepie91|m
I'm interested in decentralized storage mechanisms, but not in cryptogrifts :)
-
nicolas17
was Sia the one where the admin defended proof-of-work and "incompatible hashing algorithm to ensure mining hardware for other blockchains can't be reused"?
-
nicolas17
"i wonder if Apple is having to pull icons from popular reddit clients from WWDC slides -- apollo has found its way in them frequently in the past years" oh no
-
fireonlive
i miss live apple keynotes
-
nicolas17
gonna be a busy monday for me
-
nicolas17
there's already 2 things I found in server-side config files that I won't know the *meaning* of until I dig into the iOS 17 beta
-
fireonlive
oooh
-
fireonlive
too bad one of them isn't <iMessageLessShit>true</...
-
fireonlive
:p
-
nicolas17
I swear they're using obscure abbreviations on purpose
-
nicolas17
home-rmvfsumbom=10.3.1
-
nicolas17
home-rmvfomdmwosu-internal=10.6
-
immibis
the only good proof-of-work is RandomX
-
imer
"rmvfomdmwosu" thats just someones cat walking across the keyboard
-
nicolas17
one of my theories is "required minimum version for something something something without software update"
-
immibis
(it's designed so that a CPU *is* essentially a RandomX ASIC, and only small gains are possible by specializing it for RandomX)
-
fireonlive
hm
-
immibis
mdm = mobile device management?
-
nicolas17
unlikely because there's rmvfoodmwosu and rmvfordmwosu too (single letter changed)
-
nicolas17
-
fireonlive
my first thoguht was 'matter' because of home(kit) but eh probalby not
-
nicolas17
home- likely refers to Apple Home stuff yeah
-
fireonlive
can i zoom forward to when everything uses this already:
csa-iot.org/all-solutions/matter lol
-
nicolas17
the other is in Apple Maps, there's a config file in protobuf format, and to get names for the protobuf fields I have to decompile the code, new fields already appeared that iOS 16 doesn't have names for
-
fireonlive
sadly i don't think they're collabing on 'cast <video/audio/screen> to device Y'
-
nicolas17
95 { 1: 1684690460826; 2: 0; }
-
nicolas17
-
fireonlive
:o
-
nicolas17
gspe35-ssl.ls.apple.com/geo_manifes…namic/config?os=ios&os_version=17.0 who's up for archiving this daily for every x.y version into WARCs? :)