04:22:51 https://gitorious.org/ is back online. 04:30:02 lol when someone posted gitorious dot org in here like fifteen clients fetched the page from the server in the same minute 04:30:14 (idk if it was here exactly but who else) 04:30:40 Yeah, it was here, and it's the people who use The Lounge and have automatic fetching enabled. Causes trouble in #archivebot all the time with big !ao < URLs. 04:31:08 Or well, it caused trouble in the past and is now prevented. 04:31:47 Also, hi :-) 04:31:57 hi 04:32:17 so yeah, uh, its back on line, should be a bit less fiddly than before 04:32:41 used to be some special snowflake bullshit, now it's just Another Ceph Volume 04:33:03 Lovely, thanks a lot! 04:33:24 bundle per repo is the long term plan, just ... that's a fuckload of bundling 04:34:07 bundle per repo, ia item per username (so each item would have somewhere between one repo, and a few gigabytes of bundles in it) 04:34:18 Yeah, and it has to be done right to avoid missing certain edge cases of branches. 04:34:25 I.e. --all 04:34:38 if you want i can give you a ssh login to the server so you can start experimenting 04:34:43 the thing is 04:34:56 the volume is extremely hardlink-dense 04:35:13 because when you forked a repo it would do on the serverside "git clone" but not copy the data, because no need 04:35:31 Yeah, makes sense. 04:35:33 so heavily-forked repos have a shitload of hardlinks 04:36:06 have also been considering tarfile-per-username, which would not need weird fuckery and would preserve the hardlinks 04:36:36 because e.g., if 'astrid' forked 'jaa/repo.git' the newly-forked repo would be called 'jaa/astrid-repo.git' 04:36:36 How would that work when the forks are under different usernames? 04:36:42 Oh 04:36:43 lol 04:36:45 it's Weird 04:37:52 also um 04:37:54 Any idea what the size would roughly be without the hardlink dedupe? 04:38:12 gitorious stored everything in random hex named directories 04:39:24 before handover they tried to rename everything to usable names following the scheme i metioned earlier 04:39:33 but ran into some issues 04:39:35 so there's a mix 04:39:44 What a surprise. 04:40:05 anyway i made a directory on a different filesystem with symlinks named like what they were hoping to accomplish 04:41:35 e.g.: 04:41:37 $ ls -ld z3:z3.git geeqie:g2p-geeqie.git 04:41:39 lrwxrwxrwx 1 root root 49 Jun 30 2015 geeqie:g2p-geeqie.git -> /mnt/gitorious/repositories/geeqie/g2p-geeqie.git 04:41:41 lrwxrwxrwx 1 root root 74 Jun 30 2015 z3:z3.git -> /mnt/gitorious/repositories/b37/715/d301bef079361b5dfa4d90392c50465bd1.git 04:43:00 and then there's a apache rule to transform "/z3/z3" to whatever makes cgit look inside of "/srv/gitorious/repositories/z3:z3.git/" 04:43:28 just a bunch of dumb unix shit 04:43:29 So regarding bundling it all up, I don't have any intentions of tackling this in the near term, but maybe sometime early next year would be nice. I want to get that general project up and running anyway. Still needs a fair bit of work though, and I keep distracting myself with stuff like collecting billions of YouTube video IDs... 04:43:37 sure lmk whenever 04:43:44 shit isn't going anywhere 04:43:48 :-) 04:45:54 i have zero, zero idea how big this would be after expansion 04:46:41 i did a randomly chosen trial on geeqie/* earlier today, which was 75 megs on disk and became about 180 megs after duplication, the pile of bundles was about 120 megs 04:48:46 the whole filesystem is 4.4T 04:49:14 i've been estimating (pessimistically) a 4x expansion 04:51:49 Alright, not too horrible, but quite a bit, yeah. 04:53:01 Simple per-repo bundles are probably not the best option though. My thing will handle deduping at the commit level, but that obviously means that things are spread across multiple bundles. 04:53:45 fwiw it is on owned hardware in a datacenter and bandwidth charges are not an issue 04:54:19 That sounds good. Maybe we can just slowly run it through that then once it exists. 04:58:43 ha shit, it's a whole 3ms from www.archive.org 04:59:58 Nice 17:20:43 I'm sorry, i don't know if this is the right place but what is the image address for youtube-dislike? I want to run a docker image. 17:25:23 PyryV: The image you're looking for is atdr.meo.ws/archiveteam/youtube-dislikes-grab:latest and it's better to ask in #down-the-tube for questions on that project. 17:51:30 Bob Dole died https://www.politico.com/news/2021/12/05/bob-dole-republican-presidential-nominee-advance-obit-033611