-
JAA
gitorious.org is back online.
-
astrid
lol when someone posted gitorious dot org in here like fifteen clients fetched the page from the server in the same minute
-
astrid
(idk if it was here exactly but who else)
-
JAA
Yeah, it was here, and it's the people who use The Lounge and have automatic fetching enabled. Causes trouble in #archivebot all the time with big !ao < URLs.
-
JAA
Or well, it caused trouble in the past and is now prevented.
-
JAA
Also, hi :-)
-
astrid
hi
-
astrid
so yeah, uh, its back on line, should be a bit less fiddly than before
-
astrid
used to be some special snowflake bullshit, now it's just Another Ceph Volume
-
JAA
Lovely, thanks a lot!
-
astrid
bundle per repo is the long term plan, just ... that's a fuckload of bundling
-
astrid
bundle per repo, ia item per username (so each item would have somewhere between one repo, and a few gigabytes of bundles in it)
-
JAA
Yeah, and it has to be done right to avoid missing certain edge cases of branches.
-
JAA
I.e. --all
-
astrid
if you want i can give you a ssh login to the server so you can start experimenting
-
astrid
the thing is
-
astrid
the volume is extremely hardlink-dense
-
astrid
because when you forked a repo it would do on the serverside "git clone" but not copy the data, because no need
-
JAA
Yeah, makes sense.
-
astrid
so heavily-forked repos have a shitload of hardlinks
-
astrid
have also been considering tarfile-per-username, which would not need weird fuckery and would preserve the hardlinks
-
astrid
because e.g., if 'astrid' forked 'jaa/repo.git' the newly-forked repo would be called 'jaa/astrid-repo.git'
-
JAA
How would that work when the forks are under different usernames?
-
JAA
Oh
-
JAA
lol
-
astrid
it's Weird
-
astrid
also um
-
JAA
Any idea what the size would roughly be without the hardlink dedupe?
-
astrid
gitorious stored everything in random hex named directories
-
astrid
before handover they tried to rename everything to usable names following the scheme i metioned earlier
-
astrid
but ran into some issues
-
astrid
so there's a mix
-
JAA
What a surprise.
-
astrid
anyway i made a directory on a different filesystem with symlinks named like what they were hoping to accomplish
-
astrid
e.g.:
-
astrid
$ ls -ld z3:z3.git geeqie:g2p-geeqie.git
-
astrid
lrwxrwxrwx 1 root root 49 Jun 30 2015 geeqie:g2p-geeqie.git -> /mnt/gitorious/repositories/geeqie/g2p-geeqie.git
-
astrid
lrwxrwxrwx 1 root root 74 Jun 30 2015 z3:z3.git -> /mnt/gitorious/repositories/b37/715/d301bef079361b5dfa4d90392c50465bd1.git
-
astrid
and then there's a apache rule to transform "/z3/z3" to whatever makes cgit look inside of "/srv/gitorious/repositories/z3:z3.git/"
-
astrid
just a bunch of dumb unix shit
-
JAA
So regarding bundling it all up, I don't have any intentions of tackling this in the near term, but maybe sometime early next year would be nice. I want to get that general project up and running anyway. Still needs a fair bit of work though, and I keep distracting myself with stuff like collecting billions of YouTube video IDs...
-
astrid
sure lmk whenever
-
astrid
shit isn't going anywhere
-
JAA
:-)
-
astrid
i have zero, zero idea how big this would be after expansion
-
astrid
i did a randomly chosen trial on geeqie/* earlier today, which was 75 megs on disk and became about 180 megs after duplication, the pile of bundles was about 120 megs
-
astrid
the whole filesystem is 4.4T
-
astrid
i've been estimating (pessimistically) a 4x expansion
-
JAA
Alright, not too horrible, but quite a bit, yeah.
-
JAA
Simple per-repo bundles are probably not the best option though. My thing will handle deduping at the commit level, but that obviously means that things are spread across multiple bundles.
-
astrid
fwiw it is on owned hardware in a datacenter and bandwidth charges are not an issue
-
JAA
That sounds good. Maybe we can just slowly run it through that then once it exists.
-
astrid
ha shit, it's a whole 3ms from www.archive.org
-
JAA
Nice
-
PyryV
I'm sorry, i don't know if this is the right place but what is the image address for youtube-dislike? I want to run a docker image.
-
rewby
PyryV: The image you're looking for is atdr.meo.ws/archiveteam/youtube-dislikes-grab:latest and it's better to ask in #down-the-tube for questions on that project.
-
OrIdow6