#archiveteam-bs

01:26

xarph_

ah yes the wayback machine the thing that google doesn't link to in search results
02:12

nicolas17

how to archive a github repo?
02:13

thuban

nicolas17: ask in #gitgud
03:23

pabs

nicolas17: and save it to Software Heritage using their Save Code Now form, or API
03:24

nicolas17

seems SH has it and it's up to date
03:27

pabs

curl -D /dev/tty -X POST "archive.softwareheritage.org/api/1/origin/save/git/url/$url" | jq
03:28

pabs

ah good, they auto-update all of GitHub and other things on archive.softwareheritage.org/coverage
08:14

Girish

Hi archiveteam, I trying to decompress warc.zst file which I have downloaded to my local. May I know where can I find the DICT for it? Thanks
08:15

kpcyrd

I wish softwareheritage had a version of snapshot.debian.org that actually works
08:17

pabs

they imported all of snapshot.debian.org already
08:18

kpcyrd

how do I access it? see eg lists.debian.org/debian-devel/2023/08/msg00014.html
08:20

thuban

Girish: it's in a skippable frame at the beginning of the file iipc.github.io/warc-specifications/specifications/warc-zstd
08:21

thuban

you will probably find gitea.arpa.li/JustAnotherArchivist/…hings/src/branch/master/zstdwarccat helpful
08:25

pabs

kpcyrd: I think via the usual archive.softwareheritage.org site
08:25

pabs

it of course doesn't contain any binary packages
08:27

Girish

Thank thuban I tried the zstdwarccat. It does stdout. I was looking for organized folders of each url ...
08:28

Girish

The actual archive file is a megawarc.warc.zst
08:30

thuban

the output of zstdwarccat is the uncompressed warc file. you can then use another tool to extract the contents
08:32

thuban

(wiki.archiveteam.org/index.php/The_WARC_Ecosystem)
08:38

Girish

thuban: Does zstdwarccat create any warc files or is it just stdout?
08:39

thuban

it's just stdout, so you can use a shell redirect: `zstdwarccat input.warc.zst > output.warc`
09:43

kpcyrd

pabs: idk, seems incomplete 🤷 archive.softwareheritage.org/browse…h_content=true&search_metadata=true
09:43

pabs

<pabs> it of course doesn't contain any binary packages
09:44

kpcyrd

what did they import then?
09:44

pabs

source packages
09:49

kpcyrd

not a snapshot.debian.org replacement then. There's also snapshot-cloudflare.debian.org but unless you're lucky enough to get a cache hit you're still stuck with the 504 prone snapshot service
09:49

kpcyrd

essentially if you can't pull the file from snapshot.debian.org yourself, cloudflare won't be able to either
09:57

pabs

right
10:05

pabs

IIRC the service needs these:
10:05

pabs

1) people to care about improving it instead of working around its current limitations
10:05

pabs

2) the Debian sysadmins to have time to complete the in-progress migration of the primary replica to newer hardware
10:05

pabs

3) a Debian team for the service, so the Debian sysadmins can just do hardware/OS
10:05

pabs

4) more replicas (152TB + growth) to meet the demand on the service
10:05

pabs

5) probably architecture and hardware upgrades
10:11

pabs

oh, and 6) the Debian sysadmins need to fix the failing proprietary backup system the primary replica uses
10:53

h2ibot

Yts98 edited Xuite (+524, Add smallpaint): wiki.archiveteam.org/?diff=50452&oldid=50431
12:41

pabs

"The Future of the Vim Project" groups.google.com/g/vim_dev/c/dq9Wu5jqVTw news.ycombinator.com/item?id=37074452
13:04

Barto

+1 for transparency
16:37

that_lurker

Could maybe be a good idea to grab all the vim mail lists if possible.
16:44

that_lurker

vim.org/maillist.php
16:46

pabs

already done
16:46

pabs

the google based ones at least
16:46

that_lurker

well thats awesome
18:00

arkiver

rewby: since I only see rewby|backup in #deadcat i'll post this here
18:00

arkiver

can we please have a target for gfycat?
18:00

arkiver

this would be
18:00

arkiver

archiveteam_gfycat_
18:00

arkiver

gfycat_
18:00

arkiver

Archive Team Gfycat:
19:36

h2ibot

TheTechRobo edited Periscope (-3, There are still items, but "Tracker rate…): wiki.archiveteam.org/?diff=50453&oldid=50291
19:44

h2ibot

TheTechRobo edited ArchiveTeam Warrior (-130, /* Warrior architecture and alternatives */…): wiki.archiveteam.org/?diff=50454&oldid=50367
19:45

h2ibot

TheTechRobo edited ArchiveTeam Warrior (-591, /* Can I use whatever internet access for the…): wiki.archiveteam.org/?diff=50455&oldid=50454
19:45

» fireonlive edits the TheTechRobo
19:46

TheTechRobo

maybe we should also note that if your ISP blocks custom DNS, the projects won't work, but I'm not sure what the error message is so idk what to add to FAQ
19:46

fireonlive

in my fuzzy memory it isn't to descript
19:47

fireonlive

too*
23:52

flashfire42

any chance of a warrior project for Webs? or is there too many stuff for a warrior project for that atm
23:53

TheTechRobo

flashfire42: wiki.archiveteam.org/index.php/Webs
23:53

TheTechRobo

we already did a bunch
23:53

TheTechRobo

i'm not sure how much
23:53

flashfire42

I mean its definite again

a year ago

« a day earlier

a day later »

today »