-
pabs
FTR, I threw vger.kernel.org into AB, it is small (no archives)
-
mrcave
Hi everyone: Hi, just wondering if anyone know if the upload of the "telenor home" collection is all different version of the same set of website crawls or if every large collection on archive.org are different?
archive.org/details/archiveteam_telenor -
wiki.archiveteam.org/index.php/Telenor
-
pokechu22
mrcave: I'm pretty sure that each item in that collection contains different data, split so that each item is ~20 GB each
-
pokechu22
the home.online.no/~joeolavl/ and similar ones are a bit weird but it sounds like those were for individual users that weren't found by the main grab
-
pokechu22
So if you wanted to download all of the site, you'd need to download the WARCs from all of the items
-
mrcave
hey, thanks for the info
-
mrcave
I check the individual home.online.no/~joeolavl uploads. I helped out on the home.no grab, but not looked at the files since. now trying to look for band pages, but finding interest in writing a summary of home.no and it content. compared to geo cities, the users was the owner of the Internet subscription, so the pages are often made by adults and
-
mrcave
with a close family vibe.. person start pages for the familys+++ only found 1 band page
-
OrIdow6
Have there been any major changes in the last ~6 months? Anything I can help with in the end-of-year rush?
-
arkiver
OrIdow6: hi :)
-
arkiver
it's not very busy at the moment
-
arkiver
mostly we're working on #frogger (Blogger) still, which is almost finished
-
OrIdow6
That's good
-
OrIdow6
And hi
-
fireonlive
hii
-
fireonlive
-
fireonlive
"André Braugher Dies: Star Of ‘Homicide: Life On The Street’, ‘Brooklyn Nine-Nine’ & Other Series And Films Was 61"
-
nicolas17
...that URL sounds like he died by homicide
-
fireonlive
oh it does
-
JAA
I feel like they edited the headline after publication due to the same issue, but their system doesn't regenerate the slug in that case.
-
JAA
-
JAA
Oh wait no, the <title> isn't the article headline...
-
JAA
And that's where the slug is derived from.
-
fireonlive
ahh
-
fireonlive
at least they have a unique ID in the URL so they can redirect later
-
fireonlive
-
DJ
Ello
-
DJ
I would like to help archive ponychan, is there anything I can help with or do I just have to download Warrior?
-
flashfire42
DJ God forbid I ask. Is something happening to ponychan? or would this be proactive archival?
-
DJ
It's apparently shutting down on Jan 7th
-
flashfire42
Do you have a source for that at all?
-
DJ
Yep, here you go
ponychan.net/chat/res/112453.html, it's on Deathwatch as well.
-
DJ
Sorry just remove the comma
-
flashfire42
Oh it is too. Ok so I mean depending on the rate limit it may just be an archivebot job. More warrior runners is always great but I am not sure this would be a warrior project cc arkiver maybe?
-
fireonlive
hmm depends how many posts it has i suppose
-
fireonlive
+ media per post
-
flashfire42
I dont know a lot about chans or MLP for that matter I tend to avoid both so
-
fireonlive
though we do have until the 7th
-
JAA
How far back do the posts go anyway? Many image boards continuously purge old posts.
-
flashfire42
I was thinking that too JAA thats the way those boards often operate
-
fireonlive
ah yes
-
fireonlive
/pony/ shows 2023-07-24
-
JAA
/oat/ goes back to 2021.
-
fireonlive
/chat/ has one from 2023-08-29
-
JAA
/fan/ 2015...
-
JAA
So I guess it's not very consistent. lol
-
fireonlive
hmm.. 11 pages on /oat/
-
fireonlive
wonder if it's not very active
-
fireonlive
i think imageboards are usually purged based on new threads instead of a timer
-
JAA
Older posts still exist. Random example from /pony/:
ponychan.net/pony/res/36833460.html
-
fireonlive
oh interesting
-
fireonlive
hm that one shows up in catalog still
-
fireonlive
-
fireonlive
so not nesc. pruned yet
-
DJ
flashfire42 Alright, do I ask something specific or just go for it?
-
flashfire42
If you look above they are discussing possible ways of doing it
-
flashfire42
I also just threw it into archivebot just to see
-
DJ
Ah okay then, thanks.
-
JAA
Yeah, checking some more, those are all 404s. I just happened to check one of the very few that's in the catalog.html. lol
-
fireonlive
ah good luck haha
-
fireonlive
AB job seems to be going well so far
-
angenieux
Hello
-
angenieux
Would it be a good idea to rearrange the order of the command line argument of wget-at so that "--lua-script foo.lua" so its easier to see what project a particular process of wget-at is running with htop?
-
angenieux
*so that --lua-script part is closer to the front
-
foaf
hello guys
-
foaf
im trying to decompress one megawarc.warc.zst file but it says that i need the dictionary I tried the script in the warc page but it gives me some errors File "C:\jk.py", line 46, in <module> d = get_dict(fp) ^^^^^^^^^^^^ File "C:\jk.py", line 30, in get_dict p = subprocess.Popen(['unzstd'], stdin = subprocess.PIPE, stdout =
-
foaf
subprocess.PIPE, stderr = subprocess.PIPE) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\Python311\Lib\subprocess.py", line 1026, in __init__ self._execute_child(args, executable, preexec_fn, close_fds, File "C:\Program Files\Python311\Lib\subprocess.py",
-
foaf
line 1538, in _execute_child hp, ht, pid, tid = _winapi.CreateProcess(executable, args, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^FileNotFoundError: [WinError 2]
-
foaf
i have tried to change unzstd for zstd with no results
-
foaf
thanks in advance
-
TheTechRobo
Is that script compatible with windows?
-
foaf
i dont know what i changed is unzstd to zstd
-
TheTechRobo
Make sure it's in the same folder as yhe script
-
TheTechRobo
*the
-
flashfire42
apo.org.au were we able to do anything about this?
-
JAA
Nope, all attempts got banned very quickly.