-
threedeeitguy
potentially. There are only two lines in my MV-Errors.log file.
-
threedeeitguy
gm26791 (MV): 'animation1Name'
-
threedeeitguy
gm25516 (MV): Unterminated string starting at: line 24 column 1489 (char 94827)
-
thuban
ah ok
-
thuban
26791 looks the same as my 17614
-
thuban
not sure about 25516
-
threedeeitguy
K, ive split it into a lot (136) of smaller jobs and am running them in parallel. Its still slower but it is vastly improved.
-
thuban
nice
-
h2ibot
Wickedplayer494 edited Surrender at 20 (+367, SS is paying attention to the looming expiry…):
wiki.archiveteam.org/?diff=50041&oldid=49232
-
yts98
thuban: commit f14a67e fix 8298. I'm doing it.
-
yts98
Looking for someone to put
transfer.archivete.am/OrLXQ/api_requests.txt.zst (900 Game Atsumaru comment / scoreboard API requests) quickly before 3AM UTC.
-
yts98
i.e. in 1.5 hours.
-
yts98
s/900/900K/
-
thuban
you say 'requests'; are these just urls that can go in archivebot?
-
yts98
Yes. they're GET request URLs without session requirement.
-
thuban
i see you've asked in #archivebot, so someone will get to it shortly
-
thuban
job id: askbyh40bwvdmpjqb5v0l3por
-
thuban
yzqzss|m: mv 22500-24999 ok?
-
JAA
Summary of #archivebot: AB is far too slow for this, but I'm grabbing them with qwarc instead, which does more brrrr.
-
thuban
is that wbm certified or nah?
-
JAA
Almost all responses are empty, but I guess that's expected.
-
JAA
Where empty = {"meta":{"status":200},"data":{}}
-
JAA
-
Jake
"boardIdが範囲外です" == "boardId is out of range"
-
yzqzss|m
<thuban> "yzqzss: mv 22500-24999 ok?" <- it's ok
-
JAA
Those invalid URLs slow me down significantly due to how qwarc works on errors, but it should still finish in time at the current rate.
-
JAA
It's done.
-
h2ibot
FireonLive edited Egloos (+172, Egloos is offline. Data been partially saved.…):
wiki.archiveteam.org/?diff=50042&oldid=49938
-
fireonlive
lets see if minor edits show up too
-
h2ibot
FireonLive edited Egloos (+50, tweaks, add lead):
wiki.archiveteam.org/?diff=50043&oldid=50042
-
fireonlive
so it do
-
fireonlive
can't get anything past h2ibot
-
JAA
Edits in the User and User_talk namespaces get ignored, but otherwise, no.
-
fireonlive
ahh ok :)
-
fireonlive
also, curse you mediawiki for making every first letter capital :p
-
thuban
yts98: did you get 9445? it still errors out for me
-
thuban
(after pulling, i mean)
-
yts98
thuban: not able to get 9445 yet
-
thuban
ok. sorry, wasn't sure what strikeout meant
-
yzqzss|m
DDL was reached and the front-end pages are HTTP 30X to <
blog.nicovideo.jp/niconews/194994.html>
-
yts98
strikeout means I got it myself
-
thuban
? 9445 was struck out, did it work or no
-
fireonlive
re IRL; it's already past 12pm PDT
-
fireonlive
hm
-
fireonlive
(from #archiveteam)
-
fireonlive
seeing 500s for profiles and 404s for post links nulldata:
irl.com/meg-myers/bTduA5Al5N
-
fireonlive
where's what sketch quote about sites losing as much data as possible
-
fireonlive
what→that
-
JAA
-
fireonlive
ah! yes it was foone not sketch :)
-
yzqzss|m
The Game Atsumar's api endpoint has also been redirected.
-
yzqzss|m
To prevent wget from accidentally overwriting the previous warc files, don't run any step6-* script now.
-
» thuban nods
-
thuban
where are we sending files?
-
yts98
I don't know if we should create an IA project and upload to it respectively, or transfer all data to one place to generate magawarcs first.
-
yts98
So I'm looking for help from volunteers experienced in running staging servers.
-
datechnoman
yts98 - you would want to speak with JAA_ and arkiver_ regarding this. They can assist in getting a formal project put together
-
thuban
datechnoman: the time for that is over, site is already dead. we're just talking about how to get manually collected data onto ia in a convenient format
-
h2ibot
Yts98 edited Niconico (+228, Game Atsumaru was down):
wiki.archiveteam.org/?diff=50044&oldid=50015
-
h2ibot
FireonLive edited Current Projects (-447, move Egloos to recently finished, remove…):
wiki.archiveteam.org/?diff=50045&oldid=50024
-
h2ibot
Yts98 edited Current Projects (-156, Game Atsumaru finished):
wiki.archiveteam.org/?diff=50046&oldid=50045
-
datechnoman
Ahh gotcha no worries at all. Apologies!
-
nulldata
Should we add whatever Android app textfiles is referring to to the Deathwatch list? :P
digipres.club/@textfiles/110619602655711553
-
thuban
datechnoman: none needed! that's probably whom i'd ask about how to format an ia item anyway
-
datechnoman
For sure. The experts for sure :)
-
pabs
-
pabs
two actually
-
pabs
-
pabs
-
arkiver
yts98: how many WARCs is this?
-
h2ibot
Jack Thompson edited Deathwatch (+440, Added Showbuzz Daily and IRL):
wiki.archiveteam.org/?diff=50047&oldid=50035
-
yts98
arkiver: each game produces 1~4 WARCs. I have 12556 files in 62GiB, but other operators including thuban_ and 3 STWP members have claimed more IDs, so they will have more.
-
yts98
We also keep the files besides WARCs. Since the script overwrites WARC generated by previous runs, we cannot guarantee that every crawled file will appear in the WARC.
-
arkiver
yts98: feel free to upload to IA as items with 1000 WARCs each
-
arkiver
note that this is only okey in this case, other cases maybe not
-
arkiver
yts98: actually, each of these games, since you also get them outside of WARCs. are they playable somehow easily with the files you archived?
-
arkiver
for example like an index.html you can run and then the game plays, or similar?
-
yts98
I'm sad to find out that some major browsers reject localhost CORS, so some games should be uploaded to an http server like WAMP to be playable.
-
arkiver
got it
-
threedeeitguy
arkiver I got almost 3000 warcs, ~60gb. plus the extracted assets yts98 mentioned. The assets are ~67gb and 260,000 files.
-
arkiver
alright
-
arkiver
yts98: let's do the following
-
arkiver
each individual game can be upload in it's own item with mediatype=software
-
arkiver
perhaps identifier game-atsumaru-ID, or atsumaru-ID, whatever is more appropriate
-
arkiver
and with proper metadata
-
arkiver
and, the WARC can be uploaded separately into items with 1000 WARCs in each item
-
arkiver
does that sound good?
-
yts98
STWP also proposed game-atsumaru-warc-{game_type}-{range_start}-{range_end} .
-
yts98
Should we schedule each item to contain nearly 1000 WARCs, or use a predictable range as the index?
-
arkiver
yts98: let's say max 1000 WARCs per items, how many items with WARCs would you create in your 'predictable range'?
-
yts98
arkiver: Oh, I see: if each game generates an average of 3 WARCs, then 30 items with WARC will be created; If I conservatively map 250 game IDs to an item, there will be 119 items.
-
arkiver
yts98: ah that is no problem. let's say max 1000 items with WARCs, each max 1000 WARCs
-
arkiver
and the games files for each individual game can be uploaded to a separate item
-
yts98
Sound good.
-
yts98
I wonder if it's possible for two IA accounts to upload files to the same item? If not, is it safe to share S3 access keys with others?
-
arkiver
why do you need two people uploading to the same item?
-
arkiver
the WARCs?
-
arkiver
IA does not recommend sharing keys
-
yts98
Got it, so each operator should decide on the identifier separately.
-
nicolas17
hm
-
nicolas17
does IA use AWS-like authentication and signing?
-
yzqzss|m
-
masterX244
afterwards flashpoint should be crosschecked, too. (flashpoint does html games, too so its worth to get the games into that, too). having them already grouped at the IA makes that work easier since no filehunt needed anymore
-
JAA
Initial Knowledge Adventure CDN download finished just over 12 hours ago, as predicted. I'm now relisting the bucket and will grab anything that was added in the past couple weeks.
-
JAA
32.23 TiB downloaded into 3.22 TiB of WARCs :-)
-
JAA
There were 65907 403s, i.e. files which appear to be inaccessible.
-
JAA
Also 14662 404s which I need to deal with later. Many of those are 'directories', which aren't accessible through the media* domains. Some are because I forgor to encode stuff correctly.
-
phaeton
Is there a better way to see how long your worker has been at the same task than viewing the folder creation time in the data folder?
-
pokechu22
On the warrior web page there's an elapsed time at the bottom right of each task (below the log)
-
phaeton
thanks, I should have specified in the docker images
-
JAA
Container logs would be another option, but otherwise, not really.
-
JAA
Well, I guess you could check when the wget-at process was started, also.
-
phaeton
ok, had a long running task and was curious how far back it went... thanks for the proc runtime suggestion. Hadn't considered doing it outside of the container
-
fireonlive
-
fireonlive
i can't say i'm not surprised
-
fireonlive
but nice
-
fireonlive
ah it's not official-dicial
-
fireonlive
hello, world!
-
fireonlive
has digitize.archiveteam.org been discontinued? "This server could not prove that it is digitize.archiveteam.org; its security certificate is from internetarchive.archiveteam.org. This may be caused by a misconfiguration or an attacker intercepting your connection."
-
fireonlive
-
pokechu22
I've never heard of that project before, hmm
-
pokechu22
-
fireonlive
if you bypass the security warning it shows "Hello archive team.org!"
-
fireonlive
hmmm i don't think so
-
fireonlive
this one was "a wiki dedicated to digitizing many types of storage media including paper media, CDs, tapes, video, slides, and floppies"
-
fireonlive
more on the ingest side
-
pokechu22
hmm, some fairly dubious recent captures having captchas and stuff:
web.archive.org/web/20230301000000*/https://digitize.archiveteam.org
-
fireonlive
though there is a 'file creation software' on there
-
pokechu22
-
pokechu22
-
fireonlive
DNS points to 213.184.85.58 which has a PTR of archiveteam.org (which can be set by anyone with no validation but it's a hint at least)
-
fireonlive
run on a 'Hosting4Real' server
-
fireonlive
hm
-
fireonlive
not that it is an emergency or anything i was just curious
-
nicolas17
archiveteam.org resolves to the same IP
-
fireonlive
i saw it and was like ooh ingest methods
-
fireonlive
nicolas17: ah! thanks
-
pokechu22
There is a somewhat old export at
archive.org/details/wiki-digitizearchiveteamorg (which is incorrectly in warczone when it should be in wikiteam it seems)
-
pokechu22
-
fireonlive
as does wiki.archiveteam.org; so maybe something jrwr ran?
-
fireonlive
ooh that logo on Main_Page great though
-
fireonlive
ye there's a few interesting pages on there
-
fireonlive
but ye
-
fireonlive
neat though :)
-
arkiver
i pinged jason about it
-
fireonlive
ah! thanks
-
arkiver
it's been off since last year
-
thuban
yts98: it might be easier to keep things consistent if one person consolidates + uploads
-
h2ibot
FireonLive edited Tiki (+782, The torches have been blown out):
wiki.archiveteam.org/?diff=50048&oldid=50023
-
h2ibot
FireonLive edited Tiki (-27, fix infobox syntax):
wiki.archiveteam.org/?diff=50049&oldid=50048
-
threedeeitguy
I've no issues with sending everything onto someone else for upload.
-
fireonlive
good afternoon good afternoon
-
fireonlive
more fires for this hellscape we call earth
-
fireonlive
-
fireonlive
earth? life.
-
fireonlive
seems like their online edition needs an account to 'see more'
nationalgeographic.com
-
joepie91|m
sigh. and of course the wapo article uncritically repeats the narrative that it has "fallen on hard times" in the same breath as stating it's the most-read magazine in the US.
-
joepie91|m
anything to not call out what it really is, I guess