-
JAA
After over 40 million requests and 29.7 TiB downloaded, the qwarc process finally reached the memory limit and stopped. This is the normal intended behaviour due to a 'memory leak' (probably not an actual leak but heap fragmentation or similar), but I didn't expect it to take that long.
-
pokechu22
qwarc process on what?
-
JAA
Knowledge Adventure CDN
-
h2ibot
-
h2ibot
Yts98 edited Current Projects (+156, Propose Game Atsumaru):
wiki.archiveteam.org/?diff=50020&oldid=49956
-
arkiver
yts98: on game atsumaru - since it's a limited number of games, could we simulate the browser and capture the URLs?
-
yts98
sure we can stay on the game homepage longer to caputure the URLs. but frameworks like RMMV, RMMZ only load resource per request, and few indie games have anti-debugging to detect headless browsers
-
arkiver
yts98: feel free to post a request in #archiveteam , that has a better chance of succeeding than the wiki
-
arkiver
thought not much time left so I would not get my hopes up
-
thuban
arkiver: yeah, the resource url extraction is pretty bespoke, so idk if it's realistic to duplicate in a warrior project
-
thuban
*but* yts98's existing scripts also output url lists. can't just toss them in ab because of the session cookie thing, but definitely possible as dpos, right?
-
thuban
-
yts98
not sure if the current wget-at can make multipart/form-data POST request properly.
-
arkiver
no
-
thuban
remind me, is that required to get the session cookie?
-
arkiver
or well
-
yts98
yes
-
arkiver
if you have the exact data and headers to send I guess you could do it
-
arkiver
but is it actually needed?
-
arkiver
not everything the browser does is actually needed
-
yts98
i did not try with application/x-www-form-urlencoded
-
yts98
a possible approach is make request in python and them pass the cookies to wget-at as environment variables
-
arkiver
what percentage of games does this affect?
-
yts98
every game is guarded by the "ticket" session cookie
-
thuban
-
thuban
lol oops didn't reload. 62-78
-
JAA
(Press y to get a permalink to the exact revision rather than the mutable branch name.)
-
yts98
-
thuban
-
h2ibot
-
h2ibot
-
h2ibot
-
vokunal|m
Figures my second wiki edit would be fixing a problem in my first one. I see I have a bright future
-
fireonlive
vokunal|m: brighter than mine!
-
vokunal|m
Thanks!
-
vokunal|m
Wait
-
fireonlive
:)
-
h2ibot
FireonLive edited Current Projects (+135, add Tiki):
wiki.archiveteam.org/?diff=50024&oldid=50020
-
fireonlive
(not sure if <!-- Urgent projects is supposed to be alphabetical or in order of urgency there to feel free to slap the line around)
-
fireonlive
(or chronological but it doesn't appear to be)
-
h2ibot
Systwi created Tjournal (+22, Created an article redirecting "Tjournal" to…):
wiki.archiveteam.org/?title=Tjournal
-
h2ibot
Systwi edited Tjournal (-6, Corrected redirect target from "Tjournal" to "TJ."):
wiki.archiveteam.org/?diff=50026&oldid=50025
-
systwi_
Bah, I meant "TJournal" in that last message...in case those messages _really_ matter. :-P
-
systwi_
Hehe, even I make mistakes. No need to worry, fire and vokunal. :-)
-
systwi_
It's all a learning experience, and thankfully corrections can be made a click or two away.
-
yts98
Script for 1332 RMMZ games is ready. Still looking for operators to claim gameId ranges on
pad.notkiska.pw/p/game-atsumaru in 19 hours!
-
h2ibot
Systwi edited Topics of Archiving Interest (+44, /* Multimedia Franchises */):
wiki.archiveteam.org/?diff=50027&oldid=49224
-
h2ibot
Systwi edited Topics of Archiving Interest (+10, /* Multimedia Franchises */ Added 1 entry.):
wiki.archiveteam.org/?diff=50028&oldid=50027
-
systwi_
Made another erratum, hehe. Right Shift turned into Right Shift + Enter.
-
h2ibot
Systwi edited Topics of Archiving Interest (+54, /* Work/Occupations */ Added 1 entry.):
wiki.archiveteam.org/?diff=50029&oldid=50028
-
h2ibot
Systwi edited Topics of Archiving Interest (+60, /* Celebrities/Famous Individuals */ Added 1…):
wiki.archiveteam.org/?diff=50030&oldid=50029
-
h2ibot
Systwi edited Topics of Archiving Interest (+65, /* Leisure */ Added 1 entry.):
wiki.archiveteam.org/?diff=50031&oldid=50030
-
h2ibot
Systwi edited Topics of Archiving Interest (+642, /* Celebrities/Famous Individuals */ Added 9…):
wiki.archiveteam.org/?diff=50032&oldid=50031
-
Misty|m
-
yts98
Misty|m: not decided yet. all previous discussions happened here.
-
yts98
thuban: now I'm asking STWP for help, so the currently unclaimed gameids may be saved later.
-
VickoSaviour
is the blog.harvard.edu saved?
-
yts98
Script for 1019 Akashic Engine games is ready. Still looking for operators to claim gameId ranges on
pad.notkiska.pw/p/game-atsumaru in 14 hours.
-
rktk
yts98, does it require a JP IP address for proper discovery?
-
rktk
VPN JP IP
-
yts98
rktk: probably not. I am using a residential IP outside of Japan.
-
rktk
Oh that's good
-
rktk
I was going to offer help but, I have 500GB Free of 110TB so. I need to free up some space
-
immibis
just buy more hard drives :)
-
h2ibot
-
Hans5958
Someone should move "Google Drive" on the main page from active to hiatus
-
Hans5958
*warrior-based projects
-
h2ibot
PaulWise edited Mailman2 (+28, lua-l kind of looks like a pipermail archive…):
wiki.archiveteam.org/?diff=50034&oldid=50007
-
rktk
immibis, it's a rented dedicated server :(
-
rktk
no upgrades
-
rktk
I will probably buy a set of HDDs for a new build soon, it's just... so expensive
-
immibis
you rent a server with 110TB of storage?
-
thuban
-
yts98
thuban: commit 35954f2 fix 8573
-
thuban
ty
-
rktk
immibis, yes
-
rktk
it was the best option when I was travelling
-
rktk
cheaper than glacier s3 too
-
fireonlive
from where?
-
arkiver
Stitcher has sitemaps
-
arkiver
we might do a project for this one
-
thuban
yts98: by the way, what is the plan for storage? i assume this is going to go to the internet archive, but do you have a way to transfer the data and enough space to consolidate it before upload?
-
rktk
fireonlive, it's one of the SX line of servers from hetzner
-
fireonlive
ahh :)
-
fireonlive
tks
-
rktk
I will stop renting the server soon since I will self host at home, and instead shift that monthly cost towards another glacier s3 compatible provider
-
rktk
so then I have 2 copies and 1 is offsite, so I am missing the 3rd copy.
-
rktk
but, 3rd copy doesn't have to be for everything like that...
-
rktk
in the cloud i mean
-
arkiver
yeah i see signs sitemaps are being updated
-
arkiver
thuban: yts98: how much is this?
-
thuban
arkiver: hundreds of gigs (uncompressed and with duplication--bare files + warcs)
-
arkiver
nice, yeah that can be just stored on IA
-
h2ibot
Manu edited Deathwatch (+277, Adding Oryx, a news blog that will shut down…):
wiki.archiveteam.org/?diff=50035&oldid=50018
-
rktk
if an entire website is saved, with wget, is it better to upload the .warc.gz file to archive, or the plain HTML files themselves
-
rktk
HTML + associated files
-
pokechu22
The .warc.gz file is useful (though if you're a random user it probably won't be added to web.archive.org but instead only to
archive.org/details/warczone). If the size isn't too big, you could upload both of them
-
rktk
it's close ish to 50GB total
-
rktk
but I'm not the one that archived it
-
masterX244
pokechu22 warcs are more convenient for grepping thru. backl when we hunted imgur links the warczone was sweeped, too
-
h2ibot
Entartet edited List of websites excluded from the Wayback Machine/Partial exclusions (+43):
wiki.archiveteam.org/?diff=50036&oldid=49650
-
thuban
-
immibis
really, hetzner? I pay $60 for a server from hetzner with only 1TB of storage, but it was CPU-optimized
-
immibis
Glacier Deep Archive is about $1/TB/month and I see Hetzner SX prices are not much higher than that
-
Maakuth|m
Azure too has a cheap cold storage tier
-
immibis
$97/mo for 64TB is surely better than any hot cloud storage out there. Of course, that's not accounting for replication
-
fireonlive
-
thuban
-
threedeeitguy
yts98 How many instances is it safe to run per IP? Im up to 5 and still have bandwidth spare but don't wanna get banned.
-
thuban
i've done a dozen and haven't seen anything like rate limiting
-
threedeeitguy
cool. guess il power on another server then. wget can really hammer a CPU.
-
yzqzss|m
thuban: threedeeitguy: I changed the MV_iterate_.py script to use multiple threads (40 by default).
-
yzqzss|m
run `git pull` to update :)
-
imer
server action has some decent storage ones as well:
hetzner.com/sb?drives_count_from=15&drives_size_to=16000 15x10TB for 150€ (excl. tax)
-
imer
mh, that link doesnt work. just filter by drive size 10tb^
-
fireonlive
ooh
-
fireonlive
if only EUR to CAD wasn't such a ***** **** lol
-
masterX244
immibis: my main archival crap runs on the 64TB one, too. 48 usable only, one HDD for parity
-
threedeeitguy
cool. Il update once this lot finish.
-
thuban
-
upintheairsheep
The following Discord server will be shutting down and moving to Telegram:
discord.gg/cEH6cqMgBd
-
fireonlive
hi/bye
-
masterX244
doesnt even tell what the server is about, another issue: you can't check out invites in a private window while the app is open, iot pokes a connection to localhost
-
fireonlive
'hang out with 122 other members'
-
fireonlive
logo the "windows 10" logo
-
fireonlive
o_O
-
fireonlive
'kazlandia // new beginning'
-
threedeeitguy
yts98 looks like all of MV has been claimed. I take it we just moved onto the next one?
-
TNN
Hey guys - probably stupid question. What do each of these mean?
-
TNN
claims: 17361101done: 7471389todo: 105762705todo:backfeed: 3782832todo:redo: 0todo:secondary: 0unretrievable: 0
-
threedeeitguy
yts98 also nice update, going even quicker now.
-
JAA
Looks like the Knowledge Adventure CDN download should finish at around 05:00.
-
threedeeitguy
TNN Claims: workers have claimed the item but have not done it yet. This can include items that have been claimed but will never be finished (maybe the worker was switched off). These are eventually moved to another queue to be reclaimed. Done: Done :) todo: items to be done. IIRC this queue has the highest priority. Backfeed: items discovered by
-
threedeeitguy
(or you could say fed back) by the workers. todo:redo already been done once but needs doing again. Probably because it temporarily failed. todo:secondary low priority to do queue jaa feel free to correct anything :)
-
JAA
todo:secondary has a higher priority than :redo, it appears in the wrong order on the tracker page. Otherwise, correct.
-
TNN
Thanks very much, super helpful
-
JAA
The meaning of the queues might vary by project. We sometimes shuffle items around and just use them as priority queues, i.e. to process certain things before others.
-
threedeeitguy
Worth adding an explanation to the wiki? It feels like there should be a dedicated page for the leaderboard, not everyone wants to go digging through the tracker docs.
-
thuban
that sounds like a good idea (especially as "the tracker docs" is a rather aspirational phrase)
-
JAA
++
-
thuban
JAA: where should it go? obvious answer is 'the tracker wiki page' but a lot of that stuff is wrong at present and i don't have enough details to correct it
-
JAA
thuban: Yeah, not sure, maybe just the Warrior FAQ?
-
JAA
I feel like we should restructure that at some point though. A lot of the stuff there isn't really related to the warrior but to DPoS projects more generally.
-
threedeeitguy
I hate the wiki editor already :(
-
thuban
JAA: you're technically correct, but from a usability perspective i'm kind of reluctant to direct warrior users to a secondary page for troubleshooting
-
thuban
mediawiki magic to transclude relevant items on both the 'warrior' and 'running with docker' pages?
-
nicolas17
immibis: Backblaze is really cheap compared to competition... and it would be $320/mo for 64TB
-
JAA
Even just splitting everything warrior-specific into a separate section would do, I think.
-
thuban
not sure what you mean; specify?
-
thuban
(also, i will repeat my request for a toclimit)
-
nicolas17
what's DPoS?
-
thuban
nicolas17: Distributed Preservation of Service
-
thuban
(what we often refer to as a 'project' or 'warrior project')
-
threedeeitguy
yts98 seems to hang on completion now. Last line:
-
threedeeitguy
2023-06-27 22:57:06 URL:https://resource.game.nicovideo.jp/games/gm27293/26/img/tilesets/Dungeon_C.png [649991/649991] -> "games/gm27293/26/img/tilesets/Dungeon_C.png" [1]
-
threedeeitguy
Is it safe to close the terminal at that point?
-
JAA
thuban: E.g. 4.4 to 4.8 on
wiki.archiveteam.org/index.php/ArchiveTeam_Warrior are not really warrior-specific.
-
JAA
And people ask about them all the time, too.
-
thuban
JAA: i understand that, just not sure what you mean by 'splitting into a separate section'. it wouldn't solve the duplication with the 'running with docker' page, and it wouldn't really be more helpful, would it?
-
JAA
We could have a 'DPoS FAQ' with sections 'warrior' and 'not warrior'.
-
JAA
This would depend on people actually using the names consistently though. I try to only use 'warrior' when I actually mean the warrior VM or container (or, technically, the seesaw parts driving that, although that's basically never discussed).
-
thuban
we could, but i think that would be less usable for newbies
-
nicolas17
oh I didn't know the tracker etc was called that
-
thuban
JAA: do you have rights to edit
wiki.archiveteam.org/index.php/MediaWiki:Common.css ? i think last time we talked about this you said you didn't, but mediawiki sez you're an admin and should
-
JAA
I don't remember us talking about this, which suggests it was quite some time ago, possibly before I had admin rights, yeah.
-
JAA
I should be able to.
-
thuban
i actually checked and it was after :P but yeah
-
thuban
-
JAA
thuban: Ah, I was probably confused by 'Add the following code *in the file* ...', thinking it required editing an actual file in the MW config dir etc. That I don't have access to.
-
thuban
i figured it was something like that
-
JAA
> You do not have permission to edit this CSS page because it may affect all visitors.
-
JAA
(╯°□°)╯︵ ┻━┻
-
thuban
ugh! i guess you have to make yourself an "interface administrator" as well?
wiki.archiveteam.org/index.php/Special:ListGroupRights
-
JAA
Yeah, looks like it, editsitecss permission.
-
JAA
There we go.
-
h2ibot
JustAnotherArchivist changed the user rights of User:JustAnotherArchivist (Need to edit [[MediaWiki:Common.css]])
-
h2ibot
JustAnotherArchivist edited MediaWiki:Common.css (+474, Add TOC limit rules from…):
wiki.archiveteam.org/?diff=50037&oldid=24389
-
h2ibot
Switchnode created Template:TOClimit (+57, Created page with "<div…):
wiki.archiveteam.org/?title=Template%3ATOClimit
-
h2ibot
Switchnode edited ArchiveTeam Warrior (+7, limit toc depth for better readability):
wiki.archiveteam.org/?diff=50039&oldid=50013
-
thuban
thanks! <3
-
imer
wouldn't hurt to have an explainer for the components of the dpos stack then? i made a (bad) flowchart-thingy
transfer.archivete.am/Jqui7/2023-06-28_01-01-08_UKocGw5qlp.png probably needs some naming corrections
-
arkiver
-
imer
yeah, saw that - less technical though which was the point, right?
-
arkiver
:P
-
imer
way better graphically of course :)
-
arkiver
i just like the image
-
imer
yeah its great
-
arkiver
yes we could create a more technically correct image
-
h2ibot
Switchnode edited ArchiveTeam Warrior (+668, /* Warrior FAQ */ add counter explanation here…):
wiki.archiveteam.org/?diff=50040&oldid=50039
-
thuban
wiki.archiveteam.org/index.php/Arch…er_internet_access_for_the_Warrior? have the recent quad9 updates made the item about dns here unnecessary?
-
datechnoman
Correct. To my knowledge all projects are now using quad9 DNS by default set within the container arkiver?
-
yts98
thuban: I'll inspect 8298 17614 19117 later.
-
yts98
threedeeitguy: you can kill 27293 and redo it later.
-
thuban
-
yts98
thuban: got it. I'm investigating other frameworks, so debugging woulg be at lower priority.
-
thuban
understood
-
yts98
if anyone is able to resolve exceptions, feel free to make a pull request.
-
yts98
s/woulg/would :p
-
threedeeitguy
yts98 ive set 4 large jobs going with the rest of rmmz. There's 750gb of space left on that disk so I guess were gonna find out how large it is. Il let you know how it went in the morning. It is going a lot slower, looks like its only single threaded? If it needs an update and then a restart lmk, il be back in 8 hrs or so.
-
yts98
the deadline is in 3 hours, so I would also do some duplicate ranges
-
threedeeitguy
ah I thought it was later on today. np il see about speeding things up.
-
flashfire42
3tt38940863qg137qhybq2xbe there is also this AB job
-
thuban
threedeeitguy: why are some of your ranges listed as "done?" with question marks?
-
threedeeitguy
Those jobs hung on a few items. Im just checking the warcs to make sure it did get everything.
-
threedeeitguy
If there are any missing il shout shortly.
-
thuban
depending on the error(s) the script may have downloaded some files but not all
-
threedeeitguy
Looks like the jobs completed fine, all files match up (I just re-ran all 4 affected items in a new folder to compare). Since the update a few hours ago my terminal window just hangs once the script finishes. I guess its related to the multithread changes. Does not appear to have affected the data in anycase.
-
thuban
i'm a little concerned that i encountered half a dozen errors (some of which have not yet been fixed) whereas you have apparently seen none over a similar number of games.
-
thuban
is it possible error messages got lost in the parallel output?