-
Pedrosso
The wiki states for the docker "--concurrent 1: Process 1 item at a time per container. Although this varies for each project, the maximum recommended value is 5, and the maximum allowed value is 20. Leave this at 1, or check with us on IRC if you are unsure". I don't really understand this. Does it vary between projects or is it always worse to
-
Pedrosso
have more or something?
-
JAA
If you set it too high, the target site might ban you. Ideally, that just means you no longer do meaningful work. In bad cases, it can pollute the archive and lead to missed data if we don't catch it.
-
pokechu22
It varies between projects
-
JAA
So the ideal value is whatever lets you grab as much as possible without getting banned. Rate limits vary by target site, so ideal concurrency varies by project.
-
Pedrosso
Is there anywhere to check the recommended values, or is asking in their respective chats the most efficient?
-
JAA
We (often|usually|sometimes|occasionally) put it in the channel topic.
-
Pedrosso
I see, thanks
-
Pedrosso
With docker, is there any reason to not run multiple different projects at once?
-
JAA
Nope, many people do exactly that. :-)
-
Pedrosso
(y) awesome. I (at least temporarily) want to maximize what I can do is all
-
Pedrosso
Oh, as for when it comes to spore, I have found a tool that may assist with finding the items and thus not checking through tons of empty urls however I don't know any javascript. I'll just post the tool here if anyone wants to check it out:
github.com/Spore-Community/SporeTools/blob/main/SporeDwrApiClient.ts
-
JAA
Basically, there are three types of people running workers: a) casual users that just set up the warrior in auto mode once and forget about it; b) powerusers that run multiple projects in parallel to get the most out of existing machinery; c) insane people with large clusters.
-
JAA
(Exceptions that don't fall into these categories confirm the rule.)
-
Pedrosso
(I've no clue what that saying "exceptions confirm the rule" is supposed to mean but okay)
-
Pedrosso
idk what a cluser is, is it someone getting like a "cloud" service to do it for them?
-
Pedrosso
(also someone named idk just got tagged randomly lol)
-
JAA
'Exceptions confirm the rule' is a lighthearted way of saying there are counterexamples to a rule but they're few and far between, so the rule's still valid.
-
JAA
Cluster ~ significant amount of distributed computing resources. 'Cloud' stuff can be used for that, yeah, but it's not a requirement.
-
JAA
It also implies some degree of coordination across the resources, e.g. orchestration.
-
JAA
Not sure
github.com/Spore-Community/SporeTools is of much use for reducing the 1.1 billion number. It's an implementation of Spore's API, but I don't see anything for checking many asset IDs at once.
-
fireonlive
-
JAA
Damn
-
fireonlive
yeah :c
-
Pedrosso
JAA: What it would do is considerably lower the amount of empty URLs checked, I'd believe. I've been told that all assents use the same IDs but different formats so there will be many, many gaps. Considering the server's stability it may be safer to do something slower with a list in mind. However I've no clue how to get a list of items out of that
-
JAA
Pedrosso: If we make the same number of requests, it probably doesn't matter much. And API requests likely take more resources on the Spore server side than service static files.
-
JAA
s/service/serving/
-
h2ibot
FireonLive edited Deathwatch (+307, add Omegle):
wiki.archiveteam.org/?diff=51110&oldid=51097
-
Pedrosso
I think I understand. I'll get to checking out the qwark software then. (Hopefully I can understand how to use it lol)
-
JAA
logs.omegle.com URLs collected from anywhere would be interesting. I already ran the ones from Reddit through AB.
-
Pedrosso
I can't find any documentation for Qwarc, I hope I'm not expected to be big brained enough to figure it all out on my own
-
fireonlive
they seem to have one indexed (on google) redirect:
waw1.omegle.com/redir/gj2016
-
fireonlive
to some youtube video?
-
fireonlive
log.omegle.com exists too and is indexed; though seems to serve the same content
-
fireonlive
NSFW: they seem to be running a whitelabel verison of chaturbate too:
lady.omegle.com (though the fact it says 'Whitelabel powered by Chaturbate.com' isn't very whitelabel.. but lol)
-
fireonlive
i assume/but didn't check it's just chaturbate with a different logo
-
JAA
Pedrosso: I mentioned yesterday that there is no documentation.
-
JAA
fireonlive: Yeah, I noticed the same about lady.omegle.com. It's already running through AB to get a sample.
-
fireonlive
ah :)
-
Pedrosso
About omegle, nice that you've got something to run on
-
fireonlive
noticed
chatserv.omegle.com as well; which redirects to omegle.. but if you append a ?from= parameter you get omegle?
-
fireonlive
-
fireonlive
-
JAA
Or even just an empty param works.
-
fireonlive
ah! :)
-
fireonlive
trying to use it just gives an error to reload though
-
fireonlive
antinudeservers: ["waw1.omegle.com", "waw2.omegle.com", "waw3.omegle.com", "waw4.omegle.com"]
-
fireonlive
haha
-
fireonlive
(in the start json response)
-
JAA
Pedrosso: I can't really recommend trying to use qwarc currently. It works, and it's very powerful, but the lack of documentation just make it a non-starter unless you enjoy reading through my code and figuring out all the quirks yourself.
-
fireonlive
it links to (NSFW)
cameglelive.com (well a sub page of that) if you want an adult site instead, but not sure of the relation to omegle itself (and it doesn't seem to quickly say, it seems to be different than chaturbate)
-
fireonlive
i clicked men and now see a man jackin' it live so i guess it's unaffected for now
-
Pedrosso
JAA: Every programmer is a masochist, hah. I don't know of any other resources than just the code that was mentioned, and the one I wrote sure is slow
-
fireonlive
ah ok, it's a whitelabel verison of
streamate.com it seems
-
fireonlive
(omegle itself is Omegle.com LLC)
-
arkiver
RIP omegle
-
» fireonlive pours one out
-
h2ibot
Petchea edited Tumblr (+430, /* History */):
wiki.archiveteam.org/?diff=51111&oldid=51050
-
h2ibot
0KepOnline edited Spore (+633, Add tools (Spore PNG Downloader & sporeget) and…):
wiki.archiveteam.org/?diff=51112&oldid=51106
-
h2ibot
Petchea edited Tumblr (-47, /* History */ reblog with more context):
wiki.archiveteam.org/?diff=51113&oldid=51111
-
h2ibot
JustAnotherArchivist edited Deathwatch (+4, Fix order):
wiki.archiveteam.org/?diff=51114&oldid=51110
-
fireonlive
am i... dumb?
-
fireonlive
i swear i stared at that for a minute
-
fireonlive
lol
-
project10
-
fireonlive
..and not the fun kind :(
-
fireonlive
-
eggdrop
-
that_lurker
Omegle has some interesting subdomains like lady.omegle.com that you should not open at work like i did
-
Exorcism
LMAOOOO
-
JAA
that_lurker: Yeah, discussed above, it's a whitelabel version of Chaturbate.
-
thuban
-
thuban
(best source i could find, sorry)
-
h2ibot
Megame edited Deathwatch (+167, /* 2023 */
apo.org.au - 15 Dec):
wiki.archiveteam.org/?diff=51115&oldid=51114
-
thuban
-
thuban
no on-site announcement yet
-
thuban
although a lot of pages depend on js (even the articles, which have 'continue reading' buttons), i _think_ that the relevant data is in-source and playback from archivebot captures would work
-
thuban
(except comments, which are an ugly-looking in-house js thing)
-
balrog
-
h2ibot
Manu edited Political parties/Germany/Hamburg (+4138):
wiki.archiveteam.org/?diff=51116&oldid=51108
-
Pedrosso
Would
sporepedia2.foroactivo.com be considered small enough a forum for ArchiveBot to deal with? If so, I've got a small list of similar forums that don't seem to have been archived
-
nicolas17
we got a filesystem dump of the preinstalled macOS 14.1 on M3 Pro but still looking for 13.5 on M3 ._.
-
nicolas17
gonna get increasingly hard to find people who didn't update yet
-
fireonlive
nicolas17: jason seems to be missing from here but maybe you could tweet at him and he might boost it? or email uh whatever fucking cutsey email he left here
-
fireonlive
ended in textfiles.com i think
-
fireonlive
jesuschristmorearchiveteamcrap⊙tc
-
fireonlive
he does have that big follower base ™
-
h2ibot
-
JAA
Jason is here, but he doesn't check IRC all that often.
-
that_lurker
-
flashfire42|m
sighs time to grab more tumblr
-
JAA
→ #tumbledown
-
Pedrosso
Would anyone mind answering the question I sent above? They allegedly say they have 456,901 messanges
-
JAA
Pedrosso: That would be appropriate for AB, yeah.
-
Pedrosso
(y) great
-
fireonlive
JAA: oh maybe under another nick; i tried the two i remember
-
fireonlive
didn't check hostnames though :)
-
fireonlive
but yeah true i do remember him saying just use email
-
JAA
fireonlive: He's S.ketchCow here normally, currently S.ketchCo1 due to a netsplit or similar.
-
fireonlive
oh!
-
Pedrosso
I noticed in the ArchiveBot for sporepedia2 that
i.imgur.com/YHS5Omo.png returned a 429 code. Do those links get sent to #// or #imgone Or are they just logged somewhere?
-
pokechu22
They're just logged - it's possible to extract them from the meta-warc (which is a gz-compressed log file) and throw them into #imgone manually but nothing automatically happens currently
-
Pedrosso
Considering the nature of those images (them being spore creations) that may be a good thing to do
-
vokunal|m