-
fireonlive
skeletor wins
-
vokunal|m
I'm not sure where the number comes from, but one source stated 2,398,412 posts
-
JAA
-
JAA
-
vokunal|m
ahhhhhh. thanks
-
Arcorann
Anyone know what's going on with HikariNoAkari? I heard there was some drama and it was shutting down
-
thuban
it's been discussed, but not with any particular insight
-
JAA
-
JAA
So the AB job for Hikari No Akari finished, but it timed out on almost all sitemaps, i.e. I'm not sure it's complete. It does appear to have gone through the pagination, but new posts being added could still have led to some things getting missed.
-
h2ibot
-
betamax
high chance it's already been done, but Minnesota is / has run a flag-design contest, all* 2123 designs are on the site:
serc.mnhs.org/flags
-
betamax
*all => I'm pretty sure that they're no longer accepting submissions. not 100%, though
-
betamax
I'm not going to put it in AB because I have only glanced at the site and am not sure if extra work is needed to capture the "click each submission to view the larger-size image"
-
guest
hello.
-
guest
like using archive bot
-
fireonlive
i’m glad you do!
-
fireonlive
it’s a nice bit
-
fireonlive
bot*
-
nulldata
Bit Bot
-
DogsRNice
beep boop
-
JAA
betamax: Looks like the large images work just fine without JS. They're standard links. I've thrown it into AB.
-
ScenarioPlanet
transfer.archivete.am/l5gdO/static.spore.com-ids-2016.txt.zst - Full (?) list of Spore.com creation IDs including INVALID/PURGED/BANNED statuses, as of 2016.
-
ScenarioPlanet
20741764 entries ^
-
pokechu22
ScenarioPlanet: what speed and concurrency can that be ran at?
-
pokechu22
oh, wait, those are just numeric IDs, so it can't be ran directly
-
pokechu22
17:20 <ScenarioPlanet>
transfer.archivete.am/l5gdO/static.spore.com-ids-2016.txt.zst - Full (?) list of Spore.com creation IDs including INVALID/PURGED/BANNED statuses, as of 2016.
-
pokechu22
17:21 <ScenarioPlanet> 20741764 entries ^
-
pokechu22
Pedrosso: might find that interesting
-
Pedrosso
Hey uh pokechu22, I've been looking over archivebots logs of "
davoonline.com/phpBB3?archiveteam". It's been archiving a lot of the same login page with different "?*" things
-
Pedrosso
Also yes, I find it very interesting
-
pokechu22
Yeah, that doesn't look great :|
-
pokechu22
-
Pedrosso
idk too much about the archivebot. I've seen ignorelists used. Idk how they work but can a "just ignore all mode=login lol" work?
-
pokechu22
Yeah, mode=login would ignore any URLs with the text mode=login in it, while ^
davoonline.com/phpBB3/ucp\.php.*[?&]mode=login ignores anything starting with
davoonline.com/phpBB3/ucp.php and containing either ?mode=login or &mode=login
-
pokechu22
Now,
davoonline.com/phpBB3/viewtopic.php?style=17&p=18852 is a bit weird too since I'm not sure where the style=17 is coming from - I haven't seen other styles though so maybe it's fine?
-
pokechu22
hmm, no, URLs from
davoonline.com/phpBB3 don't have style=17 but once a URL with style=17 is retrieved that same parameter is added to everything else... and it looks identical to without it
-
JAA
-
pokechu22
... ah, it came from one link that has style=17 on it in
davoonline.com/phpBB3/viewtopic.php?p=41535#p41535 it looks like
-
pokechu22
Probably best to just nuke that then
-
pokechu22
we don't need to save everything twice
-
Pedrosso
Is it able to be modified live with these ignores?
-
Pedrosso
Oh, nice
-
pokechu22
Yep, you can add and remove ignores as needed (and adjust the speed and concurrency at which it runs too)
-
Pedrosso
As for the spore.com archive, I didn't know it would be able to archive users, but I'm glad it found its way, hah
-
pokechu22
Looking at
archivebot.com/ignores/2y5iu7ey1kzbspqay7vlkbkuq we have /ucp\.php\?mode=(login|delete_cookies|pm) in the forums ignoreset - which doesn't work with the random style=17 in the middle
-
JAA
Yeah, that should probably be /ucp\.php\?(.*&)?mode=(login|delete_cookies|pm)(&|$) instead.
-
vokunal|m
Since AB grabs a lot more than just the urls on a site, is there a way to determine whether a job will finish in time or not? Based on the current rate, he-man.org could grab ~3.3m if all goes well, but those aren't nessesarily all the urls on the site and probably external links
-
pokechu22
The easiest approach is to use --no-offsite if it seems like it'll be close to not finishing, and then manually run the offsite links afterwards (but that requires having the job's database manually saved since links skipped by --no-offsite don't end up in the log)
-
Pedrosso
What's the deal with all the 600,001-600,001 ms delays?
-
that_lurker
most of those are in a pipeline that has been offline for about a year
-
JAA
s/most of //
-
that_lurker
JAA: Didn't you plan to remove the pipeline? Or is it on the todo list
-
JAA
Yeah, the latter.
-
fireonlive
t.me/zlibrary_official/41 "Sad news! Yesterday a large number of our domains were seized again. We should highlight that the majority of the seized domains were not mirrors of the Z-Library website, but they were separate sub-projects, containing only books in rare languages of the world, and their blocking is confusing. For instance, these
-
fireonlive
domains included books in Tamil, Mongolian, Catalan, Urdu, Pashto, and other languages."
-
Pedrosso
Lost connection, did I miss anything?
-
Pedrosso
Thanks to whoever added esporo to the archivebot :]
-
h2ibot