-
guybrush.hackint.org
*** Notice -- TS for #// changed from 1655942823 to 1604184663
-
datechnoman
arkiver poke me when the next batch/bulk load is loaded. Will spin back up :)
-
datechnoman
Been running great since the last url source update
-
datechnoman
I think I have an addiction to data hoarding...
-
arkiver
datechnoman: haha, will do :)
-
arkiver
working on an update to the code, and after that we'll resume getting sitemaps
-
datechnoman
No stress or pressure at all. I know your always very busy! :)
-
datechnoman
Anything exciting being updated?
-
arkiver
we'll be pushing the periodical stuff into a separate backfeed shard
-
arkiver
so it doesn't add to the main backfeed shard we use for general queuing of URLs
-
arkiver
and sitemaps will be awesome :)
-
arkiver
once per month we'll get the sitemaps of every site we come across
-
arkiver
since those contain very valuable information and are not always available anymore when needed
-
datechnoman
That is genius and also very effective with the sitemaps
-
datechnoman
I like it
-
datechnoman
The extra backfeeds will hopefully allow for better control of incoming items
-
arkiver
-
h2ibot
arkiver: Invalid command message.
-
arkiver
-
h2ibot
arkiver: Deduplicated and queued 0 items.
-
arkiver
-
h2ibot
arkiver: Invalid command message.
-
arkiver
-
arkiver
-
h2ibot
arkiver: Deduplicating and queuing 3536804 items.
-
arkiver
-
arkiver
-
h2ibot
arkiver: Deduplicating and queuing 125 items.
-
h2ibot
arkiver: Deduplicated and queued 0 items.
-
arkiver
-
h2ibot
arkiver: Deduplicating and queuing 125 items.
-
h2ibot
arkiver: Deduplicated and queued 125 items.
-
arkiver
-
h2ibot
arkiver: Deduplicating and queuing 3536804 items.
-
arkiver
-
Ryz
arkiver, thoughts on the Google Docs discussion above? oo;
-
h2ibot
arkiver: Deduplicating and queuing 3536804 items.
-
arkiver
-
arkiver
Ryz: first fixing this
-
h2ibot
arkiver: Deduplicating and queuing 3536804 items.
-
h2ibot
arkiver: Invalid command message.
-
arkiver
-
h2ibot
arkiver: Deduplicating and queuing 3536804 items.
-
Ryz
Hmm oo;
-
arkiver
-
h2ibot
-
h2ibot
arkiver: Deduplicating and queuing 3536803 items.
-
arkiver
that is better
-
h2ibot
arkiver: Deduplicated and queued 3536803 items.
-
arkiver
JAA: systwi: ^
-
JAA
:-)
-
arkiver
there was a huge URL in there, we'll now skip those and give data about what was skipped
-
JAA
Whew, nice long URL, hehe.
-
TheTechRobo
-
JAA
-
h2ibot
JAA: Deduplicating and queuing 13608 items.
-
JAA
Well...? :-)
-
TheTechRobo
Is it supposed to have a "finished" message?
-
JAA
Yeah, see above.
-
JAA
arkiver: ?
-
TheTechRobo
Wasn't sure if that finished message was for testing or not.
-
systwi
Thanks arkiver! :D
-
systwi
So, when queueing jobs here, are they specifically run on Warriors as opposed to ArchiveBot, where jobs are run on pipelines?
-
systwi
!help
-
h2ibot
systwi: The following commands are available:
-
h2ibot
systwi: !help: Print this help message.
-
h2ibot
systwi: !a: Deduplicate and archive a list of URLs hosted on transfer.archivete.am. CAREFUL, DDOS.
-
systwi
I'm guessing that to be the case, judging from the last statement.
-
Jake
Yes, it runs on the URLs project. h2i just provides an interface to more easily queue into the project.
-
systwi
Ahh, gotcha, thanks.
-
systwi
Is there a dashboard to view the progress of ongoing jobs?
-
Jake
-
Jake
:-)
-
Jake
Lists queued through the bots aren't treated specially, so there's no real good way to track the progress of a specific last as far as I'm aware.
-
systwi
Thanks for the info, that makes sense.
-
arkiver
JAA: whoops, invalid character, let me encode those
-
systwi
-
systwi
I probably would need voice/op for that (just checking).
-
arkiver
-
h2ibot
arkiver: Deduplicating and queuing 13608 items.
-
h2ibot
arkiver: Deduplicated and queued 13608 items.
-
arkiver
JAA: TheTechRobo: ^
-
arkiver
we're percent encoding characters not supported by backfeed now
-
arkiver
-
h2ibot
arkiver: Deduplicating and queuing 2 items.
-
h2ibot
arkiver: Deduplicated and queued 2 items.
-
arkiver
systwi: ^
-
arkiver
and yeah admin is needed
-
arkiver
-
h2ibot
arkiver: Deduplicating and queuing 2 items.
-
h2ibot
arkiver: Deduplicated and queued 2 items.
-
datechnoman
nom nom nom
-
datechnoman
Amazing how fast we process URL's lol
-
arkiver
hah :)
-
arkiver
and yeah!
-
datechnoman
Am i correct in saying we are only going max-depth 1 (one hop) for links found in a URL page we process?
-
arkiver
yes
-
arkiver
or
-
arkiver
two actually
-
arkiver
so hop 2 hops, but the first hop is only followed if it was not found before
-
datechnoman
Ahhh yup gotcha! Thanks. We technically arent crawling then