-
nicolas17
2023-09-09 01:52:51 (1.36 MB/s) - Read error at byte 290689826387/484259014419 (Error decoding the received TLS packet.). Retrying.
-
nicolas17
using wget instead of curl just saved me
-
nicolas17
it resumed from where it failed
-
nicolas17
2023-09-09 01:52:39 (940 KB/s) - Read error at byte 38657556051/377308233499 (Error decoding the received TLS packet.). Retrying.
-
nicolas17
interesting that both downloads failed at similar times
-
TheTechRobo
interesting, I've only ever seen TLS errors from wayback machine
-
nicolas17
yeah this was from archive.org items, not wbm
-
pabs
does anyone know if AB/IA/WBM/something can save magnet links? this band publishes raw video files, audio stem files and fan-made bootleg videos via magnet: links
kinggizzardandthelizardwizard.com/automation kinggizzardandthelizardwizard.com/bootlegger
-
pabs
not in danger (except if seeder counts go down), but stem files are so rarely published that it would be good to have them on IA
-
pabs
-
nicolas17
pabs: I was going to say no but it seems actually yes
-
nicolas17
"If a valid .torrent file is uploaded (e.g. through our Uploader) into an item, when that item is derived, we will instantiate a BitTorrent client (Transmission) and attempt to retrieve the Torrent. If the Torrent is successfully retrieved, its contents will be added to the item. ‘Valid’ in this case means, well-formed and seeded."
-
nicolas17
"Bonus feature: if you have only a magnet link, and not a Torrent file, you can create a dummy .torrent file by pasting that magnet link into a text file and naming it foo.torrent."
-
nicolas17
not sure if you can do multiple .torrent files per item this way
-
pabs
nice, so just write to a file and ia upload
-
nicolas17
worth noting this will not seed the *original* torrents
-
» pabs not sure about doing this with the current IA upload bandwidth issues hmm
-
nicolas17
the end result is equivalent to you downloading the torrent and then uploading the files
-
pabs
hmm, guess I should check the torrent are fully seeded before doing this...
-
nicolas17
it saves *you* a ton of bandwidth
-
pabs
yeah
-
pabs
where did you read that btw?
-
nicolas17
-
nicolas17
the first half of this page is all about IA-generated torrents for items
-
nicolas17
later it talks about using torrents to upload, which is an entirely different feature
-
pabs
thanks, adding some TODO notes
-
» pabs goes to AB atariage stuff
-
thuban
i had no idea about torrent uploads, that's awesome
-
TheTechRobo
-
TheTechRobo
Used to use it a lot before I got fibre
-
manu|m
hi, so there is this publisher that recently posted about possibly being insolvent, depending on how much donations they will receive in the next two weeks. they have a magazine, a shop, etc.—6 domains in total (that i’m aware of). would you be willing to run that through the archivebot? (repost here so it doesn't get lost in #archivebot)
-
manu|m
-
thuban
manu|m: thanks for the report! that looks archivebot-able. can you list the relevant domains?
-
manu|m
-
manu|m
katapult-magazin.de has a bunch of subdomains, but their all either some internal service with a login at the front, not reachable, and two were some one-pagers that I sent to IA via browser extension
-
manu|m
I don't think katapult.link needs to process external links, looks like they're only linking to their own stuff on other domains
-
thuban
having looked at the link structure between those domains, i think it'll be best to run a separate archivebot job for each
-
thuban
(possibly with --no-offsite-links on the katapult.link job to avoid duplication)
-
thuban
someone with bot privileges should queue them up shortly, i believe
-
manu|m
thanks, i appreciate it :)
-
h2ibot
That lurker edited GitHub (-44, ghtorrent.org domain is used for casino…):
wiki.archiveteam.org/?diff=50737&oldid=50426
-
arkiver
wondering - how are we on resource for #archivebot nowadays? seems like all jobs are starting immediately. or is there something else that shows we would need more resources?
-
pokechu22
Right now we seem to be OK - it looks like all of the stuff I queued just now has filled all pipelines (I know because !status says 205 in progress, and around 200 is the limit - the exact number varies because some pipelines only accept specific jobs though), but that stuff should finish shortly. It was struggling a bit when there was a ton of gabon stuff going on but I
-
pokechu22
think it's OK at the moment
-
Barto
as my operating system teacher used to say, disks are more often full than empty, and are meant to be filled up :-) I think we're okay too. If I want to put a lot of subdomain it will definitely be queued, and if socialbot comes back for twitter, the same will apply.
-
pokechu22
(though also it's worth noting that the 200 in progress is misleading as that includes stuff on the cybercontrol pipelines, which have been offline for over a year (see
archivebot.com/pipelines))
-
Barto
certainly, also how about those jobs there?
-
pokechu22
I'm not entirely sure what their status is and if they could be resumed if/when the pipelines return
-
icedice
-
eggdrop
-
icedice
"Come end of Oct, Tokyo Lab, one of the largest holders of film material in Japan, will be discarding all film left unclaimed by rights holders. It's a foregone conclusion that anything stuck in "licensing hell" is done-for. This is bad."
-
icedice
-
eggdrop
-
icedice
"> @nappasan @NFAJ_PR The negatives will be looked into by Councillor Ken Akamatsu. We have not received a commitment to resolve the issue, so please keep an eye on the progress. We would also like to continue to ask for effective approaches from all parties."
-
Sanqui
arkiver: re: archivebot, we have gotten pretty good at estimating the limit and backing off when we hit it, but that doesn't mean we couldn't use more resources, especially with a more intelligent queuing system, or a proper ability to suspend jobs
-
nicolas17
arkiver: note that there may be some low-priority stuff that people *aren't* adding to archivebot because they know of the issues uploading to IA
-
arkiver
Sanqui: pokechu22: thank you! i do not have anything to offer right now, but wanted to know what the current situation is
-
arkiver
nicolas17: i think it's fine for people to queue stuff anyway, we can always start using the offload space if really needed
-
arkiver
also JAA ^
-
nicolas17
I mean where is archivebot data going now?
-
nicolas17
are we uploading to IA, maybe at reduced speed?
-
pokechu22
I guess I should also mention that there have been issues where one of the pipelines (I can't remember if it was hel3 or hel4) had issues uploading to IA, while the other one of the two was fine (and it also was able to transfer data to the other pipeline and upload it quickly). I'm not actually sure who's responsible for those pipelines (probably AK because
-
pokechu22
"ak-was-here-hel4") but it's something weird that's happening
-
AK
(can't remember which one it is either). Iirc JA_A did some digging and couldn't see an obvious reason. It also didn't seem to happen all the time
-
AK
(I pay the bill, JAA does all the managing of the AB stuff on the top because I don't know how and they're super nice <3)
-
arkiver
it's being uploaded
-
arkiver
i believe not at reduced speeds
-
nicolas17
ah
-
arkiver
it's going in, and it's anyway a small portion of what we normally put through to IA so not a huge problem
-
nicolas17
I'm more familiar with how warrior stuff works and largely blind to archivebot :)
-
pokechu22
It's worth noting that the issue with hel3 or hel4 is a longstanding one that wasn't caused by the more recent IA issues; it's just mysterious
-
arkiver
i bet rewby would have an idea about that, rewby is the expert when it comes to pushing data to IA
-
h2ibot
Myusernameisanything edited ISP Hosting (+3, I archived the scraped URLS for Claranet…):
wiki.archiveteam.org/?diff=50738&oldid=50486
-
h2ibot
Myusernameisanything edited Claranet Netherlands Personal Web Pages (+3, I archived the scraped URLS and now they are in…):
wiki.archiveteam.org/?diff=50739&oldid=47495
-
h2ibot
That lurker edited Valhalla (+483, Cerabytes product is like ment for archiving so…):
wiki.archiveteam.org/?diff=50740&oldid=48930
-
JAA
arkiver: The hel4 upload issues are totally unrelated to anything at IA etc. That machine just has irregular transient issues towards the rsync target for some unknown reason. Nothing weird shows up in mtr etc. when it happens, but rsync transfers run at a couple hundred kB/s. Then it's fine again a few hours later. I asked a few people and nobody had any ideas what I could try to diagnose the issue
-
JAA
either. When I'm around, I route the rsync traffic through another machine at Hetzner Helsinki, and everything works perfectly fine there.
-
arkiver
could it be a hardware issue?
-
JAA
On cybercontrol, I mentioned this in #archivebot recently, but it obviously got buried, so repeating for visibility: those pipelines might be a loss since the relevant person has been MIA for a long time now. If nothing happens by the end of the month, they'll be thrown out of the system.
-
JAA
Cc pokechu22 ^
-
arkiver
yes anything mentioned in #archivebot will probably be missed by a ton of people, please post it here too or we should have a special AB discussion channel
-
project10
#archivebot-bot
-
Barto
damn, never fun to have someone go MIA :(
-
JAA
I mark important things with [PSA], and I advise people to set up highlights for that. A separate discussion channel has been brought up several times, there even is one, but it's completely unused, and it could kind of lead to a split brain.
-
arkiver
which one?
-
arkiver
let's start using it
-
JAA
#archivebot-bs
-
Barto
for the interested: weechat.look.highlight_regex "^\[PSA\]"
-
Barto
that's what i use
-
fireonlive
i could do that with TL; though it works everywhere
-
fireonlive
add that to your reasons to hate TL JAA :p
-
thuban
is #archivebot-bs official, then?
-
arkiver
no
-
JAA
There's just been discussion about that in there. :-P
-
thuban
so are we going to start using it or no ?_?
-
JAA
My point of view is: discussions in #archivebot are mostly about immediate action, e.g. ignores or starting jobs, and dev stuff can already happen in #archiveteam-dev. PSAs like the above are rare enough that I'm not sure a separate channel makes sense, and I can repeat them here in the future for visibility.
-
JAA
I'll also note that the above was merely a 'this might happen' comment, not a real PSA. If it comes to that, I'll of course explicitly ping everyone with affected jobs, too.
-
JAA
More important PSAs are also usually added to the channel topic.
-
JAA
(Like the 'don't touch the 600001 jobs' one)
-
arkiver
i hope long term discussions/planning can be outside of #archivebot
-
arkiver
talking about the cybercontrol stuff for example
-
JAA
Well, not much to plan there, sadly.
-
arkiver
very jobs specific stuff can be in #archivebot i think?
-
thuban
agreed, but i think JAA is correct that it can happen here or in -dev (as appropriate)
-
arkiver
JAA: so archiveteam-bs/dev instead of archivebot-bs ?
-
JAA
I think so, at least. If others disagree, I won't oppose a separate channel.
-
arkiver
we could decide tomorrow
-
that_lurker
JAA: Are PSA's posted on the #archiveteam channel?
-
JAA
that_lurker: If it's something sufficiently important, I can do that, yeah.
-
h2ibot
JustAnotherArchivist edited Deathwatch (+193, /* 2023 */ Add Squat the Planet):
wiki.archiveteam.org/?diff=50741&oldid=50713
-
that_lurker
Yeah then I woulds say the current channels and practices cover the need of a dedicated channel. But I make the ocasional memes so the decision is up to someone else :P
-
fireonlive
archivebot meme, go!
-
fireonlive
:D
-
that_lurker
-
fireonlive
x3
-
JAA
Alternatively: right label 'ArchiveBot', left label 'Buttflare'
-
that_lurker
-
JAA
:-)