-
nicolas17
-
nicolas17
88755 URLs, I estimate 1.7GB total (but with such small files I'm sure header overhead will take over)
-
nicolas17
this is the way they are linked from the API and website, but they all redirect to another subdomain
-
h2ibot
Nicolas17v2 edited ARGENTeaM (+20, /* Archival */ webpages now being archived):
wiki.archiveteam.org/?diff=51263&oldid=51258
-
flashfire42
power out at IA one minute ago
-
nicolas17
yikes
-
flashfire42
-
eggdrop
-
» nicolas17 wonders if we should pause things
-
flashfire42
at least pause queuebot probably cc JAA
-
JAA
Why?
-
flashfire42
IA outage?
-
JAA
So?
-
flashfire42
/shrug
-
nicolas17
oh dictionary API is down, guess things will stall even if we don't fill targets...
-
JAA
Means the rsync target will accumulate data for now, and then the pipelines will accumulate data, and *then* AB will be in trouble.
-
nicolas17
do archivebot pipelines still upload to "fos"? (that's what the wiki said but it may be outdated)
-
JAA
No, not for years.
-
JAA
And even if they did, they wouldn't crash immediately.
-
JAA
We have 1-2 days worth of storage between the pipelines.
-
nicolas17
oh dat's a lot
-
JAA
AB isn't producing a lot of data in the grand scheme of things.
-
JAA
It's only 2 TB/d or so.
-
JAA
One of my uploads still hasn't noticed that IA is dead.
-
project10
-
JAA
-
JAA
But it seems to be returning now, slowly.
-
fireonlive
power pls
-
JAA
Also, no, the dict API isn't immediately a problem.
-
fireonlive
✨caching✨
-
JAA
:-)
-
fireonlive
=]
-
thuban
well, i broke my grab-site :/
-
fireonlive
oh no
-
thuban
installation via pyenv+pip doesn't work because fb-re2 sets `extra-compile-args=['-std=c++11']` but current re2 actually requires c++14
-
thuban
-
thuban
can't work around with CFLAGS because setuptools puts extra_compile_args at the end, which takes precedence
-
thuban
is there a particular reason we're still using
pypi.org/project/fb-re2 instead of
pypi.org/project/google-re2/?
-
nicolas17
in the blogger project we got a sudden spike in completed items and then it died again huh
-
nicolas17
I wonder what's the problem with network cost in korea
-
nicolas17
ingress or egress?
-
nulldata
08:29 PM <nulldata> Twitch is shutting down in Korea on February 27, 2024 KST --
blog.twitch.tv/en/2023/12/05/an-update-on-twitch-in-korea
-
nicolas17
will they block streamers in korea or viewers in korea or both?
-
nulldata
For those not watching main channel
-
flashfire42
Oh dear. #burnthetwitch will be getting a workout
-
nulldata
I wonder what the bandwidth costs are for those using an AWS Korea POP compared to other regions?
-
nicolas17
"though SK has laws that makes the content provider pay the bandwidth cost iirc"
-
nicolas17
oh
-
nicolas17
nulldata: "Hong Kong, Indonesia, Philippines, Singapore, South Korea, Taiwan, Thailand, Malaysia, and Vietnam" is the most expensive category in AWS CloudFront
-
missaustraliana
hey! fullpwnmedia here. long time no see. It has come to my attention that an australian breakfast tv network will cease in 20ish days. their network is known to delete everything that is tied to a show if it no longer exists. the show that is ending is Studio 10 and is run by the 10 Network.
-
flashfire42
Wait studio 10 is being axed?
-
missaustraliana
not sure if you will be able to access this link but this is their page
-
missaustraliana
-
missaustraliana
yeah on december 22
-
missaustraliana
it will cease to exist
-
missaustraliana
-
missaustraliana
-
eggdrop
-
missaustraliana
-
flashfire42
Just threw the youtube channel into down the tube
-
flashfire42
10play will be behind geoblock tho I spose if you wanted to incur the wratch of IA you could use tubeup
-
missaustraliana
I need all hands on deck to get this all archived. Especially the YouTube as it contains full length episodes
-
flashfire42
Youtube will be done via downthetube missaustraliana as soon as the bot decides to wake up. Instagram you will have to do manually. Twitter is a pain in the ass
-
missaustraliana
I'm not so worried about 10play episodes as they also need an account. But I would like for the page to be atleast archived
-
missaustraliana
Thanks flashfire42
-
missaustraliana
I'm manually archiving each show until cut off through OTA (DTV)
-
missaustraliana
I have todays show getting uploaded to my IA account
-
flashfire42
missaustraliana how are you uploading that? Tubeup?
-
missaustraliana
What do you mean? I recorded todays recording with my set top box now uploading to IA
-
flashfire42
AH now I get you
-
missaustraliana
now having struggles uploading as IA is throwing a fit and spitting out "there was a network problem"
-
flashfire42
oh yeah they have a power outage
-
missaustraliana
It could be that OR i just realised that im throwing it the files that are not stored locally but on my intranet
-
missaustraliana
Okay yeah nah its the power outage
-
missaustraliana
flashfire42 is there another way i can upload or do i just have to wait
-
flashfire42
Not to IA
-
missaustraliana
Shiz..
-
fireonlive
-
cas
TheTechRobo about this video
web.archive.org/web/20220103091055/…www.youtube.com/watch?v=OQktVBtbygI So does that mean I can't extract the video in wayback? Is there any other ways I can extract the video's content?
-
arkiver
(me and Sanqui are talking further over PM)
-
h2ibot
Tech234a edited Deathwatch (+121, /* 2023 */ Add Discord attachment links date):
wiki.archiveteam.org/?diff=51265&oldid=51261
-
nicolas17
cas: it's likely the video wasn't archived
-
cas
rip, that would be unfortunate to hear
-
h2ibot
Nicolas17v2 edited ARGENTeaM (+46, Add archivebot job ID for subtitles):
wiki.archiveteam.org/?diff=51266&oldid=51263
-
cas
It's a tall order, but I can't help but wonder if somewhere like common crawl might have it archived
-
JAA
I don't think Common Crawl is doing anything with YouTube videos. And they have long needed special treatment to archive.
-
nicolas17
well generic crawlers won't get youtube videos for sure
-
arkiver
i don't think common crawl is doing video at all?
-
nicolas17
you need complex purpose-specific code to get the URL of the actual video file (like what youtube-dl does)
-
h2ibot
Nulldata edited Deathwatch (+434, Added 'In The Know' and Twitch Korea.):
wiki.archiveteam.org/?diff=51267&oldid=51265
-
h2ibot
Inti83 edited Argentina (+282, /* List of At-Risk Websites */ - updating with…):
wiki.archiveteam.org/?diff=51268&oldid=51252
-
nicolas17
arkiver: even if it did, it would only work with like plain <video src="foo.mp4">, not youtube's rube goldberg.js machine
-
h2ibot
House edited Talk:Main Page (+406, /* Instructions please */ new section):
wiki.archiveteam.org/?diff=51269&oldid=50704
-
nicolas17
omfg
-
nicolas17
JAA: check that edit and tell me if it sounds familiar
-
JAA
nicolas17: I just approved it, so yes.
-
arkiver
nicolas17: yeah
-
cas
saddening, guess that means I can't get the video then
-
missaustraliana
can someone add Studio 10 December 22 AEDT
-
cas
thanks for answering my questions folks
-
JAA
missaustraliana: Anyone can edit the wiki. :-)
-
missaustraliana
Without or with an account? JAA
-
JAA
missaustraliana: With
-
missaustraliana
mk
-
nicolas17
missaustraliana: I can add it for you but I won't write the site description for you, give me something I can blindly copypaste :P
-
nulldata
If you sign up right now, the account is free!
-
» fireonlive 👀 Talk:
-
arkiver
let's see what shutdowns we have coming up
-
JAA
We should probably have a warning that nobody really pays attention to Talk:*.
-
missaustraliana
I'm signing up now as I'll probably be making changes in the future
-
arkiver
JAA: yeah :P
-
arkiver
missaustraliana: sounds good :)
-
arkiver
welcome to the wiki
-
missaustraliana
tsym
-
fireonlive
wikipals!
-
missaustraliana
okay but what in the world is that captcha
-
missaustraliana
convert to ascii?
-
nicolas17
yeah
-
missaustraliana
incorrect captcha..
-
nicolas17
the captcha used to be "ask for the secret word in #archiveteam-bs"
-
nicolas17
guess they changed it to something more automated :P
-
nulldata
hunter2
-
missaustraliana
can someone convert this to ascii? the page doesnt seem to like what i input
-
missaustraliana
4D6F757365
-
nicolas17
maybe you're a robot >.>
-
fireonlive
Mouse
-
missaustraliana
oh..
-
nulldata
How do we know you're not a bot?! :P
-
fireonlive
oh no!
-
missaustraliana
Jokes on you lot i am a robot8-|
-
nicolas17
oh no we just helped chatgpt sign up on the wiki
-
missaustraliana
i will now fill the whole wiki with false information<3
-
nulldata
fireonlive what have you done?!
-
fireonlive
oh nooooo
-
arkiver
looks like 3dtotal is done with archivebot
-
arkiver
will post on discord in #discard
-
missaustraliana
wait theres a discord too?
-
missaustraliana
i thought thwere was only irc
-
nulldata
Only IRC
-
missaustraliana
oh:'(
-
missaustraliana
my robot ass rlly thought
-
nulldata
arkive.r is referring to the Discord archive project
-
missaustraliana
ohh
-
h2ibot
FireonLive edited Talk:Main Page (+165, beware the talk pages):
wiki.archiveteam.org/?diff=51270&oldid=51269
-
arkiver
larm.fm is pretty huge
-
arkiver
JAA: do you know if we did anything for larm.fm?
-
h2ibot
-
flashfire42
So what you are saying we better start now?
-
JAA
arkiver: Nothing yet I believe.
-
arkiver
i wonder if they have dumps - a dump would be appropriate for this kind of data
-
missaustraliana
do we have any updates on studio 10? someone said that they chucked their youtube into down thr tube
-
flashfire42
I tried. When the bot is back it will go through
-
arkiver
apo.org.au seems to be saved with AB
-
missaustraliana
ah
-
JAA
arkiver: Negative, APO banned all attempts.
-
arkiver
ah
-
arkiver
fun
-
arkiver
i see sequential identifiers
-
nicolas17
missaustraliana: the bot that handles the "add to queue" commands is also hosted at archive.org, and it hasn't recovered after the power outage yet
-
arkiver
nicolas17: i'm working on it
-
nicolas17
arkiver: oh I know :) just giving status info
-
arkiver
JAA: does looks like they have nice sequential identifiers everywhere
-
missaustraliana
oh shit there was an actual outage at IA? flash did mention that but i thought that was just speculation. i didnt get any sort of notification on mastodon that it was down.
-
JAA
arkiver: Yeah, regular Drupal.
-
arkiver
missaustraliana: it's no very rare unfortunately
-
nicolas17
-
eggdrop
-
arkiver
:P IA mastodong *was down too*
-
nicolas17
x_x
-
JAA
> mastodong
-
arkiver
mastodon*
-
JAA
:-)
-
JAA
(preemptive bonk)
-
missaustraliana
judging the comments it seems like they didnt have enough funds for power
-
arkiver
hahaha looks like i made up a good one :P
-
JAA
missaustraliana: Rule 1 about Twitter: do not believe anything in comments.
-
missaustraliana
-
eggdrop
-
arkiver
missaustraliana: which comments?
-
JAA
99% of it is rubbish.
-
arkiver
oh
-
missaustraliana
real. bring back the old twitter
-
arkiver
no IA pays their power bill
-
missaustraliana
old twitter was the best
-
JAA
Meh
-
arkiver
but IA does make maximum use of their hardware, which could leave room for an extra failure every now and then compared to other services
-
JAA
It wasn't the hellhole it is now at least, I guess.
-
missaustraliana
when there was no blue checkmark, no priority in comments, no rate limits
-
nicolas17
missaustraliana: that's just textfiles telling the whining people to donate
-
missaustraliana
is textfiles apart of ia
-
fireonlive
he's a staffer there yeah
-
missaustraliana
ah
-
nicolas17
telegram and blogger projects are starting to upload now
-
missaustraliana
im waiting for any updates before continuing to upload as i dont want to flood the server
-
nicolas17
either our target-master-ewby added some miraculous amount of buffer space, or IA started taking uploads again
-
fireonlive
it's probably getting rammed pretty hard as it works its way back on its feet
-
missaustraliana
ill try once but if it fails im waiting
-
missaustraliana
i can tell you now the upload is alot slower
-
missaustraliana
which could indicate that it might be working again
-
fireonlive
that would be a fun rule of tumb
-
fireonlive
thumb
-
missaustraliana
on my end uploads are working
-
fireonlive
nice
-
Vokun
Just looking at the 20k+ items in the item deriver queue is haunting
-
arkiver
the bot to queue to projects is back!
-
Vokun
🎉
-
JAA
Vokun: Those built up earlier in the day though.
-
JAA
Starting around 10 hours ago.
-
Vokun
Oh
-
missaustraliana
ohh so it was restored from a backup
-
JAA
Uh, what?
-
missaustraliana
dw i might just be dumb
-
nicolas17
(since I keep forgetting + docs universally suck) does archivebot do body deduplication across different URLs?
-
JAA
No
-
nicolas17
so if I give archivebot a list with URLs A and B, and A redirects to B, that will unfortunately request B twice (issue 431)... but it will also *store* the body of B twice? that's annoying :/
-
nicolas17
okay that's the same URL but
-
JAA
Yes
-
fireonlive
won't the .gz help out a bit with that
-
JAA
No, each record is compressed separately.
-
fireonlive
ahh
-
nicolas17
argenteam added info and subtitles for like 5 more TV show episodes since I scraped everything... I thought after announcing they were shutting down, they would stop adding anything :|
-
fireonlive
you'd think...
-
h2ibot
JustAnotherArchivist edited ARGENTeaM (-10, Use Job template):
wiki.archiveteam.org/?diff=51272&oldid=51266
-
nicolas17
oh neat
-
fireonlive
<@arkiver> :P IA mastodong *was down too* < something on your mind, arkiver? :3
-
JAA
<@JAA> (preemptive bonk)
-
fireonlive
:D
-
h2ibot
JustAnotherArchivist edited Argentina (+83, Fix URLs):
wiki.archiveteam.org/?diff=51273&oldid=51268
-
h2ibot
JustAnotherArchivist edited Argentina (+49, More URL fixes):
wiki.archiveteam.org/?diff=51274&oldid=51273
-
JAA
Everything from [[Argentina]] has been thrown into AB now (except for the two that are geofenced).
-
JAA
Nearly everything in the 'Gender and Equality' section is JS hell, so no idea how well that will work.
-
h2ibot
Switchnode edited Argentina (+2069, /* List of At-Risk Websites */ add archivebot +…):
wiki.archiveteam.org/?diff=51275&oldid=51274
-
thuban
!tell Inti83 all of the sites on the argentina wiki page have been submitted to archivebot; you can monitor running jobs at
archivebot.com and retrieve finished ones from
archive.fart.website/archivebot/viewer
-
eggdrop
[tell] ok, I'll tell Inti83 when they join next
-
thuban
!tell Inti83 note that a job succeeding does not necessarily mean the site was adequately captured (if eg there is heavy use of javascript)
-
eggdrop
[tell] ok, I'll tell Inti83 when they join next
-
h2ibot
Switchnode edited Argentina (+36, /* Memory and Human Rights */ add archivebot…):
wiki.archiveteam.org/?diff=51276&oldid=51275
-
fireonlive
-
fireonlive
unfortunately the account is protected
twitter.com/w3c
-
eggdrop
-
JAA
It was accessible 13 hours ago. :-|
-
fireonlive
:\
-
JAA
> have directed all our followers here to Mastodon
-
JAA
Well, except nobody can read the tweets anymore...
-
fireonlive
indeed, no one will see the 'please follow us over here' unless they specifically visit the profile which.. they wont lol
-
JAA
I've been informed that their existing followers should still be able to see the tweets.
-
fireonlive
oh! right
-
fireonlive
i'm a bit OOTL on twitter it seems
-
fireonlive
sadly... i don't seem to follow them
-
h2ibot
Tech234a edited Chrome Web Store (+470, Update manifest v2 timeline, app deprecation,…):
wiki.archiveteam.org/?diff=51277&oldid=49044
-
JAA
intheknow.com is being annoying with a consent redirect. I'm grabbing it with grab-site from an unaffected IP.
-
that_lurker
"
-
that_lurker
We'll be
-
that_lurker
right back..." nice site :P
-
JAA
Yeah, I got that one as well with curl.
-
JAA
< HTTP/1.1 400 Direct self loop detected