-
pabs
arkiver: there are definitely times when AB is at or above the job limit + 5 pending, especially when flashfire42 is around doing ISP stuff. I tend to avoid doing proactive stuff a fair bit, unless we are more than a few jobs below the limit and things seem quiet
-
pabs
and when/if snscrape comes back, there will be a big backlog of twitter archiving to do
-
flashfire42
Sorry bout that hahaha
-
pabs
and there will be other situations where higher peak capacity is useful; for eg the adelaide university merger is going to be tons of jobs due to the many subdomains
-
pabs
no, I think you're doing good stuff flashfire42 :)
-
pabs
anyway, I'm sure we will always be able to reach whatever the job limit is :)
-
owen
What's the easiest way to archive a mid-sized portion of a website? (ex. example.com/stuff-to-archive/*)
-
pabs
owen: archivebot
-
pabs
just pass us the URL and we will run it, then everything happens automatically after that
-
pabs
that needs a directory index or some other link based mechanism that lists all subcontents though
-
pokechu22
The alternative technique is saving the entire site :P
-
project10
AB job 8k26biu6lro5cb6vi3awnu3z8 is a chonky one
-
arkiver
#shreddit is restarted
-
arkiver
pabs: so, we're holding back now?
-
pabs
some folks are occasionally yeah
-
Ryz
Reminder, inactive user content excluding YouTube on Google may start being deleted starting on 2023 December S:
-
arkiver
when IA is all fine again with taking data, got great plans for expanding our archiving - or well especially plans for #//
-
arkiver
we'll significantly increase our coverage of 'important stuff'
-
fireonlive
awesome possum
-
h2ibot
-
pabs
Ryz: does that include public blogspot/blogger/etc stuff?
-
Ryz
pabs, yes...S:
-
Ryz
The problem with Blogger user number IDs is that it gives 429s pretty easily at least on running ArchiveBot, which is why I would want this to take off the ground as soon as possible...
-
Ryz
arkiver?
-
pabs
fuck
-
» pabs . o O 0 ( #Y )
-
» pabs has 1323 URLs in his blogspot archive TODO...
-
fireonlive
x_x
-
pabs
ISTR with blogspot it is easy to enumerate lots of blogspot starting with one blog, see what other blogs that author has, and same for all the commenters
-
» pabs checks shell history for some terrible oneliners
-
pabs
also theres tons of spammers on blogspot
-
fireonlive
yeah one of the sites i want to get archived eventually is just 99% overrun with spam (it's also js-hell-frontend-on-top-of-phpBB2) :/
-
fireonlive
sad to se
-
fireonlive
see
-
pabs
-
pabs
-
shinji257
I got a couple of tasks that keep getting stuck at "Lua runtime error: reddit.lua:286: attempt to call global 'unicode_codepoint_as_utf8' (a nil value)"? They are reddit project tasks.
-
imer
shinji257: thats known i think, #shreddit is the project channel :)
-
imer
Just waiting for a fix, should be sorted later today
-
pabs
does anyone know if AB looks at <a href> links inside HTML comments?
-
mgrandi
msn.com/en-us/news/technology/atari…omebrew-community-forum/ar-AA1grqaA, I've heard rumblings that they are going to purge boards on
forums.atariage.com , dunno how easy it is to archive , it's an Invision forum board
-
pabs
there is an AB job in progress
-
pabs
and the forum has been saved before, 2021 or 2019 IIRC
-
pabs
unfortunately we had to restart the job a couple of times and slow it down a fair bit
-
mgrandi
Awesome
-
pabs
got the main website too and some other subdomains
-
Webuser693
Hey, do you have the video link, it's called
youtube.com/watch?v=fUVrK6089fs
-
shinji257
imer: acknowledged
-
h2ibot
Bzc6p edited ArchiveTeam Domains (+37, /* archiveteam.hu */ Lecsű is discontinued):
wiki.archiveteam.org/?diff=50743&oldid=50703
-
h2ibot
Bzc6p edited Deathwatch (-3, /* 2023 */ fix grammar):
wiki.archiveteam.org/?diff=50744&oldid=50741
-
h2ibot
Bzc6p edited Valhalla (+0, /* Physical Options */ typo):
wiki.archiveteam.org/?diff=50745&oldid=50740
-
fede
hello
-
fede
is this like an archiving project?
-
that_lurker
This is the team that does the projects. You can find info about current and old archiving projects in the wiki
wiki.archiveteam.org/index.php/Main_Page
-
that_lurker
On the page of every project you can also find the corresponging irc channel.
-
fede
there's no everyplay archive right?
-
imer
-
fede
thats so sad
-
fede
i lost all my videos
-
TheTechRobo
yeah, I unfortunately haven't been able to find anyone who archived it
-
h2ibot
-
h2ibot
-
JAA
pabs: AB parses the HTML and then walks the element tree. It shouldn't see anything in comments.
-
arkiver
thuban: is there any update on the orange sites coming back?
-
h2ibot
Myusernameisanything edited University Web Hosting (-7, Changing not saved yet tag to lost.):
wiki.archiveteam.org/?diff=50748&oldid=47676
-
h2ibot
Myusernameisanything edited List of websites excluded from the Wayback Machine (+57, Added 2 links):
wiki.archiveteam.org/?diff=50749&oldid=50702
-
h2ibot
Myusernameisanything edited BluWiki (+10, If there are about 20 dumps, it is partially…):
wiki.archiveteam.org/?diff=50750&oldid=27576
-
h2ibot
Gridkr edited List of websites excluded from the Wayback Machine (+20, Add
nexo.com/): wiki.archiveteam.org/?diff=50751&oldid=50749
-
h2ibot
JAABot edited List of websites excluded from the Wayback Machine (+0):
wiki.archiveteam.org/?diff=50754&oldid=50751
-
h2ibot
JustAnotherArchivist edited SoundCloud (+236, Datetimeify, add 2019 projectn't, add…):
wiki.archiveteam.org/?diff=50755&oldid=48897
-
project10
#archivebot jobs submit discovered things into the backfeed system, yes?
-
TheTechRobo
i don’t think so, assuming you mean e.g. queuing imgur URLs in #imgone
-
JAA
project10: No, there's zero interaction between AB and DPoS projects.
-
project10
well the genesis of my question was seeing #telegrab items submitted via AB (job 1ty54jgyh2n6iv2ri6o0gbbbp)
-
JAA
That's just me archiving URLs shared in AT channels so our logs aren't full of dead links in the future.
-
project10
oh :)
-
fireonlive
JAA++
-
that_lurker
we need commode points system here aswell :P
-
fireonlive
JAA++
-
eggdrop
karma for 'JAA' is now 1
-
fireonlive
lol
-
nicolas17
2 files remaining and I'll finish getting the listing of all yahoo-videos .tar.bz2 files
-
nicolas17
my intention was to get *.tar.bz2 first while I wrote a more efficient script to get the .tar lists, which of course I haven't actually started yet so I'll have to continue the .tar files the slow way
-
JAA
++fireonlive
-
fireonlive
f
-
JAA
Pff, doesn't even understand pre-incrementing.
-
fireonlive
:p
-
TheTechRobo
eggdrop—
-
TheTechRobo
Oh thanks the lounge i really needed that transformation
-
JAA
The Lounge--
-
eggdrop
[karma] 'The Lounge' is now at -1
-
JAA
The Lounge--
-
eggdrop
[karma] 'The Lounge' is now at -1
-
JAA
Ah, works with a normal space, too. :-)
-
Terbium
The Lounge++
-
eggdrop
[karma] 'The Lounge' is now at 0
-
fireonlive
TheTechRobo: i do believe that was iOS
-
fireonlive
:P
-
TheTechRobo
Oh thanks apple then
-
Terbium
iPhone--\
-
Terbium
iPhone--
-
eggdrop
[karma] 'iPhone' is now at -1
-
fireonlive
!
-
TheTechRobo
Dictating how I type letters, thanks Timmy
-
fireonlive
>not knowing how to configure text replacement
-
JAA
This can go in -ot now. :-)
-
fireonlive
:)
-
JAA
Apparently my FuzzyMemories.TV crawl is nearly done.
-
JAA
It has a bit of pagination to hunt down but has already retrieved most /watch/ pages and the accompanying videos (that aren't 404s).
-
JAA
Specifically, video IDs go to 4794, and my crawl has retrieved 4668 as of a couple minutes ago.
-
JAA
~100 GiB so far
-
JAA
4054 actual videos as of just now based on some crude log grepping.