-
purplebot
Elections/2020 September Swiss votes edited by JustAnotherArchivist (+881, Add some cantonal and local votes) 13 minutes ago --
archiveteam.org/?diff=45540&oldid=45539
-
purplebot
Political parties/Switzerland edited by JustAnotherArchivist (+1105, /* Other political entities */ …) just now --
archiveteam.org/?diff=45541&oldid=45478
-
purplebot
Elections/2020 September Swiss votes edited by JustAnotherArchivist (+25, /* Zürich: Hardturm-Stadion */ …) 11 minutes ago --
archiveteam.org/?diff=45542&oldid=45540
-
kiska
Re: XR I just added my target
-
OrIdow6^2
arkiver: Does the XR grab do anything when downloadableVideo is available?
-
kiska
XR going to be a last minute project? Looks like itnl
-
kiska
s/itnl/it
-
Sian1468
-
SketchTheCow
THat won't happen, you'll be in warczone
-
arkiver
OrIdow6^2: yes it gets the downloadable video
-
arkiver
kiska: no, finishing up now since tracker is working again
-
OrIdow6^2
Oh, I see, it's done through the generic URL extraction
-
OrIdow6^2
Hmmm
-
SketchTheCow
I'm going to make another Archive Team "Miscellaneous" collection.
-
SketchTheCow
archiveteam-fire is a mess and should be dealt with
-
arkiver
update coming up for samsung xr
-
arkiver
SketchTheCow: I assume all warrior project (also small ones) will still get their own collection?
-
SketchTheCow
In general, yes
-
SketchTheCow
-
OrIdow6^2
arkiver: Video IDs can have underscores, e.g.
samsungvr.com/view/Wv_0tcndBOG and
samsungvr.com/view/hq_vozffUc6 , don't know if you've noticed this
-
SketchTheCow
I'm the only one (besides you) who can make collection sets and populate and fix them
-
SketchTheCow
And as we've already established, you're lazy and work at half my speed
-
SketchTheCow
Just idlin' around, learning some crap degree
-
SketchTheCow
Who even USES physics
-
EggplantN_d
What's a physic?
-
arkiver
no physics would mean no archiveteam :)
-
SketchTheCow
It's this high-falutin' way of saying "shit falls over"
-
arkiver
its challenging
-
SketchTheCow
-
SketchTheCow
So much stuff is blowing through the inbox due to the projects, it's sometimes hard to notice these 20-50 items that have been flopping along the bottom of the tank in the torrents
-
JAA
So the mess that is now -fire will become a new mess in _misc?
-
SketchTheCow
The mess that is in -fire will split out into collections, or end up in misc.
-
SketchTheCow
I can't do much about 1,000 items with one single representative of itself.
-
SketchTheCow
But fire was essentially replaced with -inbox
-
SketchTheCow
And there's definitely collection-sized sets in there.
-
JAA
Ah, right.
-
SketchTheCow
-
SketchTheCow
That needs a description
-
Kaz
heh, _aaaaaaaa is usually me
-
Kaz
i don't remember what that was, tbh
-
SketchTheCow
Oh, I know it's you
-
JAA
Always has been.
-
JAA
-
JAA
'A DPoS project was launched on 2020-01-22 but almost immediately crippled the servers. No further attempts were made after that.'
-
JAA
So that's all we grabbed from that I think, maybe?
-
SketchTheCow
I can do a lot of this. It'd be nice to get some help
-
SketchTheCow
Obviously I get pulled away a lot.
-
SketchTheCow
But I can get things at least somewhat synced.
-
SketchTheCow
I THINK I cleared the inbox of everything that isn't an active pipe (twitter and weibo, where the amount coming in means a few dozen are always sitting there, being derived before moving)
-
Kaz
how does one create a collection
-
SketchTheCow
Be me
-
Kaz
well then
-
Kaz
i see
-
EggplantN_d
how does one become the mighty cow of sketch
-
SketchTheCow
Give a talk at an event at the Internet Archive, then walk down into the office and demand a job
-
EggplantN_d
that involves being in murica
-
EggplantN_d
wanting to avoid that for now
-
SketchTheCow
Well, yes
-
JAA
Whatever happened to the location in Canada?
-
SketchTheCow
It exists
-
SketchTheCow
We don't talk about it
-
SketchTheCow
It exists though
-
JAA
Ah :-)
-
EggplantN_d
ah one of those
-
kiska
:D
-
EggplantN_d
couple of raspberry pi's on a DSL line
-
SketchTheCow
Yes, that's how we do things at the archive
-
kiska
s/DSL/dial-up
-
kiska
:P
-
EggplantN_d
#bringbackmicrowave
-
SketchTheCow
s/dial-up/ham radio with a modem/
-
JAA
Connected to a pile of 1 GB HDDs.
-
JAA
There are a lot of USB splitters involved.
-
kiska
Connected to 5.25 inch floppies :D
-
kiska
With a human sorting "machine"
-
arkiver
samsung xr has just under 6000 videos
-
EggplantN_d
perfect lets go
-
SketchTheCow
Ok, inbox "fixed"
-
SketchTheCow
If it's in there (for the moment), IA is processing it and it'll move once done
-
kiska
arkiver: Are we doing discovery as we go?
-
arkiver
no
-
arkiver
project is online
-
SketchTheCow
So, archiveteam-fire has 62,000 items.
-
arkiver
I see EggplantN_d already grabbed a bunch of items
-
SketchTheCow
This should be fucking delightful
-
EggplantN_d
whoops
-
EggplantN_d
lol
-
EggplantN_d
doing 2Gbit atm over 2 boxes
-
kiska
For samsung 6k videos seem... small?
-
arkiver
odd sizes
-
arkiver
EggplantN_d: did you already update scripts to 20200928.01?
-
EggplantN_d
yes
-
EggplantN_d
['GNU Wget 1.20.3-at.20200919.01'],
-
EggplantN_d
VERSION = '20200928.01'
-
EggplantN_d
root@kvm1:/storage/samsung-xr-grab# cat pipeline.py | grep 2020
-
EggplantN_d
root@kvm1:/storage/samsung-xr-grab#
-
arkiver
scripts updated again
-
EggplantN_d
do i need to update?
-
arkiver
yeah, and you can abort everything
-
arkiver
let's soo how it performs now, I don't think this update will fix the possible problem
-
EggplantN_d
requeued
-
EggplantN_d
the 10 from the start need readding arkiver if you can
-
arkiver
yeah I requeued everything
-
arkiver
we're at a nice 10 items/min :)
-
EggplantN_d
surely we can go faster than 10/min?
-
kiska
btw we sure there are 6k videos?
-
arkiver
EggplantN_d: excuse me, another update
-
EggplantN_d
aaaaaaaaaa
-
arkiver
kiska: well seems like it yeah
-
EggplantN_d
ok
-
arkiver
feel free to abort
-
arkiver
should be final one
-
SketchTheCow
hahahahah I broke the internet archive queue
-
SketchTheCow
With photos
-
arkiver
also some items might be very large for samsung
-
arkiver
SketchTheCow: how did you do that
-
SketchTheCow
Well, shoving in 9,000 photos into objects forces the entire copy of the item over while it checks it
-
SketchTheCow
So 9000 * 25gb an item
-
lennier1
Yeah, looked liked videos had several resolutions, plus some had the downloadable option that let you get a stereoscopic video.
-
arkiver
SketchTheCow: the image needs to be derived?
-
JAA
Speaking of breaking IA... I have 15.4k/16.8 GiB WARCs from Joe Rogan's YouTube video comments (two per video). All in one item will be terrible, and one item per video will be terrible. Merging them together is also terrible for accessibility. What's the least bad way to upload these?
-
arkiver
JAA: what is 15.4k/16.8GiB
-
JAA
15.4k files totalling 16.8 GiB
-
arkiver
ah
-
JAA
So ~2 MB per video on average.
-
arkiver
I'd cat them together and upload to single item
-
arkiver
maybe keep the originals until deriving went well
-
JAA
Yeah, but then accessing a single video's comments will be horrible.
-
JAA
I don't expect this to work well in the WBM at all.
-
SketchTheCow
This all sounds terrible
-
arkiver
you can do a range request on the WARC
-
SketchTheCow
What have you do
-
arkiver
using the data from the CDX
-
SketchTheCow
ne
-
SketchTheCow
you fool, you've killed us all
-
arkiver
yeah he did
-
JAA
CDX won't help because the video ID is not in the comment API URLs, only an opaque continuation token.
-
JAA
YouTube is fun.
-
arkiver
keep some metadata file with references between URLs and video IDs?
-
JAA
Yeah, I guess so. Basically megawarc then.
-
arkiver
15.4k files in an items is really not a good idea
-
arkiver
yeah megawarc
-
SketchTheCow
Megawarc it, and if you're concerned about the conversion, make a second item with a simple cat like a .tar or something, with a "use sometime" introduction and note.
-
JAA
Alright, sounds good.
-
arkiver
EggplantN_d: crap another update coming up
-
arkiver
looks like samsungvr sometimes gives 404 for m3u8 files :/
-
EggplantN_d
will need requeue?
-
arkiver
yeah
-
arkiver
this site sucks as well
-
EggplantN_d
even items that are done?
-
EggplantN_d
😦
-
arkiver
yeah
-
EggplantN_d
set 03 to min version anyway on tracker
-
arkiver
all status codes except 200 and 302 will yield an error now
-
EggplantN_d
*043
-
EggplantN_d
**04
-
SketchTheCow
So, I've gone ahead and made a choice wrt the archiveteam-fire stuff
-
SketchTheCow
Anything with identifier warc-* is going into a collection.
-
SketchTheCow
-
arkiver
and I assume in the archiveteam collection?
-
arkiver
EggplantN_d: all updated *again*
-
arkiver
non 200 or 302 codes are bad now
-
EggplantN_d
have you requeued?
-
EggplantN_d
also do the 66 done need redoing?
-
EggplantN_d
i've requeued out
-
arkiver
everything is requeued again
-
arkiver
what is it lately with sites that suck
-
arkiver
tencent as well
-
EggplantN_d
not sure :9
-
EggplantN_d
hows google sites lol
-
EggplantN_d
is that as bad
-
HP_Archivist
JAA: What's the correct sever for the Efnet channel?
-
HP_Archivist
server*
-
HP_Archivist
paraphysics of choopa?
-
HP_Archivist
or* choopa
-
HP_Archivist
I'm trying to duplicate the hexchat configuration I have on one machine, to another one. and for some reason I can't connect
-
lennier1
arkiver: Does this download the stereoscopic video for 3D videos when downloadableVideo is enabled? We were discussing a few days ago that the web viewer doesn't offer stereoscopic.
-
arkiver
lennier1: do you have an example URL?
-
lennier1
I can find one in the logs.
-
lennier1
-
OrIdow6^2
lennier1: Yes
-
OrIdow6^2
Wait, misread that
-
JAA
HP_Archivist: None because we're moving soon. :-P Pick any from
efnet.org/?module=servers or use the generic irc.efnet.org if you're not in too many channels.
-
OrIdow6^2
Still yes
-
arkiver
lennier1: yes
-
arkiver
or OrIdow6^2 answered yeah
-
arkiver
(I read the "misread that" only :P)
-
HP_Archivist
JAA: Yeah, I PM'ed you with an error I keep getting
-
lennier1
OK, good to know. :)
-
arkiver
final items for naver have been fixed, and queued
-
SketchTheCow
-
SketchTheCow
That'll be 13,000 items or so
-
purplebot
Wikipedia edited by M.Barry (+95, /* External links */ Adding link …) just now --
archiveteam.org/?diff=45543&oldid=45537
-
mgrandi
Is youtube deleting the community captions today?
-
EggplantN_d
kinda
-
lennier1
Community captions that were submitted, but never approved by the video owner, I believe.
-
mgrandi
There was a non warrior project for it, should I turn my workers onto that I guess? Since tencent weibo is dead now
-
lennier1
themadpro was looking for help on it earlier.
github.com/Data-Horde/ytcc-archive
-
EggplantN_d
no need to spin up more they are capped at 2k/min
-
EggplantN_d
-
SketchTheCow
-
arkiver
more is coming up, no worries :P
-
EggplantN_d
more what >_> arkiver
-
fuzzy8021
any ip limit on samsungxr?
-
EggplantN_d
no, just slow due to file sizes
-
EggplantN_d
so dont go insane
-
fuzzy8021
k thanks
-
EggplantN_d
20/min limit set via tracker also
-
fuzzy8021
ah
-
EggplantN_d
targets are a bit toasty but even with that eta is ~5 hours to finish samsung XR so I'm not fussed about doing anything
-
JAA
From EFnet #warrior: 21:23:27 < n00b__> Hi, there's this Estonian image host that will be closing very very very soon, I am certain it will come with data loss. What can I do to help archive it? Here's the site:
fotoalbum.ee
-
arkiver
1st of october
-
EggplantN_d
aaaaaaaaa
-
EggplantN_d
another juan
-
jodizzle
Too much for AB?
-
arkiver
found a way to get all images
-
arkiver
for URLs like
fotoalbum.ee/photos/MagusKiisu/112474281 you need both username and ID
-
arkiver
but URL
fotoalbum.ee/popup.php?type=share&pic=112474281 will give you the link with user while only having photo ID in the URL
-
arkiver
IDs are sequential
-
arkiver
so I guess we'll go through 100+ million IDs
-
jodizzle
Nice
-
jodizzle
Also seeing some links in the upper-left, like album.ee. Seems like they're all closing
-
arkiver
nicely noticed
-
arkiver
yep
-
arkiver
projects coming up
-
jodizzle
I'm gonna throw the social medias and such in AB
-
EggplantN_d
full script project arkiver?
-
arkiver
EggplantN_d: not sure if I get what your question, but "yes"
-
arkiver
jodizzle: thanks, please also throw in the main websites, so we get the main pages
-
jodizzle
Got it
-
arkiver
you can go !a and cancel after some time (or ignore photo pages/URLs)
-
arkiver
it's so we get any FAQs, announcement, etc. ('special pages')
-
EggplantN_d
ah great
-
jodizzle
Yeah, I'll ignore the photo pages if we successfully get them by warrior
-
purplebot
Deathwatch edited by JustAnotherArchivist (+137, /* 2020 */ Add Fotoalbum) just now --
archiveteam.org/?diff=45544&oldid=45538
-
arkiver
I guess because it's different website it should be different projects
-
arkiver
also in structure different website
-
Kaz
channel?
-
arkiver
and different names/logos/etc.
-
arkiver
no idea Kaz :P
-
thuban
fotoalbum-eek
-
EggplantN_d
fotooff
-
Kaz
fotoff seems like we've had something similar in the past
-
Kaz
same with any sort of fotofail-type name
-
Kaz
whats the length limit on hackint
-
Kaz
ah yes, I'm allowed #lookatthisfotoalbum, only problem is i hate it
-
EggplantN_d
#lookatthisfotograph surely
-
EggplantN_d
-
thuban
i can hear it in my head. why would you do this to us
-
EggplantN_d
hahahahaha
-
Kaz
well yeah, but the site _is_ called fotoalbum
-
EggplantN_d
but for the meme kaz
-
Kaz
yes ik
-
Kaz
sod it, lets go for #lookatthisfotograph
-
EggplantN_d
navergonnagiveyouup
-
EggplantN_d
youtube.com/watch?v=aANF2OOVX40 youtube recommended takes you to strange places
-
thuban
lennier1, Doranwen: have you guys finished downloading my torrent? (i want to move the data back to cold storage)
-
arkiver
thuban: what is in the torrent?
-
thuban
arkiver: gmds from the yahoo groups project that were sitting inaccessible on marked1's target
-
JAA
We have a channel for that, no?
-
arkiver
thuban: you can upload the torrents to IA, if it's not too many individual files
-
arkiver
and yeah JAA
-
JAA
#yahoosucks
-
thuban
oh they are both in it. i assumed it would be deserted at this point
-
JAA
Speaking of channels, do we want one for Samsung XR? I suggested #sandsung yesterday, and even though nobody responded to that, there are 8 people in it now (including myself).
-
EggplantN_d
it'll be over in a few hours JAA
-
thuban
arkiver: i'll pass for now due to the unresolved privacy issues--iirc betamax had some plans for sorting/processing that data, but i don't know whether any progress was ever made. (i guess further discussion can go back to #yahoosucks)
-
arkiver
i see
-
arkiver
so #lookatthisfotograph for all the photo/video sites
-
arkiver
the .ee sites
-
thuban
as long as we're talking about new projects: is the warrior still going to be supported in the future, or are we going over solely to docker containers?
-
thuban
i've upgraded from -3 to the very recent -3.1 and run the warrior-extras-installer, but my warrior still complains on e.g. samsung-xr of "No usable Wget+At found"
-
Kaz
yes and no
-
Kaz
we'd like to get the warrior to a place where it's a little more.. stable
-
Kaz
but nobody really has the time to look at it properly
-
thuban
ahh, i see
-
thuban
i wish i could help more--i'm more of a programmer than a sysadmin, but if anyone has scutwork that needs done (documentation?) hmu
-
thuban
(i realize that individual machines are a drop in the (bit)bucket compared to the mass cloud deployments people have been doing, but when a project comes along we can't finish, every drop matters, right?)
-
arkiver
exactly!
-
arkiver
every drop absolutely matters
-
arkiver
if we can't finish something in time, every URL you archived is saved by you for eternity :)
-
Kaz
it'd be super nice to have the warriors in a workable state again
-
Kaz
we're a lot better at managing uploads too, which was a pain point in the past
-
EggplantN_d
we're finally getting around to having a variety of targets setup ready 24/7
-
EggplantN_d
then the warrior is next likely
-
thuban
sounds good :)