-
arkiver
fireonlive: d*****d is discord?
-
arkiver
how many are we talking about?
-
arkiver
if it's not a huge number (check with me if you think it's a huge number) then yes
-
fireonlive
ye, was looking at the LTT discord dump someone posted and it was around 1TB >_<
-
arkiver
fireonlive: like 1 TB of URLs?
-
arkiver
or of data?
-
fireonlive
raw data
-
arkiver
what is LTT?
-
fireonlive
linus tech tips
-
arkiver
oh
-
arkiver
nice
-
arkiver
yes sure, dump that in
-
fireonlive
oki
-
fireonlive
may i have the voice here
-
arkiver
yes
-
fireonlive
thanks :)
-
arkiver
i forgot how to make that permanent...
-
arkiver
but this will work for now
-
fireonlive
no worries
-
nicolas17
arkiver: the notion of "1 TB of URLs" is freaking me out
-
arkiver
wait
-
arkiver
fireonlive: so like the data coming in through the URLs is 1 TB right?
-
arkiver
or is the list of URLs themselves 1 TB?
-
fireonlive
data downloaded will be about 600-1000GB; size of list lower
-
JAA
Let's see if this works...
-
JAA
.vop add fireonlive
-
ChanServ
JAA added fireonlive to the VOP list.
-
arkiver
right so data coming through URLs
-
JAA
Yup :-)
-
arkiver
should be fine then
-
fireonlive
thanks JAA :)
-
arkiver
nicolas17: so 1 TB of data after downloading content through URLs
-
nicolas17
I assume they meant the whole Discord dump, which includes messages, metadata, image URLs, and the data in those URLs (in a format not usable for WBM, hence need to archive it separately)
-
arkiver
JAA: what is thst?
-
arkiver
that*
-
fireonlive
'voice-op'
-
arkiver
nice
-
JAA
arkiver: Auto-voice, equivalent to '/msg ChanServ VOP #// ADD fireonlive'
-
fireonlive
wasn't sure what AT's standard flags were for voicing someone
-
arkiver
that makes it easier to remember
-
nicolas17
oh because ChanServ is present here
-
nicolas17
.help
-
fireonlive
ChanServ_3830423842304823408420blazeit: devoice
-
fireonlive
.voice
-
fireonlive
:)
-
arkiver
i guess that is possible because you're in some permanent list now
-
nicolas17
yes
-
nicolas17
.voice
-
fireonlive
indeed: /msg chanserv flags #//
-
nicolas17
"you are not authorized"
-
arkiver
well TIL
-
arkiver
.voice
-
fireonlive
you can see everyone with access there
-
arkiver
:P i have voice and ops now?
-
fireonlive
yep! :p
-
arkiver
yay collect them all!
-
JAA
Yes, and h2ibot even supports that correctly, too. :-)
-
nicolas17
I think the +v is not visible, but if you deop yourself you'll be left with voice instead of nothing
-
fireonlive
arkiver: type this: /mode +oooo arkiver arkiver arkiver arkiver
-
fireonlive
:3
-
JAA
nicolas17: Most clients don't display it, but it is visible at the protocol level.
-
arkiver
i'll go trust fireonlive here
-
arkiver
oh wait
-
arkiver
i know that command :P
-
nicolas17
JAA: I thought the raw user list would have @ alone rather than @ and +
-
fireonlive
:D
-
fireonlive
charybdis gives no fucks
-
JAA
nicolas17: Hmm, I'd have to check how exactly I retrieve it in http2irc. It's definitely available though.
-
fireonlive
,, isvoice arkiver
-
eggdrop
ok: 1 -4ms-
-
fireonlive
,, isop arkiver
-
eggdrop
ok: 1 -0ms-
-
fireonlive
,, isvoice nicolas17
-
eggdrop
ok: 0 -0ms-
-
fireonlive
hm
-
arkiver
,, isvoice JAA
-
eggdrop
no.
-
arkiver
you're behind JAA
-
arkiver
,, isops JAA
-
eggdrop
no.
-
arkiver
,, isop JAA
-
eggdrop
no.
-
JAA
lol
-
fireonlive
that's just it refusing to serve you
-
arkiver
uh
-
fireonlive
lol
-
fireonlive
,, isop JAA
-
eggdrop
ok: 1 -1ms-
-
JAA
Computer says no...
-
fireonlive
,, isvoice JAA
-
eggdrop
ok: 0 -0ms-
-
fireonlive
",," is raw TCL
-
arkiver
:P and i actually don't care a whole lot
-
fireonlive
which also gives people...
-
arkiver
eggdrop can serve itself
-
fireonlive
,, exec free -m
-
eggdrop
ok: (3 lines) -28ms-
-
eggdrop
1/3: total used free shared buff/cache available
-
eggdrop
2/3: Mem: 457 327 12 3 132 130
-
eggdrop
3/3: Swap: 511 342 169
-
fireonlive
raw shell access :P
-
JAA
WCGW?
-
fireonlive
so i have to gate it
-
arkiver
yes of course
-
fireonlive
atm it just checks if it's me
-
» nicolas17 waves the "off-topic" flag
-
» fireonlive waves nicolas17
-
arkiver
but yes #archiveteam-ot
-
JAA
,, :(){:|:&};:
-
eggdrop
no.
-
fireonlive
xP
-
arkiver
i'll say that from now on too if I don't want to reply
-
arkiver
"no."
-
fireonlive
:D
-
JAA
,, Will you answer with 'no.'?
-
eggdrop
no.
-
fireonlive
just discordcdn links for LTT ye?
-
arkiver
something is blowing up through in the tracker
-
fireonlive
there's some other ones: /screenshots [375444977033150494].txt:http://puu.sh/I3tvo/b7bafeac63.jpg
-
arkiver
fireonlive: yes, what else?
-
fireonlive
or ltx-2023 [758027280420307005].txt:http://lttstore.com
-
fireonlive
lol
-
arkiver
oh
-
fireonlive
someone's linkedin
-
arkiver
do you have a list for me to check?
-
JAA
Huh, secondary's still 9.1M‽
-
arkiver
i assume you mean a big bunch of outlinks
-
fireonlive
mm
-
fireonlive
i can grab just discord cdn or also just grep for http(s)://
-
fireonlive
one sec
-
arkiver
JAA: yeah because i took out some overly aggressive filter patterns earlier... but should go down soon
-
JAA
That's still only the linktr.ee backlog, right?
-
arkiver
JAA: kind of, i also moved a bunch of .de stuff to there
-
JAA
Ah
-
fireonlive
-
fireonlive
(does bot extract links from text?)
-
arkiver
fireonlive: not sure if it extract links from text
-
arkiver
i'd have to check the code
-
fireonlive
!help
-
h2ibot
fireonlive: The following commands are available: (for '')
-
h2ibot
fireonlive: !help: Print this help message. (for '')
-
h2ibot
fireonlive: !a: Deduplicate and archive a list of URLs hosted on transfer.archivete.am. CAREFUL, DDOS. (for '')
-
fireonlive
i could use a better regex too
-
arkiver
fireonlive: no
-
fireonlive
oh ok
-
JAA
http\S* good enough?
-
fireonlive
i used 'https?://' for this list
-
arkiver
i could look into making it extract URLs from text but that will not be in tofay
-
arkiver
today*
-
fireonlive
ok :)
-
JAA
-o is your friend.
-
fireonlive
ye
-
fireonlive
i wouldn't be against just providing links
-
fireonlive
but i think arkiver wants to do extraction bot side
-
vokunal|m
Yeah I did the LTT discord dump. I still have ~700 gigs to upload. Two zip files around 300GB each and another several hundred thousand files I haven't zipped yet. Lots of images sent on discord
-
arkiver
feel free to just provide a list of URLs
-
arkiver
vokunal|m: where are these uploads going?
-
arkiver
as in which items?
-
fireonlive
whatever is best :)
-
fireonlive
am flexible boi
-
vokunal|m
-
fireonlive
-
fireonlive
but if you're already uploading media then i guess we're good!
-
vokunal|m
ahh i didn't do that one
-
vokunal|m
All of the json and html files are uploaded already if someone wants to grab the links from them. All that's left is three channel's media files
-
fireonlive
ah yeah yours is from the 21st vs the 6th
-
vokunal|m
Yeah I saw someone grab it and they said it kept crashing so I started grabbing it. It took a long time
-
vokunal|m
Are we grabbing urls from specific discord servers only or more broad?
-
fireonlive
am a little confused as to what i'm to do next so i'll pause for now :D
-
arkiver
i should get some sleep now
-
arkiver
fireonlive: if you have a list of URLs extracted from that, those can be queued
-
arkiver
if the discord CDN URLs are easy to extract, those can be done first, but all is welcome
-
fireonlive
sounds good :)
-
fireonlive
have a good sleep arkiver
-
arkiver
thanks :)
-
vokunal|m
Here's a few from their respective discord servers.... (full message at <
matrix.hackint.org/_matrix/media/v3…ackint.org/jcejbUumCphuJtYZfGdqCmWq>)
-
vokunal|m
cdn links only as far as I can tell
-
vokunal|m
Here's the LTT one as well
-
vokunal|m
-
JAA
-
JAA
-
fireonlive
(still downloading json)
-
vokunal|m
I don't have a script to rip the json so i grabbed the html. I think they should have the same links
-
fireonlive
ah ye i'd imagine so
-
fireonlive
-
h2ibot
fireonlive: Deduplicating and queuing 157410 items. (for '
transfer.archivete.am/8tbaR/urls-Linus-Tech-Tips-Discord.txt')
-
h2ibot
-
fireonlive
:)
-
fireonlive
the rest don't seem to have a lot
-
fireonlive
-
fireonlive
-
fireonlive
-
fireonlive
-
fireonlive
-
h2ibot
-
fireonlive
-
fireonlive
-
h2ibot
-
fireonlive
-
fireonlive
-
vokunal|m
yeah they're much smaller servers
-
h2ibot
-
h2ibot
fireonlive: Deduplicating and queuing 2262 items. (for '
transfer.archivete.am/n2pro/urls-rslash-MadeInAbyss.txt')
-
h2ibot
fireonlive: Deduplicating and queuing 389 items. (for '
transfer.archivete.am/T3Ymc/urls-AI-Hub-Discord.txt')
-
h2ibot
-
h2ibot
fireonlive: Deduplicating and queuing 1039 items. (for '
transfer.archivete.am/k1J4R/urls-Melvor-Idle-Discord.txt')
-
h2ibot
-
h2ibot
-
h2ibot
-
h2ibot
fireonlive: Deduplicating and queuing 6538 items. (for '
transfer.archivete.am/e3RR7/urls-Soda-Dungeon-Discord.txt')
-
h2ibot
-
h2ibot
fireonlive: Deduplicated and queued 2262 items. (for '
transfer.archivete.am/n2pro/urls-rslash-MadeInAbyss.txt')
-
h2ibot
fireonlive: Deduplicated and queued 389 items. (for '
transfer.archivete.am/T3Ymc/urls-AI-Hub-Discord.txt')
-
h2ibot
fireonlive: Deduplicated and queued 1039 items. (for '
transfer.archivete.am/k1J4R/urls-Melvor-Idle-Discord.txt')
-
h2ibot
-
h2ibot
-
h2ibot
-
TheTechRobo
is that only CDN urls?
-
TheTechRobo
s/CDN/image/
-
vokunal|m
yeah
-
» fireonlive pets h2ibot
-
fireonlive
you're my little archivebot aren't you
-
fireonlive
yes you are
-
fireonlive
yes you are my little archivebot
-
fireonlive
:3
-
vokunal|m
Here's some msc discord cdn urls i have from other url dumps i have
transfer.archivete.am/W89Ed/urls-discord-msc.txt
-
fireonlive
hmmm 71k not sure what qualifies as huge (LTT was huuuuge)
-
fireonlive
cc arkiver :3
-
fireonlive
it'd be just like me to get and lose access in less than a day
-
TheTechRobo
fireonlive: hmm let me check my logs, ark.iver did give me a limit when I got voice
-
TheTechRobo
I think he said that >1m URLs would be an "ask me first"
-
vokunal|m
based on KiB/u, 1M would be close to 100GiB. makes sense
-
vokunal|m
on avg
-
fireonlive
hmm that would make sense
-
-
fireonlive
-
h2ibot
fireonlive: Deduplicating and queuing 1 items. (for '
transfer.archivete.am/6k1XA/test.txt')
-
h2ibot
fireonlive: Deduplicated and queued 1 items. (for '
transfer.archivete.am/6k1XA/test.txt')
-
fireonlive
:3
-
arkiver
-
h2ibot
arkiver: Deduplicating and queuing 70595 items. (for '
transfer.archivete.am/W89Ed/urls-discord-msc.txt')
-
arkiver
vokunal|m: ^
-
h2ibot
arkiver: Deduplicated and queued 70595 items. (for '
transfer.archivete.am/W89Ed/urls-discord-msc.txt')
-
fireonlive
:)
-
arkiver
-
h2ibot
arkiver: Deduplicating and queuing 383 items. (for '
transfer.archivete.am/j4seB/rss_urls.txt')
-
h2ibot
arkiver: Deduplicated and queued 383 items. (for '
transfer.archivete.am/j4seB/rss_urls.txt')