01:00:07 fireonlive: d*****d is discord? 01:00:17 how many are we talking about? 01:00:38 if it's not a huge number (check with me if you think it's a huge number) then yes 01:00:53 ye, was looking at the LTT discord dump someone posted and it was around 1TB >_< 01:01:10 fireonlive: like 1 TB of URLs? 01:01:16 or of data? 01:01:17 raw data 01:01:24 what is LTT? 01:01:31 linus tech tips 01:01:34 oh 01:01:35 nice 01:01:38 yes sure, dump that in 01:01:42 oki 01:01:54 may i have the voice here 01:01:58 yes 01:02:04 thanks :) 01:02:11 i forgot how to make that permanent... 01:02:16 but this will work for now 01:02:43 no worries 01:04:33 arkiver: the notion of "1 TB of URLs" is freaking me out 01:04:39 wait 01:04:55 fireonlive: so like the data coming in through the URLs is 1 TB right? 01:05:03 or is the list of URLs themselves 1 TB? 01:05:29 data downloaded will be about 600-1000GB; size of list lower 01:05:46 Let's see if this works... 01:05:48 .vop add fireonlive 01:05:48 -ChanServ- JAA added fireonlive to the VOP list. 01:05:49 right so data coming through URLs 01:05:54 Yup :-) 01:05:55 should be fine then 01:05:56 thanks JAA :) 01:06:09 nicolas17: so 1 TB of data after downloading content through URLs 01:06:11 I assume they meant the whole Discord dump, which includes messages, metadata, image URLs, and the data in those URLs (in a format not usable for WBM, hence need to archive it separately) 01:06:15 JAA: what is thst? 01:06:17 that* 01:06:22 'voice-op' 01:06:36 nice 01:06:37 arkiver: Auto-voice, equivalent to '/msg ChanServ VOP #// ADD fireonlive' 01:06:37 wasn't sure what AT's standard flags were for voicing someone 01:06:43 that makes it easier to remember 01:06:49 oh because ChanServ is present here 01:06:55 .help 01:07:10 ChanServ_3830423842304823408420blazeit: devoice 01:07:13 .voice 01:07:15 :) 01:07:44 i guess that is possible because you're in some permanent list now 01:07:51 yes 01:07:52 .voice 01:07:54 indeed: /msg chanserv flags #// 01:07:58 "you are not authorized" 01:07:59 well TIL 01:08:04 .voice 01:08:07 you can see everyone with access there 01:08:15 :P i have voice and ops now? 01:08:20 yep! :p 01:08:28 yay collect them all! 01:08:39 Yes, and h2ibot even supports that correctly, too. :-) 01:08:40 I think the +v is not visible, but if you deop yourself you'll be left with voice instead of nothing 01:08:43 arkiver: type this: /mode +oooo arkiver arkiver arkiver arkiver 01:08:47 :3 01:08:56 nicolas17: Most clients don't display it, but it is visible at the protocol level. 01:09:06 i'll go trust fireonlive here 01:09:14 oh wait 01:09:17 i know that command :P 01:09:32 JAA: I thought the raw user list would have @ alone rather than @ and + 01:09:39 :D 01:09:59 charybdis gives no fucks 01:10:17 nicolas17: Hmm, I'd have to check how exactly I retrieve it in http2irc. It's definitely available though. 01:10:45 ,, isvoice arkiver 01:10:45 ok: 1 -4ms- 01:10:48 ,, isop arkiver 01:10:49 ok: 1 -0ms- 01:10:54 ,, isvoice nicolas17 01:10:54 ok: 0 -0ms- 01:10:56 hm 01:11:22 ,, isvoice JAA 01:11:22 no. 01:11:34 you're behind JAA 01:11:43 ,, isops JAA 01:11:43 no. 01:11:49 ,, isop JAA 01:11:49 no. 01:11:50 lol 01:11:53 that's just it refusing to serve you 01:11:53 uh 01:11:56 lol 01:11:59 ,, isop JAA 01:11:59 ok: 1 -1ms- 01:12:03 Computer says no... 01:12:04 ,, isvoice JAA 01:12:05 ok: 0 -0ms- 01:12:19 ",," is raw TCL 01:12:21 :P and i actually don't care a whole lot 01:12:22 which also gives people... 01:12:24 eggdrop can serve itself 01:12:25 ,, exec free -m 01:12:25 ok: (3 lines) -28ms- 01:12:25 1/3: total used free shared buff/cache available 01:12:26 2/3: Mem: 457 327 12 3 132 130 01:12:27 3/3: Swap: 511 342 169 01:12:31 raw shell access :P 01:12:36 WCGW? 01:12:36 so i have to gate it 01:12:38 yes of course 01:12:50 atm it just checks if it's me 01:12:52 * nicolas17 waves the "off-topic" flag 01:12:57 * fireonlive waves nicolas17 01:13:18 but yes #archiveteam-ot 01:13:20 ,, :(){:|:&};: 01:13:20 no. 01:14:09 xP 01:14:14 i'll say that from now on too if I don't want to reply 01:14:15 "no." 01:14:30 :D 01:14:37 ,, Will you answer with 'no.'? 01:14:37 no. 01:14:40 just discordcdn links for LTT ye? 01:14:46 something is blowing up through in the tracker 01:14:52 there's some other ones: /screenshots [375444977033150494].txt:http://puu.sh/I3tvo/b7bafeac63.jpg 01:14:55 fireonlive: yes, what else? 01:14:59 or ltx-2023 [758027280420307005].txt:http://lttstore.com 01:15:01 lol 01:15:15 oh 01:15:19 someone's linkedin 01:15:20 do you have a list for me to check? 01:15:29 Huh, secondary's still 9.1M‽ 01:15:32 i assume you mean a big bunch of outlinks 01:15:37 mm 01:15:49 i can grab just discord cdn or also just grep for http(s):// 01:15:52 one sec 01:15:55 JAA: yeah because i took out some overly aggressive filter patterns earlier... but should go down soon 01:16:02 That's still only the linktr.ee backlog, right? 01:16:22 JAA: kind of, i also moved a bunch of .de stuff to there 01:16:26 Ah 01:21:00 arkiver: here's the very raw grep https://transfer.archivete.am/qalfN/ltt-links-raw.txt 01:21:27 (does bot extract links from text?) 01:22:20 fireonlive: not sure if it extract links from text 01:22:26 i'd have to check the code 01:22:32 !help 01:22:33 fireonlive: The following commands are available: (for '') 01:22:34 fireonlive: !help: Print this help message. (for '') 01:22:35 fireonlive: !a: Deduplicate and archive a list of URLs hosted on transfer.archivete.am. CAREFUL, DDOS. (for '') 01:22:53 i could use a better regex too 01:22:59 fireonlive: no 01:23:03 oh ok 01:23:22 http\S* good enough? 01:23:55 i used 'https?://' for this list 01:24:02 i could look into making it extract URLs from text but that will not be in tofay 01:24:04 today* 01:24:22 ok :) 01:24:28 -o is your friend. 01:24:54 ye 01:25:00 i wouldn't be against just providing links 01:25:10 but i think arkiver wants to do extraction bot side 01:26:40 Yeah I did the LTT discord dump. I still have ~700 gigs to upload. Two zip files around 300GB each and another several hundred thousand files I haven't zipped yet. Lots of images sent on discord 01:26:53 feel free to just provide a list of URLs 01:27:19 vokunal|m: where are these uploads going? 01:27:22 as in which items? 01:27:24 whatever is best :) 01:27:52 am flexible boi 01:28:25 https://archive.org/details/Discord-Linus-Tech-Tips 01:28:27 i used the LTT discord dump from https://archive.org/details/discord-375436620578684930 01:28:44 but if you're already uploading media then i guess we're good! 01:28:50 ahh i didn't do that one 01:30:32 All of the json and html files are uploaded already if someone wants to grab the links from them. All that's left is three channel's media files 01:31:10 ah yeah yours is from the 21st vs the 6th 01:32:03 Yeah I saw someone grab it and they said it kept crashing so I started grabbing it. It took a long time 01:34:35 Are we grabbing urls from specific discord servers only or more broad? 01:36:28 am a little confused as to what i'm to do next so i'll pause for now :D 01:37:53 i should get some sleep now 01:38:04 fireonlive: if you have a list of URLs extracted from that, those can be queued 01:38:24 if the discord CDN URLs are easy to extract, those can be done first, but all is welcome 01:41:48 sounds good :) 01:41:52 have a good sleep arkiver 01:42:39 thanks :) 02:03:18 Here's a few from their respective discord servers.... (full message at ) 02:03:39 cdn links only as far as I can tell 02:09:25 Here's the LTT one as well 02:09:26 https://transfer.archivete.am/8tbaR/urls-Linus-Tech-Tips-Discord.txt 02:10:37 Reposting the links because Matrix is being annoying: https://transfer.archivete.am/HnwYp/urls-Made-In-rslash-place-Discord.txt https://transfer.archivete.am/k1J4R/urls-Melvor-Idle-Discord.txt https://transfer.archivete.am/i96Cj/urls-Ragtag-Archive-Discord.txt https://transfer.archivete.am/n2pro/urls-rslash-MadeInAbyss.txt https://transfer.archivete.am/stUD1/urls-SheepIt-Renderfarm-Discord.txt 02:10:43 https://transfer.archivete.am/e3RR7/urls-Soda-Dungeon-Discord.txt https://transfer.archivete.am/T3Ymc/urls-AI-Hub-Discord.txt https://transfer.archivete.am/DisCh/urls-Arkham-Network-Discord.txt https://transfer.archivete.am/mMSZ8/urls-Edgeworld-Revival-Discord.txt 02:18:40 (still downloading json) 02:19:19 I don't have a script to rip the json so i grabbed the html. I think they should have the same links 02:22:11 ah ye i'd imagine so 02:23:00 !a https://transfer.archivete.am/8tbaR/urls-Linus-Tech-Tips-Discord.txt 02:23:05 fireonlive: Deduplicating and queuing 157410 items. (for 'https://transfer.archivete.am/8tbaR/urls-Linus-Tech-Tips-Discord.txt') 02:23:14 fireonlive: Deduplicated and queued 157410 items. (for 'https://transfer.archivete.am/8tbaR/urls-Linus-Tech-Tips-Discord.txt') 02:23:29 :) 02:27:10 the rest don't seem to have a lot 02:28:24 !a https://transfer.archivete.am/HnwYp/urls-Made-In-rslash-place-Discord.txt 02:28:25 !a https://transfer.archivete.am/k1J4R/urls-Melvor-Idle-Discord.txt 02:28:25 !a https://transfer.archivete.am/i96Cj/urls-Ragtag-Archive-Discord.txt 02:28:25 !a https://transfer.archivete.am/n2pro/urls-rslash-MadeInAbyss.txt 02:28:25 !a https://transfer.archivete.am/stUD1/urls-SheepIt-Renderfarm-Discord.txt 02:28:25 fireonlive: Deduplicating and queuing 10 items. (for 'https://transfer.archivete.am/HnwYp/urls-Made-In-rslash-place-Discord.txt') 02:28:25 !a https://transfer.archivete.am/e3RR7/urls-Soda-Dungeon-Discord.txt 02:28:25 !a https://transfer.archivete.am/T3Ymc/urls-AI-Hub-Discord.txt 02:28:26 fireonlive: Deduplicating and queuing 203 items. (for 'https://transfer.archivete.am/i96Cj/urls-Ragtag-Archive-Discord.txt') 02:28:26 !a https://transfer.archivete.am/DisCh/urls-Arkham-Network-Discord.txt 02:28:26 !a https://transfer.archivete.am/mMSZ8/urls-Edgeworld-Revival-Discord.txt 02:28:27 yeah they're much smaller servers 02:28:27 fireonlive: Deduplicating and queuing 50 items. (for 'https://transfer.archivete.am/stUD1/urls-SheepIt-Renderfarm-Discord.txt') 02:28:28 fireonlive: Deduplicating and queuing 2262 items. (for 'https://transfer.archivete.am/n2pro/urls-rslash-MadeInAbyss.txt') 02:28:29 fireonlive: Deduplicating and queuing 389 items. (for 'https://transfer.archivete.am/T3Ymc/urls-AI-Hub-Discord.txt') 02:28:30 fireonlive: Deduplicated and queued 10 items. (for 'https://transfer.archivete.am/HnwYp/urls-Made-In-rslash-place-Discord.txt') 02:28:31 fireonlive: Deduplicating and queuing 1039 items. (for 'https://transfer.archivete.am/k1J4R/urls-Melvor-Idle-Discord.txt') 02:28:32 fireonlive: Deduplicated and queued 203 items. (for 'https://transfer.archivete.am/i96Cj/urls-Ragtag-Archive-Discord.txt') 02:28:33 fireonlive: Deduplicated and queued 50 items. (for 'https://transfer.archivete.am/stUD1/urls-SheepIt-Renderfarm-Discord.txt') 02:28:34 fireonlive: Deduplicating and queuing 426 items. (for 'https://transfer.archivete.am/DisCh/urls-Arkham-Network-Discord.txt') 02:28:35 fireonlive: Deduplicating and queuing 6538 items. (for 'https://transfer.archivete.am/e3RR7/urls-Soda-Dungeon-Discord.txt') 02:28:36 fireonlive: Deduplicating and queuing 539 items. (for 'https://transfer.archivete.am/mMSZ8/urls-Edgeworld-Revival-Discord.txt') 02:28:37 fireonlive: Deduplicated and queued 2262 items. (for 'https://transfer.archivete.am/n2pro/urls-rslash-MadeInAbyss.txt') 02:28:38 fireonlive: Deduplicated and queued 389 items. (for 'https://transfer.archivete.am/T3Ymc/urls-AI-Hub-Discord.txt') 02:28:39 fireonlive: Deduplicated and queued 1039 items. (for 'https://transfer.archivete.am/k1J4R/urls-Melvor-Idle-Discord.txt') 02:28:40 fireonlive: Deduplicated and queued 426 items. (for 'https://transfer.archivete.am/DisCh/urls-Arkham-Network-Discord.txt') 02:28:41 fireonlive: Deduplicated and queued 539 items. (for 'https://transfer.archivete.am/mMSZ8/urls-Edgeworld-Revival-Discord.txt') 02:28:42 fireonlive: Deduplicated and queued 6538 items. (for 'https://transfer.archivete.am/e3RR7/urls-Soda-Dungeon-Discord.txt') 02:34:05 is that only CDN urls? 02:34:23 s/CDN/image/ 02:39:04 yeah 02:47:17 * fireonlive pets h2ibot 02:47:22 you're my little archivebot aren't you 02:47:25 yes you are 02:47:30 yes you are my little archivebot 02:47:33 :3 02:51:07 Here's some msc discord cdn urls i have from other url dumps i have https://transfer.archivete.am/W89Ed/urls-discord-msc.txt 02:53:19 hmmm 71k not sure what qualifies as huge (LTT was huuuuge) 02:53:23 cc arkiver :3 02:53:43 it'd be just like me to get and lose access in less than a day 02:53:57 fireonlive: hmm let me check my logs, ark.iver did give me a limit when I got voice 02:56:02 I think he said that >1m URLs would be an "ask me first" 02:57:07 based on KiB/u, 1M would be close to 100GiB. makes sense 02:57:12 on avg 02:57:54 hmm that would make sense 02:58:03 * TheTechRobo points to https://github.com/TheTechRobo/discord-urls-extractor if you want to cost brewster even more money 03:09:11 !a https://transfer.archivete.am/6k1XA/test.txt 03:09:12 fireonlive: Deduplicating and queuing 1 items. (for 'https://transfer.archivete.am/6k1XA/test.txt') 03:09:13 fireonlive: Deduplicated and queued 1 items. (for 'https://transfer.archivete.am/6k1XA/test.txt') 03:09:16 :3 14:44:27 !a https://transfer.archivete.am/W89Ed/urls-discord-msc.txt 14:44:31 arkiver: Deduplicating and queuing 70595 items. (for 'https://transfer.archivete.am/W89Ed/urls-discord-msc.txt') 14:44:31 vokunal|m: ^ 14:44:35 arkiver: Deduplicated and queued 70595 items. (for 'https://transfer.archivete.am/W89Ed/urls-discord-msc.txt') 16:08:53 :) 23:32:40 !a https://transfer.archivete.am/j4seB/rss_urls.txt 23:32:41 arkiver: Deduplicating and queuing 383 items. (for 'https://transfer.archivete.am/j4seB/rss_urls.txt') 23:32:42 arkiver: Deduplicated and queued 383 items. (for 'https://transfer.archivete.am/j4seB/rss_urls.txt')