00:27:25 Hi, I'm looking to upload a WARC crawl of a small site I did to the Internet Archive, and I came across the FAQ at https://wiki.archiveteam.org/index.php/Frequently_Asked_Questions#halp_pls_halp 00:27:44 My upload form currently looks like this: https://files.elomatreb.eu/f/c72afcd85fd7bd8a5026428d596288d7.png - is this fine? 00:47:01 Someone will probably suggest adding additional metadata of some sort or another, but 1) you probably have the minimal info to upload and 2) you can add metadata after your warc gets uploaded 00:47:14 So go ahead, imo 01:05:54 Iki1: What sort of additional metadata do you suggest? 01:06:05 Also, thanks! 01:49:19 Yahoo!知恵袋 seems to be still open 01:49:52 Or would a better channel for this be the yahoo answers one? 01:50:09 Yeah, I'm moving to #noanswers. 02:46:32 youtube's rss feeds (eg https://www.youtube.com/feeds/videos.xml?channel_id=UCrTNhL_yO3tPTdQ5XgmmWjA) all seem to be 404ing for me, even though they're still linked in the page source. anyone else? 02:48:42 seems to be broken. 02:49:17 an ill omen 08:30:39 hi, how can I find a specific group in the yahoo groups archive? there's a bunch of different files in the collection 08:32:30 roxfan: we're still organizing that data; of you tell us in #yahoosucks which group it is, someone should be able to help you find it 08:32:34 *if 08:33:30 thx 12:54:00 as far as sci-hub/libgen (see post in #archiveteam) most of it is available via bittorrent, https://phillm.net/libgen-stats-table-raw.php 13:25:39 I guess they are talking about https://torrentfreak.com/fbi-has-gained-access-to-sci-hub-founders-apple-account-email-claims-210513/ 13:26:52 also note the actual legal request there was dated Feb 2019, just Apple was unable to reveal it until just now 20:37:47 https://aaa.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.com/ from https://news.ycombinator.com/item?id=27156106 22:04:57 JAA: do you have any idea how rate-limiting twitter is currently? I've just added in the next two twitter lists (part 3 and 4 of 17), and am wondering if I can up the concurrency / reduce the delay on the job for part 4 since it's on just twitter.com URLs now 22:06:56 betamax: I haven't seen issues with it, but it's been a while since a job ran faster than default settings because it's usually mixed with outlinks that often can't be run as quickly. In the past, there were no rate limiting issues in twitter.com at all. 22:08:06 I'll give it a try and see how it goes. 22:10:23 I've set it to 9 workers and [0 200] delay. If you (or others) think that's excessive, feel free to reduce. (Whether or not I use similar settings for later parts of the list will depend upon if the parts are running on separate pipelines) 22:11:51 So that's actually 6 with 0-200 because there's a hard limit of 6 connections per host. But yeah, we'll see. :-) 22:59:10 betamax: By the way, 70 % done of the websites.