-
elomatrebHi, I'm looking to upload a WARC crawl of a small site I did to the Internet Archive, and I came across the FAQ at wiki.archiveteam.org/index.php/Freq…ently_Asked_Questions#halp_pls_halp
-
elomatrebMy upload form currently looks like this: files.elomatreb.eu/f/c72afcd85fd7bd8a5026428d596288d7.png - is this fine?
-
Iki1Someone will probably suggest adding additional metadata of some sort or another, but 1) you probably have the minimal info to upload and 2) you can add metadata after your warc gets uploaded
-
Iki1So go ahead, imo
-
elomatrebIki1: What sort of additional metadata do you suggest?
-
elomatrebAlso, thanks!
-
TheTechRoboYahoo!知恵袋 seems to be still open
-
TheTechRoboOr would a better channel for this be the yahoo answers one?
-
TheTechRoboYeah, I'm moving to #noanswers.
-
thubanyoutube's rss feeds (eg youtube.com/feeds/videos.xml?channel_id=UCrTNhL_yO3tPTdQ5XgmmWjA) all seem to be 404ing for me, even though they're still linked in the page source. anyone else?
-
Jakeseems to be broken.
-
thubanan ill omen
-
roxfanhi, how can I find a specific group in the yahoo groups archive? there's a bunch of different files in the collection
-
thubanroxfan: we're still organizing that data; of you tell us in #yahoosucks which group it is, someone should be able to help you find it
-
thuban*if
-
roxfanthx
-
yanoas far as sci-hub/libgen (see post in #archiveteam) most of it is available via bittorrent, phillm.net/libgen-stats-table-raw.php
-
VerifiedJI guess they are talking about torrentfreak.com/fbi-has-gained-acc…s-apple-account-email-claims-210513
-
russssalso note the actual legal request there was dated Feb 2019, just Apple was unable to reveal it until just now
-
marked
-
betamaxJAA: do you have any idea how rate-limiting twitter is currently? I've just added in the next two twitter lists (part 3 and 4 of 17), and am wondering if I can up the concurrency / reduce the delay on the job for part 4 since it's on just twitter.com URLs now
-
JAAbetamax: I haven't seen issues with it, but it's been a while since a job ran faster than default settings because it's usually mixed with outlinks that often can't be run as quickly. In the past, there were no rate limiting issues in twitter.com at all.
-
betamaxI'll give it a try and see how it goes.
-
betamaxI've set it to 9 workers and [0 200] delay. If you (or others) think that's excessive, feel free to reduce. (Whether or not I use similar settings for later parts of the list will depend upon if the parts are running on separate pipelines)
-
JAASo that's actually 6 with 0-200 because there's a hard limit of 6 connections per host. But yeah, we'll see. :-)
-
JAAbetamax: By the way, 70 % done of the websites.