00:00:53 https://old.reddit.com/r/povertyfinancecanada/comments/11ytjf4/rogers_data_overage_bill_will_make_you_homeless/ continues to this day 00:39:10 oof 01:48:47 some epic crawl, journalmetro.com (d9mh44xbsx92ie1iwf88mk2pn) - i didn't expect it to be so big (and still growing) 01:49:13 things seem to be running smoothly though, and might finish in time to keep that thing in IA before the damn thing falls apart like everything else 01:50:40 (immature giggles from the back row) 01:52:07 They didn't say anything about how long the site would stay up, did they? 01:54:42 i haven't followed closely 03:22:18 Heya folks, besides the default Warrior project selection, any other Warrior projects that might need attention? 03:23:26 atm telegram and reddit are the 2 with items but they are right now clogged by targets. if you wanna try your luck at Zowa then we could test if its a ban or if the items that its trying to push out are indeed bad at this point Ryz 03:31:50 yeah everything seems stalled atm 03:36:13 Was pondering on Imgur but hmm o.o; 03:36:33 Wouldn't mind running more of the bruteforcer if it needs attention 03:37:55 I have 135 million IDs from the bruteforcer that I still didn't submit into the queue and probably never will 03:38:00 imgur got too large 03:38:54 Oof, too much data? :c 03:42:00 we archived 654TB 03:43:15 The problem is the data size. We already went well past the initial estimate we gave IA. 03:43:17 we're at 650TiB 03:43:18 Yes, which is more than double what we told IA. 03:44:35 ...Oo; 03:44:39 Aaaaah <#>; 03:44:48 I feel like the best option going forward that we have is keeping this running continuously MediaFire-style so that we can queue lists of images collected from other crawls 03:44:49 But I don't see archiving all of Imgur happening anytime soon. Well, not until they're shutting down or doing a severe policy change like deleting images after X days or whatever. 03:46:28 Hmm, would it be best to just run the bruteforcer on the remote chance that Imgur may actually shut down or the severe policy change? Collect more of the stuff 03:46:44 → #imgone 04:23:28 ok I have to ask. What the fuck is actually connecting to the rsync servers if nobody is actually seeming to connect. If we are all complaining what the fuck is the clog? 04:27:32 flashfire42: if you get -1 it means disks are full 04:27:46 and the server is set to maximum 0 concurrent connections and nobody is connecting 04:28:21 So the bottleneck is moving that data to temp storage? or did we already fill that 04:29:26 Yes, that is the bottleneck. No, it isn't full, but its capacity is reduced compared to at the beginning. 04:30:19 I have seen continuous rsync errors for hours so it looks stuck full rather than "slow to free up" 04:30:46 unless it's so slow that the hysteresis is making it look stuck 04:47:52 JAA: should we pause (or greatly rate-limit) projects while targets are full? 04:48:37 especially telegram where people would get reclaims of items that took too long *because* they're stuck uploading 05:07:12 poor optane9 rewby 14:28:15 So someone mentioned archiving Doomworld yesterday. Since it's Invision and I only had to replace three lines in my Canucks forums script, I gave it a quick try. Turns out that site is very broken. Quite a lot of topics return 500s: https://www.doomworld.com/forum/topic/721-x/ 14:30:14 There hasn't been any official announcement in five years, and the sole admin I could see is rarely active. So it could use an archival. 14:44:30 Cisco appreas to have bought Splunk 14:44:53 https://www.splunk.com/en_us/blog/leadership/splunk-and-cisco-unite-to-accelerate-digital-resilience-as-one-of-the-leading-global-software-companies.html 14:53:51 that_lurker: im not sure they could make it any more expensive but i’m sure they’re going to try 14:54:48 "Somebody: Splunk has exorbitant prices and locked-in enterprise customers! 14:54:48 Cisco: Oh these guys are just like us. Better buy them up. We know this business." 14:55:28 That and many more fun takes are on the HN https://news.ycombinator.com/item?id=37596497 14:56:11 :3 15:05:36 Would maybe be a good idea to grab the splunk documentation site https://docs.splunk.com/Documentation 15:14:29 https://www.fanforum.com/ is anything from this site archived? there seems to be a LOT of older content there 15:14:35 https://archive.org/search?query=originalurl%3A%28*www.fanforum.com*%29 nothing on archive 15:14:38 at least, not as a warc 15:14:46 im sure it's in web.archive 15:15:30 rktk: no results in the AB job viewer https://archive.fart.website/archivebot/viewer/?q=fanforum.com 15:15:43 possibly worthy as a new project? 15:15:56 I notice the site loads verrrrry slow. takes a while depending on how old the INDEX of a subforum is 15:16:04 talking minutes, not seconds 15:16:32 seems like it would be impossible to archive - would kill the site? 15:16:34 That IA search is only really useful for wiki dumps. 15:16:45 oh, the front page eventually loaded 15:16:47 Most other items don't have an 'originalurl' metadata field. 15:19:09 pabs exactly what I'm talking about 15:19:15 Jaa ah only for wiki sites 15:19:24 didn't know that was a metadata entry 15:19:46 pabs just trying to sus out thoughts on this but yeah it seems the site is basically in some kind of maintenance mode or like, skeleton life support... 15:19:54 and yet people still actively post 15:21:43 Oh wow, that's huge. 15:21:54 Threads: 495,274 | Posts: 107,495,875 | Members: 406,871 | Currently Active Users: 2969 (36 members and 2933 guests) 15:21:55 15:22:11 vBulletin 15:22:37 probably too big for ArchiveBot? 15:22:38 Topic IDs are around 63 million, so enumerating that is out of question. 15:22:56 https://github.com/EFForg/apkeep <= I thought this could be helpfull for the archive team if people want to download apk to archive them 15:23:30 holy crap 15:24:38 Neat, thanks! 15:26:04 pabs: Yeah, with all the extra links everywhere, probably too big. And also, the slow responses would run into the timeout all the time I bet. 15:28:04 2933 guests, hmm I wonder if they are getting hit hard by spidering 16:03:11 Oh: https://www.fanforum.com/f443/possible-board-closure-discussion-please-read-respond-63272309/ 16:03:45 Just that individual subforum I think, but yeah. 16:03:55 > Fan Forum requires that we average at least 12 posts per day, with lower numbers than that leading to warnings and then possible closure of the board. 16:06:28 huh. 16:08:07 interesting 17:59:03 that_lurker: I saw several people wondering if the $28B Cisco paid Splunk was to acquire them or just renewing their license for the year