01:42:03 JAA: can we get a filter on www.code-brew.com/.+\?var=[0-9]+ seems to include urls witha random number for cache bypassing, for example: https://transfer.archivete.am/z5fVi/2024-03-25_01-40-26.txt seeing 25/s on my end 01:42:03 inline (for browser viewing): https://transfer.archivete.am/inline/z5fVi/2024-03-25_01-40-26.txt 01:42:37 same for portmillennium.pl, but with ?d=[0-9]+ https://transfer.archivete.am/10T1jO/2024-03-25_01-42-32.txt 01:42:38 inline (for browser viewing): https://transfer.archivete.am/inline/10T1jO/2024-03-25_01-42-32.txt 01:43:00 portmillennium has 20req/s on my end 01:44:35 ^ those seem to be present on html pages: for example http://portmillennium.pl/?d=1978618008 queued http://portmillennium.pl/img/noclegi/slidery/27_serdecznie-zapraszamy_8.jpg?d=1265490834 and Queuing URL http://portmillennium.pl/?d=2094395448 01:47:02 https://www.code-brew.com/best-marketplace-builder-usa/?var=928496706 -> Queuing URL https://www.code-brew.com/best-marketplace-builder-usa/?var=1010145979 01:48:09 guessing there's a few million of those in backfeed by now since it's been blowing up 01:54:08 i take the cache bypassing back btw 01:56:48 oh? 01:59:41 no clue why you’d want to do this aside from spam 02:09:41 imer: Ew. In my experience, it's usually done to 'fix' caching. By people who don't know how to configure a web server correctly, naturally. 02:12:21 Good pickup 02:12:25 Lots of ew 02:13:00 ^https?://www%.code-brew%.com/.*%?var=[0-9]+ and ^https?://portmillennium%.pl/.*%?d=[0-9]+ added. 02:13:20 I noticed portmillennium but figured it was just a large site 02:13:25 Was hitting it really hard >.< 02:13:27 Lolz 02:13:30 Thanks JAA 02:14:35 Oh 02:15:19 I will never get Lua patterns right on the first try. 02:15:25 code%-brew, not code-brew 02:15:48 was gonna say the portthing is gone but the other one is still there 02:15:51 thanks! 02:16:30 that seems to have worked 02:16:34 do they rewrite to the base url? 02:16:50 todo should start diving now 02:16:52 hopefully 02:17:01 No, they get filtered. 02:17:33 Yeah but they are dropped as processed 02:18:06 What i was meant to say is the todo will stop growing now and decrease instead 02:18:46 Looking much cleaner now 02:18:53 base url without query is queued to backfeed though by the grab code 02:19:01 And I was replying to immibis. :-) 02:19:05 at least for assets 02:19:29 imer: Ah right, so it won't be a concern there except for replayability maybe. Oh well. 02:19:59 Ohhh ooops my bad lol! 02:20:02 think we grabbed enough copies one of them ought to work xD 02:20:36 Well yeah, the data's there, but if something references one of the params dropped by the filter now, it'll be broken in the WBM. 02:20:52 Not that it matters. Silly sites. 02:22:03 mmmh, 60+% filter rate :) 02:22:24 brrrrr 02:25:03 Safe to say they were getting archived pretty hardcore lol 02:25:42 Dont think ive seen the filter rate every that high before lol :O 02:35:32 JAA: https://gitspartv.github.io/lua-patterns/ is an amazing tool 02:35:44 It's like regexr/regex101 for Lua patterns 02:35:56 Though it doesn't seem to support actually matching yet 02:36:00 Nice, thanks! 02:36:36 who even invented lua :p 02:36:37 TIL Fengari 'The Lua VM written in JS ES6' 02:46:19 why does lua have its own pattern language? D: 03:46:25 because the developers didn't want to re-implement regex 03:47:02 they do not have the choice operator | - they just match left-to-right pretty blindly. Lua is a pretty small self-contained library. 17:35:02 arkiver: due to run out at the current speed again in about 4 hours, how about unstashing some things into secondary? :) 17:38:08 imer: we could look into that yes :P 17:38:15 i think we have those PDFs stashes away 17:38:18 i can put them in secondary 17:39:32 imer: moving in! 17:39:42 thanks 18:56:22 RIP CPUs 18:58:00 i've had enough hardware failure today to last a lifetime 18:58:06 so maybe i'll stay out of that lol 20:51:48 JAA: remember when you said RIP CPUs? 20:51:55 and i said i would stay out of it? 20:52:16 https://usercontent.irccloud-cdn.com/file/hCqGgfGD/image.png 20:53:10 :-P 20:53:26 The load average is only in the double digits. That's fine. 20:55:16 meanwhile, on another planet 20:55:18 https://usercontent.irccloud-cdn.com/file/6YiCGS1m/image.png 20:56:03 That's more like it. 21:03:24 How is that server even still operational lol? 21:11:05 datechnoman: it died 21:11:13 that was its dying breath 21:11:38 Haha well that makes more sense :P 21:11:49 Lost but not forgotten.... 21:12:15 OVH had to replace one of the disks earlier because I went full nyany on mataringa and accidentally the raid array