15:12:02 I find that firefox can display this article, but if I use wget with the same user agent I get 403: https://www.axios.com/2023/07/21/zients-biden-ai-authority 15:12:18 how might they be detecting that it's wget? any workaround short of running a browser? 15:34:58 this loads in Firefox 15:35:08 if I open the developer tools, Network tab, I can "copy as cURL" 15:36:20 which usually works, and then I can start trimming headers until it stops working 15:36:22 oh 15:36:34 cm: they have cloudflare protection, you're screwed 15:38:23 opaque server-side checks, if they get suspicious of you, you get complex javascript which tries to see if it's a legit browser, if *that* fails you get a captcha 15:38:50 so even "running a browser" may not be enough 15:38:51 nicolas17: but with cloudflare protection I would have thought wget would give me a cloudflare page? 15:38:55 rather than 403 15:39:14 I'm getting a cloudflare page 15:39:23 Just a moment... 15:39:24 15:39:26 Enable JavaScript and cookies to continue 15:39:27 15:39:40 what is your wget command? 15:39:54 * nicolas17 starts trimming headers 15:39:56 just plain wget with no special user agent? 15:41:00 ah seems the cloudflare challenge page is a 403 15:41:27 use wget --content-on-error if you really want it :P 15:41:32 doubting this would work for your usecase but what about flaresolverr? any way to integrate wget with that? 15:41:38 ah I see 15:41:49 --content-on-error is how you got the "Just a moment..." page? 15:42:26 well I was using curl instead of wget, which basically behaves like --content-on-error by default 15:42:33 ah ok 15:42:36 but yes, with wget --content-on-error I get the "Just a moment" page saved into a file 15:45:10 you sure this is cloudflare? doesn't mention them on the page 15:46:26 __cf in many places 15:46:28 :) 15:46:57 also headers: 15:46:58 < cf-mitigated: challenge 15:47:00 < server: cloudflare 15:47:01 < cf-ray: 7ea492992c1d08ed-EZE 15:49:52 ahh cloudflare. both a protector and the bane of our existence :p 15:52:21 cool cool 15:52:33 thanks this has been really helpful 15:52:47 that copy-as-curl trick is good to know 23:18:32 JustAnotherArchivist edited Current Projects (-89, Shuffle recruiting section, add donation link): https://wiki.archiveteam.org/?diff=50274&oldid=50254 23:20:33 JustAnotherArchivist edited Frequently Asked Questions (+12, Add donation page link): https://wiki.archiveteam.org/?diff=50275&oldid=49477