-
cmI find that firefox can display this article, but if I use wget with the same user agent I get 403: axios.com/2023/07/21/zients-biden-ai-authority
-
cmhow might they be detecting that it's wget? any workaround short of running a browser?
-
nicolas17this loads in Firefox
-
nicolas17if I open the developer tools, Network tab, I can "copy as cURL"
-
nicolas17which usually works, and then I can start trimming headers until it stops working
-
nicolas17oh
-
nicolas17cm: they have cloudflare protection, you're screwed
-
nicolas17opaque server-side checks, if they get suspicious of you, you get complex javascript which tries to see if it's a legit browser, if *that* fails you get a captcha
-
nicolas17so even "running a browser" may not be enough
-
cmnicolas17: but with cloudflare protection I would have thought wget would give me a cloudflare page?
-
cmrather than 403
-
nicolas17I'm getting a cloudflare page
-
nicolas17<title>Just a moment...</title>
-
nicolas17<span id="challenge-error-text">
-
nicolas17Enable JavaScript and cookies to continue
-
nicolas17</span>
-
cmwhat is your wget command?
-
» nicolas17 starts trimming headers
-
cmjust plain wget with no special user agent?
-
nicolas17ah seems the cloudflare challenge page is a 403
-
nicolas17use wget --content-on-error if you really want it :P
-
dumbgoydoubting this would work for your usecase but what about flaresolverr? any way to integrate wget with that?
-
cmah I see
-
cm--content-on-error is how you got the "Just a moment..." page?
-
nicolas17well I was using curl instead of wget, which basically behaves like --content-on-error by default
-
cmah ok
-
nicolas17but yes, with wget --content-on-error I get the "Just a moment" page saved into a file
-
cmyou sure this is cloudflare? doesn't mention them on the page
-
nicolas17__cf in many places
-
nicolas17:)
-
nicolas17also headers:
-
nicolas17< cf-mitigated: challenge
-
nicolas17< server: cloudflare
-
nicolas17< cf-ray: 7ea492992c1d08ed-EZE
-
fireonliveahh cloudflare. both a protector and the bane of our existence :p
-
cmcool cool
-
cmthanks this has been really helpful
-
cmthat copy-as-curl trick is good to know
-
h2ibotJustAnotherArchivist edited Current Projects (-89, Shuffle recruiting section, add donation link): wiki.archiveteam.org/?diff=50274&oldid=50254
-
h2ibotJustAnotherArchivist edited Frequently Asked Questions (+12, Add donation page link): wiki.archiveteam.org/?diff=50275&oldid=49477