You could try using curl_cffi in Python
You could try using curl_cffi in Python with the impersonate option enabled it often works because it simulates a real browser. In my case, I use Rust with a library called wreq, which does something similar. These tools usually bypass that issue unless the site relies heavily on JavaScript or more advanced browser behavior.
18 Replies
thats so nice of you to help. But can you help me understanding the technicalities behind it. Why it works in browser but fails in postman, any reason behind it? Thanks again
It’s most likely that the server is checking whether the request is coming from a real browser or not that’s why I mentioned those options earlier. Tools like Postman or simple fetch calls don’t fully mimic a browser environment, so the server might block or delay the response.
Thanks. But If you have any technical details to provide, please. Otherwise thanks a alot.
Sure! From a technical perspective, websites often use mechanisms like bot detection, fingerprinting, or JavaScript challenges to verify that a request is coming from a real browser.
That’s why tools like curl_cffi with impersonation or Rust libraries like wreq (which I’m using) are helpful they try to mimic real browser behavior more accurately.
hmm. Have you created an actor on apify?
I’m creating my first actor in Rust, but I already have experience with web scraping for custom projects.
cool
So basically… there are many types of fingerprinting the server might be using. Some of the most common include checking the User-Agent and the exact order of HTTP headers, as bots often send them differently than real browsers. The server might also use Canvas and WebGL fingerprinting to see how your graphics card renders visual content, or inspect your timezone, language, and operating system to verify consistency with typical browser environments. Font and plugin detection is another method, since real browsers expose this information while bots usually don’t. Additionally, details like touch support, screen size, and device memory all help build a unique fingerprint to distinguish bots from real users.
@aciku just advanced to level 1! Thanks for your contributions! 🎉
So, these tools try to fool the server by spoofing fingerprints to make them look like real browsers. Some of the ones I know are:
https://github.com/apify/impit,
https://github.com/lwthiker/curl-impersonate,
https://github.com/lexiforest/curl_cffi,
and https://github.com/0x676e67/wreq, among others.
I tried stealth pkg as well as camouflax
@thenetaji just advanced to level 8! Thanks for your contributions! 🎉
Look, this site for example shows you your browser’s fingerprint: https://amiunique.org/fingerprint
My Fingerprint- Am I Unique ?
Check if your browser has a unique fingerprint, how identifiable you are on the Internet
yeah, that one I have tried. I have managed to scrape usingheadless browser with proxy. But hidden still didnt respond.Your help is much appreciated. But if you are free can tell how can I learn all this advance stuff, I can't get a clear path. And whats your experience. No worries if you can't reply.
Take a look at this video: https://www.youtube.com/watch?v=ji8F8ppY8bs
John Watson Rooney
YouTube
This is How I Scrape 99% of Sites
Check Out ProxyScrape here: https://proxyscrape.com/?ref=jhnwr
➡ JOIN MY MAILING LIST
https://johnwr.com
➡ COMMUNITY
https://discord.gg/C4J2uckpbR
https://www.patreon.com/johnwatsonrooney
➡ PROXIES
https://proxyscrape.com/?ref=jhnwr
➡ HOSTING (Digital Ocean)
https://m.do.co/c/c7c90f161ff6
If you are new, welcome. I'm John, a self t...
Let me know if it works for you. That video should be enough.
That's most likely tls fingerprinting that basically happens during the TLS handshake (HTTP version, cyphers etc) -assuming- you're already replicating the same working HTTP headers.
Your best bets, as mentioned by @aciku is to experiment with browsers impersonation libraries that are available within your prefered programing language. But even if you do, some websites might block as they requires solving JS challenges -probably set tokens/cookies that are short-lived, rate-limited and hard to reproduce/reuse- that raw HTTP clients cannot execute)
@azzouzana @aciku Thanks for the help man. Much appreciated