aciku•2mo ago

You could try using curl_cffi in Python

You could try using curl_cffi in Python with the impersonate option enabled it often works because it simulates a real browser. In my case, I use Rust with a library called wreq, which does something similar. These tools usually bypass that issue unless the site relies heavily on JavaScript or more advanced browser behavior.

18 Replies

thenetaji•2mo ago

thats so nice of you to help. But can you help me understanding the technicalities behind it. Why it works in browser but fails in postman, any reason behind it? Thanks again

acikuOP•2mo ago

It’s most likely that the server is checking whether the request is coming from a real browser or not that’s why I mentioned those options earlier. Tools like Postman or simple fetch calls don’t fully mimic a browser environment, so the server might block or delay the response.

thenetaji•2mo ago

Thanks. But If you have any technical details to provide, please. Otherwise thanks a alot.

acikuOP•2mo ago

Sure! From a technical perspective, websites often use mechanisms like bot detection, fingerprinting, or JavaScript challenges to verify that a request is coming from a real browser. That’s why tools like curl_cffi with impersonation or Rust libraries like wreq (which I’m using) are helpful they try to mimic real browser behavior more accurately.

thenetaji•2mo ago

hmm. Have you created an actor on apify?

acikuOP•2mo ago

I’m creating my first actor in Rust, but I already have experience with web scraping for custom projects.

thenetaji•2mo ago

cool

acikuOP•2mo ago

So basically… there are many types of fingerprinting the server might be using. Some of the most common include checking the User-Agent and the exact order of HTTP headers, as bots often send them differently than real browsers. The server might also use Canvas and WebGL fingerprinting to see how your graphics card renders visual content, or inspect your timezone, language, and operating system to verify consistency with typical browser environments. Font and plugin detection is another method, since real browsers expose this information while bots usually don’t. Additionally, details like touch support, screen size, and device memory all help build a unique fingerprint to distinguish bots from real users.

MEE6•2mo ago

@aciku just advanced to level 1! Thanks for your contributions! 🎉

acikuOP•2mo ago

So, these tools try to fool the server by spoofing fingerprints to make them look like real browsers. Some of the ones I know are: https://github.com/apify/impit, https://github.com/lwthiker/curl-impersonate, https://github.com/lexiforest/curl_cffi, and https://github.com/0x676e67/wreq, among others.

thenetaji•2mo ago

I tried stealth pkg as well as camouflax

MEE6•2mo ago

@thenetaji just advanced to level 8! Thanks for your contributions! 🎉

acikuOP•2mo ago

Look, this site for example shows you your browser’s fingerprint: https://amiunique.org/fingerprint

My Fingerprint- Am I Unique ?

Check if your browser has a unique fingerprint, how identifiable you are on the Internet

thenetaji•2mo ago

yeah, that one I have tried. I have managed to scrape usingheadless browser with proxy. But hidden still didnt respond.Your help is much appreciated. But if you are free can tell how can I learn all this advance stuff, I can't get a clear path. And whats your experience. No worries if you can't reply.

acikuOP•2mo ago

Take a look at this video: https://www.youtube.com/watch?v=ji8F8ppY8bs

John Watson Rooney

YouTube

This is How I Scrape 99% of Sites

Check Out ProxyScrape here: https://proxyscrape.com/?ref=jhnwr ➡ JOIN MY MAILING LIST https://johnwr.com ➡ COMMUNITY https://discord.gg/C4J2uckpbR https://www.patreon.com/johnwatsonrooney ➡ PROXIES https://proxyscrape.com/?ref=jhnwr ➡ HOSTING (Digital Ocean) https://m.do.co/c/c7c90f161ff6 If you are new, welcome. I'm John, a self t...

acikuOP•2mo ago

Let me know if it works for you. That video should be enough.

azzouzana•2mo ago

That's most likely tls fingerprinting that basically happens during the TLS handshake (HTTP version, cyphers etc) -assuming- you're already replicating the same working HTTP headers. Your best bets, as mentioned by @aciku is to experiment with browsers impersonation libraries that are available within your prefered programing language. But even if you do, some websites might block as they requires solving JS challenges -probably set tokens/cookies that are short-lived, rate-limited and hard to reproduce/reuse- that raw HTTP clients cannot execute)

thenetaji•2mo ago

@azzouzana @aciku Thanks for the help man. Much appreciated

You could try using curl_cffi in Python

Did you find this page helpful?