noble-gold
noble-gold3y ago

Waiting for CF bot check

I'm trying to pass CF's bot check using Firefox without much luck. I found the thread about using Firefox to get cookies for Cheerio, but I need to use Firefox all the way. The issue I'm running into is that CF bot page gives 403 which causes Crawlee to think it's a bad request. I was able to use errorHandler to wait out the bot check but now I can't find a way to keep the CF cookies for the session. If I do session.setCookies(...) inside the errorHandler, nothing gets stored and the retry connection uses a new session. I also tried session.markGood() but didn't help. Any ideas?
2 Replies
Lukas Krivka
Lukas Krivka3y ago
You can skip the status check by changing https://crawlee.dev/api/core/interface/SessionPoolOptions#blockedStatusCodes (perhaps just pass empty array). That should allow you to do all the things you described. The session rotation depends on the SessionPool config. You can first test the workflow with maxPoolSize: 1 so you will be sure the session stays the same.
noble-gold
noble-goldOP3y ago
Interesting, thanks! I'll try that.

Did you find this page helpful?