Sessions and proxies?

At a glance

The community member is having trouble understanding how sessions and proxies work in their Puppeteer crawler setup. They have set up a crawler with a session pool, persistent cookies, and multiple proxies, but they are finding that only one session is used concurrently with one proxy, unless they set useIncognitoPages: true.

The comments suggest that the session logic is to stick with the same IP (proxy) until an error occurs, and to keep cookies as long as the session is "alive". If the community member wants to use random IPs per request, they should not use a session pool. If they need to handle cookies manually, it may be easier to set useIncognitoPages: true to have each page use its own proxy and handle everything automatically.

One community member suggests that without using a session pool (useSessionPool: false) and persistent cookies (persistCookiesPerSession: false), the community member may be able to use random proxies, but they are not sure how the SDK will handle cookies in that case.

JJeno

I am having a hard time understanding sessions and proxies. I have the following crawler setup:

Plain Text

const crawler = new PuppeteerCrawler({
    requestList,
    useSessionPool: true,
    persistCookiesPerSession: true,
    proxyConfiguration,
    requestHandler: router,
    requestHandlerTimeoutSecs: 100,
    headless: false,
    minConcurrency: 20,
    maxConcurrency: 30,
    launchContext: {
        launcher: PuppeteerExtra,
        useIncognitoPages: true
    },
})

Basically I want to run the same task concurrently with different proxies. Unless I set useIncognitoPages: true, only one session is used concurrently with one proxy. Is this how it should work? What is the point of having a session pool if only one is used?

5 comments

AAlexey Udovydchenko

Session logic is to stick with IP (proxy) until error and keep cookies as long as session "alive", so if you want random IPs per each request do not use session, if you need cookies handle it by own logic

JJeno

So with concurrency, Crawlee uses the same session in parallel in case I use sessionPool?
Regarding manually handling cookies and stuff, probably easier to set useIncognitoPages: true. That way each page has its own proxy and everything is handled.

JJeno

How would I use random proxies without useSessionPool? With the below config Puppeteer is running on the same proxy.

Plain Text

const crawler = new PuppeteerCrawler({
    // useSessionPool: true,
    requestHandler: router,
    maxConcurrency: 2,
    headless: false,
    proxyConfiguration,
    requestList,
})

await crawler.run()

And pages also share cookies.

AApifyBot

just advanced to level 1! Thanks for your contributions! 🎉

AAlexey Udovydchenko

If you need random access then expected way is useSessionPool: false, persistCookiesPerSession: false otherwise I not sure how exactly it will end up with some other session settings and incognito pages, may be SDK will enforce cookies, may be not, never tried this way actually 😉 To check in more details you can add some log output based on context.session

Add a reply

Apify Discord Mirror

Sessions and proxies?