Apify Discord Mirror

Updated 5 months ago

Sessions and proxies?

At a glance

The community member is having trouble understanding how sessions and proxies work in their Puppeteer crawler setup. They have set up a crawler with a session pool, persistent cookies, and multiple proxies, but they are finding that only one session is used concurrently with one proxy, unless they set useIncognitoPages: true.

The comments suggest that the session logic is to stick with the same IP (proxy) until an error occurs, and to keep cookies as long as the session is "alive". If the community member wants to use random IPs per request, they should not use a session pool. If they need to handle cookies manually, it may be easier to set useIncognitoPages: true to have each page use its own proxy and handle everything automatically.

One community member suggests that without using a session pool (useSessionPool: false) and persistent cookies (persistCookiesPerSession: false), the community member may be able to use random proxies, but they are not sure how the SDK will handle cookies in that case.

I am having a hard time understanding sessions and proxies. I have the following crawler setup:

Plain Text
const crawler = new PuppeteerCrawler({
    requestList,
    useSessionPool: true,
    persistCookiesPerSession: true,
    proxyConfiguration,
    requestHandler: router,
    requestHandlerTimeoutSecs: 100,
    headless: false,
    minConcurrency: 20,
    maxConcurrency: 30,
    launchContext: {
        launcher: PuppeteerExtra,
        useIncognitoPages: true
    },
})


Basically I want to run the same task concurrently with different proxies. Unless I set useIncognitoPages: true, only one session is used concurrently with one proxy. Is this how it should work? What is the point of having a session pool if only one is used?
A
J
A
5 comments
Session logic is to stick with IP (proxy) until error and keep cookies as long as session "alive", so if you want random IPs per each request do not use session, if you need cookies handle it by own logic
So with concurrency, Crawlee uses the same session in parallel in case I use sessionPool?
Regarding manually handling cookies and stuff, probably easier to set useIncognitoPages: true. That way each page has its own proxy and everything is handled.
How would I use random proxies without useSessionPool? With the below config Puppeteer is running on the same proxy.
Plain Text
const crawler = new PuppeteerCrawler({
    // useSessionPool: true,
    requestHandler: router,
    maxConcurrency: 2,
    headless: false,
    proxyConfiguration,
    requestList,
})

await crawler.run()


And pages also share cookies.
just advanced to level 1! Thanks for your contributions! πŸŽ‰
If you need random access then expected way is useSessionPool: false, persistCookiesPerSession: false otherwise I not sure how exactly it will end up with some other session settings and incognito pages, may be SDK will enforce cookies, may be not, never tried this way actually πŸ˜‰ To check in more details you can add some log output based on context.session
Add a reply
Sign up and join the conversation on Discord