Matze
Matze16mo ago

Bind session and proxy together

Hi, I have a small problem, my sessions and proxies don't stay together, which I expected to be the default.
new PlaywrightCrawler({
...
useSessionPool: true,
sessionPoolOptions: {
blockedStatusCodes: [403],
sessionOptions: {
maxErrorScore: 1,
maxUsageCount: 7
}
},
proxyConfiguration: new ProxyConfiguration({ proxyUrls: proxyList() }), // List of 250 Proxies
...
})
new PlaywrightCrawler({
...
useSessionPool: true,
sessionPoolOptions: {
blockedStatusCodes: [403],
sessionOptions: {
maxErrorScore: 1,
maxUsageCount: 7
}
},
proxyConfiguration: new ProxyConfiguration({ proxyUrls: proxyList() }), // List of 250 Proxies
...
})
When I log the session id and port from the list in my router, the proxy.sessionId does not match the session.id.
log.info(session?.id)
log.info(proxyInfo?.port)
log.info(proxyInfo?.sessionId)
log.info(session?.id)
log.info(proxyInfo?.port)
log.info(proxyInfo?.sessionId)
Results look like:
INFO PlaywrightCrawler: session_AlZoomLhQU
INFO PlaywrightCrawler: 10209
INFO PlaywrightCrawler: session_Dnha2MhDeX
....
INFO PlaywrightCrawler: session_AlZoomLhQU
INFO PlaywrightCrawler: 10208
INFO PlaywrightCrawler: session_6jOviCJSHt
...
INFO PlaywrightCrawler: session_AlZoomLhQU
INFO PlaywrightCrawler: 10209
INFO PlaywrightCrawler: session_Dnha2MhDeX
....
INFO PlaywrightCrawler: session_AlZoomLhQU
INFO PlaywrightCrawler: 10208
INFO PlaywrightCrawler: session_6jOviCJSHt
...
I don't know if the session may change afterwards after the proxy is assigned: * https://github.com/apify/crawlee/blob/master/packages/browser-crawler/src/internals/browser-crawler.ts#L504 * https://github.com/apify/crawlee/blob/master/packages/browser-crawler/src/internals/browser-crawler.ts#L534
4 Replies
Matze
MatzeOP16mo ago
GitHub
Proxy changes for same session · Issue #2503 · apify/crawlee
Which package is this bug report for? If unsure which one to select, leave blank @crawlee/browser (BrowserCrawler) Issue description According to the documentation the proxies and sessions are boun...
Lukas Krivka
Lukas Krivka15mo ago
They should stay together until the session has been discarded
Matze
MatzeOP15mo ago
Is there an option to log when a session is created/discarded? Because the first session is persistent and correct, the session related to the proxy is a random new one. As described in the issue, I believe https://github.com/apify/crawlee/blob/master/packages/browser-crawler/src/internals/browser-crawler.ts#L531 is not working correctly, because the proxy is loaded above it and hence too early https://github.com/apify/crawlee/blob/master/packages/browser-crawler/src/internals/browser-crawler.ts#L505 INFO PlaywrightCrawler: session_AlZoomLhQU <- Session of window/session object INFO PlaywrightCrawler: 10209 <- Proxy port INFO PlaywrightCrawler: session_Dnha2MhDeX <- Session of proxy .... INFO PlaywrightCrawler: session_AlZoomLhQU <- Persistet & correct session of window/session object INFO PlaywrightCrawler: 10208 <- Proxy port INFO PlaywrightCrawler: session_6jOviCJSHt <- Random new session of proxy
Lukas Krivka
Lukas Krivka15mo ago
Log | API | Crawlee
The log instance enables level aware logging of messages and we advise to use it instead of console.log() and its aliases in most development scenarios. A very useful use case for log is using log.debug liberally throughout the codebase to get useful logging messages only when appropriate log level is set and keeping the console tidy in p...

Did you find this page helpful?