optimistic-gold
optimistic-gold5mo ago

Proxy settings appear to be cached

Hi, I'm trying to use residential proxies on a playwright crawler, but it appears that even when I comment out the proxyConfiguration there is still an attempt to use a proxy. Created a fresh project to create a minimal test to debug and it worked fine, until I had a proxy failure, and then it happened again. The error is: WARN PlaywrightCrawler: Reclaiming failed request back to the list or queue. Detected a session error, rotating session... goto: net::ERR_TUNNEL_CONNECTION_FAILED so clearly it's trying to use a proxy. I have verified this by looking at the process arguments that include --proxy-bypass-list=<-loopback> --proxy-server=http://127.0.0.1:63572 . Any ideas? It's driving me insane. Code as follows:
import { PlaywrightCrawler } from 'crawlee'

// const proxyConfiguration = new ProxyConfiguration({
// proxyUrls: [
// '...'
// ],
// })

const crawler: PlaywrightCrawler = new PlaywrightCrawler({
launchContext: {
launchOptions: {
headless: false,
// channel: 'chrome',
// viewport: null,
},
},
// proxyConfiguration,
maxRequestRetries: 0,
maxRequestsPerCrawl: 5,
sessionPoolOptions: {
blockedStatusCodes: [],
},
async requestHandler({ request, page, log }) {
log.info(`Processing ${request.url}...`)
await page.waitForTimeout(100000)
},
failedRequestHandler({ request, log }) {
log.info(`Request ${request.url} failed too many times.`)
},
// browserPoolOptions: {
// useFingerprints: false,
// },
})

await crawler.addRequests([
'https://abrahamjuliot.github.io/creepjs/'
])

await crawler.run()

console.log('Crawler finished.')
import { PlaywrightCrawler } from 'crawlee'

// const proxyConfiguration = new ProxyConfiguration({
// proxyUrls: [
// '...'
// ],
// })

const crawler: PlaywrightCrawler = new PlaywrightCrawler({
launchContext: {
launchOptions: {
headless: false,
// channel: 'chrome',
// viewport: null,
},
},
// proxyConfiguration,
maxRequestRetries: 0,
maxRequestsPerCrawl: 5,
sessionPoolOptions: {
blockedStatusCodes: [],
},
async requestHandler({ request, page, log }) {
log.info(`Processing ${request.url}...`)
await page.waitForTimeout(100000)
},
failedRequestHandler({ request, log }) {
log.info(`Request ${request.url} failed too many times.`)
},
// browserPoolOptions: {
// useFingerprints: false,
// },
})

await crawler.addRequests([
'https://abrahamjuliot.github.io/creepjs/'
])

await crawler.run()

console.log('Crawler finished.')
4 Replies
Hall
Hall5mo ago
Someone will reply to you shortly. In the meantime, this might help: -# This post was marked as solved by Matous. View answer.
optimistic-gold
optimistic-goldOP5mo ago
After some frenetic debugging trying everything I can think of (removing node modules, user data dir, browsers and reinstalling everything), it appears that the issue was with bun. Not sure what in particular was causing it but it must have been somehow running cached code.
NeoNomade
NeoNomade5mo ago
from what I remember bun is still throwing errors when it's combined with Crawlee. Some internal packages complaining. Is theere any particular reason you want to use bun ?
optimistic-gold
optimistic-goldOP5mo ago
Just that it's fast and works well generally. Issues seem to have resolved but if they continue I'll probably jump to pnpm

Did you find this page helpful?