Proxy settings appear to be cached

🎭PlaywrightCrawler👨‍💻Web-Scraping

Hi,

I'm trying to use residential proxies on a playwright crawler, but it appears that even when I comment out the proxyConfiguration there is still an attempt to use a proxy. Created a fresh project to create a minimal test to debug and it worked fine, until I had a proxy failure, and then it happened again.

The error is: WARN PlaywrightCrawler: Reclaiming failed request back to the list or queue. Detected a session error, rotating session...
goto: net::ERR_TUNNEL_CONNECTION_FAILED

so clearly it's trying to use a proxy. I have verified this by looking at the process arguments that include --proxy-bypass-list=<-loopback>
--proxy-server=http://127.0.0.1:63572 . Any ideas? It's driving me insane.

Code as follows:

 import { PlaywrightCrawler } from 'crawlee'

// const proxyConfiguration = new ProxyConfiguration({
//   proxyUrls: [
//     '...'
//   ],
// })

const crawler: PlaywrightCrawler = new PlaywrightCrawler({
  launchContext: {
    launchOptions: {
      headless: false,
      // channel: 'chrome',
      // viewport: null,
    },
  },
  // proxyConfiguration,
  maxRequestRetries: 0,
  maxRequestsPerCrawl: 5,
  sessionPoolOptions: {
    blockedStatusCodes: [],
  },
  async requestHandler({ request, page, log }) {
    log.info(`Processing ${request.url}...`)
    await page.waitForTimeout(100000)
  },
  failedRequestHandler({ request, log }) {
    log.info(`Request ${request.url} failed too many times.`)
  },
  // browserPoolOptions: {
  //   useFingerprints: false,
  // },
})

await crawler.addRequests([
  'https://abrahamjuliot.github.io/creepjs/'
])

await crawler.run()

console.log('Crawler finished.')

 import { PlaywrightCrawler } from 'crawlee'

// const proxyConfiguration = new ProxyConfiguration({
//   proxyUrls: [
//     '...'
//   ],
// })

const crawler: PlaywrightCrawler = new PlaywrightCrawler({
  launchContext: {
    launchOptions: {
      headless: false,
      // channel: 'chrome',
      // viewport: null,
    },
  },
  // proxyConfiguration,
  maxRequestRetries: 0,
  maxRequestsPerCrawl: 5,
  sessionPoolOptions: {
    blockedStatusCodes: [],
  },
  async requestHandler({ request, page, log }) {
    log.info(`Processing ${request.url}...`)
    await page.waitForTimeout(100000)
  },
  failedRequestHandler({ request, log }) {
    log.info(`Request ${request.url} failed too many times.`)
  },
  // browserPoolOptions: {
  //   useFingerprints: false,
  // },
})

await crawler.addRequests([
  'https://abrahamjuliot.github.io/creepjs/'
])

await crawler.run()

console.log('Crawler finished.')

Apify & Crawlee•11mo ago•

5 replies

endless-jade

Proxy settings appear to be cached

🎭PlaywrightCrawler👨‍💻Web-Scraping

 import { PlaywrightCrawler } from 'crawlee'

// const proxyConfiguration = new ProxyConfiguration({
//   proxyUrls: [
//     '...'
//   ],
// })

const crawler: PlaywrightCrawler = new PlaywrightCrawler({
  launchContext: {
    launchOptions: {
      headless: false,
      // channel: 'chrome',
      // viewport: null,
    },
  },
  // proxyConfiguration,
  maxRequestRetries: 0,
  maxRequestsPerCrawl: 5,
  sessionPoolOptions: {
    blockedStatusCodes: [],
  },
  async requestHandler({ request, page, log }) {
    log.info(`Processing ${request.url}...`)
    await page.waitForTimeout(100000)
  },
  failedRequestHandler({ request, log }) {
    log.info(`Request ${request.url} failed too many times.`)
  },
  // browserPoolOptions: {
  //   useFingerprints: false,
  // },
})

await crawler.addRequests([
  'https://abrahamjuliot.github.io/creepjs/'
])

await crawler.run()

console.log('Crawler finished.')

 import { PlaywrightCrawler } from 'crawlee'

// const proxyConfiguration = new ProxyConfiguration({
//   proxyUrls: [
//     '...'
//   ],
// })

const crawler: PlaywrightCrawler = new PlaywrightCrawler({
  launchContext: {
    launchOptions: {
      headless: false,
      // channel: 'chrome',
      // viewport: null,
    },
  },
  // proxyConfiguration,
  maxRequestRetries: 0,
  maxRequestsPerCrawl: 5,
  sessionPoolOptions: {
    blockedStatusCodes: [],
  },
  async requestHandler({ request, page, log }) {
    log.info(`Processing ${request.url}...`)
    await page.waitForTimeout(100000)
  },
  failedRequestHandler({ request, log }) {
    log.info(`Request ${request.url} failed too many times.`)
  },
  // browserPoolOptions: {
  //   useFingerprints: false,
  // },
})

await crawler.addRequests([
  'https://abrahamjuliot.github.io/creepjs/'
])

await crawler.run()

console.log('Crawler finished.')

Proxy settings appear to be cached

Similar Threads

Proxy settings appear to be cached

Similar Threads

Similar Threads

Similar Threads