variable-lime•16mo ago

TargetClosedError

👋 Im always getting error after 5 minutes of start scraping(works normally until it gets error)

import { PlaywrightCrawler, ProxyConfiguration } from "crawlee"

const crawler = new PlaywrightCrawler({
    proxyConfiguration: new ProxyConfiguration({
        proxyUrls: [
            ...
        ],
    }),
    async requestHandler({ request, page, enqueueLinks, pushData, log }) {
        const title = await page.title()
        const content = await page.content()
        log.info(`URL: ${request.loadedUrl} || TITLE: '${title}'`)
        const links = await page.$$eval(
            "a.button.button-join.is-discord",
            (links) => links.map((link) => link.getAttribute("href"))
        )

        await pushData({
            title,
            url: request.loadedUrl,
            content,
            links,
        })

        await enqueueLinks()
    },
})

await crawler.run(["....."])

import { PlaywrightCrawler, ProxyConfiguration } from "crawlee"

const crawler = new PlaywrightCrawler({
    proxyConfiguration: new ProxyConfiguration({
        proxyUrls: [
            ...
        ],
    }),
    async requestHandler({ request, page, enqueueLinks, pushData, log }) {
        const title = await page.title()
        const content = await page.content()
        log.info(`URL: ${request.loadedUrl} || TITLE: '${title}'`)
        const links = await page.$$eval(
            "a.button.button-join.is-discord",
            (links) => links.map((link) => link.getAttribute("href"))
        )

        await pushData({
            title,
            url: request.loadedUrl,
            content,
            links,
        })

        await enqueueLinks()
    },
})

await crawler.run(["....."])

12 Replies

variable-limeOP•16mo ago

error:

C:\Users\kamil\Documents\discordlists-scraper\node_modules\playwright-core\lib\server\chromium\crPage.js:500
    this._firstNonInitialNavigationCommittedReject(new _errors.TargetClosedError());
                                                   ^

TargetClosedError: Target page, context or browser has been closed
    at FrameSession.dispose (C:\Users\kamil\Documents\discordlists-scraper\node_modules\playwright-core\lib\server\chromium\crPage.js:500:52)
    at CRPage.didClose (C:\Users\kamil\Documents\discordlists-scraper\node_modules\playwright-core\lib\server\chromium\crPage.js:162:60)
    at CRBrowser._onDetachedFromTarget (C:\Users\kamil\Documents\discordlists-scraper\node_modules\playwright-core\lib\server\chromium\crBrowser.js:200:14)
    at CRSession.emit (node:events:518:28)
    at CRSession.emit (node:domain:488:12)
    at C:\Users\kamil\Documents\discordlists-scraper\node_modules\playwright-core\lib\server\chromium\crConnection.js:160:14
    at runNextTicks (node:internal/process/task_queues:60:5)
    at process.processImmediate (node:internal/timers:449:9)
    at process.topLevelDomainCallback (node:domain:160:15)
    at process.callbackTrampoline (node:internal/async_hooks:128:24)

C:\Users\kamil\Documents\discordlists-scraper\node_modules\playwright-core\lib\server\chromium\crPage.js:500
    this._firstNonInitialNavigationCommittedReject(new _errors.TargetClosedError());
                                                   ^

TargetClosedError: Target page, context or browser has been closed
    at FrameSession.dispose (C:\Users\kamil\Documents\discordlists-scraper\node_modules\playwright-core\lib\server\chromium\crPage.js:500:52)
    at CRPage.didClose (C:\Users\kamil\Documents\discordlists-scraper\node_modules\playwright-core\lib\server\chromium\crPage.js:162:60)
    at CRBrowser._onDetachedFromTarget (C:\Users\kamil\Documents\discordlists-scraper\node_modules\playwright-core\lib\server\chromium\crBrowser.js:200:14)
    at CRSession.emit (node:events:518:28)
    at CRSession.emit (node:domain:488:12)
    at C:\Users\kamil\Documents\discordlists-scraper\node_modules\playwright-core\lib\server\chromium\crConnection.js:160:14
    at runNextTicks (node:internal/process/task_queues:60:5)
    at process.processImmediate (node:internal/timers:449:9)
    at process.topLevelDomainCallback (node:domain:160:15)
    at process.callbackTrampoline (node:internal/async_hooks:128:24)

5 minutes of crawling -> this error (im not doing anything when crawler runs bcs my cpu usage is high) fixed, for others facing this issue: its autoscaling issue. best solution i found is using pm2 for autorestarting on error, queue must be saved then on restart it will continue from error moment

lemurio•16mo ago

hey, which version of crawlee are you using?

HonzaS•16mo ago

I think I have the similar problem after some time ( not always 5 minutes )the playwright crawler throws error that kills whole run I have set restartOnError as you can see in the log, but it is far from ideal

lemurio•16mo ago

hey, the team knows about this issue, in the meantime you could downgrade to 3.8.2

HonzaS•16mo ago

@lemurio I have now "crawlee": "^3.5.4", in package.json when I have tried this "crawlee": "3.8.2", I get this

WARN  PlaywrightCrawler:AutoscaledPool:Snapshotter: Memory is critically overloaded. Using 2646 MB of 977 MB (271%). Consider increasing available memory.

in log and concurrency goes to 1

NeoNomade•16mo ago

This can also be related that to the fact that if you don't close pages manually the handlers are keeping them for a very loooong time

HonzaS•16mo ago

@NeoNomade so you have await page.close() at the end of every handler function?

NeoNomade•16mo ago

Absolutely

HonzaS•16mo ago

Thank you a lot! This really solved the problem. Not even one TargetClosedError anymore. So it looks like it is a bug in the crawlee library that it does not close pages fast enough.

NeoNomade•16mo ago

It seems like crawlee dev team doesn't accept many things I also proposed images to be with Alpine since they are safer and a lot smaller. But they keep using debian

Saurav Jain•16mo ago

Hey, sorry for the bug. can you please open a bug report with explanation of it here: https://github.com/apify/crawlee/issues/new?assignees=&labels=bug&projects=&template=bug_report.yml I will take it to the team, thanks!

GitHub

Build software better, together

GitHub is where people build software. More than 100 million people use GitHub to discover, fork, and contribute to over 420 million projects.

Lukas Krivka•15mo ago

Hello, the this._firstNonInitialNavigationCommittedReject(new _errors.TargetClosedError() error is finally fixed in latest Crawlee. see https://github.com/apify/crawlee/releases/tag/v3.10.3

GitHub

Release v3.10.3 · apify/crawlee

3.10.3 (2024-06-07) Bug Fixes adaptive-crawler: log only once for the committed request handler execution (#2524) (533bd3f) increase timeout for retiring inactive browsers (#2523) (195f176) respec...

TargetClosedError

Did you find this page helpful?