variable-lime
variable-lime•16mo ago

TargetClosedError

👋 Im always getting error after 5 minutes of start scraping(works normally until it gets error)
import { PlaywrightCrawler, ProxyConfiguration } from "crawlee"

const crawler = new PlaywrightCrawler({
proxyConfiguration: new ProxyConfiguration({
proxyUrls: [
...
],
}),
async requestHandler({ request, page, enqueueLinks, pushData, log }) {
const title = await page.title()
const content = await page.content()
log.info(`URL: ${request.loadedUrl} || TITLE: '${title}'`)
const links = await page.$$eval(
"a.button.button-join.is-discord",
(links) => links.map((link) => link.getAttribute("href"))
)

await pushData({
title,
url: request.loadedUrl,
content,
links,
})

await enqueueLinks()
},
})

await crawler.run(["....."])
import { PlaywrightCrawler, ProxyConfiguration } from "crawlee"

const crawler = new PlaywrightCrawler({
proxyConfiguration: new ProxyConfiguration({
proxyUrls: [
...
],
}),
async requestHandler({ request, page, enqueueLinks, pushData, log }) {
const title = await page.title()
const content = await page.content()
log.info(`URL: ${request.loadedUrl} || TITLE: '${title}'`)
const links = await page.$$eval(
"a.button.button-join.is-discord",
(links) => links.map((link) => link.getAttribute("href"))
)

await pushData({
title,
url: request.loadedUrl,
content,
links,
})

await enqueueLinks()
},
})

await crawler.run(["....."])
12 Replies
variable-lime
variable-limeOP•16mo ago
error:
C:\Users\kamil\Documents\discordlists-scraper\node_modules\playwright-core\lib\server\chromium\crPage.js:500
this._firstNonInitialNavigationCommittedReject(new _errors.TargetClosedError());
^

TargetClosedError: Target page, context or browser has been closed
at FrameSession.dispose (C:\Users\kamil\Documents\discordlists-scraper\node_modules\playwright-core\lib\server\chromium\crPage.js:500:52)
at CRPage.didClose (C:\Users\kamil\Documents\discordlists-scraper\node_modules\playwright-core\lib\server\chromium\crPage.js:162:60)
at CRBrowser._onDetachedFromTarget (C:\Users\kamil\Documents\discordlists-scraper\node_modules\playwright-core\lib\server\chromium\crBrowser.js:200:14)
at CRSession.emit (node:events:518:28)
at CRSession.emit (node:domain:488:12)
at C:\Users\kamil\Documents\discordlists-scraper\node_modules\playwright-core\lib\server\chromium\crConnection.js:160:14
at runNextTicks (node:internal/process/task_queues:60:5)
at process.processImmediate (node:internal/timers:449:9)
at process.topLevelDomainCallback (node:domain:160:15)
at process.callbackTrampoline (node:internal/async_hooks:128:24)
C:\Users\kamil\Documents\discordlists-scraper\node_modules\playwright-core\lib\server\chromium\crPage.js:500
this._firstNonInitialNavigationCommittedReject(new _errors.TargetClosedError());
^

TargetClosedError: Target page, context or browser has been closed
at FrameSession.dispose (C:\Users\kamil\Documents\discordlists-scraper\node_modules\playwright-core\lib\server\chromium\crPage.js:500:52)
at CRPage.didClose (C:\Users\kamil\Documents\discordlists-scraper\node_modules\playwright-core\lib\server\chromium\crPage.js:162:60)
at CRBrowser._onDetachedFromTarget (C:\Users\kamil\Documents\discordlists-scraper\node_modules\playwright-core\lib\server\chromium\crBrowser.js:200:14)
at CRSession.emit (node:events:518:28)
at CRSession.emit (node:domain:488:12)
at C:\Users\kamil\Documents\discordlists-scraper\node_modules\playwright-core\lib\server\chromium\crConnection.js:160:14
at runNextTicks (node:internal/process/task_queues:60:5)
at process.processImmediate (node:internal/timers:449:9)
at process.topLevelDomainCallback (node:domain:160:15)
at process.callbackTrampoline (node:internal/async_hooks:128:24)
5 minutes of crawling -> this error (im not doing anything when crawler runs bcs my cpu usage is high) fixed, for others facing this issue: its autoscaling issue. best solution i found is using pm2 for autorestarting on error, queue must be saved then on restart it will continue from error moment
lemurio
lemurio•16mo ago
hey, which version of crawlee are you using?
HonzaS
HonzaS•16mo ago
I think I have the similar problem after some time ( not always 5 minutes )the playwright crawler throws error that kills whole run I have set restartOnError as you can see in the log, but it is far from ideal
No description
lemurio
lemurio•16mo ago
hey, the team knows about this issue, in the meantime you could downgrade to 3.8.2
HonzaS
HonzaS•16mo ago
@lemurio I have now "crawlee": "^3.5.4", in package.json when I have tried this "crawlee": "3.8.2", I get this WARN PlaywrightCrawler:AutoscaledPool:Snapshotter: Memory is critically overloaded. Using 2646 MB of 977 MB (271%). Consider increasing available memory. in log and concurrency goes to 1
NeoNomade
NeoNomade•16mo ago
This can also be related that to the fact that if you don't close pages manually the handlers are keeping them for a very loooong time
HonzaS
HonzaS•16mo ago
@NeoNomade so you have await page.close() at the end of every handler function?
NeoNomade
NeoNomade•16mo ago
Absolutely
HonzaS
HonzaS•16mo ago
Thank you a lot! This really solved the problem. Not even one TargetClosedError anymore. So it looks like it is a bug in the crawlee library that it does not close pages fast enough.
NeoNomade
NeoNomade•16mo ago
It seems like crawlee dev team doesn't accept many things I also proposed images to be with Alpine since they are safer and a lot smaller. But they keep using debian
Saurav Jain
Saurav Jain•16mo ago
Hey, sorry for the bug. can you please open a bug report with explanation of it here: https://github.com/apify/crawlee/issues/new?assignees=&labels=bug&projects=&template=bug_report.yml I will take it to the team, thanks!
GitHub
Build software better, together
GitHub is where people build software. More than 100 million people use GitHub to discover, fork, and contribute to over 420 million projects.
From An unknown user
From An unknown user
From An unknown user
Lukas Krivka
Lukas Krivka•15mo ago
Hello, the this._firstNonInitialNavigationCommittedReject(new _errors.TargetClosedError() error is finally fixed in latest Crawlee. see https://github.com/apify/crawlee/releases/tag/v3.10.3
GitHub
Release v3.10.3 · apify/crawlee
3.10.3 (2024-06-07) Bug Fixes adaptive-crawler: log only once for the committed request handler execution (#2524) (533bd3f) increase timeout for retiring inactive browsers (#2523) (195f176) respec...

Did you find this page helpful?