Apify Discord Mirror

Home
Members
ThePhantom
T
ThePhantom
Offline, last seen 5 months ago
Joined August 30, 2024
Hey,

I'm facing this issue:
Error: Function newContext() is not available in incognito mode
at PlaywrightBrowser.newContext (xxxxx\node_modules@crawlee\browser-pool\playwright\playwright-browser.js:69:15)

Here's the code that triggers it:
Plain Text
const browser = await launchPlaywright();

try {
    const context = await browser.newContext({
    ...


As per Playwright docs:
Plain Text
Playwright allows creating "incognito" browser contexts with browser.newContext() method. "Incognito" browser contexts don't write any browsing data to disk.


ref: https://playwright.dev/docs/api/class-browsercontext

Why would it be not allowed in Crawlee if Playwright supports it?
1 comment
L
Hey,

I'm facing some captchas, reCaptcha V2 in this case. After I solve the captcha, then it'll 'mark' me as 'safe' and I can continue scraping. But I'm wondering how should I approach getting the captcha programmatically, solve it and send back the required response. This way I can run on a server and 'whitelist' it's IP or do the same for proxies(it keeps throwing captchas on proxies too!).

Just not sure how to do all this in code.
7 comments
L
m
T
a
A
Hey,

I'm playing around with a CheerioCrawler and I've noticed requests failing due to the errors in the title. I'm wondering if it has something to do with my setup(pretty straight-forward and tested before without issues), the source(had no issue with it before as well) or it's something else that I'm missing.
Has anyone faced one or both of these errors before?
2 comments
T
L
I'm trying to grab the next page link from: https://www.haskovo.net/news with:
Plain Text
await enqueueLinks({
        selector: '.pagination li:last-child > a',
        label: 'LIST',
    })


But it won't work. I've checked this(+ other selectors) in DevTools and it grabs the element fine.

What am I missing?

PS: I'm just messing around, trying to get the grasp of things. I'm aware that I can grab the whole thing with Cheerio, but I want a 'proof of concept' with PlaywrightCrawler.
9 comments
A
T
L
A
Hey!

I'm trying different proxy providers and I've noticed the issue in the title.

I'm setting the proxy in
Plain Text
proxyUrls
in the following format:

Plain Text
http://user:pass@host:port
as I usually do. But with the current providers I'm testing, the request will fail with either 407 (Proxy Authentication Required) or 422 responses.

Strangely enough, if tried with
Plain Text
curl -x 'proxy string from the same providers, in the same format' https://example.com
- it works.

Any idea what could be causing it?
10 comments
T
L
A
Hey,

I've had a Cheerio crawler running for couple of hours, but it crashed. I'm wondering if it's possible to renew the crawl from the place it stopped at. I can see there are some files left in the
Plain Text
key_value_stores
dir:
3 comments
L
A
T