Apify Discord Mirror

Updated 4 months ago

How can I use the Playwright Crawler and BeautifulSoup Crawler in the same Actor?

At a glance

The post describes a challenge where the community member wants to use Playwright to fill in and submit a website search page with dynamic JavaScript, and then use BeautifulSoup to open and parse the information from each product page. However, they mention that using Playwright to open each product page takes a long time, and they cannot run both crawlers at the same time.

The comments suggest that the community member is looking to build their own actor (a type of web scraper) using Playwright and BeautifulSoup. They want to first send an HTTP request to get the HTML, use BeautifulSoup to parse the data, and then use Playwright to open the links obtained from the parsing. The discussion indicates that this can be implemented either within a single actor or using a bundle of two actors.

While there is no explicitly marked answer, the community members provide some guidance on how to approach this problem, suggesting that the integration of HTTP client, BeautifulSoup, and Playwright within an actor should not be too difficult, and that the community member should refer to the official documentation for Playwright instantiation in an actor.

Useful resources
This is so that Playwright can fill in and submit a website search page which uses dynamic Javascript. When the results are shown I want to be able to use the BeautifulSoup crawler to open each product page and parse the information. If I use Playwright to open each product page, this takes a very long time. I cannot seem to run both Crawlers at the same time.
1
H
A
M
7 comments
The link contains the answer
I want to build my own actor with playwright and BeautifulSoup.
I am looking for this exactly solution. first I want to send a Http request and get the HTML and use the beautifulSoup to parse the data and then open the Links (get from parsing the data) using playwrights .

correct me If I am wrong.
first use the python with beautifulSoup and get the results and use those result with Playwright.
so we have to create and build 2 different actor for this ?
Hi @Abdul

The discussion above concerns the use of - crawlee-python

In Actor, you can implement the use of Http client + BeautifulSoup and Playwright, either within a single Actor or using a bundle of two Actors.
@Mantisus just advanced to level 4! Thanks for your contributions! πŸŽ‰
Thanks for clarifying it. do you have anything that will be helpful for me to start working on actor with HTTP client + BeautifulSoup + PlayWright
No, I don't have any code samples like that. Since I don't usually use Playwright and browser automation.

But writing such an Actor is not much different from just writing a scrapper using such a bundle.

Refer to the official documentation to see Playwright instantiation in Actor - https://docs.apify.com/sdk/python/docs/guides/playwright

Add on top of HTTP Client + BeautifulSoup integration will not be a problem.
Add a reply
Sign up and join the conversation on Discord