Addressing playwright memory limitations in crawlee

Hello,

I am currently using crawlee on a medium sized project and I am generally happy with it. I am targeting e-commerce websites and I am interested in the presentation of various products on the website, therefore I opted of a browser automation solution, to be able to "see" the page.

I am using playwright as the browser automation tool. Recently I noticed some of my scraping instances fail with the following error:

While handling this request, the container instance was found to be using too much memory and was terminated.

While handling this request, the container instance was found to be using too much memory and was terminated.

I did some digging around the web and I found the following:
https://stackoverflow.com/questions/72954376/python-playwright-memory-overlad

It seems that the playwright context just grows over time. It is a known issue, but playwright itself will not handle this because they are primarily a web testing tool, not a scraping tool.

The mentioned solution is to save the state of the context on the disk, and restart the context every once in a while. I was wondering if crawlee has any out of the box functionality that applies this solution. If not, did anyone else encounter the problem? How did you fix it?

Stack Overflow

Python Playwright memory overlad

I made a code that scrapy a website continuously and after several times a got this message
<--- Last few GCs --->

[17744:00000270608DE2C0] 16122001 ms: Scavenge 2023.5 (2082.0) ->
2017.3...

Apify & Crawlee•4y ago•

6 replies

awake-maroon

Addressing playwright memory limitations in crawlee

While handling this request, the container instance was found to be using too much memory and was terminated.

While handling this request, the container instance was found to be using too much memory and was terminated.

Stack Overflow

Python Playwright memory overlad

I made a code that scrapy a website continuously and after several times a got this message
<--- Last few GCs --->

[17744:00000270608DE2C0] 16122001 ms: Scavenge 2023.5 (2082.0) ->
2017.3...

Addressing playwright memory limitations in crawlee

Similar Threads

Addressing playwright memory limitations in crawlee

Similar Threads

Similar Threads

Similar Threads