Apify Discord Mirror

Updated 5 months ago

chromium.launchpersistentcontext with crawlee

At a glance

The community member is trying to combine the use of persistent browser contexts from the Playwright library with the Crawlee framework. They want to create a pool of logged-in Instagram user sessions that can be reused across script runs, to avoid having to log in again if the cookies are still valid, and only create new accounts when necessary.

Other community members suggest using Crawlee's KeyValueStore to store and retrieve the valid cookies, and setting launchContext: { useIncognitoPages: true } and browserPoolOptions: { maxOpenPagesPerBrowser: 1 } to manage the browser contexts. An alternative approach is to use the BasicCrawler and handle the Playwright contexts manually within the request handlers.

Useful resources
Hi everyone, this doc: https://docs.apify.com/academy/puppeteer-playwright/browser-contexts
shows how to use persistent context when working with pure playwright. But how can I combine this with crawlee? is there a configuration for this while calling PlayWrightCrawler(...)? or a way to get similar behaviour?
P
l
4 comments
Hi ,
The persistentContext is just a tool, it depends for which purpose you want to use it, there might be better solution for your needs. Can you describe little bit more what are you trying to achieve?
Hi thank you for responding.
I'm trying to essentially create a pool of logged in users for instagram scrapping. I have a script that creates a user, a script that logs in, and I want to maintain a pool of browser contexts that are saved and loaded from some file when I rerun the script, so that the script will only log in if a cookie expired & only create an account when one was blocked.
Thanks for the description.

So the flow could be:
  • check KeyValueStore for valid cookies
    • if there are any check if they are not expired and use them
    • otherwise do the login again and save new cookies to the KeyValue Store.
  • I also suggest to use
Plain Text
launchContext: {
    useIncognitoPages: true
},
browserPoolOptions: {
  maxOpenPagesPerBrowser: 1
}


Another approache would be to use BasicCrawler and managing Playwright contexts totally by yoursel inside the handlers.
Thank you! I'll try doing that πŸ™‚
Add a reply
Sign up and join the conversation on Discord