Drogbata
Drogbata2mo ago

Session Cookies

When i use bs4 crawler i would expect that session stores received cookies and then reuses in next requests but SessionCookies are empty. Any idea why and how to make it work?
3 Replies
Mantisus
Mantisus2mo ago
Hey @Drogbata SessionCookies are tied to a specific Session. I think you are checking SessionCookies in another Session when you check this. You can either limit SessionPool to 1 session. Or store cookies in some external variable and pass cookies to each Session, for example, in pre_navigation_hook. Documentation that may be useful to you https://crawlee.dev/python/docs/guides/logging-in-with-a-crawler https://crawlee.dev/python/docs/guides/session-management
Drogbata
DrogbataOP2mo ago
@Mantisus I am using 1 session, my requests also are tied to specific session_id and doesnt work. so pre_navigation_hook would be my solution but i had to ask because that seems like core feature and not something i have to workaround.
Mantisus
Mantisus2mo ago
@Drogbata сan you provide an example of the code? With the following test code, it works as expected.
async def test():
crawler = BeautifulSoupCrawler()

@crawler.router.default_handler
async def set_cookies(ctx: BeautifulSoupCrawlingContext):
ctx.log.info(f'Cookies set to {ctx.session.id}')

await ctx.add_requests([Request.from_url('https://httpbin.org/cookies', label='GET', session_id=ctx.session.id)])

@crawler.router.handler('GET')
async def get_cookies(ctx: BeautifulSoupCrawlingContext):
ctx.log.info(f'Cookies retrieved from {ctx.session.id}')
print(ctx.session.cookies)
print(ctx.http_response.read())

await crawler.run(['https://httpbin.org/cookies/set/a/1'])
async def test():
crawler = BeautifulSoupCrawler()

@crawler.router.default_handler
async def set_cookies(ctx: BeautifulSoupCrawlingContext):
ctx.log.info(f'Cookies set to {ctx.session.id}')

await ctx.add_requests([Request.from_url('https://httpbin.org/cookies', label='GET', session_id=ctx.session.id)])

@crawler.router.handler('GET')
async def get_cookies(ctx: BeautifulSoupCrawlingContext):
ctx.log.info(f'Cookies retrieved from {ctx.session.id}')
print(ctx.session.cookies)
print(ctx.http_response.read())

await crawler.run(['https://httpbin.org/cookies/set/a/1'])

Did you find this page helpful?