3 replies

Hey ,why do i get web scrapping of first url , since i have another url .

I am implemented Playwright crawler to parse the url , I made a single request to crawler with first url, since the request has been processing , meanwhile , i passed anotther url in craler and hit the request, While processing, through crawler, it is processing content from first url , instead of second url both times. Can be please help?

async def run_crawler(url, domain_name, save_path=None):
print("doc url inside crawler file====================================>", url)
crawler = PlaywrightCrawler(
max_requests_per_crawl=10,
browser_type='firefox',
)

@crawler.router.default_handler
async def request_handler(context: PlaywrightCrawlingContext) -> None:
context.log.info(f'Processing {url} ...')

links = await context.page.evaluate(f'''() => {{
return Array.from(document.querySelectorAll('a[href*="{domain_name}"]'))
.map(a => a.href);
}}''')

await context.enqueue_links(urls=links)

elements = await context.page.evaluate(PW_SCRAPING_CODE)

data = {
'url': url,
'title': await context.page.title(),
'content': elements
}
print("datat =================>", data)

await context.push_data(data)

await crawler.run([url])

i am calling the craler using

Hey ,why do i get web scrapping of first url , since i have another url .

Similar Threads