

crawler. Runapplication/jsoncrawlee.devgithub.com https://google.com/search?q=restaurantsmaxRequestsPerCrawlmaxRequestsPerCrawl({ json })await enqueueLinks()EnqueueStrategy'https://www.google.com/search?q=restaurants''https://google.com/search?q=restaurants'SameDomainCheerioCrawler: Reclaiming failed request back to the list or queue. Request blocked - received 429 status code.
{"id":"lbvAGmHKVGPGH6n","url":"https://google.com/search?q=restaurants","retryCount":2}
www.Requests made through the proxy are automatically routed through a proxy server from the selected country and pure HTML code of the search result page is returned.
Important: Only HTTP requests are allowed, and the Google hostname needs to start with the www. prefix.
For code examples on how to connect to Google SERP proxies, see the examples page.{ url, uniqueKey: [GENERATE_RANDOM_KEY_OR_USE_COUNTER] }#COUNTERbodyimport { CheerioCrawler, createCheerioRouter, EnqueueStrategy } from 'crawlee';
const startUrls = ['https://www.google.com/search?q=restaurants'];
const searchPageNavUrlSelector = 'div[role="navigation"] table a';
const searchResultsUrlSelector = 'div[id="search"] div[data-sokoban-container] a[data-ved]';
export const router = createCheerioRouter();
router.addDefaultHandler(async ({ enqueueLinks, log, request }) => {
log.info(`Search page`, { url: request.loadedUrl });
await enqueueLinks({
strategy: EnqueueStrategy.SameDomain,
selector: searchPageNavUrlSelector,
});
await enqueueLinks({
strategy: EnqueueStrategy.All,
selector: searchResultsUrlSelector,
label: 'SEARCH_RESULT_URL',
});
});
router.addHandler('SEARCH_RESULT_URL', async ({ request, log }) => {
log.info(`Search result url:`, { url: request.loadedUrl });
});
const crawler = new CheerioCrawler({
requestHandler: router,
// This still is a safeguard only in this implementation.
maxRequestsPerCrawl: 30,
});
await crawler.run(startUrls);