rival-black
rival-black•3y ago

Bet 365 crawler

i'm having some issues to webscrap the bet 365 website, anyone knows how to bypass the bet365 security?
5 Replies
Lukas Krivka
Lukas Krivka•3y ago
Betting sites are heavily protected. The go to standard is Playwright + Firefox and using residential proxies
rival-black
rival-blackOP•3y ago
I already tried but without success
// Add import of CheerioCrawler
import { RequestQueue, PlaywrightCrawler, ProxyConfiguration } from "crawlee";
import { firefox } from "playwright";

const requestQueue = await RequestQueue.open();
await requestQueue.addRequest({ url: "https://www.bet365.com/#/HO/" });

// Create the crawler and add the queue with our URL
// and a request handler to process the page.
const crawler = new PlaywrightCrawler({
requestQueue,
// The `$` argument is the Cheerio object
// which contains parsed HTML of the website.
headless: false,
launchContext: {
// Set the Firefox browser to be used by the crawler.
// If launcher option is not specified here,
// default Chromium browser will be used.
launcher: firefox,
},
proxyConfiguration: new ProxyConfiguration({
proxyUrls: ["http://168.195.252.87:8080"],
}),
async requestHandler({ request, page }) {
// Extract <title> text with Cheerio.
// See Cheerio documentation for API docs.
const title = (await page.$("title"))?.getProperties();
await page.waitForTimeout(10000);
await page.screenshot({ path: "example.png" });
await page.waitForSelector(
"body > div:nth-child(1) > div > div.wc-WebConsoleModule_SiteContainer > div.wc-PageView > div > div > div.wcl-CommonElementStyle_WebNav > div > div.wn-Menu.wn-Menu-mousemode.wn-Menu_SiteSearch > div > div.wn-FrequentContainer > div.wn-PreMatchFrequentItem"
);
console.log(`The title of "${request.url}" is: ${title}.`);
},
});

// Start the crawler and wait for it to finish
await crawler.run();
// Add import of CheerioCrawler
import { RequestQueue, PlaywrightCrawler, ProxyConfiguration } from "crawlee";
import { firefox } from "playwright";

const requestQueue = await RequestQueue.open();
await requestQueue.addRequest({ url: "https://www.bet365.com/#/HO/" });

// Create the crawler and add the queue with our URL
// and a request handler to process the page.
const crawler = new PlaywrightCrawler({
requestQueue,
// The `$` argument is the Cheerio object
// which contains parsed HTML of the website.
headless: false,
launchContext: {
// Set the Firefox browser to be used by the crawler.
// If launcher option is not specified here,
// default Chromium browser will be used.
launcher: firefox,
},
proxyConfiguration: new ProxyConfiguration({
proxyUrls: ["http://168.195.252.87:8080"],
}),
async requestHandler({ request, page }) {
// Extract <title> text with Cheerio.
// See Cheerio documentation for API docs.
const title = (await page.$("title"))?.getProperties();
await page.waitForTimeout(10000);
await page.screenshot({ path: "example.png" });
await page.waitForSelector(
"body > div:nth-child(1) > div > div.wc-WebConsoleModule_SiteContainer > div.wc-PageView > div > div > div.wcl-CommonElementStyle_WebNav > div > div.wn-Menu.wn-Menu-mousemode.wn-Menu_SiteSearch > div > div.wn-FrequentContainer > div.wn-PreMatchFrequentItem"
);
console.log(`The title of "${request.url}" is: ${title}.`);
},
});

// Start the crawler and wait for it to finish
await crawler.run();
i'm using this configuration
Lukas Krivka
Lukas Krivka•3y ago
And how you are getting blocked?
rival-black
rival-blackOP•3y ago
Yep
MEE6
MEE6•3y ago
@Manuel Antunes just advanced to level 1! Thanks for your contributions! 🎉

Did you find this page helpful?