like-gold
like-gold2y ago

Scraping Lazada.

Hi everyone has anyone had success in scraping lazada retailer (i.e. https://www.lazada.co.th/products/2511-2-5-i5008341785-s21162506684.html) . This is just 1 country example, they are active on multiple markets ( th, my, vn, sg, id). I've had success in crawling the store pages using &ajax=true in their url of a store and retrying the request until it works. i.e https://www.lazada.co.th/junkins/?q=All-Products&from=wangpu&langFlag=th&pageTypeId=2&ajax=true . However I didn't had any success in accessing the product page itself as it always gives me a captcha which is not solvable and just asks you refresh the page. I've tried puppeteer with live cookies generated from a 3rd party service with residential proxies and different puppeteer configs. Tried to mimic a user as much as possible however I'm still getting blocked. I'm using puppeteer on version-3. Can anyone help find a solution to access their product page with a bot ? Thanks in advance!!!
1 Reply
automatic-azure
automatic-azure2y ago
Hi @Teodor , are you blocked when you start Chrome manually on your local computer and then connect Puppeteer to it via the CDP protocol? I use this method when troubleshooting, because the browser environment is very close to a non-automated one (to circumvent browser fingerprinting technics). For example you can start Chrome like this:
"path/to/Google Chrome" --remote-debugging-port=9222 --proxy-server=http://localhost:8080 --disable-web-security --user-data-dir
"path/to/Google Chrome" --remote-debugging-port=9222 --proxy-server=http://localhost:8080 --disable-web-security --user-data-dir
I use gost as an intermediate proxy server, to handle authentication to my residential proxy provider:
gost -L=http://:8080 -F=http://username:password@my-proxy-provider.com:12345
gost -L=http://:8080 -F=http://username:password@my-proxy-provider.com:12345
Then in your code you can start Puppeteer like this:
import puppeteer from "puppeteer";
import axios from "axios";

const browserConfigResponse = await axios.get('http://127.0.0.1:9222/json/version');
const browser = await puppeteer.connect({
browserWSEndpoint: browserConfigResponse.data['webSocketDebuggerUrl']
});
import puppeteer from "puppeteer";
import axios from "axios";

const browserConfigResponse = await axios.get('http://127.0.0.1:9222/json/version');
const browser = await puppeteer.connect({
browserWSEndpoint: browserConfigResponse.data['webSocketDebuggerUrl']
});
GitHub
gost/README_en.md at master · ginuerzh/gost
GO Simple Tunnel - a simple tunnel written in golang - ginuerzh/gost

Did you find this page helpful?