The community member is looking for someone familiar with web crawling to help with a paid project. The requirements include scraping millions of Chinese magazine articles with over 500 words and at least one image, converting any formulas or equations to LaTeX format, and delivering the content as a JSON file with the images. One community member suggests the topic would be better suited for the #💻hire-freelancers channel, as this channel is for support with Python SDKs. Another community member asks if the community member has a target website for scraping.
Hello, everyone! Who is familiar with web crawling? I have paid project but I can't do it. Who is confident? Budget: 500$
Rerequirement: Millions of Chinese magazine articles, publication level magazines(or e-zine articles published by magazines), not news articles on websites, each with more than 500 words and must be accompanied by more than one picture(only jpg format, no less than 128 pixels per side). The final delivery is a json file and images. The text only needs the body content, not any annotative content or author information, date of editing, date of publication, etc. If the article has formulas or equations, etc., it needs to be converted to Latex format. Only the body content and the accompanying images should be captured, and the placeholder <image> should be typed on the corresponding position where the accompanying image appears in the article. The images in each article are named 0000,0001,0002... in the order they appear.