Casper
Casper4y ago

Download PDF file from URL?

Does someone know of a simple npm library to download files from a URL in Javascript/TypeScript?
7 Replies
fascinating-indigo
fascinating-indigo4y ago
I actually want to know this as well, seems like it's giving me an error on how other formats are only supported application/pdf is not supported
Casper
CasperOP4y ago
I tried using axiom, http.get, clicking on the download button. Nothing works
Casper
CasperOP4y ago
Basic crawler | Crawlee
This is the most bare-bones example of using Crawlee, which demonstrates some of its building blocks such as the BasicCrawler. You probably don't need to go this deep though, and it would be better to start with one of the full-featured crawlers
fascinating-indigo
fascinating-indigo4y ago
router.addDefaultHandler(async ({ request }) => {
const file = fs.createWriteStream('filename.pdf');
const response = await fetch(request.url);
response.body.pipe(file);
});
router.addDefaultHandler(async ({ request }) => {
const file = fs.createWriteStream('filename.pdf');
const response = await fetch(request.url);
response.body.pipe(file);
});
Casper
CasperOP4y ago
Thanks I tried your suggestion but unfortunately it wont compile I fixed it with code
async function downloadFile(url: string, targetFile: string) {
return await new Promise((resolve, reject) => {
Https.get(url, (response: any) => {
const code = response.statusCode ?? 0;

if (code >= 400) {
return reject(new Error(response.statusMessage));
}

// handle redirects
if (code > 300 && code < 400 && !!response.headers.location) {
return downloadFile(response.headers.location, targetFile);
}

// save the file to disk
const fileWriter = Fs.createWriteStream(targetFile).on("finish", () => {
resolve({});
});

response.pipe(fileWriter);
}).on("error", (error: string) => {
reject(error);
});
});
}

await downloadFile(link, "file.pdf");
async function downloadFile(url: string, targetFile: string) {
return await new Promise((resolve, reject) => {
Https.get(url, (response: any) => {
const code = response.statusCode ?? 0;

if (code >= 400) {
return reject(new Error(response.statusMessage));
}

// handle redirects
if (code > 300 && code < 400 && !!response.headers.location) {
return downloadFile(response.headers.location, targetFile);
}

// save the file to disk
const fileWriter = Fs.createWriteStream(targetFile).on("finish", () => {
resolve({});
});

response.pipe(fileWriter);
}).on("error", (error: string) => {
reject(error);
});
});
}

await downloadFile(link, "file.pdf");
Alexey Udovydchenko
do not forget to add additionalMimeTypes in crawler options then you can handle files with cheerio crawler
Casper
CasperOP4y ago
Thanks

Did you find this page helpful?