robust-apricot
robust-apricot3y ago

Download PDF file from URL?

Does someone know of a simple npm library to download files from a URL in Javascript/TypeScript?
7 Replies
ambitious-aqua
ambitious-aqua3y ago
I actually want to know this as well, seems like it's giving me an error on how other formats are only supported application/pdf is not supported
robust-apricot
robust-apricotOP3y ago
I tried using axiom, http.get, clicking on the download button. Nothing works
robust-apricot
robust-apricotOP3y ago
Basic crawler | Crawlee
This is the most bare-bones example of using Crawlee, which demonstrates some of its building blocks such as the BasicCrawler. You probably don't need to go this deep though, and it would be better to start with one of the full-featured crawlers
ambitious-aqua
ambitious-aqua3y ago
router.addDefaultHandler(async ({ request }) => {
const file = fs.createWriteStream('filename.pdf');
const response = await fetch(request.url);
response.body.pipe(file);
});
router.addDefaultHandler(async ({ request }) => {
const file = fs.createWriteStream('filename.pdf');
const response = await fetch(request.url);
response.body.pipe(file);
});
robust-apricot
robust-apricotOP3y ago
Thanks I tried your suggestion but unfortunately it wont compile I fixed it with code
async function downloadFile(url: string, targetFile: string) {
return await new Promise((resolve, reject) => {
Https.get(url, (response: any) => {
const code = response.statusCode ?? 0;

if (code >= 400) {
return reject(new Error(response.statusMessage));
}

// handle redirects
if (code > 300 && code < 400 && !!response.headers.location) {
return downloadFile(response.headers.location, targetFile);
}

// save the file to disk
const fileWriter = Fs.createWriteStream(targetFile).on("finish", () => {
resolve({});
});

response.pipe(fileWriter);
}).on("error", (error: string) => {
reject(error);
});
});
}

await downloadFile(link, "file.pdf");
async function downloadFile(url: string, targetFile: string) {
return await new Promise((resolve, reject) => {
Https.get(url, (response: any) => {
const code = response.statusCode ?? 0;

if (code >= 400) {
return reject(new Error(response.statusMessage));
}

// handle redirects
if (code > 300 && code < 400 && !!response.headers.location) {
return downloadFile(response.headers.location, targetFile);
}

// save the file to disk
const fileWriter = Fs.createWriteStream(targetFile).on("finish", () => {
resolve({});
});

response.pipe(fileWriter);
}).on("error", (error: string) => {
reject(error);
});
});
}

await downloadFile(link, "file.pdf");
Alexey Udovydchenko
do not forget to add additionalMimeTypes in crawler options then you can handle files with cheerio crawler
robust-apricot
robust-apricotOP3y ago
Thanks

Did you find this page helpful?