rival-black
rival-black12mo ago

Save a webpage to a PDF file using Actor.setValue()

Hi, I'm new to PuppeteerCrawler. I'm trying to create a simple script to save a webpage as a PDF. For this purpose, I created a new Actor from the Crawlee - Puppeteer - TypeScript template in Apify. This is my main.ts code:
import { Actor } from 'apify';
import { PuppeteerCrawler, Request } from 'crawlee';

await Actor.init();

interface Input {
urls: Request[];
}

const { urls = ['https://www.google.com/'] } = await Actor.getInput<Input>() ?? {};

const crawler = new PuppeteerCrawler({
async requestHandler({ page }) {
const pdfFileName = 'testFile';
const pdfBuffer = await page.pdf({ format: 'A4', printBackground: true });

console.log('pdfFileName: ', pdfFileName);
console.log('pdfBuffer: ', pdfBuffer);

await Actor.setValue(pdfFileName, pdfBuffer, { contentType: 'application/pdf' });
},
});

await crawler.addRequests(urls);
await crawler.run();

await Actor.exit();
import { Actor } from 'apify';
import { PuppeteerCrawler, Request } from 'crawlee';

await Actor.init();

interface Input {
urls: Request[];
}

const { urls = ['https://www.google.com/'] } = await Actor.getInput<Input>() ?? {};

const crawler = new PuppeteerCrawler({
async requestHandler({ page }) {
const pdfFileName = 'testFile';
const pdfBuffer = await page.pdf({ format: 'A4', printBackground: true });

console.log('pdfFileName: ', pdfFileName);
console.log('pdfBuffer: ', pdfBuffer);

await Actor.setValue(pdfFileName, pdfBuffer, { contentType: 'application/pdf' });
},
});

await crawler.addRequests(urls);
await crawler.run();

await Actor.exit();
It seems that Actor.setValue doesn't want to consume the sent PDF buffer. What am I doing wrong? Thanks
No description
3 Replies
Hall
Hall12mo ago
View post on community site
This post has been pushed to the community knowledgebase. Any replies in this thread will be synced to the community site.
Apify Community
lemurio
lemurio12mo ago
hey, the pdf() method returns Uint8Array, you will need to convert it to a Buffer class, try this:
const pdfFileName = 'testFile';
const pdf = await page.pdf({ format: 'A4', printBackground: true });
const pdfBuffer = Buffer.from(pdf);

console.log('pdfFileName: ', pdfFileName);
console.log('pdfBuffer: ', pdfBuffer);

await Actor.setValue(pdfFileName, pdfBuffer, { contentType: 'application/pdf' });
const pdfFileName = 'testFile';
const pdf = await page.pdf({ format: 'A4', printBackground: true });
const pdfBuffer = Buffer.from(pdf);

console.log('pdfFileName: ', pdfFileName);
console.log('pdfBuffer: ', pdfBuffer);

await Actor.setValue(pdfFileName, pdfBuffer, { contentType: 'application/pdf' });
Page.pdf() method | Puppeteer
Generates a PDF of the page with the print CSS media type.
rival-black
rival-blackOP12mo ago
Great, that works. Thanks

Did you find this page helpful?