wispy-oliveW
Apify & Crawlee2y ago
1 reply
wispy-olive

How to cron with Playwright

Few days ago i was trying to implement a cron when i was scraping, but when the code make the second run it hits 0 url, and i figured out how to do this and in every re-run hit all the urls,

I create at the same level of the main.ts one file called start.ts

then i move the package.json to start that file like this :

"scripts": {
        "start": "npm run start:dev",
        "start:prod": "node dist/start.js",
        "start:dev": "tsx src/start.ts",
        "build": "tsc",
        "test": "echo \"Error: oops, the actor has no tests yet, sad!\" && exit 1",
        "postinstall": "npx crawlee install-playwright-browsers"
    },

and in the start.ts i put this code
import cron from 'node-cron';
import { exec } from 'child_process';

cron.schedule('*/2 * * * *', () => {
  exec('node dist/main.js', (error, stdout, stderr) => {
    if (error) {
      console.error(`Error al ejecutar el script: ${error}`);
      return;
    }
    if (stderr) {
      console.error(`Error en el script: ${stderr}`);
      return;
    }
    console.log(`Resultado del script: ${stdout}`);
  });
});

with this code we can manage the cron and all the execution will end with all the url, and avoiding the error that makes the crawler reach 0 url
Was this page helpful?