other-emerald•2y ago

How to cron with Playwright

Few days ago i was trying to implement a cron when i was scraping, but when the code make the second run it hits 0 url, and i figured out how to do this and in every re-run hit all the urls, I create at the same level of the main.ts one file called start.ts then i move the package.json to start that file like this :

"scripts": {
        "start": "npm run start:dev",
        "start:prod": "node dist/start.js",
        "start:dev": "tsx src/start.ts",
        "build": "tsc",
        "test": "echo \"Error: oops, the actor has no tests yet, sad!\" && exit 1",
        "postinstall": "npx crawlee install-playwright-browsers"
    },

"scripts": {
        "start": "npm run start:dev",
        "start:prod": "node dist/start.js",
        "start:dev": "tsx src/start.ts",
        "build": "tsc",
        "test": "echo \"Error: oops, the actor has no tests yet, sad!\" && exit 1",
        "postinstall": "npx crawlee install-playwright-browsers"
    },

and in the start.ts i put this code

import cron from 'node-cron';
import { exec } from 'child_process';

cron.schedule('*/2 * * * *', () => {
  exec('node dist/main.js', (error, stdout, stderr) => {
    if (error) {
      console.error(`Error al ejecutar el script: ${error}`);
      return;
    }
    if (stderr) {
      console.error(`Error en el script: ${stderr}`);
      return;
    }
    console.log(`Resultado del script: ${stdout}`);
  });
});

import cron from 'node-cron';
import { exec } from 'child_process';

cron.schedule('*/2 * * * *', () => {
  exec('node dist/main.js', (error, stdout, stderr) => {
    if (error) {
      console.error(`Error al ejecutar el script: ${error}`);
      return;
    }
    if (stderr) {
      console.error(`Error en el script: ${stderr}`);
      return;
    }
    console.log(`Resultado del script: ${stdout}`);
  });
});

with this code we can manage the cron and all the execution will end with all the url, and avoiding the error that makes the crawler reach 0 url

1 Reply

Alexey Udovydchenko•2y ago

expected usesage is apify run -p so SDK client will delete existing dataset and request queue. otherwise its restart for lareaady crawled requests and unless you manage it by some logic requests considered as laready resolved and handler function skipped.

How to cron with Playwright

Did you find this page helpful?