multiple-amethyst
multiple-amethyst•2y ago

Batch PDF Text extraction

Hello, I'm new to apify and tested your Website Content Crawler which worked great. I downloaded several PDFs in that process which are now stored in a database file on apify. I can manually extract the text using the PDF Text Extractor for each pdf with the key store link. However for multiple PDFs that is not efficient. If I provide a database link or key value link of all the PDFs the pdf extraction states invalid file format. Is there a way to batch process all these PDFs? Thank you very much 🙂
2 Replies
Saurav Jain
Saurav Jain•2y ago
We will get back to you soon!
Oleg V.
Oleg V.•2y ago
Input can ba an array of URLs. This way, you can process multiple URLs simultaneously: https://console.apify.com/actors/QbKEOrw6PkLcy4Xms/information/latest/readme#input However, there's no direct way to retrieve all values at once. You can try to use Apify API: https://docs.apify.com/api/v2/#/reference/key-value-stores or you can access the key-value store via your code : https://docs.apify.com/sdk/js/reference/class/Actor#openKeyValueStore https://docs.apify.com/sdk/js/docs/next/guides/result-storage#key-value-store Simply loop over the keys and then utilize the result as an array of URLs.

Did you find this page helpful?