Apify Discord Mirror

Updated 5 months ago

Pagination works locally in Crawlee but the same actor on Apify the pagination does not work correct

At a glance

The community member has implemented pagination to scrape data from multiple pages, but the actor running on Apify.com does not start from the intended page 2 and does not finish at page 5 as expected. The community members suggest various troubleshooting steps, such as verifying the latest commit, checking the actor input, and ensuring the state data does not interfere with the scraping. They also discuss potential issues with parsing the page number input and using global variables. Finally, the community member states that the issue was a bug in the implementation of the query params, and they have now solved the problem.

Useful resources
I have implemented pagination that can start from eg. page 2 and end at including page 5 to scrape all the data from each page. It works correctly on my local machine and I have pushed the newest working code (newest commit id) to GitHub and then to Apify via Webhook, however, when I run the actor on Apify.com it starts at the first page instead of page 2 and does not finish at including page 5. Any suggestions on what might be wrong?
1
y
C
A
17 comments
You might try to use:
Plain Text
npx apify-cli login
npx apify-cli push

from your local repo and then test that build on apify platform, at least you can make sure your latest version is built.
Or it might be an actor input issue, if you made it configurable.
I have verified the commit id in the latest build that I run the scraper with is the latest commit it Github master branch so I would suspect this to be the issue and I can also see the latest change (test logging) was in the latest run as well. But I guess it never hurts to try it out. Input also seems to work, however locally it is a string and on apify it is a number, though this does not explain why the pagination still just cuts off at page 4 ?
Hmm, I can't generate more ideas without seeing at least something) I had an issue once that selector was missing when I run crawler on apify platform but it worked perfectly locally. Can't recall what was causing that.
Copy input from Apify cloud run to localhost kvstore, see if issue related to input parsing
Also make sure to run locally as apify run -p otherwise state data might interfere with scraping and cause side effects
yeah that could be an issue, however I change the url to navigate to the page I want, but then after I have started on the intended page I will find the next page button link
thanks I will look into that
in my actor I am specifying the input param startFromPageNumber as a string and parses the number to an int locally, would this fail on apify.com?
that would explain some issues then
or could using global variable cause problems?
I can see that the start url which use "url.searchParams.append("page", startFromPageNumber.toString()" does not add the page number to query
I suspect there is an issue with parsing this: "const startFromPageNumber: number = input.startFromPageNumber;"
it also seems even though I push actor using "npx apify-cli push" that the actor does not get updated because I can not see the console.log messages I have made locally
I have a trigger to execute new build when I add code to github main branch
I have solved the issue now. It was a bug in my implementation of query params. It would be really nice if such functionality would be added to crawlee and apify in the future to reduce the risk of bugs πŸ™‚
Add a reply
Sign up and join the conversation on Discord