other-emerald
other-emerald11mo ago

Simple POST-example

Flaw in tutorial on basic POST-functionality: https://crawlee.dev/python/docs/examples/fill-and-submit-web-form It makes an actual POST-request, but the data is not reaching the server, tried on various endpoints. Two questions: 1) What is broken here and how to fix it? 2) My biggest concern using Crawlee is that I have no clue how to troubleshoot these kind of bugs. Where can one check what goes wrong, for example how to check under the hood if CURL (?) or whatever library makes the actual request is populating the payload correctly, etc. It has many benefits this framework, but due to all the abstractions, its very hard to troubleshoot. Probably my mistake and inexperience with the framework, but any guidance on how to troubleshoot would be great as simple things not working without anyway to troubleshoot makes using this Crawlee-framework quite cumbersome.
import asyncio
import json

from crawlee import Request
from crawlee.http_crawler import HttpCrawler, HttpCrawlingContext


async def main() -> None:
crawler = HttpCrawler()

# Define the default request handler, which will be called for every request.
@crawler.router.default_handler
async def request_handler(context: HttpCrawlingContext) -> None:
context.log.info(f'Processing {context.request.url} ...')
response = context.http_response.read().decode('utf-8')
context.log.info(f'Response: {response}') # To see the response in the logs.

# Prepare a POST request to the form endpoint.
request = Request.from_url(
url='https://httpbin.org/post',
method='POST',
payload=json.dumps(
{
'custname': 'John Doe',
}
).encode(),
)

# Run the crawler with the initial list of requests.
await crawler.run([request])


if __name__ == '__main__':
asyncio.run(main())
import asyncio
import json

from crawlee import Request
from crawlee.http_crawler import HttpCrawler, HttpCrawlingContext


async def main() -> None:
crawler = HttpCrawler()

# Define the default request handler, which will be called for every request.
@crawler.router.default_handler
async def request_handler(context: HttpCrawlingContext) -> None:
context.log.info(f'Processing {context.request.url} ...')
response = context.http_response.read().decode('utf-8')
context.log.info(f'Response: {response}') # To see the response in the logs.

# Prepare a POST request to the form endpoint.
request = Request.from_url(
url='https://httpbin.org/post',
method='POST',
payload=json.dumps(
{
'custname': 'John Doe',
}
).encode(),
)

# Run the crawler with the initial list of requests.
await crawler.run([request])


if __name__ == '__main__':
asyncio.run(main())
Fill and submit web form | Crawlee for Python · Fast, reliable craw...
Crawlee helps you build and maintain your Python crawlers. It's open source and modern, with type hints for Python to help you catch bugs early.
5 Replies
Hall
Hall11mo ago
View post on community site
This post has been pushed to the community knowledgebase. Any replies in this thread will be synced to the community site.
Apify Community
Mantisus
Mantisus11mo ago
Hi @crawleexl , you may notice that in the tutorial the payload is not transferred to payload but to data Now payload is useless and is not passed to the http client There are problems with POST requests, this should be fixed in the next release with this fix - https://github.com/apify/crawlee-python/pull/542.
optimistic-gold
optimistic-gold11mo ago
Hi @Mantisus I'm having the same issue trying to get data passing the payload the same way. When will the Crawlee team release version 0.4.0? That version seems to be working for my tests. I'm trying to resolve this payload issue in my project.
Mantisus
Mantisus11mo ago
I'm not an Apify employee and I don't know when the next release will be
Oleg V.
Oleg V.10mo ago
0.4.3 version was released recently. So the issue should be fixed.

Did you find this page helpful?