Apify Discord Mirror

Updated 5 months ago

Post Request with json data to get cookies and use these cookies to to scrap further Urls

At a glance

The community member has a website that responds differently based on the IP address location. They want to scrape URLs once they have the necessary cookies, which are obtained by calling an endpoint. The community members discuss two approaches: 1) making a single request to the endpoint, parsing the set-cookie header, and setting the cookie header for subsequent requests; 2) calling the endpoint in the createSessionFunction of the session pool and setting the cookies there. The community members provide sample code for the first approach, but the original poster is concerned about making double the requests. They are currently trying to implement the second approach using createSessionFunction, but are having trouble getting the response to set the cookies.

Useful resources
Hello all, I have a special situation, website response depends on the location of the IP address. But there is a possibility to change the address. The way it works is by calling the endpoint which returns the cookies. I want to scrap the urls once I have the cookies. How can I do that with crawlee ? and how will those cookies be managed with sessions? It's a bit complicated to explain but I hope you guys get the idea of what I want. Thank you for reading that long post.
v
c
P
6 comments
Hey , the simplest solution would be to do one request on this endpoint, parse the set-cookie header and set the cookie header while enqueueing other requests.

An alternative option would be to call the endpoint in the createSessionFunction of the session pool and set cookies there.
Thankyou I will try out. But I also need send json data to get the cookies. Would you be nice enough to give me a sample code if you have by any chance ? Thanks alot for your help.
Something like this could work:
JSON post requests are the same as GET requests, you just have to specify the payload and method:
Plain Text
const request = {
  url: 'https://example.com',
  method: 'POST',
  payload: JSON.stringify({ foo: 'bar' }),


then in the handler of this request, you can access the response set-cookie headers:
Plain Text
router.addHandler('cookies', async ({ crawler, response }) => {
    const { headers } = response;

    // parse necessary cookies from headers['set-cookie']
    // ...

    // enqueue new requests with parsed cookies
    const request = {
      url: 'https://example.com',
      headers: {
        cookie: parsedCookie,
      },
    };
    await crawler.requestQueue.addRequest(request);
});
Thankyou very much but by doing that I would double the requests. Cookies wont change for each request so one time request for cookies is fine. I am currently trying to do via createSessionFunction. But docs are not helping that much. Do you have some guidance for that ?
You may look at my code in the chat. Thankyou very much.
This is how I am trying to do. createSessionFunction: async(sessionPool,options) => {

const new_session = new Session({
...options,
sessionPool,
});

new_session.setCookiesFromResponse({}); <- How to get the response here to set the cookies ?

return new_session``
Add a reply
Sign up and join the conversation on Discord