wet-aquaW
Apify & Crawleeโ€ข4y agoโ€ข
1 reply
wet-aqua

Exctract url from html code

Hello all,

I would like to extract url in html code with Apify scrapper.

Here is the html code and the url to extract :

<a class="app-aware-link profile-rail-card__profile-link t-16 t-black t-bold tap-target" href="https://www.linkedin.com/in/benjaminejzenberg?miniProfileUrn=urn%3Ali%3Afs_miniProfile%3AACoAAAj58zYBTN8loEzvrFJhh-16iFZ8gnfPSGU" data-test-app-aware-link="">
<div class="single-line-truncate t-16 t-black t-bold mt2">
Voir le profil complet
</div>
</a>


Here is my input :

async function pageFunction(context) {

const $ = context.jQuery;
const pageTitle = $('title').first().text();
const h1 = $('h1').first().text();
const first_h2 = $('h2').first().text();
const random_text_from_the_page = $('p').first().text();
const author_profile_link = $('div.scaffold-layout.scaffold-layout--breakpoint-xl.scaffold-layout--sidebar-main-aside.scaffold-layout--reflow > div > div > div > div > div > div > div.pt3.ph3.pb4.break-words > a:nth-child(5) a[href]').text();


context.log.info(
URL: ${context.request.url}, TITLE: ${pageTitle}
);

await context.enqueueRequest({ url: 'http://www.example.com' });

return {
url: context.request.url,
pageTitle,
h1,
first_h2,
random_text_from_the_page,
author_profile_link
};
}

Thanks for your help ๐Ÿ™‚
Was this page helpful?