foreign-sapphire
foreign-sapphire•17mo ago

Hi I am using Smart Article Extractor

Hi I am using Smart Article Extractor actor for extracting info in form of json from an article URL, now upon running it on postman, the actor runs flawlessly on apify console but fails to provide any response on postman with 201, how can i get response on it, please help
14 Replies
HonzaS
HonzaS•17mo ago
I guess you got defaultDatasetId in the response so you can make another request to get data from that dataset
metropolitan-bronze
metropolitan-bronze•17mo ago
{ "articleUrls": [ { "url": "https://www.livemint.com/science/serum-institute-faces-lawsuit-over-covishield-side-effects-deaths-from-blood-clots-says-father-of-deceased-girl-11714617925438.html" } ], "crawlWholeSubdomain": false, "enqueueFromArticles": false, "extendOutputFunction": "($) => {\n const result = {};\n // Uncomment to add a title to the output\n // result.pageTitle = $('title').text().trim();\n\n return result;\n}", "isUrlArticleDefinition": { "minDashes": 4, "hasDate": true, "linkIncludes": [ "article", "storyid", "?p=", "id=", "/fpss/track", ".html", "/content/" ] }, "mustHaveDate": true, "onlyInsideArticles": true, "onlyNewArticles": false, "onlyNewArticlesPerDomain": false, "onlySubdomainArticles": false, "proxyConfiguration": { "useApifyProxy": true }, "saveHtml": false, "saveHtmlAsLink": false, "saveSnapshots": false, "scanSitemaps": false, "scrollToBottom": false, "useBrowser": false, "useGoogleBotHeaders": false } this is my body, and it should reflect on postman only , am i doing something wrong here? https://api.apify.com/v2/acts/lukaskrivka~article-extractor-smart/run-sync?token=*mytoken*, this is my POST request url @HonzaS
HonzaS
HonzaS•17mo ago
and what is the response?
metropolitan-bronze
metropolitan-bronze•17mo ago
nothing, blank on postman, but it works and fetches data as per my logs
HonzaS
HonzaS•17mo ago
it should not be blank I think, let me try
metropolitan-bronze
metropolitan-bronze•17mo ago
yes
MEE6
MEE6•17mo ago
@Shubh just advanced to level 1! Thanks for your contributions! 🎉
metropolitan-bronze
metropolitan-bronze•17mo ago
please do actor's id = hy5TYiCBwQ9o8uRKG
metropolitan-bronze
metropolitan-bronze•17mo ago
No description
metropolitan-bronze
metropolitan-bronze•17mo ago
that job that i ran
HonzaS
HonzaS•17mo ago
can you see the log? I have this in log of the run 2024-05-03T13:12:09.418Z WARN No text found on article page: https://www.livemint.com/science/serum-institute-faces-lawsuit-over-covishield-side-effects-deaths-from-blood-clots-says-father-of-deceased-girl-11714617925438.html 2024-05-03T13:12:09.453Z WARN IS NOT VALID ARTICLE --- Reasons: [Article has no date], [Article has too few words: 1 (should be at least 150)] so it returns data as response but there are no data from that run I think oh, now I see you have results on the console do I have different input as I have no results? https://console.apify.com/view/runs/6p8nlrbV7oe6GtxwC I have changed the input and now it is returning results on the web and from request also
metropolitan-bronze
metropolitan-bronze•17mo ago
what did you change could you send curl please minus the token ill try running it thanks in advance Honza you are a true saviour hi
HonzaS
HonzaS•17mo ago
No description
HonzaS
HonzaS•17mo ago
there is no curl export, but the url is https://api.apify.com/v2/acts/lukaskrivka~article-extractor-smart/run-sync-get-dataset-items?token=<token> and body is
{
"articleUrls": [
{
"url": "https://www.livemint.com/science/serum-institute-faces-lawsuit-over-covishield-side-effects-deaths-from-blood-clots-says-father-of-deceased-girl-11714617925438.html"
}
],
"crawlWholeSubdomain": false,
"enqueueFromArticles": false,
"extendOutputFunction": "($) => {\n const result = {};\n // Uncomment to add a title to the output\n // result.pageTitle = $('title').text().trim();\n\n return result;\n}",
"isUrlArticleDefinition": {
"minDashes": 4,
"hasDate": true,
"linkIncludes": [
"article",
"storyid",
"?p=",
"id=",
"/fpss/track",
".html",
"/content/"
]
},
"mustHaveDate": false,
"onlyInsideArticles": true,
"onlyNewArticles": false,
"onlyNewArticlesPerDomain": false,
"onlySubdomainArticles": false,
"proxyConfiguration": {
"useApifyProxy": true
},
"saveHtml": false,
"saveHtmlAsLink": false,
"saveSnapshots": false,
"scanSitemaps": false,
"scrollToBottom": false,
"useBrowser": false,
"minWords":1,
"useGoogleBotHeaders": false
}
{
"articleUrls": [
{
"url": "https://www.livemint.com/science/serum-institute-faces-lawsuit-over-covishield-side-effects-deaths-from-blood-clots-says-father-of-deceased-girl-11714617925438.html"
}
],
"crawlWholeSubdomain": false,
"enqueueFromArticles": false,
"extendOutputFunction": "($) => {\n const result = {};\n // Uncomment to add a title to the output\n // result.pageTitle = $('title').text().trim();\n\n return result;\n}",
"isUrlArticleDefinition": {
"minDashes": 4,
"hasDate": true,
"linkIncludes": [
"article",
"storyid",
"?p=",
"id=",
"/fpss/track",
".html",
"/content/"
]
},
"mustHaveDate": false,
"onlyInsideArticles": true,
"onlyNewArticles": false,
"onlyNewArticlesPerDomain": false,
"onlySubdomainArticles": false,
"proxyConfiguration": {
"useApifyProxy": true
},
"saveHtml": false,
"saveHtmlAsLink": false,
"saveSnapshots": false,
"scanSitemaps": false,
"scrollToBottom": false,
"useBrowser": false,
"minWords":1,
"useGoogleBotHeaders": false
}
and only header: Content-Type: application/json I have set "minWords":1 and "mustHaveDate": false

Did you find this page helpful?