exotic-emerald
exotic-emerald•2y ago

Hi I am using Smart Article Extractor

Hi I am using Smart Article Extractor actor for extracting info in form of json from an article URL, now upon running it on postman, the actor runs flawlessly on apify console but fails to provide any response on postman with 201, how can i get response on it, please help
14 Replies
HonzaS
HonzaS•2y ago
I guess you got defaultDatasetId in the response so you can make another request to get data from that dataset
frail-apricot
frail-apricot•2y ago
{ "articleUrls": [ { "url": "https://www.livemint.com/science/serum-institute-faces-lawsuit-over-covishield-side-effects-deaths-from-blood-clots-says-father-of-deceased-girl-11714617925438.html" } ], "crawlWholeSubdomain": false, "enqueueFromArticles": false, "extendOutputFunction": "($) => {\n const result = {};\n // Uncomment to add a title to the output\n // result.pageTitle = $('title').text().trim();\n\n return result;\n}", "isUrlArticleDefinition": { "minDashes": 4, "hasDate": true, "linkIncludes": [ "article", "storyid", "?p=", "id=", "/fpss/track", ".html", "/content/" ] }, "mustHaveDate": true, "onlyInsideArticles": true, "onlyNewArticles": false, "onlyNewArticlesPerDomain": false, "onlySubdomainArticles": false, "proxyConfiguration": { "useApifyProxy": true }, "saveHtml": false, "saveHtmlAsLink": false, "saveSnapshots": false, "scanSitemaps": false, "scrollToBottom": false, "useBrowser": false, "useGoogleBotHeaders": false } this is my body, and it should reflect on postman only , am i doing something wrong here? https://api.apify.com/v2/acts/lukaskrivka~article-extractor-smart/run-sync?token=*mytoken*, this is my POST request url @HonzaS
HonzaS
HonzaS•2y ago
and what is the response?
frail-apricot
frail-apricot•2y ago
nothing, blank on postman, but it works and fetches data as per my logs
HonzaS
HonzaS•2y ago
it should not be blank I think, let me try
frail-apricot
frail-apricot•2y ago
yes
MEE6
MEE6•2y ago
@Shubh just advanced to level 1! Thanks for your contributions! 🎉
frail-apricot
frail-apricot•2y ago
please do actor's id = hy5TYiCBwQ9o8uRKG
frail-apricot
frail-apricot•2y ago
No description
frail-apricot
frail-apricot•2y ago
that job that i ran
HonzaS
HonzaS•2y ago
can you see the log? I have this in log of the run 2024-05-03T13:12:09.418Z WARN No text found on article page: https://www.livemint.com/science/serum-institute-faces-lawsuit-over-covishield-side-effects-deaths-from-blood-clots-says-father-of-deceased-girl-11714617925438.html 2024-05-03T13:12:09.453Z WARN IS NOT VALID ARTICLE --- Reasons: [Article has no date], [Article has too few words: 1 (should be at least 150)] so it returns data as response but there are no data from that run I think oh, now I see you have results on the console do I have different input as I have no results? https://console.apify.com/view/runs/6p8nlrbV7oe6GtxwC I have changed the input and now it is returning results on the web and from request also
frail-apricot
frail-apricot•2y ago
what did you change could you send curl please minus the token ill try running it thanks in advance Honza you are a true saviour hi
HonzaS
HonzaS•2y ago
No description
HonzaS
HonzaS•2y ago
there is no curl export, but the url is https://api.apify.com/v2/acts/lukaskrivka~article-extractor-smart/run-sync-get-dataset-items?token=<token> and body is
{
"articleUrls": [
{
"url": "https://www.livemint.com/science/serum-institute-faces-lawsuit-over-covishield-side-effects-deaths-from-blood-clots-says-father-of-deceased-girl-11714617925438.html"
}
],
"crawlWholeSubdomain": false,
"enqueueFromArticles": false,
"extendOutputFunction": "($) => {\n const result = {};\n // Uncomment to add a title to the output\n // result.pageTitle = $('title').text().trim();\n\n return result;\n}",
"isUrlArticleDefinition": {
"minDashes": 4,
"hasDate": true,
"linkIncludes": [
"article",
"storyid",
"?p=",
"id=",
"/fpss/track",
".html",
"/content/"
]
},
"mustHaveDate": false,
"onlyInsideArticles": true,
"onlyNewArticles": false,
"onlyNewArticlesPerDomain": false,
"onlySubdomainArticles": false,
"proxyConfiguration": {
"useApifyProxy": true
},
"saveHtml": false,
"saveHtmlAsLink": false,
"saveSnapshots": false,
"scanSitemaps": false,
"scrollToBottom": false,
"useBrowser": false,
"minWords":1,
"useGoogleBotHeaders": false
}
{
"articleUrls": [
{
"url": "https://www.livemint.com/science/serum-institute-faces-lawsuit-over-covishield-side-effects-deaths-from-blood-clots-says-father-of-deceased-girl-11714617925438.html"
}
],
"crawlWholeSubdomain": false,
"enqueueFromArticles": false,
"extendOutputFunction": "($) => {\n const result = {};\n // Uncomment to add a title to the output\n // result.pageTitle = $('title').text().trim();\n\n return result;\n}",
"isUrlArticleDefinition": {
"minDashes": 4,
"hasDate": true,
"linkIncludes": [
"article",
"storyid",
"?p=",
"id=",
"/fpss/track",
".html",
"/content/"
]
},
"mustHaveDate": false,
"onlyInsideArticles": true,
"onlyNewArticles": false,
"onlyNewArticlesPerDomain": false,
"onlySubdomainArticles": false,
"proxyConfiguration": {
"useApifyProxy": true
},
"saveHtml": false,
"saveHtmlAsLink": false,
"saveSnapshots": false,
"scanSitemaps": false,
"scrollToBottom": false,
"useBrowser": false,
"minWords":1,
"useGoogleBotHeaders": false
}
and only header: Content-Type: application/json I have set "minWords":1 and "mustHaveDate": false

Did you find this page helpful?