Apify Discord Mirror

Updated 5 months ago

'undefined' in DataSet it is keeping me from exporting data

At a glance

The community member is facing an issue where they are encountering an 'undefined' value when running Dataset.getData(), which is preventing them from being able to export the data to a CSV file. The error message indicates that the CSV library is expecting an array or object, but is receiving an 'undefined' value instead.

The community members have provided some additional context, including an example of the output from Dataset.getData(), which shows that the 'items' array contains both valid objects and an 'undefined' value. The root cause of the issue seems to be that the dataset contains a corrupted item, where the value is 'undefined' instead of a valid JSON object.

One community member suggested trying to use the clean option when getting the dataset items, which may help to remove the corrupted item. Another community member noted that this is a bug in the Crawlee library and will be fixed.

There is no explicitly marked answer, but the community members are working together to try to understand and resolve the issue.

Useful resources
when i run Dataset.getData() i keep finding one of the value as 'undefined'
this is preventing me from being able to Dataset.exportToCSV() because it complains about this undefined value.
is there a ways to clean the Dataset so that undefined value does not appear?
or to know why this is happening?
1
A
G
A
10 comments
is it in your own data items? can you share full error message?
Plain Text
/home/user/compras_publicas_scrapper/node_modules/csv-stringify/dist/cjs/sync.cjs:322
        return Error(`Invalid Record: expect an array or an object, got ${JSON.stringify(chunk)}`);
               ^

Error: Invalid Record: expect an array or an object, got undefined
    at Object.__transform (/home/user/compras_publicas_scrapper/node_modules/csv-stringify/dist/cjs/sync.cjs:322:16)
    at stringify (/home/user/compras_publicas_scrapper/node_modules/csv-stringify/dist/cjs/sync.cjs:553:21)
    at Dataset.exportTo (/home/telix/compras_publicas_scrapper/node_modules/@crawlee/core/storages/dataset.js:253:48)
    at async Dataset.exportToCSV (/home/user/compras_publicas_scrapper/node_modules/@crawlee/core/storages/dataset.js:286:9)
    at async Dataset.exportToCSV (/home/user/compras_publicas_scrapper/node_modules/@crawlee/core/storages/dataset.js:306:9)
    at async file:///home/user/compras_publicas_scrapper/src/main.js:29:1

Node.js v19.1.0
I poked around a little and this happend when one of the values in Datase is undefined
for example the out up of await Dataset.getData() would be something like this
Plain Text
{                                                                                            
  count: 68,   
  desc: false,                                                                               
  items: [  
{                                                
                                                                                    
    {                                                
      'DescripciĆ³n': [Object],                       
      Fechas: [Object],                              
      Productos: [Object],                           
      'ParĆ”metros de CalificaciĆ³n': [Object],        
      Archivos: [Array],                             
      url: 'https://www.compraspublicas.gob.ec/ProcesoContratacion/compras/PC/informacionProcesoContratacion2.cpe?idSoliCompra=GILqItCW52eDlBQxzLpaLlDdArmhJscPjPHoxQjiTfA,'
    },                                               
    undefined,                                       
    {                                                
      'DescripciĆ³n': [Object],                       
      Fechas: [Object],                              
      Productos: [Object],                           
      'ParĆ”metros de CalificaciĆ³n': [Object],        
      Archivos: [Array],                             
      url: 'https://www.compraspublicas.gob.ec/ProcesoContratacion/compras/PC/informacionProcesoContratacion2.cpe?idSoliCompra=vRmOB40vA8u58yblEZpsmW3AMHl8RaM3CHv6oQPAf5Y,'
    },  
],
  limit: 999999999999,
  offset: 0,
  total: 68
}
just advanced to level 1! Thanks for your contributions! šŸŽ‰
You can try getting DS items with https://crawlee.dev/api/core/interface/DatasetDataOptions#clean but root cause of error is that you have corrupted item, undefined is not expected as item value, it supposed to be json object
This is a bug in Crawlee so wil lbe fixed
Would you please send me privately the dataset ID? I would want to see how the undefined got there
Add a reply
Sign up and join the conversation on Discord