I-Lab

Checks before your data is archived as a datapackage and/or published

As only a limited number of things can be checked automatically, the I-Lab Datamanager will perform a qualitative assessment of your datapackage once you’ve submitted to the Vualt and/or published it.

The idea behind this assessment is that

  • the datapackages are self-evident for other researchers
  • meet basic standards of quality
  • comply to sensitive data and privacy rules and regulations.

This assessment consists of a number of checks.

Folders

  • Is the structure logical
  • logical naming convention

Codebook

Does the dataset contain a codebook describing:

  • the setup of the research
  • the variables of the dataset
  • the units used
  • the instruments
  • sampling method and
  • sample size,
  • experimental set up,

Large dataset

  • Is there a document describing the dataset?

Raw data

  • If the dataset is based on raw data, which is not available in the set itself, is there a reference to the location of the raw dataset.
  • If the dataset contains data processed from raw data does it contain a description on how the former has been derived from the latter, e.g. by providing algorithms and/or transformation scripts.

Valid data

  • Is the data ‘valid’ in a formal sense; e.g. an excel sheet with calculations should not contain cells with warnings like ‘invalid value’.

When publishing a datapackage

  • Is there a valid License Type defined in the Yoda Metadata.
  • If an embargo date has been defined in the Yoda Metadata, does it represent a reasonable period; e.g. when the datapackage is to be stored for 10 years and the embargo-date expires a day before the retention date of the datapackage, that will not be considered ‘reasonable’.

When submitting and/or publishing a datapackage

  • Does the description filled out in the Yoda metadata form make sense; e.g. does the datapackage Description provide sufficient information, do the tags provide for good data discovery in the Catalogue, etc.?
  • In case of Open Data: Does the dataset contain data which might be considered to be private or sensitive and thus can be considered as a liability?

Should the Data Manager conclude your dataset does not (yet) meet the quality standards for submission to the Vault and/or Publication, he will contact you and provide concrete suggestions for improvement.