Checks before your data is archived as a datapackage and/or published
As only a limited number of things can be checked automatically, the I-Lab Datamanager will perform a qualitative assessment of your datapackage once you’ve submitted to the Vualt and/or published it.
The idea behind this assessment is that
- the datapackages are self-evident for other researchers
- meet basic standards of quality
- comply to sensitive data and privacy rules and regulations.
This assessment consists of a number of checks.
Folders
- Is the structure logical
- logical naming convention
Codebook
Does the dataset contain a codebook describing:
- the setup of the research
- the variables of the dataset
- the units used
- the instruments
- sampling method and
- sample size,
- experimental set up,
Large dataset
- Is there a document describing the dataset?
Raw data
- If the dataset is based on raw data, which is not available in the set itself, is there a reference to the location of the raw dataset.
- If the dataset contains data processed from raw data does it contain a description on how the former has been derived from the latter, e.g. by providing algorithms and/or transformation scripts.
Valid data
- Is the data ‘valid’ in a formal sense; e.g. an excel sheet with calculations should not contain cells with warnings like ‘invalid value’.
When publishing a datapackage
- Is there a valid License Type defined in the Yoda Metadata.
- If an embargo date has been defined in the Yoda Metadata, does it represent a reasonable period; e.g. when the datapackage is to be stored for 10 years and the embargo-date expires a day before the retention date of the datapackage, that will not be considered ‘reasonable’.
When submitting and/or publishing a datapackage
- Does the description filled out in the Yoda metadata form make sense; e.g. does the datapackage Description provide sufficient information, do the tags provide for good data discovery in the Catalogue, etc.?
- In case of Open Data: Does the dataset contain data which might be considered to be private or sensitive and thus can be considered as a liability?
Should the Data Manager conclude your dataset does not (yet) meet the quality standards for submission to the Vault and/or Publication, he will contact you and provide concrete suggestions for improvement.