As part of the data life cycle, research data will be contributed to a repository to support preservation and discovery. A research project may generate many different iterations of the same dataset - for example, the raw data from the instruments, as well as datasets which already include computational transformations of the data.
In order to focus resources and attention on these core datasets, the project team should define these core data assets as early in the process as possible, preferably at the conceptual stage and in the data management plan. It may be helpful to speak with your local data archivist or librarian in order to determine which datasets (or iterations of datasets) should be considered core, and which datasets should be discarded. These core datasets will be the basis for publications, and require thorough documentation and description.
- Only the datasets which have significant long-term value should be contributed to a repository, requiring decisions about which datasets need to be kept.
- If data cannot be recreated or it is costly to reproduce, it should be saved.
- Four different categories of potential data to save are observational, experimental, simulation, and derived (or compiled).
- Your funder or institution may have requirements and policies governing contribution to repositories.
Given the amount of data produced by scientific research, keeping everything is neither practical nor economically feasible.
Decisions about what data to keep will help to focus project resources on those data that should be stored for long-term preservation.
Whyte, Angus. Appraise and Select Research Data for Curation. Digital Curation Centre. http://www.dcc.ac.uk/resources/how-guides/appraise-select-research-data