I want to search


We've launched a new website!

You're currently accessing the archived version of the DataONE website. To see our new design and keep up to date with the latest DataONE news, visit our new website at https://dataone.org

Decide what data to preserve

Best Practice: 

The process of science generates a variety of products that are worthy of preservation. Researchers should consider all elements of the scientific process in deciding what to preserve:

  • Raw data
  • Tables and databases of raw or cleaned observation records and measurements
  • Intermediate products, such as partly summarized or coded data that are the input to the next step in an analysis
  • Documentation of the protocols used
  • Software or algorithms developed to prepare data (cleaning scripts) or perform analyses
  • Results of an analysis, which can themselves be starting points or ingredients in future analyses, e.g. distribution maps, population trends, mean measurements
  • Any data sets obtained from others that were used in data processing
  • Multimedia: documented procedures, or standalone data

When deciding on what data products to preserve, researchers should consider the costs of preserving data:

  • Raw data are usually worth preserving
  • Consider space requirements when deciding on whether to preserve data
  • If data can be easily or automatically re-created from raw data, consider not preserving. E.g. if data that have undergone quality control processes and were analyzed, consider preserving since reproduction might be costly
  • Algorithms and software source code cost very little to preserve
  • Results of analyses may be particularly valuable for future discovery and cost very little to preserve

Researchers should consider the following goals and benefits of preservation:

  • Enabling re-analysis of the same products to determine whether the same conclusions are reached
  • Enabling re-use of the products for new analysis and discovery
  • Enabling restoration of original products in the case that working datasets are lost
Description Rationale: 

To meet multiple goals for preservation, researchers should think broadly about the digital products that their project generates, preserve as many as possible, and plan the appropriate preservation methods for each.

Cindy Parr
Heather Henkel
Keven Comerford