I want to search


We've launched a new website!

You're currently accessing the archived version of the DataONE website. To see our new design and keep up to date with the latest DataONE news, visit our new website at https://dataone.org

Jean: Agricultural scientist at a field station


Photo credit: https://www.flickr.com/photos/cimmyt/9538063625
Picture is of Dr. Barnabas Kiula of the International Maize and Wheat Improvement Center.
The person represented here is not affiliated with DataONE and use of their image does not reflect endorsement of DataONE services.

Name, age, and education: 

Jean is an agricultural scientist working at the Cornell University agricultural field station in Geneva NY. He received a PhD in horticulture from Virginia Polytechnic Institute in 1987.

Life or career goals, fears, hopes, and attitudes: 

Jean is a tenured associate professor in the Department of Horticulture. Jean’s current project uses the tomato as a model system to study sympodial growth (a growth pattern in which the stem is a succession of growths rather than one). He is driven by the agricultural research station’s program of research and the need to publish and to obtain grant funding to support his research.

A day in the life: 

Jean’s project involves two types of data: phenotypic and genomic. The phenotypic data are collected via pencil and paper after seeds are germinated. The genomic data come off of a sequencing machine and are assembled by a computer. Jean has two types of sequence data: whole genome and transcriptome. His work produces extremely high amounts of molecular data that require significant technical support. Jean uses pedigree numbers to connect genotype, phenotype and generation and all data are stored in a pedigree book. He uses MEINS guidelines for metadata for the genomic data (Yilmaz et al., 2010, doi:10.1038/npre.2010.5252.2) as there are no metadata standards specifically for his discipline, probably because researchers are still trying to figure out how to handle and analyze the data. He knows plant ontologies exists, but doesn’t use them because they do not serve his needs—they are too general.

Reasons for using DataONE to share and to reuse data
Needs and expectations of DataONE tools: 

Jean is wary of going exclusively digital with his phenotype data because of the horror stories he’s heard from other colleagues who have lost lots of work. However, he does transcribe data from paper to an Excel sheet. He keeps the paper copy and sometimes refers back to it to jog his memory. He uses basic visualizations within Excel to verify the accuracy of data transfer and to correct (or verify) any outliers; if these functions were easier to perform using DataONE tools, he might be convinced that digitizing his data for deposition at a DataONE member node is worthwhile.

He does have concerns about the long-term preservation of his data, as there are currently no formal process in place for long-term data management. He might be interested in depositing data at a member node for preservation (e.g., migration of data formats), though only if doing so were as easy as (or at least, not much harder than) maintaining local backups.

Intellectual and physical skills that can be applied: 

Jean is quite focused on his own research and has not historically involved many colleagues in collaborative work outside of his particular area of specialization. As such, he does not see the rationale for common data management protocols and believes that his data are only likely to be of interest to a very select number of researchers, most of whom he knows personally. That said, he is interested in being able to do perform longitudinal and synthetic analyses of his own work, something which is currently impossible due to the shifting standards applied to genomic data. This issue is interesting enough to Jean that he would be likely to contribute his expertise and sample data for the purpose of developing ontologies that actually meet his needs and could be supported for subsequent use in DataONE.

Technical support available: 

Jean funds good technical support within his research group. He knows that data management and archiving is becoming a more important issue for his field, and he is willing to devote resources to doing a better job of it, despite his concerns about the ultimate utility to his own work.

Personal biases about data sharing and reuse (and data management more generally): 

Jean does not normally share his pedigree book because it would not make sense to others, but freely distributes seeds to colleagues that ask for them. He considers these seeds to be data. When he receives seeds from others he “vets” the data by germinating the seeds and confirming the phenotype. He has hired a web developer to help visualize some of the collected data.

The assembled genomes he is willing to share immediately and thinks others should do the same. The transcriptome data are used to answer a biological question and thus are more sensitive. He would be willing to share the raw transcriptome data after publication, but does not want to be scooped in publications or proposals.

Repositories exist for genome data (e.g., GenBank), but not for raw phenotypic or raw sequence reads. Jean uses standard gene nomenclature to describe mutants, but feels unqualified to handle metadata.

Jean currently collects data for his own use. He does validate his data and describes it using the MEINS guidelines for metadata. Deposit is in the form of publications based on summaries and analyses; some of the data are shared, e.g., gene sequences in Genbank.

DataONE could provide tools to help Jean maintain his data in a consistent fashion over time. The motivation to use DataONE would be for better description of his data and for long-term preservation.

Comparison of current and DataONE-enabled practices:
Project Planning: 
  • Management Planning: Develops a project Data Management Plan following examples provided on the DataONE portal.
Current data collection: 

Jean collects phenotypic and genomic data.

DataONE enabled data collection: 

No change.

Current data description: 

Jean uses the MEINS guidelines for metadata for the genomic data, but does not describe the phenotypic data.

DataONE enabled description: 
  • Training: Learns how to use Morpho (a metadata management editor) based on instructional materials available in the DataONE Best Practices Database and associated downloadable video instructions.
  • Helps develop an ontology for describing data to enable longitudinal analyses.
Current data preservation: 

Deposits genomic data but no other long-term data preservation plans.

DataONE enabled preservation: 
  • Data Preservation: Deposits the data and metadata in a DataONE member node data repository for long-term preservation of the data.
  • Data Preservation: Submits a research paper to a journal associated with Dryad—a DataONE Member Node. Upon acceptance, he submits the publication-relevant data, metadata, and model to Dryad where they are given a DOI (digital object identifier) and preserved in the Dryad repository.
  • Citation: Upon publication, he adds the publication reference and the data citation (including DOIs for both; provided by Dryad and the journal) to her CV.
Current data discovery: 


DataONE enabled discovery: 
  • Citation: Another scientist working in Mexico on a similar study discovers the new publication and data created by Jean and cites him in his work.
Current data integration: 


DataONE enabled integration: 


Current data analyses: 

Uses standard desktop analysis tools.

DataONE enabled analysis: 
  • Data Visualization: Uses data analysis and visualization tools identified through DataONE Tools Database or available as part of the Investigator Toolkit to analyze existing data that he will use in his own research.
  • Data Visualization: Creates graphics using tools identified via DataONE.

Data Conservancy Jean persona by Anne Thessen: Interview with Zach Lippman and comments from Sherri Simmons; revised by Kevin Crowston