I want to search


We've launched a new website!

You're currently accessing the archived version of the DataONE website. To see our new design and keep up to date with the latest DataONE news, visit our new website at https://dataone.org


The call for the 2019 DataONE Summer Internship Program is now closed

The 2019 DataONE Summer Internship Program

The Data Observation Network for Earth (DataONE) is a virtual organization dedicated to providing open, persistent, robust, and secure access to biodiversity and environmental data, supported by the U.S. National Science Foundation. DataONE is pleased to announce the availability of summer research internships for undergraduates, graduate students and recent postgraduates.

Program Information

Interns undertake a 9-week program of work centered around one of the projects listed below. Each intern will be paired with one primary mentor and, in some cases, secondary and tertiary mentors. Interns need not necessarily be at the same location or institution as their mentor(s).


February 22 - Application period opens
March 25 - Deadline for receipt of applications at midnight Mountain time
Apr 8 - Notification of acceptance and scheduling of face-to-face meetings (schedules permitting)
May 20 - Program begins*
June 18 - Midterm evaluations
July 19 - Program concludes**
* Some allowance will be made for students who are unavailable during these dates due to their school calendar.
** Program may not extend beyond Aug 9, 2019.


The program is open to undergraduate students, graduate students, and postgraduates who have received their degree within the past five years. Given the broad range of projects, there are no restrictions on academic backgrounds or field of study. Interns must be at least 18 years of age by the program start date, must be currently enrolled or employed at a U.S. university or other research institution and must currently reside in, and be eligible to work in, the United States. Interns are expected to be available approximately 40 hours/week during the internship period (noted above) with significant availability during the normal business hours. Interns from previous years are eligible to participate.

Financial Support

Interns will receive a stipend of $5,000 for participation, paid in two installments (one at the midterm and one at the conclusion of the program). In addition, required travel expenses will be borne by DataONE. Participation in the program after the mid-term is contingent on satisfactory performance. The University of New Mexico will administer funds. Interns will need to supply their own computing equipment and internet connection. For students who are not US citizens or permanent residents, complete visa information will be required, and it may be necessary for the funds to be paid through the student’s university or research institution. In such cases, the student will need to provide the necessary contact information for their organization.


Projects will be announced when the application period opens. Projects will cover a range of topic areas and vary in the extent and type of prior background required of the intern. Not all projects are guaranteed funding and the interests and expertise of the applicants will, in part, determine which projects will be selected for the program.

2019 Project Titles

Project 1: Tools to enhance community driven data management education
Project 2: Provenance for Self or Others? A Study with Hands-on Experiments
Project 3: Supporting Community Outreach and Advocacy for Open Data
Project 4: Reach and Citation of DataONE
Project 5: Build capacity for using DataONE via Python
Project 6: A Reproducible Network Analysis of the DataONE Linked Open Data graph

Project Details

Project 1: Tools to enhance community driven data management education

Primary Mentor(s): Megan Mach
Secondary Mentor(s): Dave Vieglais

Necessary Prerequisites:

  • Excellent organizational and data management skills
  • Upper level undergraduate student or above in environmental science, information science, computer science or a related subject, including science communication
  • Experience using GitHub and/or GitHub Pages
  • Ability to process technical information and synthesize into high level concepts

Desirable Skills / Qualifications:

  • Undergraduate degree in environmental science, information science, computer science or a related subject
  • Awareness of open data and data sharing practices
  • Good communication skills and ability to talk to people from different disciplines

Expected Outcomes:

  • Technical tutorials aiding community sourced data management education materials
  • with provenance linkages fully describing the computational processes that produced those outputs
  • DOI assignment for data management education resources
  • Develop and administer survey assessing usability of their tutorials and DOI creation method

Project Description:
About DataONE
DataONE supports synthesis research through enhanced search and discovery of Earth and environmental science data from across a network of integrated data repositories. Efficiencies to researchers can include reduced time in data discovery, refined search function resulting in more relevant data results and the ability to download data from multiple repositories among others. Researchers working in synthesis science, conducting systematic reviews or meta-analyses will benefit from using DataONE as a data search engine.

The problem
Over the past ten years DataONE has focused on both making earth and environmental data accessible, and also highlighting the importance of strong skills in data management for researchers. We have published data management education modules and led workshops to develop best practices. As a next step we have moved many of these materials to a community-based platform to increase their use and usability by the research community. Education materials are now being hosted through GitHub on our Data Management Skillbuilding Hub and we want users downloading, editing, and contributing to keeping them updated.

The project
This project will develop technical tutorials concerning submission of material to the Skillbuilding Hub infrastructure (in GitHub) for a non-technical audience. These tutorials will be written in markdown as pages within the Hub and will provide support for the contribution of several different types of content. To ensure the longevity of these materials this internship will also support development of a method for assigning a DOI to each specific education resource in collaboration with the DataONE team. Usability of the tutorial and DOI creation will be assessed through the creation a survey, administered to the DataONE User Group and other identified parties. Feedback will be incorporated in finalized materials, to be launched on the Skillbuilding Hub.

Project 2: Provenance for Self or Others? A Study with Hands-on Experiments

Primary Mentor(s): Bertram Ludäscher
Secondary Mentor(s): Michael Gryk
Additional Mentor(s): Robert Sandusky

Necessary Prerequisites:

  • Experience or interest in scientific data management, reproducibility (computational, scientific), and provenance
  • Familiarity with open science concepts

Desirable Skills / Qualifications:

  • Experience with databases (e.g., SQL)
  • Programming experience (e.g., in Python)
  • Experience with provenance tools and applications

Expected Outcomes:

  • An annotated bibliography of provenance research
  • A report with findings and recommendations based on a literature survey and the intern’s own hands-on experiences.
  • A draft research design to investigate adoption of and barriers to to adoption of data provenance tools and techniques, based on the literature survey and informed by the intern’s own hands-on experiences.

Project Description:
Data provenance is an important form of metadata that captures the lineage and processing history of data products resulting from data-driven analyses and workflows. Provenance information can increase the transparency, reproducibility, and reuse of data products. Recent years have seen considerable research and development efforts devoted to standards, tools, and applications that capture, store, query, and visualize provenance.
The goal of this project is to study contemporary use of provenance in different stages of the data life-cycle in order to answer questions such as: Who is creating or using provenance and for what purposes? Is provenance capture and use already ingrained and best practice in some domains, or is it viewed as yet another “metadata chore” that scientists reluctantly deal with.
This project consists of two parts: (i) an “environmental scan” / survey of the research literature on data provenance with a focus on provenance tools and applications (possibly including some limited survey work), and (ii) a hands-on part whose goal is to use commonly mentioned tools in their prototypical settings. A key outcome is a report with findings and recommendations based on the literature survey and the intern’s own hands-on experiences.

Project 3: Supporting Community Outreach and Advocacy for Open Data

Primary Mentor(s): Robert Sandusky, Karl Benedict
Secondary Mentor(s): Amber Budden, Megan Mach

Necessary Prerequisites:

  • Undergraduate degree in environmental science, information science, computer science or a related subject
  • Experience creating presentation materials in powerpoint, keynote or related software
  • Ability to process technical information and synthesize it into high-level concepts
  • Interest in design and an ‘eye’ for aesthetics
  • Good oral and written communication skills and ability to communicate effectively with people from different disciplines

Desirable Skills / Qualifications:

  • Experience with Adobe design software such as InDesign, Photoshop
  • Demonstrated experience in development of designed print materials
  • Awareness of open data and data sharing practices

Expected Outcomes:

  • Create a final summary and assessment of interviews and research conducted during the previous year to support creation of talking points and materials required to effectively communicate the DataONE mission, vision, products and services
  • Develop an outreach kit to help DataONE community members advocate for DataONE and promote its products, services, and benefits. The outreach kit may include presentation (slide) materials, downloadable PDFs, shared Google Drive or other resources
  • Create a final summary of community input regarding the ongoing evolution of DataONE’s community governance model

Project Description:
The DataONE Users Group (DUG) is the worldwide community of Earth observation data authors, users, and diverse stakeholders that makeup the DataONE partnership communities. The primary function of the DUG has been to represent the needs and interests of these communities in the activities of DataONE. Members of the DUG include representatives of the member repositories, coordinating nodes, researchers and other relevant groups (e.g. research networks, professional societies, libraries, academic institutions, data centers etc.).

As DataONE moves towards a sustainable future (https://www.dataone.org/future) the user community will become increasingly important in contributing to a community-driven organizational structure and in advocating for DataONE products and services. To support this distributed advocacy, we seek to develop an outreach kit for the user group members. The outreach kit content will be grounded in the data compiled from interviews, surveys, and other sources during the previous year.

The intern will design a DataONE outreach kit in collaboration with the primary mentors, who are the current chairs of the DataONE Users Group. The intern will collaborate on development of communication materials for each of the topics / products in formats identified as valuable to the user community (e.g. PDF downloads, slide presentations, image directories etc). These materials will leverage previous materials developed by the DataONE team and be consistent with current DataONE branding.

Project 4: Reach and Citation of DataONE

Primary Mentor(s): Amber Budden
Secondary Mentor(s): Amanda Whitmire
Additional Mentor(s): CEO WG

Necessary Prerequisites:

  • Excellent organizational and data management skills
  • Experience with bibliographic management systems such as zotero
  • Experience querying web of science, google scholar etc for research articles
  • Undergraduate degree in information science, computer science; or undergraduate degree from another domain and interest / experience in bibliometrics or citation analysis

Desirable Skills / Qualifications:

  • Masters degree in library or information science

Expected Outcomes:

  • A public bibliographic library of publications from the DataONE project
  • A public bibliographic library of publications citing DataONE
  • A database containing information on references to DataONE across the web
  • A summary report on representation of DataONE across the web and in published literature

Project Description:
As part of our transition to a sustainable future (https://www.dataone.org/future), DataONE seeks to develop a comprehensive understanding of the way in which the organization is discussed and referenced in the broader community. This information will support strategic communication and outreach planning and provide insights into future collaborations and partnerships to be pursued.

Scholarly communications are one method of assessing recognition and DataONE maintains a database of articles published by DataONE and also articles citing DataONE. However, many references are non formal citations and exist on web pages or in blogs and other communications. Additionally, even within the published literature, there is variation in how and where DataONE is cited resulting in some articles not be accurately indexed.

This project will undertake several activities. First, the current database of publications and citing articles will be reviewed and transferred to a public bibliographic manager such as zotero, enabling community contributions to the library moving forward. In doing so, a thorough search of the literature will be conducted to ensure the library is up-to-date. Second, building from a previous project in DataONE that explored ARL library citation of DataONE, this internship will investigate incidence of DataONE citations and links on pages across the web. These will be categorized by various factors such as type of page, type of mention, where the link directs to etc. Research of this type will augment current usage data to help us understand which products and services are value by the community and to explore variation across stakeholder types.

Project 5: Build capacity for using DataONE via Python

Primary Mentor(s): Bryce Mecum
Secondary Mentor(s): Dave Vieglais
Additional Mentor(s): Roger Dahl

Necessary Prerequisites:

  • Experience with software development using the Python programming language
  • Familiarity with Git repositories and the GitHub environment
  • Familiarity with the Jupyter Labs / Notebooks environment

Desirable Skills / Qualifications:

  • Experience with packaging Python tools for distribution
  • Technical documentation skills
  • Experience developing unit tests and software test cases

Expected Outcomes:

  • Creation of educational materials for the scientific Python community showcasing how Python can be used with DataONE
  • One or more integrations with the scientific Python community
  • Updated documentation for the d1_python libraries
  • Streamlined installation and simplified libraries for using the d1_python library for client applications

Project Description:
Programmatic tools for working with DataONE exist but they need to be discoverable and easy to use in order for communities to form around their use. The d1_python library is a core software library developed and used by the DataONE development team for working with the DataONE infrastructure. The library is quite extensive and can be daunting to install and use, especially where the intent is for simple use cases such a searching, uploading, or downloading data. The goal of this project is to build capacity within the scientific Python community around DataONE by developing examples, documentation, and possibly additional libraries or extensions with the goal of simplifying or reducing the barriers to use of the d1_python library by researchers.

Project 6: A Reproducible Network Analysis of the DataONE Linked Open Data graph

Primary Mentor(s): Bryce Mecum
Secondary Mentor(s): Dave Vieglais

Necessary Prerequisites:

  • Experience with accessing structured data over HTTP
  • Experience working with XML and JSON
  • Experience with a scripting language such as Python or R

Desirable Skills / Qualifications:

  • Familiarity with the linked open data paradigm
  • Familiarity with network analysis techniques
  • Understanding of graph data structures
  • Familiarity with earth science metadata standards

Expected Outcomes:

  • Identification of a set of key metrics or questions to apply to the DataONE linked open data graph
  • A reproducible analysis of key metrics or questions in the form of a report that can be re-run periodically to track changes over time

Project Description:
With over 800,000 datasets accessible through programmatic interfaces, DataONE provides a rich corpus of machine readable metadata that is also expressed as a linked open data (LOD) graph. The goal of this project is to explore the LOD graph of DataONE and provide a network analysis on the graph and how the network differs from the content available through the traditional DataONE Application Programming Interface (API). For example: How interconnected are data sets and researchers? How many individual authors contributed to how many data sets? Can fields such as keywords be normalized to a small set of controlled vocabularies? How do network analysis measures differ by metadata standard, year of publication, or other facets?

To Apply

Full details of the application process, and links to forms, will be available when the application period opens.
Required application materials include: 1) a resume that includes educational history, current position, any publications or honors, and full contact information (including phone number, e-mail address, and mailing address); 2) a cover letter identifying the project you are interested in, the contributions you expect to make to the project, relevant background, value of the internship program to your career objectives and your approach to meeting the project deliverables; and 3) a letter of reference.

Applications must be completed by 11:59 PM (Mountain time) on March 25th. Links to the application forms are provided below. Applicants should also provide a letter of reference. The letter of reference should be sent directly by its author to internship@dataone.org by the application deadline.

  1. The cover letter should address the following questions:
    • Which DataONE Summer Internship project(s) are you most interested in and why?
    • What contributions do you expect to be able to make to the project(s)?
    • What background do you have which is relevant to the project(s)?
    • What do you expect to learn and/or achieve by participating?
    • What are your thoughts and ideas about the project, including particular suggestions for ways of achieving the project objectives?
    • How will participation in this program help you achieve your educational and career objectives?
    • Are there any factors that would affect your ability to participate, including other summer employment, university schedules, and other commitments?
  2. The resume should include the applicant’s educational history, current position, any publications or honors, and full contact information (including phone number, e-mail address, and mailing address).
  3. The letter of reference should be sent directly to internship@dataone.org and should be from a professor, supervisor, or mentor.

The internship application is now closed

Evaluation of applications

Applications will be evaluation according to the following criteria:

  • The academic and technical qualifications of the applicant.
  • Evidence of strong written and oral communication skills.
  • The extent to which the applicant can provide substantive contributions to one or more projects, including the applicant’s ideas for project implementation.
  • The extent to which the internship would be of value to the career development of the applicant
  • The availability of the applicant during the period of the internship.

Intellectual Property

DataONE is predicated on openness and universal access. Software is developed under one of several open source licenses, and copyrightable content produced during the course of the project will made available under a Creative Commons (CC-BY 3.0) license. Where appropriate, projects may result in published articles and conference presentations, on which the intern is expected to make a substantive contribution, and receive credit for that contribution.

Funding acknowledgement

Previous Summer Internships were supported by a National Science Foundation Award (NSF Award 0830944): "DataNetONE (Observation Network for Earth)". Current Summer Internships are supported by National Science Foundation Award #1430508.

For more information

If you have questions or problems about the application process or internship program in general, please e-mail internship@dataone.org.