I want to search


We've launched a new website!

You're currently accessing the archived version of the DataONE website. To see our new design and keep up to date with the latest DataONE news, visit our new website at https://dataone.org

Software Tools


Dash is an open source, community driven project that takes a unique approach to data publication and digital preservation.

Dash focuses on search, presentation, and discovery and delegates the responsibility for the data preservation function to the underlying repository with which it is integrated. It is a project based at the University of California Curation Center (UC3), a program at California Digital Library (CDL) that aims to develop interdisciplinary research data infrastructure.

Additional Information:

In today’s technologically advanced world, the data generated by researchers is increasingly born digital and subject to intensive transformation and analyses before publication. The various file formats, software, and hardware required to succeed in the modern research landscape can become daunting, especially since education about digital data management has not kept pace with these technological advancements. There is a significant gap between the data management skills needed by modern researchers and their current abilities; the gap is more noticeable given the current increase in funder

Dash is an open source, community driven project that takes a unique approach to data publication and digital preservation. Dash focuses on search, presentation, and discovery and delegates the responsibility for the data preservation function to the underlying repository with which it is integrated.

Dash is based at the University of California Curation Center (UC3), a program at California Digital Library (CDL) that aims to develop interdisciplinary research data infrastructure. Dash employs a multi-tenancy user interface providing partners with extensive opportunities for local branding and customization, use of existing campus login credentials, and, importantly, offering the Dash service under a tenant-specific URL, an important consideration helping to drive adoption. We welcome collaborations with other organizations wishing to provide a simple, intuitive data publication service on top of more cumbersome legacy systems.

There are currently seven live instances of Dash: - UC Berkeley - UC Irvine - UC Merced - UC Office of the President - UC Riverside - UC Santa Cruz - UC San Francisco - ONEshare (in partnership with DataONE)

Architecture and Implementation

Dash is completely open source. Our code is made publicly available on GitHub (http://cdluc3.github.io/dash/). Dash is based on an underlying Ruby-on-Rails data publication platform called Stash. Stash encompasses three main functional components: Store, Harvest, and Share.

  • Store: The Store component is responsible for the selection of datasets; their description in terms of configurable metadata schemas, including specification of ORCID and Fundref identifiers for researcher and funder disambiguation; the assignment of DOIs for stable citation and retrieval; designation of an optional limited time embargo; and packaging and submission to the integrated repository
  • Harvest: The Harvest component is responsible for retrieval of descriptive metadata from that repository for inclusion into a Solr search index
  • Share: The Share component, based on GeoBlacklight, is responsible for the faceted search and browse interface
Tags: collaboration,data storage,preservation Contributor: Cost: Free
Custom text:

CyberTracker is a software tool that allows users to collect field data with handheld computers or PDAs. It can also be used to create digital field guides because it allows rich content to be displayed in conjunction with data capture fields.

The CyberTracker Species Identification Filter consists of a sequence of screens each with a checklist of characteristic features of a species. Once data has been filtered it can be Exported to Microsoft Excel, Comma Seperated Values, XML or HTML formats. Creating data elements for each screen automatically creates a structured database. Cybertracker provides some templates.

CyberTracker software can be used on smart phones and handheld computers with GPS to record observations. The design allows users to display icons, text or both, which makes data collection faster. It also allows field data collection by non-literate users and school children. CyberTracker Conservation is a non-profit organization whose vision is to promote the development of a worldwide environmental monitoring network.

Additional Information:
  • Réveillon, A. 2009. The GBIF Integrated Publishing Toolkit User Manual, version 1.0. Copenhagen: Global Biodiversity Information Facility. 37 pp.
  • http://en.wikipedia.org/wiki/CyberTracker
  • Related Tools: DiGir, Tapir
Tags: data entry,metadata Contributor: Cost: Free
Custom text:

Confluence is a commercial wiki product used by many universities, open source software efforts, etc. It is a product of Atlassian, and provides rich and flexible editing capabilities and a plugin environment to extend the features of the wiki. There is an extensive range of plugins. Many organizations use it for documentation, group collaboration, project or course sites, knowledge management, internal web sites, etc. It supports a range of access control options for supporting anything from private to group to open-to-the-world access for viewing and editing. It also supports a range of export options that make it easy to get information out of the wiki in a form that can be easily re-purposed.

Additional Information:

Non-profits and Open Source projects can use Confluence for free, and academic pricing is relatively inexpensive. See http://www.atlassian.com/software/confluence/pricing.jsp for more detailed information.

Tags: collaboration,repository,web 2.0 Contributor: MG, TH Cost: Cost-basis
Custom text:
Collection Caster

Collection casting is a process for advertising a data set by creating a structured Atom news feed so people and computer systems can find your data. The "Collection Caster" tool is a web-based application that creates a "cast" (an eXtensible Markup Language/XML file) for your data set. You then place the XML file on your web server and create links to it from wherever you would like to advertise your data.

Additional Information:
Tags: Contributor: RD, JP Cost: Free
Custom text:

ColdFusion is both a platform and a language (ColdFusion Markup Language [CFML]) for enabling developers to build, deploy, and maintain Internet applications. ColdFusion is an Adobe product. ColdFusion is specifically designed to make it easier to connect HTML pages to a database and thereby create dynamically-generated web pages. Website content is managed in connection with a relational database and so can be generated on the fly. Updates can occur on multiple web pages by changing data in the database.

Additional Information:
Tags: content management system,database,web 2.0 Contributor: TV, EB Cost: Cost-basis
Custom text:
ClustalW2 / ClustalW / ClustalX

ClustalW2, ClustallW, and ClustalX are general purpose, multiple sequence alignment tools. Multiple alignments of protein sequences can identify conserved sequence regions. This is useful in designing experiments to test and modify the function of specific proteins, in predicting the function and structure of proteins and in identifying new members of protein families. Clustal is a general purpose multiple sequence alignment program for DNA or proteins. ClustalW is the command line version and ClustalX is the graphical version of Clustal. The current version is ClustalW2. It produces biologically meaningful multiple sequence alignments of divergent sequences by calculating the best match for the selected sequences and lining them up so that the identities, similarities and differences can be seen. Evolutionary relationships can be seen via viewing Cladograms or Phylograms.

Additional Information:
  • http://www.ebi.ac.uk/Tools/clustalw2/index.html
  • Higgins D., Thompson J., Gibson T., Thompson J.D., Higgins D.G., Gibson T.J.(1994)
  • CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.
  • Higgins, D.G., Thompson, J.D. and Gibson, T.J. (1996) Using CLUSTAL for multiple sequence alignments. Methods Enzymol., 266:383-402. Nucleic Acids Research 22: 4673-4680.
  • Larkin M.A., Blackshields G., Brown N.P., Chenna R., McGettigan P.A., McWilliam H.*, Valentin F.*, Wallace I.M., Wilm A., Lopez R.*, Thompson J.D., Gibson T.J. and Higgins D.G. (2007),ClustalW and ClustalX version 2, Bioinformatics 2007 23(21): 2947-2948.
  • Using ClustalX for multiple sequence, Tuimala, J (tutorial available at http://www.clustal.org/)
Tags: bioinformatics,phylogenetics,sequence alignment Contributor: Cost: Free
Custom text:

CiteULike is a free online web-based bibliography manager. It allows you to post, view, and organize scientific papers. Several journal services have one-click linking to CiteULike for saving references. This application also allows you to post links on a variety of social networking sites. Users can also search this site for publications that others have pulled into the site, and share reference lists publicly.

Groups can be established within this site to share publications of interest.

Additional Information:
Tags: bibliography,social networking,web 2.0 Contributor: JB, CS Cost: Free
Custom text:

CiteBank is an open access repository to aggregate citations for biodiversity publications and deliver access to biodiversity related articles. It provides search and browse capabilities to biodiversity publications stored in multiple international repositories. There is a storage platform for articles and documents that are digitized, but not yet online. It also provides a common system for scholars to share their specialist bibliographies. Users can upload, edit, and share their own personal lists of references and citations. CiteBank indexes the Biodiversity Heritage Library (BHL).

Additional Information:
Tags: bibliography,biodiversity,discover,repository,social networking Contributor: RL, CS Cost: Free
Custom text:

CatMDEdit is a metadata editor tool that facilitates the documentation of resources, focusing on the description of geographic information resources. The metadata conforms to Dublin Core and ISO 19115 (Geographic Information) standards. Automatic metadata generation for some common geospatial data file formats including Shapefile, DGN, ECW, FICC, GeoTiff, GIF/GFW, JPG/JGW, and PNG/PGW. CatMDEdit allows the automatic creation of metadata for collections of related resources, in particular spatial series that arise as a result of the fragmentation of geometric resources into datasets of manageable size and similar scale.

There are Spanish, English, French, Polish, Portuguese and Czech versions. CatMDEdit is an initiative of the National Geographic Institute of Spain (IGN), which is the result of a scientific and technical collaboration between IGN and the Advanced Information Systems Group (IAAA) of the University of Zaragoza with the technical support of GeoSpatiumLab (GSL).

Additional Information:
Tags: geospatial,GIS,metadata,metadata editor Contributor: RL, CS Cost: Free
Custom text:

Box allows you to store and share content online. Files and folders can be shared as web links, files and folders can be synced from the desktop. This means that files can be automatically backed up from multiple computers/devices, and stored on the Box server. It provides searching tools, and the ability to view files without downloading.

Box supports standard web browsers and mobile devices such as Android, iPhone and iPad. It can be automatically accessed through a variety of other mobile apps, and it integrates with other collaboration software such as Google Docs, Gmail, Microsoft Sharepoint, etc. Box.net allows free use of up to 5 GB of storage, and has pricing plans for enterprise capabilities, larger storage use and some additional features such as versioning, encrypted storage, etc.

Additional Information:
Tags: data storage,repository,web 2.0 Contributor: MG, MD Cost: Free
Custom text: