I want to search


We've launched a new website!

You're currently accessing the archived version of the DataONE website. To see our new design and keep up to date with the latest DataONE news, visit our new website at https://dataone.org

File Organization System, Meet Collaborator’s File Organization System

File Organization System, Meet Collaborator’s File Organization System

A freshwater ecologist who studies natural phytoplankton communities had been working on a project for several years when unforeseen trouble cropped up and the still-incomplete project came to a standstill. Following days and weeks of pondering what to do about the obstacles that were hampering progress, she decided to put the project on hold for a while. Still experiencing the mix of relief and frustration that came with that decision, she focused her attention on another project that involved investigating changes over time in lake temperatures and in the depths at which different types of lake phytoplankton are found. After some time, however, that project also ran into difficulties. Prolonged agonizing over these setbacks (and worrying over all of the time invested without any manuscripts to show for it) was wearing her down when suddenly a moment of clarity revealed an idea that was surely pure genius – the problem with these projects is that they were not big enough!

You may now be questioning the distressed ecologist’s sanity, but her moment of insight actually turned out to be a valuable one. Many of the problems that had crept up in her two studies could actually be resolved if she expanded the study to involve more data and more organisms. Fortunately, she also knew another freshwater ecologist who was investigating yet another species of phytoplankton and he, too, had not yet published his work. She discussed with him her dilemma and her brilliant idea and he was completely on board, especially considering he had been having similar difficulties. They were consumed with optimism and were eager to embark on this new journey. However, after contemplating the next action, they realized they were in for a few storms before they could hit smooth sailing.

Between the two ecologists they had three separate projects, each having separate folders with multiple files and multiple versions of data and information. They also compared analyses for all of the projects, and came to the unfavorable realization that some of the analyses were in different formats. Two different programs with different formats were used – an open-source statistical and graphics scripting program called R and a commercial point and click program called JMP. They were overwhelmed to say the least! This is where the real trouble began. One of the researchers already had her data and documentation on a collaborative project management server, but this system was no longer supported at her institution, meaning that files needed to be transferred somewhere else. Although her data were on a collaborative site, she wasn’t sure whether or not the files there were up-to-date because she had another collaborator who had been working on the files and might not have added the most recent versions. The other collaborator had his data and documentation on his personal computer, but the file system and file-naming conventions were not very systematic, which meant he would need to go through a lot of the files to refresh his memory about what they each contained. How were they going to merge everything from their separate projects into one clear, organized workflow and file system for both of them to use at the same time, and how were they going to continue adding data and information to eventually successfully complete the collaborative manuscript?

Both of them spent an entire week organizing their data into the new project folder and establishing the most updated results of their combined projects. They began by setting up a shared Dropbox folder, which allowed them to upload and share their data files, code associated with their individual analyses, and other project documentation. To make navigating the files easier, they agreed to always label each file they saved with the date before the file name so they could quickly see what the most updated version of a file was. However, because they were constantly making changes, things very quickly became confusing and it was nearly impossible to keep track of exactly what had been changed and when. Even though they had invested so much time in setting up a file organization system that they both understood, things were getting out of control! As the number and complexity of files grew, finding the most recent version of a file and understanding exactly how it was different from other versions became a frustrating experience.

Discussion points: Despite their best efforts to keep things organized, their system wasn’t working. Frustration began to overtake their initial excitement about the possibilities the collaborative project offered. What to do, what to do?

Story contributed by Dr. Derek Gray with additional information from Kara Woo.
Image: CC-BY-NC-SA by Jon Shablotnik via flickr