Opening the Door for Open Data

Decades worth of ecological data points, metadata, and databases have been amassed by scientists around the world. Where all these data end up has major implications for the future of research and conservation, yet there are no standard practices for sharing data with other scientists or the public. Shared data could open the door for new research opportunities and could be used to make better-informed policy and conservation decisions. However, most raw data and potentially useful information ends up filed away by those who initially gathered it, sometimes never to be glanced at again. Published results that are not open source often go unnoticed by experts from varied fields because they are locked behind paywalls. Additionally, there are few incentives and resources for researchers to share their data. However, several efforts are underway to simplify and encourage sharing environmental data, open up access to important information, and improve transparency.

Overcoming difficulties with data sharing requires linking databases and scientists from various organizations to minimize the redundancy of management systems and prevent the proliferation of scattered and incomplete datasets. Federal agencies, which were recently required to make their data available to the public by the OPEN Government Data Act, have largely led the movement towards developing management systems that can increase data transparency and accessibility. States have also begun to mandate data improvements with the passage of legislation, such as AB 1755 (the Open and Transparent Water Data Act) in California. Many agencies and other organizations are seeking to organize massive amounts of data in a meaningful and practical way by starting with regional or project-specific management systems, such as FACT Network, the Florida Atlantic Coast Telemetry group for fish and sea turtle telemetry. These smaller systems often provide a way to compile, view, and analyze data in different ways, but must be scaled-up and established as a valuable resource for academics and larger regional management associations to have a broader impact.

During their annual meeting earlier this year, the Bay-Delta Interagency Ecological Program (IEP) hosted a workshop called “Fame, Freedom, and Fairness with Open Data,” during which the IEP’s Data Utilization Working Group (DUWG) presented their efforts to achieve open science. Currently, IEP data are stored across a variety of databases, some more accessible than others. The DUWG is hoping to make open science easier in the Bay-Delta by offering templates for data management plans that describe data collection, archiving and storage, and a template for metadata (descriptions or other information about a data set) using the Ecological Metadata Language standard to increase transparency. IEP chose to use the Environmental Data Initiative (EDI) for data curation and archiving, which assigns each dataset a Digital Object Identifier (DOI) so it can be uniquely identified and adequately cited in publications (hence the “fame” and “fairness”). So far, the IEP has uploaded data from four different surveys to this service. Once uploaded to EDI, the information will eventually be transmitted to the AB 1755 database, which will act as a central hub to coordinate existing water data in California, and from there it will go to various portals such as Bay Delta Live and SacPas, which serve as access points and provide geographic or graphical tools for visualizing data.

Despite recognized risks and difficulties to overcome, such as concerns about the cost of sharing data, misinterpretation of one’s data, and misuse of species information, there may also be some rewards to those who opt to share their data. Greater feedback and quality control from a wider audience could improve studies and articles before they are published. The recent “open science” movement even aims to make all aspects of the scientific process more collaborative, from study design to data management to sharing results. Open source data portals, such as Zooinverse gather data with the help of the public and have led to several publications (Cooper et al. 2014) and conservation decisions (Kobori et al. 2015). Possibilities clearly lie hidden within all the data sitting on the shelf, and new systems for more transparent and coordinated data management appear to be on the horizon.

This post featured in our weekly e-newsletter, the Fish Report. You can subscribe to the Fish Report here.