Mission
The primary goal of data management is to
provide long term storage and maintenance of the Shortgrass Steppe LTER data as
well as access to the data by LTER scientists, students, and the public. The
design of our archival procedures, relational database management system, and
web-based data access system are all oriented toward achieving this goal. In
providing access to SGS LTER data by scientists at Colorado State University, we
are also considering the needs of scientists’ worldwide to access the data.
The second goal for data management is to assist the LTER scientists in the
analysis of the data and the use of the data in modeling activities.
Overview
Information management ideally starts before
data collection is ever started at the site.
Communication between researchers and the information management (IM)
team begins with project initiation. The
IM team remains involved during data collection, verification, entry, QA/QC,
archival, and publication (Brunt 2000). Currently, additional data are being directly downloaded to
our database from data loggers in the field, thus shortening the time between
data collection and entry. After
digital data are assured for quality, the information is transferred to the
SGS-LTER Relational Database Management System (RDBMS) residing on the SGS-LTER
server, where information becomes accessible to the public through our website (http://sgs.cnr.colostate.edu/Data/DataLibrary.htm).
We have developed a strong web-based tool with
the Agricultural Research Service (ARS) for researchers to contribute metadata
to the database. An online form
allows researchers to automatically enter metadata into the SGS-LTER Access
RDBMS. End users may query the
database for project information dating back to 1940.
We will continue to develop and refine these tools to participate in
developing metadata content standards across the Network. (Please see http://sgs.cnr.colostate.edu/ars)
Meteorological data from the site are
transferred to Colorado State University by modem each night. The data are then
processed by a data "filter" that verifies that the data values fall
within reasonable ranges. Where errors occur in the data stream the filter
reports the errors and replaces the data with missing value codes.
Information management for the SGS LTER
project has traditionally relied upon using flat ASCII files to store data,
ASCII files to describe the data (metadata files), and locally written programs
to access, view, plot and download the data. However, in 1996 all of the SGS
LTER data and metadata was loaded into an Access relational database. This
database allows information management personnel and scientists to query any
part of the database via our web site. ASCII text versions of these datasets are
also still available. Our
information management strategy is based upon two basic tenets for data
management at the SGS LTER: (1) data are to be maintained in ways that ensure
that the data will be accessible several decades from now, and (2) the analysis
of data is to be conducted by the LTER scientists. The role of the information
management staff is to see to it that data are properly recorded, transcribed,
documented and stored to meet these two goals.
The scope of information management activities
at the SGS LTER has been expanding over the last few years. GIS data are being
used extensively for some studies, synthetic analyses and regional scale
extrapolations are continuing to be of interest to the LTER scientists, and
collaborative efforts for exchanging data between LTER sites have been initiated
(Burke and Lauenroth 1993, Lauenroth et al 1993). As a result of the 1993 site
review we have critically re-examined our current data management plan.
The Current Data Base System
The LTER database is an Access relational
database, which is run on a Microsoft server housed within the Natural Resources
Ecology Laboratory at Colorado State University. These data are the primary data
for the project and consist of the original verified field observations. The
LTER database is backed up to 8-mm cassettes as part of the standard backup
procedure for the local network. In addition, weekly database backup files are
completed. The information manager retains one copy of the data, and the PI
keeps another copy.
The data of the Access database include field
data from experiments or monitoring studies conducted at Shortgrass Steppe field
sites. These data include observations collected prior to the start of the LTER
project, primarily from the International Biological Program's Pawnee site.
Associated with each data file should be a file that provides a description of
the format of the data, the name of the investigator responsible for the data,
methods used for collecting the data, problems encountered with the collection
of the data, and other pertinent information. Such documentation of a data set
is often called metadata. The documentation of the data is essential if the data
are to be used in the future. Many of the data sets collected under the IBP do
not yet have adequate data description files. We are adding these descriptions
to the database as time permits.
The investigators associated with the LTER
project have offices in many different buildings across the campus. In addition,
the LTER database is used by scientists across the nation. We have developed
tools for accessing the data in the LTER database from anywhere that has access
to the internet. The system is based on a series of WWW forms that allow the
user to query and download specific datasets using active server pages. Our
user-friendly website provides an interactive interface to the database.
LTER information management also includes the
maintenance of a bibliographic database for publications related to the LTER
project and the SGS site. The list of publications is updated annually, printed
and distributed. The bibliography is also maintained on the network. Recently we
have begun to investigate the feasibility of providing bi-directional links
between entries in the bibliographic database and the metadata files for the
data used in the reference. The SGS LTER bibliographic database is searchable by
author, keyword, year, and publication type.
Inter-site
Information Management Activities
The LTER data managers agreed at the July,
1993 Data Manager's meeting to pursue methods to facilitate the exchange of data
between the sites. The strategy to be employed is to use a common metadata
format to describe the data being transferred. Current efforts at the LTER
Network level include the development of a Network
Information System that will facilitate
synthetic uses of common LTER datasets.
SGS-LTER participates in DTOC (Data Table of
Contents), Personnel database, CLIMDB (All Site Climate Database), ANPP (All
Site Annual Net Primary Production), and All Site Bibliography Network
Information Systems (NIS) modules that are maintained by the Network Office (http://lternet.edu/data).
As we progress through the new millennium, the
“Decade of Synthesis and Standardization” of metadata continues to be a
pressing issue at both the site and Network level (Stafford, personal
communication and Baker et al. 2000). The information management team is also
excited about its active involvement with developing a “content standard”
for metadata within the LTER community. This
new tool, called Ecological Metadata Language (EML), will simplify data access (http://caplter.asu.edu/data/metadata/workshop012002.htm).
Policies for
Data Management
Definitions
Restricted Access
Restricted
access limits access to a data set to the investigator responsible for its
collection. The investigator can provide the data to others, but he or she will
have the authority to make the decision to approve or deny access to the data.
Open Access
Open
access permits anyone who so desires to access and use a set of data without
requiring permission from the investigator. However, all users of SGS LTER data
must notify the data manager and acknowledge the SGS LTER in any publications
that result from the data.
Long Term Data Set
A
long-term data set is one that spans more than three years, and in which the
data is clearly part of an ongoing study.
Data Access Review Committee
The
data access review committee is envisioned as consisting of one LTER PI, one
non-LTER scientist, and one grad student. If a question arises regarding an
extension request, a scientist from the field of study related to the data set
in question shall be consulted to help evaluate the request. Metadata the
information that describes a data set, including specific features of the file
format, units for variables (fields), definitions for fields plus general
information describing the experimental design, study site, etc.
Access and
Proprietary Rights
1.
Metadata would be accessible to all (open access)
2.
There is a 3-year interval from the end of field work during which data
will be maintained as restricted access for short-term studies. After this
3-year period, the data shall change to open access. If the investigator wishes
to keep data access restricted for longer than 3 years, the investigator must
provide a written request annually to justify the restricted access extension.
The data access review committee will review the requests and recommend whether
restricted access should be continued. In the absence of a written request the
data will be assigned to the open access category. Publication of the data
should weigh against continuing restricted access, but factors such as a long
period from acceptance to publication should be weighed in favor of maintaining
restricted access. The data access review committee shall meet on an annual
basis to review written justifications which request the extension of restricted
access on particular data sets.
3.
After 6 years from the end of the field work data will be placed in the
open access category unless there are valid extenuating circumstances to justify
further retention in the restricted access class. More leniency should be given
for maintaining restricted access on long term data sets.
- Long
term data sets should be exposed to a "moving window" on the
access policy, such that after year 4 or following publication of the
Investigators results, the first year's data is liable for reclassification
to open access unless a request for maintaining restricted access is made.
Quality
control
1.
Metadata provided by a researcher would require review and a
recommendation for acceptance by another scientist. The researcher would be
asked to provide the names of potential reviewers.
2.
A mechanism for eliciting comments on the content of metadata is
established in order to provide additional feedback for improvement of quality.
3.
A standardized list of keywords for describing a data set in the metadata
will be prepared and used when appropriate. These keywords may be a subset of
the keywords used in the bibliography.
4.
That rules for developing species codes will be defined. Existing species
codes could be listed as well. We will consider using species codes developed
for use with the SCS, BLM or the Plant Information Network (PIN).
5.
Maps will be associated with metadata for field experiments, showing
locations of experimental areas.
- A
standard for stakes and marking plates used in the field will be developed.
Responsibilities
of the Information Management Staff and Scientists
1.
Data should be turned in within 3 months after the end of an experiment
or field season, with exceptions being granted due to factors such as sample
analysis delays.
2.
In order to keep track of the expected date for submission of data, a
simple form will be created and distributed along with the ARS and PNG form by
which investigators request permission to use the site. The form should also be
submitted prior to conducting greenhouse or laboratory experiments for the LTER
program.
3.
Non-LTER scientists who collect data relevant to the LTER project should
also be encouraged to submit data for inclusion in the database. If the data
cannot be obtained, such scientists should be requested to submit metadata
describing their data sets so that a record is kept of their research.
4.
Priorities for submission of data and entry into the system will be set
by the information management staff, but that they also have the responsibility
for informing the scientists if other priorities will delay getting the data
into the system in a timely fashion.
5.
Material will be prepared to hand to scientists who are contemplating
research under LTER and also to be used prior to starting laboratory or field
experiments. The material will describe the purpose of data management, the
responsibilities of the scientists and data management staff, standards for
conventions such as keywords, species lists, and location data, and forms for
preparing metadata.
- The
responsibilities of the information management staff, in order of priority,
should be (1) to insure that data are placed in the data management system
on a timely basis and that the data undergo quality control checks, (2) that
the data be made accessible to the appropriate people, depending on access
status, and (3) that scientists be provided with analytical support at the
level of providing data in a form that can be used by others, plus help with
preparing graphics or other non-technical analyses. Not to be directly
supported are tasks such as statistical analyses.
References
Baker, K.S., B.J. Benson, D.L. Henshaw, D.
Blodgett, J.H. Porter, and S.G. Stafford. 2000. Evolution of a multisite network
information system: The LTER information management paradigm.
BioScience 50: 963-978.
Brunt, J.W. 2000. Data Management Principles, Implementation, and
Administration. Pp. 25-47 in
W.K. Michener and J.W. Brunt. Ecological
Data Design, Management and Processing. Blackwell
Science Ltd., Oxford, UK.
Burke, I.C. and W.K. Lauenroth. 1993. What do
LTER results mean? Extrapolating from site to region and decade to century.
Ecol. Mod. 67:49-80.
Lauenroth, W.K., D.L. Urban, D.P. Coffin, W.J.
Parton, H.H. Shugart, T.B. Kirchner, and T.M. Smith. 1993. Modeling vegetation
structure-ecosystem process interactions across sites and ecosystems. Ecol. Mod.
67:49-80.
Last Updated February 28, 2002

02/28/02 |