Database Management
Policies and Procedures
Data Tracking System
The implementation of our data tracking system (Fig. 5.1 )
means that data management starts long before data are
collected. Investigators must submit a request form to the data manager as part of the
procedure to conduct research. The form contains specific information about the study that
the data manager logs into the tracking system. After entering preliminary information
about a project into the system, the data manager tracks the progress of the dataset from
data collection through to availability on the SGS WWW site. The SGS data tracking system
increases the efficiency of data entry into the system, improves project-wide awareness of
the scope of research activities at any given point in time, and improves overall data
quality by providing consistency among datasets through standardized data collection
forms.
Data Delivery and
Verification
Contact with the investigator is maintained throughout the duration of the project and
data are submitted to the data manager no later than three months following the end of the
study. For experiments that use the LTER field crew, data are submitted directly to the
data manager. The data manager then verifies each dataset with its data description to
ensure that there are no inconsistencies between the actual data and the metadata.
Immediately following verification, the data manager updates the database by storing each
dataset as a separate table in the database and relating it to the metadata via a dataset
ID code.
Data Archiving
The data manager archives the data and metadata by storing them
together in one ASCII text file per dataset. Using this format for data storage ensures
readability over the long term. Datasets are stored redundantly via the following four
methods to provide security against accidental data loss or destruction:
- hard disk on a Sun workstation (daily)
- system level back-ups on high-density 8-mm cartridges (daily)
- SGS archives on high-density 8-mm cartridges (quarterly)
- original data forms are copied to be microfilmed
Data Access
The metadata to all datasets are made publicly accessible as soon as they are received
from the investigator. Datasets are available to the public three years after the end date
of the study or following the investigator's publication of study results (whichever comes
first). Investigators are notified when data are scheduled to go public and may at this
point submit a written request to extend the access restriction on their data. A committee
reviews the request and determines whether the data may remain limited access and for how
long.
SGS WWW Site
Once a dataset is verified, the metadata will be automatically posted to our SGS WWW
site. Our goal is to use the SGS web site as the primary mode of information dissemination
for SGS scientists, the scientific community, and the public. (The data access portion
of our WWW site is currently being developed. We expect it to be operational by March 1,
1996. The rest of the site is fully functional now.)
The design of our web site will enable scientists everywhere as well as the public to
easily access our on-line data library (see Tables 1.2, 1.3, and 1.4 for lists of the
datasets currently managed by SGS-LTER). We have developed a standardized list of keywords
that forms the foundation for searching the SGS data library on our WWW site. A user may
either travel through the hierarchy of research categories to locate datasets of interest
or may utilize a keyword search. When viewing the data, the structure of our home page
allows the user the following options: 1. data files containing the actual rows and
columns of data; 2. the metadata text, describing the study and its datasets; 3. graphs of
the data which are generated "on the fly" as the user queries the dataset; 4.
graphic images associated with datasets such as experiment designs, maps, photos of the
study site etc.
We have recently assembled a comprehensive species list that is available on our home
site. The list includes every species found at the site and is broken down into 7
categories: plants, birds, mammals, arthropods, microarthropods, nematodes and herpetiles.
For a more complete description of our data management polices and procedures, please see
our data management policy posted on our Web site.
Data Management
Software
In the past, our data management system relied upon in-house software and programming
that provided leading edge technology in the field. Given the rapid improvement of
commercial database systems software, we have re-evaluated the efficiency of in-house
software development. We are currently in the process of migrating to a relational
database system (ORACLE). As we write this, we are moving our data into ORACLE, and
working to connect the ORACLE database directly to the WWW site using ORACLE html tools.
This custom tailored system and its associated applications will allow us to meet the
specific needs of SGS scientists.
We are very excited about the future of our data management system and our ability to
maximize the utility of our relational database management system (RDBMS) to our
scientists and the public. We envision a state-of-the art RDBMS capable of providing high
quality, long-term data storage and enhanced access to these data through a dynamic link
to the SGS WWW site. Our web site will no longer store static data files, but will allow
visitors to execute dynamic, on-the-fly data requests and analyses. By achieving a state
of the art RDBMS, we will greatly contribute to the long-term success of SGS as an
outstanding ecological research site well into the next century.
Daily GIS management accomplishes the collection of new data, extension of existing
spatial data, and maintenance of metadata. Expansion of the SGS to include the Pawnee
National Grasslands (PNG) allows us to acquire more ecologically complete landscape level
data. Data new to the SGS study area include: (1) prairie dog town locations, (2) swift
fox locations, (3) plant communities and associated range site descriptions, and (4) land
use and Conservation Reserve Program (CRP) treatments.
We utilize an extended ARC/INFO data library structure for analysis and daily
management of spatial data and metadata ( Fig.
5.3). These data are then made available across the WWW in several formats to
accommodate the needs of investigators. Since many users simply wish to view the data, map
views stored in a Map Atlas are accessible for viewing in raster format, and downloading
in black-and-white or color postscript format for local printing of high-quality graphics.
A new method for access and retrieval of historical field study sites is now being
adopted at SGS. This format stores each study location as a polygon in the Study Site
library layer. This new format will allow scientists and data managers to more easily
identify past and ongoing research based on plant or animal species, soil key words if
appropriate, researcher names, dates of study, and of course geographic proximity. This
structure will form a link between the GIS data library and the field data in the data
management system.
GIS metadata at our site conform to the Content Standards for Geospatial Metadata.
Approximately 75 percent of the metadata elements for this standard are appropriate and
used at our site, with approximately 20 percent of these being required elements. This
information is currently stored in relational database tables and accessible for internal
use and maintenance. Text output files are made available for outside and network users.
For new and recent data layers, the required metadata elements are complete. Metadata for
spatial data preceding the standard, although well-documented, may never have all of the
required elements we currently collect.
Research Support
GIS research analysis is conducted primarily using Arc/Info and IMAGINE software. These
GIS analyses range from plant-level scanning and analysis of root characteristics, to
plot-level identification of plant growth and mortality, to landscape-level and
landscape-level assessments of nutrient run-off.
Prior to the advent of WWW Internet viewers, we supported machine and software
independent views of our SGS Map Atlas through on-line map images. These map images could
be viewed within the Colorado State University network using Unix-based non-GIS viewing
tools, or transferred to remote locations via file transfer protocol for viewing. This
served primarily as a mechanism to facilitate communication and visualization for
research. These views are now supported and accessible through our WWW site.
Purchased, SGS-automated, and project data are saved in duplicate on 8 mm tapes in the
original format, with the second copy stored in a separate location from the first. Data
automated or developed in-house are stored in Arc/Info export format and are reviewed
yearly for compatibility maintenance. The final products of project data are stored and
reviewed in a similar manner. Final products are also stored together with all associated
work files on 8 mm tape in triplicate: two copies for our site and one copy for the
researcher. These are identified with the name of the project, date of completion and the
researchers' names.