Species Distribution Maps - Methodology
Andrew Lister and
How were the maps made?
The Forest Inventory and Analysis (FIA) Units of the USDA Forest
Service seek to improve the understanding of the nation's forests
with respect to their quantity, distribution, use and health. There
has recently been a great deal of interest in the production of
high quality, spatially resolute maps of forest inventory information,
such as species distributions (Riemann Hershey et al. 1997, Moeur
and Riemann Hershey 1999, Iverson et al. 1999), pockets of high-value
commercial trees (King 2000), and forest distribution (Zhu 1994).
Basics of Geostatistics
In order to create these maps, we implemented a technique called
Sequential Gaussian Conditional Simulation (SGCS). SGCS is a method
from the field of geostatistics, a branch of statistics that is
used to characterize spatial distributions and to produce estimates
of variables at unsampled locations. The idea behind geostatistics
is quite simple: samples taken closer together are more similar
on average than samples taken farther apart. For example, if you
have a grid of points over an area, and measure some variable at
you might find that on average, points that are right next to each
other have values that are more similar than points that are 2 or
3 units apart. A lot of environmental factors like soils,
climate, tree growth, species distribution, topography, etc. show
Using this principle, we can make a mathematical model (variogram)
of how dissimilarity changes with distance:
As the separation distance increases, the dissimilarity,
on the vertical axis, increases and then levels off. This mathematical
model is then used in a procedure known as kriging (named after
a statistician named Krige) to help estimate values for variables
at unknown locations. SGCS uses kriging in a simulation framework
to produce distributions of possible estimates at a given location.
One advantage of kriging is that the mathematical
model (variogram), which is really a regression equation, can be
used to help estimate the value of an unknown point separated by
a known distance from other known points. Weights are placed on
known points surrounding a point to be estimated based on the value
of the variogram associated with the separation distance, and the
estimate is then produced by finding a weighted average of the surrounding
points’ values. In other words, since we know how close a point
to be estimated is to a known point, and we know from the model
how similar it is to that known point, we know how much influence
the known point’s value should have on that estimate’s value.
Kriging and its variants (e.g., cokriging, kriging
with local means, indicator kriging, and SGCS) have been used for
years in the mining industry, and only recently in the forestry
community. To produce the species importance maps on these pages,
we implemented SGCS in the following manner:
- We retrieved basal area data from our FIA plot database.
- We calculated importance for each species for each plot as the
percentage of a given plot’s total basal area that is made up
of a given species’ basal area (i.e., relative importance, or
relative dominance). We only used forested plots in the
analysis, and we included saplings.
- Next, we performed some data preprocessing, and created the
mathematical model describing the relationship between similarity
and separation distance (i.e., the variogram).
- The next step was to implement the simulation: we created
multiple (100) kriged maps using a combination of original data
values and previously simulated data values. By doing this,
at each location on the map (each pixel of the final map image),
we created a distribution made up of 100 estimates.
- We next analyzed this distribution of 100 estimates with the
goal of finding one that best describes the FIA data’s county-level
estimates. For example, from this distribution of estimates,
we might have chosen the mean to report as the final value.
If various assumptions of the technique are met, this mean is
the best estimate given everything we know about the data (see
the histograms below).
- Finally, we calculated what is called the interquartile range
(IQR) of the distribution of estimates (see below). This
statistic is calculated by finding the range of estimates between
the 25th and the 75th percentiles of a data distribution.
For example, if the estimate below which 25% of the estimates
fall (the 25th percentile) is .31, and the value below which 75%
of the estimates fall (75th percentile) is .86, then the IQR is
.86 - .31, or .55 (55%). Large IQR’s indicate wide distributions
(less confidence in the reported estimate) and small IQR’s indicate
narrow distributions (more confidence in the reported estimate).
One useful aspect of the simulation procedure is that the error
estimate for a given point is not only based on the density of surrounding
samples, but also on how similar those surrounding samples are.
The resulting “error maps”, which are associated with the estimates,
are useful when evaluating the final map product. Furthermore,
we don’t necessarily need to choose the mean value for our final
estimate; we can tailor it to the goals of our study by choosing
the median, or any other percentile.
These maps are useful as graphical representations of the spatial
variation in species importance, as inputs into geospatial models,
or as tools that can help guide other studies. We would like
to emphasize that these are estimates, and that there are varying
levels of uncertainty associated with the estimates. The IQR
maps displayed on the pages can give you an idea of which estimates
are more uncertain than others. We recommend that before using
these maps for any purpose, you contact us for advice and for recommendations
on appropriate uses of the information. We can make no guarantees
as to the appropriateness of the map for any purpose!
For more information on the Sequential Gaussian Conditional Simulation
procedure, see our geostatistics
workshop webpage, Ed Isaaks’s
page, Deutsch and Journel (1998), Isaaks and Srivastava (1989),
Goovaerts (1997), and others of the below references.
Goovaerts, P. 1997. Geostatistics For Natural Resources Evaluation.
Oxford University Press, New York. 483p.
Isaaks, EH and RM Srivastava. 1989. An Introduction to Applied
Geostatistics. Oxford University Press, New York. 561p.
Iverson LR, Prasad AM, Hale BJ, and EK Sutherland. 1999. An atlas
of current and potential future distributions of common trees of
the eastern United States. General Technical Report NE-265. Newtown
Square, PA: USDA Forest Service, Northeastern Research Station.
King, SL. 2000. Sequential Gaussian simulation vs. simulated annealing
for locating pockets of high-value commercial trees in Pennsylvania.
Annals of Operations Research 95: 177-203.
Lister, A, R Riemann, and M Hoppus. 2000. Use of regression and
geostatistical techniques to predict tree species distributions
at regional scales. 4th International Conference on Integrating
GIS and Environmental Modeling (GIS/EM4): Problems, Prospects
and Research Needs. Banff, Alberta, Canada, September 2-8,
Lister, AJ, Riemann, R and M Hoppus. 2000. A nonparametric
geostatistical approach for estimating species importance.
The 2nd Annual Forest Inventory and Analysis (FIA) Symposium.
Salt Lake City, Utah, October 17-18, 2000.
Moeur, M, and R Riemann Hershey. 1999. Preserving spatial and attribute
correlation in the interpolation of forest inventory data. In: Lowell
K, Jaton A, editors. Spatial accuracy assessment: Land information
uncertainty in natural resources. Chelsea, MI: Ann Arbor Press.
Riemann Hershey R, Ramirez MA, and DA Drake. 1997. Using geostatistical
techniques to map the distribution of tree species from ground inventory
data. In: Gregoire, T. et al., editors. Modeling longitudinal and
spatially correlated data: methods, applications, and future directions.
Lecture notes in statistics 122. New York: S. Verlag. p 187-198.
Rossi, RE, Mulla, DJ, Journel, AG and EH Franz. 1992. Geostatistical
tools for modeling and interpreting ecological spatial dependence.
Ecological Monographs 62(2):277-314.
Zhu, Z. 1994. Forest Density Mapping in the Lower
48 States: A Regression Procedure. USDA Research Paper
SO-280, Southeastern Forest Experiment Station, New Orleans, LA.