|
|
|
GIS
/Spatial Statistics
|
 |
Geostatistics Workshop
Definitions and Descriptions
NOTE -- click on the images to view
them at a larger scale
Variograms/correlograms
-- measures of spatial continuity
Description: The variogram
is essentially depicting the variation between data values at
increasing distances away from each other. Since we're trying
to estimate one value from its neighbors, what we're looking for
here is structure--something we can model. It is this model
of how (e.g.) sugar maple values vary with distance away from
any single point that is used in these interpolation methods.
More detailed description:
The 3 most common measures of spatial continuity in geostatistics
are the variogram, the covariance function, and the correlogram.
They are all functions that numerically characterize the strength
of association between observations of the response variable as
a function of distance and possibly direction. They assume that
the spatial autocorrelation doesn’t depend on where the pair of
observations is located, just on the distance between them (and
possibly on their orientation relative to each other—we look at
this via an anisotropic variogram). They are basically calculated
by--the values of all pairs X distance apart are compared and
the variance, covariance, or correlation value are calculated
for that separation (lag) distance. Plotting all of these
against distance = the variogram/correlogram/covariance function
(see web site; and/or see Andy’s description using the h-scatterplot
to illustrate this more clearly). Differences between the measures:
There are some subtle differences between the measures.
The covariance is similar to variogram except that it standardizes
for local means. And the correlogram standardizes for both
local means and local variations. Where these are noticeably
different, this may indicate a lack of stationarity (i.e. where
local means and variances vary across the dataset). Viewing
and modeling the correlogram (called 'autocorrelation' in Surfer)
is generally believed to be the best of the 3 to use in the interpolation.
A variogram or correlogram is just a single view of the data.
Varying the lag spacing and/or the range of distances being displayed
in the graph often give a slightly different picture that may
provide more information than your first drawing. In addition,
sometimes the strong univariate characteristics of your data,
such as the large number of 0's and thus highly skewed distribution,
can mask some of the spatial structure that is really in the data.
In this case, viewing the variogram or correlogram after transforming
the data, such as normal-scoring, can reveal a variogram/correlogram
with substantially more visible spatial structure. In general,
"it is easy to mask spatial continuity by a poor choice of lag
spacing, direction angles, or a poor handling of outlier values.
It is rare to generate spatial continuity that does not exist."
(Deutsch and Journel, 1998, p. 58-59*).
*Note: "there are two notable exceptions
to this statement: (1) clustered data may cause certain
measure of spatial continuity to show an artificial structure
[not common with FIA plots], and (2) the combination of severe
anisotropy and large angular tolerance can artificially increase
the range of correlation in the direction of minimum continuity
[several anisotropy is probably also not that common...]"
(Deutsch and Journel, 1998, p. 58-59).
Illustration:
|
 |
|
important parts of the variogram/correlogram
|
different variogram views of the same data
|
Ordinary
Kriging (OK)
Brief description:
OK estimates are essentially weighted moving averages of the sample
data values – taking the distance, direction and redundancy of
neighboring points into account using that model defined from
the variogram. It is designed to be the best linear unbiased
estimate.
Illustrations:
|
|
example output
|
Features: (assumptions,
outputs, and characteristics)
- It honors the overall (global) mean (but not the sample histogram
or sample variogram/correlogram)
- It honors the sample data values
- The result is distinctly smoothed and pockets of high values
(or a single high value in areas of sparse ground points) can
have a big effect on the final output
- It reports an estimation variance, but this is not usually
a very useful measure of the true uncertainty of the estimate
because it does not reflect the local data values, only the
number and proximity of sample points used in the estimate
- Assumptions – more than IK, less than SGCS -- i.e. it performs
better the more normal the data is... (???)
Sequential
Gaussian Conditional Simulation (SGCS)
Brief description: Instead
of coming up with a single best estimate, CS comes up with many
different, equally probable, alternative realizations.
From this set of estimates, an entire distribution function
can be built for each cell, representing the range of possible
values.
More detailed description:
SGCS first transforms the data into a normal distribution.
(Thus it uses the model of the variogram/correlogram calculated
from the normal-scored data.) It then selects one grid
node at random and kriges the value at that location.
It then draws a random number from a normal (Gaussian) distribution
that has been constructed to have a variance equivalent to the
kriged variance and a mean equivalent to that kriged value.
This value (the random value chosen from that distribution)
is the simulated value for that grid node. It then selects
another grid node at random and repeats, including all previously
simulated nodes in the kriging calculation. This preserves
the spatial variability as modeled in the variogram. When
all nodes have been simulated for an individual realization,
it then backtransforms the values to the original distribution.
This gives us the first realization. It then repeats for
all the other realizations using a different random number sequence.
Illustrations:
 |
 |
|
Example realizations
|
Summary statistics are calculated from these distributions
for each cell
|
 |
 |
|
Example maps of a single realization,
the chosen estimate and a map of the uncertainties associated
with that estimate
|
The plus and minus uncertainty can be
different, (thus you might want to maintain these separately)
|
|
|
|
An illustration of the differences between
the OK output and the SGCS output
|
Features:
Assumptions:
- Requires a multi-normal distribution of the data.
Univariate normality is achieved by normal-scoring the data.
Bivariate normality (i.e. normality between points, as revealed
in the variogram) can be checked (see illus), and higher-moment
normality can only be assumed from there
- Stochastic modeling (such as SGCS is) is particularly
useful when there is a belief in some ‘space of uncertainty’
and that this technique can produce outcomes that sample this
space fairly. (and similarly, this assumes that each
realization contains a realistic level of spatial heterogeneity
too)
Outputs and Characteristics
- Each realization maintains both the sample histogram and
variogram (univariate and bivariate statistics)
- It honors the sample data value
- Where there is higher local variation, there will be higher
uncertainty in the estimates.
- Produces a directly understandable measure of uncertainty
with the estimate
- Each realization typically contains a realistic level of
spatial heterogeneity
- The characteristics of the summary dataset can include:
- Contains closer to the full range of data values than
OK
- Retains some indication of the local variability
Indicator
Kriging (IK)
Brief Description: IK is
essentially Ordinary Kriging except that instead of using the
%ba/acre values and calculating weighted means using those values,
it divides the data into only 2 classes – above and below a
designated cutoff value of interest and calculates the probability
that that condition occurs. IK is the use of OK using
a separate model for each cutoff
Illustrations:
|
|
|
example output
|
incorporating soft information
|
Features:
(assumptions, outputs, and characteristics)
-
Provides, for each estimated cell, the probability
that it falls below that cutoff value
-
This probability can be used by the user
to reflect on which side s/he wants to err…
-
Is particularly useful if there is a specific
threshold value(s) of interest
-
Makes no assumptions about the data distribution
-
The results of several cutoffs can be combined
to create a single map of several %ba/acre classes (like the
OK map) – the disadvantage of this is that each cutoff has
to be modeled and interpolated separately
-
Can easily incorporate soft/ancillary information
into the indicator kriging via prior probabilities
Previous: Outline
of Steps | Next: Discussion
|
|