USDA Forest ServiceSkip navigational links  
 Northeastern Forest Inventory & Analysis
 Go to: NE FIA Home Page
 Go to:
 Go to:
Go to:
 Go to:
 Go to:
Go to:
 Go to: Publications & Products
 Go to:
Go to: FIA Site Map
 Go to: NE Station
 Go to:
 Go to:

Go to:Introduction

Go to:Flowchart of Process

Viewing:Outline of Steps

Viewing:Definitions and Descriptions

Go to:Discussion

Go to:Step by Step

Go to:Examples & Downloads

Go to:Videos

Go to:Arc Scripts

Viewing:Contact: Andrew Lister

 

Forest Inventory & Analysis Program
11 Campus Blvd.
Suite 200
Newtown Square, PA 19073-3294

(610)557-4075
(610)557-4250 FAX
(610)557-4132 TTY/TDD

 United States Department of Agriculture Forest Service. USDA logo which links to the department's national site. Forest Service logo which links to the agency's national site.
 

GIS /Spatial Statistics

Geostatistics Workshop

Definitions and Descriptions

NOTE -- click on the images to view them at a larger scale

Go to:Variograms/correlograms -- measures of spatial continuity

Description:  The variogram is essentially depicting the variation between data values at increasing distances away from each other.  Since we're trying to estimate one value from its neighbors, what we're looking for here is structure--something we can model.  It is this model of how (e.g.) sugar maple values vary with distance away from any single point that is used in these interpolation methods.

More detailed description:  The 3 most common measures of spatial continuity in geostatistics are the variogram, the covariance function, and the correlogram.  They are all functions that numerically characterize the strength of association between observations of the response variable as a function of distance and possibly direction. They assume that the spatial autocorrelation doesn’t depend on where the pair of observations is located, just on the distance between them (and possibly on their orientation relative to each other—we look at this via an anisotropic variogram).  They are basically calculated by--the values of all pairs X distance apart are compared and the variance, covariance, or correlation value are calculated  for that separation (lag) distance.  Plotting all of these against distance = the variogram/correlogram/covariance function (see web site; and/or see Andy’s description using the h-scatterplot to illustrate this more clearly). Differences between the measures:  There are some subtle differences between the measures.  The covariance is similar to variogram except that it standardizes for local means.  And the correlogram standardizes for both local means and local variations.  Where these are noticeably different, this may indicate a lack of stationarity (i.e. where local means and variances vary across the dataset).  Viewing and modeling the correlogram (called 'autocorrelation' in Surfer) is generally believed to be the best of the 3 to use in the interpolation. A variogram or correlogram is just a single view of the data.  Varying the lag spacing and/or the range of distances being displayed in the graph often give a slightly different picture that may provide more information than your first drawing.  In addition, sometimes the strong univariate characteristics of your data, such as the large number of 0's and thus highly skewed distribution, can mask some of the spatial structure that is really in the data.  In this case, viewing the variogram or correlogram after transforming the data, such as normal-scoring, can reveal a variogram/correlogram with substantially more visible spatial structure.  In general, "it is easy to mask spatial continuity by a poor choice of lag spacing, direction angles, or a poor handling of outlier values.  It is rare to generate spatial continuity that does not exist."  (Deutsch and Journel, 1998, p. 58-59*).

*Note:  "there are two notable exceptions to this statement:  (1) clustered data may cause certain measure of spatial continuity to show an artificial structure [not common with FIA plots], and (2) the combination of severe anisotropy and large angular tolerance can artificially increase the range of correlation in the direction of minimum continuity [several anisotropy is probably also not that common...]"  (Deutsch and Journel, 1998, p. 58-59).

Illustration:
 

Variogram
Variogram examples
important parts of the variogram/correlogram
different variogram views of the same data

Go to:Ordinary Kriging (OK)

Brief description:  OK estimates are essentially weighted moving averages of the sample data values – taking the distance, direction and redundancy of neighboring points into account using that model defined from the variogram.  It is designed to be the best linear unbiased estimate.
Illustrations:
Ordinary Kriging
example output

Features:   (assumptions, outputs, and characteristics)

  • It honors the overall (global) mean (but not the sample histogram or sample variogram/correlogram)
  • It honors the sample data values
  • The result is distinctly smoothed and pockets of high values (or a single high value in areas of sparse ground points) can have a big effect on the final output
  • It reports an estimation variance, but this is not usually a very useful measure of the true uncertainty of the estimate because it does not reflect the local data values, only the number and proximity of sample points used in the estimate
  • Assumptions – more than IK, less than SGCS -- i.e. it performs better the more normal the data is...  (???)

Go to:Sequential Gaussian Conditional Simulation (SGCS)

    Brief description:  Instead of coming up with a single best estimate, CS comes up with many different, equally probable, alternative realizations.  From this set of estimates, an entire distribution function can be built for each cell, representing the range of possible values.

    More detailed description:  SGCS first transforms the data into a normal distribution.   (Thus it uses the model of the variogram/correlogram calculated from the normal-scored data.)  It then selects one grid node at random and kriges the value at that location.  It then draws a random number from a normal (Gaussian) distribution that has been constructed to have a variance equivalent to the kriged variance and a mean equivalent to that kriged value.  This value (the random value chosen from that distribution) is the simulated value for that grid node.  It then selects another grid node at random and repeats, including all previously simulated nodes in the kriging calculation.  This preserves the spatial variability as modeled in the variogram.  When all nodes have been simulated for an individual realization, it then backtransforms the values to the original distribution.  This gives us the first realization.  It then repeats for all the other realizations using a different random number sequence.

    Illustrations:

    SGCS Summary statistics
    Example realizations
    Summary statistics are calculated from these distributions for each cell
    Maine map Uncertainty
    Example maps of a single realization, the chosen estimate and a map of the uncertainties associated with that estimate
    The plus and minus uncertainty can be different, (thus you might want to maintain these separately)
    compare
    An illustration of the differences between the OK output and the SGCS output

     

Features:

    Assumptions:

    • Requires a multi-normal distribution of the data.  Univariate normality is achieved by normal-scoring the data.  Bivariate normality (i.e. normality between points, as revealed in the variogram) can be checked (see illus), and higher-moment normality can only be assumed from there
    • Stochastic modeling (such as SGCS is)  is particularly useful when there is a belief in some ‘space of uncertainty’ and that this technique can produce outcomes that sample this space fairly.  (and similarly, this assumes that each realization contains a realistic level of spatial heterogeneity too)

      Outputs and Characteristics
    • Each realization maintains both the sample histogram and variogram (univariate and bivariate statistics)
    • It honors the sample data value
    • Where there is higher local variation, there will be higher uncertainty in the estimates.
    • Produces a directly understandable measure of uncertainty with the estimate
    • Each realization typically contains a realistic level of spatial heterogeneity
    • The characteristics of the summary dataset can include:
    • Contains closer to the full range of data values than OK
    • Retains some indication of the local variability

Go to:Indicator Kriging (IK)

Brief Description:  IK is essentially Ordinary Kriging except that instead of using the %ba/acre values and calculating weighted means using those values, it divides the data into only 2 classes – above and below a designated cutoff value of interest and calculates the probability that that condition occurs.  IK is the use of OK using a separate model for each cutoff

Illustrations:

Indicator Kriging
compare
example output
incorporating soft information

 

Features:  (assumptions, outputs, and characteristics)

  • Provides, for each estimated cell, the probability that it falls below that cutoff value
  • This probability can be used by the user to reflect on which side s/he wants to err…
  • Is particularly useful if there is a specific threshold value(s) of interest
  • Makes no assumptions about the data distribution
  • The results of several cutoffs can be combined to create a single map of several %ba/acre classes (like the OK map) – the disadvantage of this is that each cutoff has to be modeled and interpolated separately
  • Can easily incorporate soft/ancillary information into the indicator kriging via prior probabilities

Previous: Outline of Steps | Next: Discussion