USDA Forest ServiceSkip navigational links  
 Northeastern Forest Inventory & Analysis
 Go to: NE FIA Home Page
 Go to:
 Go to:
Go to:
 Go to:
 Go to:
Go to:
 Go to: Publications & Products
 Go to:
Go to: FIA Site Map
 Go to: NE Station
 Go to:
 Go to:

Go to:Introduction

Go to:Flowchart of Process

Viewing:Outline of Steps

Go to:Definitions and Descriptions

Go to:Discussion

Go to:Step by Step

Go to:Examples & Downloads

Go to:Videos

Go to:Arc Scripts

Viewing:Contact: Andrew Lister

 

Forest Inventory & Analysis Program
11 Campus Blvd.
Suite 200
Newtown Square, PA 19073-3294

(610)557-4075
(610)557-4250 FAX
(610)557-4132 TTY/TDD

 United States Department of Agriculture Forest Service. USDA logo which links to the department's national site. Forest Service logo which links to the agency's national site.
 

GIS /Spatial Statistics

Geostatistics Workshop

Outline of Steps

Go to:Data Prep
  • Select data for analysis and extract data from database
    • Using just forested plots (see discussion)
    • Using 'importance value' (see example .sql files used in NE for getting this from Oracle)
      • Removing those plots with 0 total ba/acre because they represent missing info on species occurrence or dominance...
    • Check variables (did you specify all live trees instead of all trees?  Saplings included? Clearcuts included?...)
  • View data spatially and Project coordinates from geographic (lat/long) into a real-world projection (e.g. Albers, State Plane or UTM)
    • Prepare/reformat data (e.g. in Excel for import into ArcView) -- video (xyztoshape), Step-by-step
    • View data spatially data (e.g. in ArcView) -- video (tableaddevent), Step-by-step
    • Project coordinates (e.g. using ArcView) -- video (project), Step-by-step

    • and add newly projected XY coordinates to attribute table in ArcView -- video (addcoor), Step-by-step
      • See the video (loadext) for installing extensions
    • See alternatives to Arcview for the projection step  (both free and commercial) if you'd rather try something else.
Go to:Examine data for errors and check data characteristics (a.k.a. EDA--exploratory data analysis)
  • For errors
    • Checking for duplicate locations (e.g. in Excel or with an arcview script (search for "duplicates" at the ESRI Arcscripts page) and resolve/remove duplicates
    • Visually checking for obvious location errors (e.g. in ArcView or Surfer (classed post map))
    • Remove missing values (e.g. in Excel)
    • and any other expected errors in your region...
  • For characteristics -- looking for trends, interesting patterns, average point spacings, -- discussion
    • Do summary stats (e.g. in Surfer or Excel)
    • Check a classified map of the plots (e.g. in ArcView or Surfer)
    • Calculate min, max and average point spacings (variogram report in Surfer)
    • This will also involve looking at the variogram and correlogram for spatial structure
Go to:Data Prep--part 2
  • Create the dataset for modeling -- this will include the plots contained within the area of interest
    • This can involve dividing the state or multi-state area into smaller zones.  You can do this by using existing areas like ecoregions (Rachel's approach) or hand-drawn zones (Andy's approach video) -- (discussion).

    • Step-by-step
  • Create the buffered dataset for the interpolation -- this will include the same plots as above plus all plots in an area bordering the region of interest, usually up to a distance equivalent to the size of the search radius used.  This provides the algorithm with all the neighboring plot data available when it is estimating those cells/locations on the edge of the area of interest, thereby reducing an edge effect.
  • Prepare data for normal-scoring -- remove duplicate values (discussion)
    • Need to eliminate duplicates using a duplicate elimination procedures -- either using rankdupe or a procedure using Excel... -- see step-by-step procedure for Excel
  • Normal-scoring the data -- step-by-step


  • Note:  this is not necessary if you are going to run OK on just the regular data, krig in Surfer (though you might consider using some transformation to normalize the data--see discussion/links below/ai-geostats homepage for discussion) or if you are going to run IK or SICS.  In the first case (OK), you would calculate and model the correlogram of the original data, and in the last two cases (IK and SICS), you would calculate and model the indicator variogram/correlogram from the original data.
Go to:Modeling the variograms/correlograms Go to:Sequential Gaussian Conditional Simulation, definition/description
  • Choice of simulation parameters -- guidelines, parameter entry example parameter file
    • Make sure to use original data (not the nscored data) as the input variable
  • Running the simulation  -- software sgsim902 -- see step-by-step instructions on running GSLIB routines
  • Calculating summary stats from the 100 realizations -- software postme99, Step-by-step
  • Using a nonforest mask to remove those output points that have been modeled in nonforest areas, Step-by-step
  • Choosing the percentile to use as the estimate -- guidelines, discussion, illus
    • Based on which percentile falls closest to the county-level stats generated from FIA plots, Step-by-step
  • Choosing the range to use as the uncertainty (you want to use an appropriate measure of uncertainty for your goals (e.g., CV tells you something different than variance, or iqr))-- guidelines, discussion
Go to:Accuracy Assessment
  • Plot-pixel analysis and whether plot values fall within the measure of uncertainty presented -- illus graphic
  • Reproduction of the univariate statistics, and the variogram/correlogram --checking for algorithm performance/bias -- illus-histo, illus-vario, step-by-step, video (mse.avi)
  • Check and report local statistics of the output map vs. the local stats of the original dataset (e.g. local averages, local stdevs) -- illus, video (mse, allstatsprep, zonedefinitionarcview, avsumstats, postmegridpoint, etc. .avi)
  • Check and report county statistics derived from the output map vs. county statistics derived from the original data -- illus
  • Can also do plot-pixel analysis and examine how close the estimates come to the sample data (e.g. R2), but this has substantial limitations...discussion
Go to:Presentation (map creation) and Distribution (putting maps and data up on the web)
  • How to make a finished map
    • Step-by-step for using ArcView --(we'll do this for both a point-symbol version and a raster version...) videos
    • Joining the separately interpolated regions together again (using the arc grid command "mosaic")
  • How to put data up on the web using Imapper software (see software section)
Go to:Documentation of the datasets
  • Recording the data used (pretty specifically, e.g. the details in the .sql batch file), and the routine used
  • Recording the models used, and algorithm parameters used (search radius, etc.)
  • Recording any zones used
  • Recording the method/decision rule used for choice of estimate and uncertainty
  • Possible discussion of the opportunities and limitations of this dataset -- see/refer to some of the GAP notes on this as they did a rather nice, thorough job of this...
Go to:An illustration of some potential applications of the datasets
  • Species risk maps -- illus
  • Use of the output from several species distributions together, as well as several percentiles in a single map to get an idea of where to look for oaks in Pennsylvania...  -- illus
  • Labeling for classes of stand occupancy and acceptable and unacceptable uncertainty for the project at hand...  -- illus

Previous: Flow chart of General Process | Next: Definitions & Descriptions