|
|
|
GIS
/Spatial Statistics
|
 |
Geostatistics Workshop
Outline of Steps
Data
Prep
- Select data for analysis and extract data from database
- Using just forested plots (see discussion)
- Using 'importance value' (see example
.sql files used in NE for getting this from Oracle)
- Removing those plots with 0 total ba/acre because they
represent missing info on species occurrence or dominance...
- Check variables (did you specify all live trees instead of
all trees? Saplings included? Clearcuts included?...)
- View data spatially and Project coordinates from geographic
(lat/long) into a real-world projection (e.g. Albers, State Plane
or UTM)
- Prepare/reformat data (e.g. in Excel for import into ArcView)
-- video (xyztoshape),
Step-by-step
- View data spatially data (e.g. in ArcView) -- video
(tableaddevent), Step-by-step
- Project coordinates (e.g. using ArcView) -- video
(project), Step-by-step
and add newly projected XY coordinates to attribute table in ArcView
-- video (addcoor),
Step-by-step
- See the video (loadext)
for installing extensions
- See alternatives to Arcview
for the projection step (both free and commercial) if
you'd rather try something else.
Examine
data for errors and check data characteristics (a.k.a. EDA--exploratory
data analysis)
- For errors
- Checking for duplicate locations (e.g. in Excel or with
an arcview script (search for "duplicates" at the ESRI
Arcscripts page) and resolve/remove duplicates
- Visually checking for obvious location errors (e.g. in
ArcView or Surfer (classed post map))
- Remove missing values (e.g. in Excel)
- and any other expected errors in your region...
- For characteristics -- looking for trends, interesting patterns,
average point spacings, -- discussion
- Do summary stats (e.g. in Surfer or Excel)
- Check a classified map of the plots (e.g. in ArcView or
Surfer)
- Calculate min, max and average point spacings (variogram
report in Surfer)
- This will also involve looking at the variogram and correlogram
for spatial structure
Data
Prep--part 2
- Create the dataset for modeling -- this will include the plots
contained within the area of interest
- This can involve dividing the state or multi-state area into
smaller zones. You can do this by using existing areas
like ecoregions (Rachel's approach) or hand-drawn zones (Andy's
approach video) --
(discussion).
Step-by-step
- Create the buffered dataset for the interpolation -- this will
include the same plots as above plus all plots in an area bordering
the region of interest, usually up to a distance equivalent to
the size of the search radius used. This provides the algorithm
with all the neighboring plot data available when it is estimating
those cells/locations on the edge of the area of interest, thereby
reducing an edge effect.
- Prepare data for normal-scoring -- remove duplicate values
(discussion)
Need to eliminate duplicates using a duplicate elimination procedures
-- either using rankdupe or a procedure using Excel... -- see
step-by-step procedure
for Excel
- Normal-scoring the data -- step-by-step
Note: this is not necessary if you
are going to run OK on just the regular data, krig in Surfer (though
you might consider using some transformation to normalize the data--see
discussion/links below/ai-geostats homepage for discussion) or if
you are going to run IK or SICS. In the first case (OK), you
would calculate and model the correlogram of the original data,
and in the last two cases (IK and SICS), you would calculate and
model the indicator variogram/correlogram from the original data.
Modeling
the variograms/correlograms
Sequential
Gaussian Conditional Simulation, definition/description
- Choice of simulation parameters -- guidelines, parameter
entry example parameter
file
- Make sure to use original data (not the nscored data) as
the input variable
- Running the simulation -- software sgsim902 -- see step-by-step
instructions on running GSLIB routines
- Calculating summary stats from the 100 realizations -- software
postme99, Step-by-step
- Using a nonforest mask to remove those output points that have
been modeled in nonforest areas, Step-by-step
- Choosing the percentile to use as the estimate -- guidelines,
discussion, illus
- Based on which percentile falls closest to the county-level
stats generated from FIA plots, Step-by-step
- Choosing the range to use as the uncertainty (you want to use
an appropriate measure of uncertainty for your goals (e.g., CV
tells you something different than variance, or iqr))-- guidelines,
discussion
Accuracy
Assessment
- Plot-pixel analysis and whether plot values fall within the
measure of uncertainty presented -- illus graphic
- Reproduction of the univariate statistics, and the variogram/correlogram
--checking for algorithm performance/bias -- illus-histo, illus-vario,
step-by-step, video (mse.avi)
- Check and report local statistics of the output map vs. the
local stats of the original dataset (e.g. local averages, local
stdevs) -- illus, video (mse,
allstatsprep, zonedefinitionarcview, avsumstats, postmegridpoint,
etc. .avi)
- Check and report county statistics derived from the output
map vs. county statistics derived from the original data -- illus
- Can also do plot-pixel analysis and examine how close the estimates
come to the sample data (e.g. R2), but this has substantial
limitations...discussion
Presentation
(map creation) and Distribution (putting maps and data up on the web)
- How to make a finished map
- Step-by-step for using ArcView --(we'll
do this for both a point-symbol version and a raster version...)
videos
- Joining the separately interpolated regions together again
(using the arc grid command "mosaic")
- How to put data up on the web using Imapper software (see software
section)
Documentation
of the datasets
- Recording the data used (pretty specifically, e.g. the details
in the .sql batch file), and the routine used
- Recording the models used, and algorithm parameters used (search
radius, etc.)
- Recording any zones used
- Recording the method/decision rule used for choice of estimate
and uncertainty
- Possible discussion of the opportunities and limitations of
this dataset -- see/refer to some of the GAP notes on this as
they did a rather nice, thorough job of this...
An
illustration of some potential applications of the datasets
- Species risk maps -- illus
- Use of the output from several species distributions together,
as well as several percentiles in a single map to get an idea
of where to look for oaks in Pennsylvania... -- illus
- Labeling for classes of stand occupancy and acceptable and
unacceptable uncertainty for the project at hand... -- illus
Previous: Flow
chart of General Process | Next:
Definitions & Descriptions
|
|