 |
Geostatistics Workshop
Step-by-Step
Preparing/reformatting
data in Excel for ArcView
Now, with your lat, long, id, attribute1, attribute2,
etc... file, prepare the data in a spreadsheet like Excel for conversion
to a shapefile. This is not a vital step, but is recommended
because it allows you to visualize the data, check for obvious location
errors, look for basic spatial patterns or trends in the data, etc.
You should end up with a comma-delimited table with one header row,
and decimal degrees X and Y columns. Get the video
(xyztoshape.avi).
-- back to Outline--.
Importing
and viewing data in ArcView
In ArcView, you want to add the table, and then choose
"view--add event theme", choosing the decimal degrees longitude
as the X and decimal degrees latitude as the Y. you will then
be able to see your plots in geographic coordinates. You will
need to project them into a projection such as Albers, State Plane,
or UTM, however, in order to have the distances between the points
be measured accurately across the dataset. In other words,
you cannot accurately do modeling with x,y data in decimal degrees!
Get the video (tableaddevent.avi).
-- back to Outline--
Projecting
coordinates using ArcView
You will need to check to see if you have "projector",
or if you have av 3.2, the projection wizard, available. If
not, you will need to download it here, and place all of the files
in the "ext32" directory, wherever you have arcview installed.
If you are using unix arcview, you should automatically have projector
as an option. If not, you need to have someone copy the projector
extension to the $AVHOME/ext directory, wherever arcview is installed
on your system. **The same goes for the other extensions referenced
in this document (xtools, dnr tools, etc.); all extensions can be
handled the same way! Get the video
(project.avi).
-- back to Outline--
Adding
newly projected XY coordinates to attribute table in ArcView
Now that your data are projected, you need to use a script,
or a program written in arcview's "avenue" language, to add the
x, y coordinates to the attribute table you imported. You
have a shapefile now, but it does not explicitly have the x,y coordinates
in albers projection in tabular form. You want to load, compile,
and execute the addcoor.ave script, available from the scripts
page. Then, you will export this text file so that you
can bring it into Surfer, and then begin modelling. Get the
video (addcoor.avi).
-- back to Outline--
Create
the dataset for modeling
- Define zones using the drawing tool or an existing ecoregion
(or other) coverage. If using user-defined areas that you
just drew, convert the drawings to shapefiles using xtools "convert
graphics to shape".
- With the point theme active do "theme-select by theme", and
select the features of the active theme that 'have their center
in' the selected features of the zone/ecoregion theme (the polygon
coverage or shapefile) of interest.
- Export the point attribute table (with the records still selected
from the above operation) as 'delimited text (only the selected
records will be exported) and import into Excel. Get the
video (zonedefinitionarcview.zip).
-- back to Outline--
Create
the dataset for interpolation
- create another polygon coverage or shapefile in which the polygon
of interest (i.e. the ecoregion or user-defined area) is larger
by a certain buffer zone -- ideally equivalent to the search radius
that will be used in the interpolation.
- and with that selected, do the same steps as above to create
this second dataset. Get the video
(zonedefinitionarcview.zip).
-- back to Outline--
Eliminating
duplicate values (prior to nscoring) and NSCORE
- Organize rows of interest in Excel. Sort by row of interest
(species). Remove missing values (NOT necessarily 0’s; if
0 is a measurement, consider leaving it. One needs to determine
the specific product one wishes to produce (e.g., the importance
of red maple on forest land, or the importance of red maple on
all land. If the former, include zero observations for red
maple on forested plots, but don’t include zeros which are nonforest
plots; if the latter, don’t omit any zero’s.))
- Make a column of tiny, incrementally increasing numbers, then
a column of random numbers. Sort "tiny" and "rand" by "rand".
- Then make a new column, species_dupe, and use the formula "if
species = cell above it, then species + tiny; otherwise, just
species". This will add a tiny amount to each duplicate
value, randomly. ***Remember to copy and paste special the
values of species_dupe when done
- Delete away "tiny" and "rand", put into simplified geo eas
format, and save as MS-Dos text in Wordpad or Word.
- Simplified geo eas format is this:
title
#vars
varname1
varname2
varname i
data1
data2 .... datai
data1
data2 ... datai
etc,
e.g. data from zone3 red maple
4
x
y
basal_area
volume
43000003
35999998 120 4000
43883999
35999967 80 2399
etc.
- Edit nscore.par (see example parameter file included with program
in the software seciton), or see the gslib
help page. ****NOTE It's best to run the gslib
programs from the dos prompt, i.e., open the msdos prompt, and
then navigate to where you've got the nscore.exe stored, and then
type "nscore", and follow prompts.
Bringing
the data (output from nscore) into Surfer, and viewing and modeling
the variogram/correlogram
- Bring the *.out file from nscore into Surfer’s spreadsheet
(treat consecutive delimiters as one, tab and space delimited
checked)
- Remove header, and save (as original file name from arcview
export (basically, it keeps #’s of extra files you generate to
a minimum; you won’t need the old file anymore anyway.)
- After importing the data into Surfer and checking to make sure
everything's ok (view the worksheet), you can create a variogram
by doing grid -- variogram -- new variogram. Be sure to
manipulate the maximul lag distance in the new variogram dialogue
under options (the default is often rather short). Remember,
you want to have a large enough distance to describe the entire
form of the variogram, but have an adequate number of pairs of
points (at least 30, but preferably hundreds) at the long distances
to have faith in your semivariance value. Choose the 'autocorrelation'
option (this will display the correlogram--discussion). Get the
video (variomodelx.avi).
- Do variogram modeling (see guidelines) using the nscored data.
- Record variogram parameters (using snagit for this is handy)
and paste into excel; I like to paste the variogram graph itself,
the experimental tab, and the model tab, so later I can exactly
reproduce a variogram that I made with minimum effort.
-- back to Outline--
Variogram
Modeling Steps/Tips/Guidelines-- (see the videos
for this)
viewing and modeling the omnidirectional correlogram
('autocorrelation estimator type' in Surfer)
For all steps here, see the Surfer Help. It’s VERY
useful. 1. Do Grid-Variogram-New Variogram in Surfer; choose correct
columns, update statistics, and check on range of x and y in statistics;
now, under options, choose a reasonable range for the maximum lag
distance
Tip: The default is a little short. What you want to do is choose
a maximum lag distance that will allow you to describe the form
of the variogram AND have an adequate number of pairs of points
in each lag class. You can choose a distance as large as the
maximum x and y ranges as your data, but you probably won’t have
many pairs of points that are separated by this distance.
It’s best to choose a large range, nearly as large as the maximum
range. You can alter this later. 2. Once the variogram has
been calculated, choose autocorrelation estimator*, then choose
6000 meter lags for FIA data, and set the vertical scale to be something
well over the actual sill (like 1.1 or 1.2—this makes it easier
to compare the different directions). ***Remember to hit “apply”
after making all changes to see the results of the change***
Your goal should be to get a variogram point as close to the y axis
as possible, but have enough pairs of points within that lag bin
to have some confidence in it (at least 30, but more is advisable!).
*Isaaks has suggested that the use of the autocorrelation (correlogram)
measure avoids problems with the presence of a trend (nonstationary
random function). (see http://www.isaaks.com). Note:
you can overlap lag classes. For example, consider the case
where the Max Lag Distance is 100, the Number of Lags is five, and
the Lag Width is 40. The resulting lag intervals would be
0 to 30, 10 to 50, 30 to 70, 50 to 90, and 70 to 100. Notice
these intervals overlap. Overlapping lag intervals causes
a moving average type smoothing of the experimental variogram.
Often, it is easier to select an appropriate model for such smoothed
experimental variograms. Note: the linear model is there by
default. You should delete this right away (under the model
tab)—it’s upsetting to see it there! You can add and delete
models and edit parameters on the experimental tab. I’d recommend
playing with this for awhile. Remember: The scale is
the sill –(minus) the nugget, and the length is the range. Note:
when using an EXPONENTIAL model remember that in Surfer the convention
is to use 1/3 of the GSLIB range as the 'length', while the range
in GSLIB refers to the real range in meters as you read it off the
correlogram--e.g. if it's the last component, that distance where
you think the variogram actually reaches the sill. If you
use the GAUSSIAN model (which is a little tricky and we don't really
recommend--we usually stick with combos of the spherical and exponential),
the range is also referred to differently so you should check the
Surfer help for the proper format. Tip: We chose these options
after a lot of trial and error. 3. Decide on what combo
of models is best by getting the omnidirectional variogram (tolerance
is 90 degrees). This is the variogram that fits all the directional
variograms the best. (see Surfer help for a discussion of this).
-- back to Outline--
Modeling
anisotropy (see video (autofit.avi)
& video (anisotropy.avi)
r.e. this):
Get a good look at the model. Look at the number
of pairs, the presence of multiple structures, and the variogram
form. Now, cycle through the various directions: set
the tolerance to 30, and step through in multiples of 30 degrees.
Look for whether or not the range changes; look for how well the
omnidirectional model fits all directions.
Tip: The point is to choose a combination of models that fits
all directions. However, you want to very carefully model
the direction of maximum continuity (longest range), and the direction
90 degrees from that (usually the minimum range)---see the example
from the GSLIB book. One way to do this is to choose a combo
of models and a nugget for your omnidirectional model, and then
to model the variogram using this combo of models and nugget in
the maximum and minimum directions of continuity. You should choose
a variogram using your knowledge and intuition about the phenomenon
under question. You might believe that two structures is better
than one because over short distances, you’re looking at spatial
autocorrelation at the small watershed level, whereas at great distances,
you’re looking at autocorrelation at the regional level, and the
autocorrelation functions differ. I prefer a blend of both
philosophies: I’ll look at the variogram and ask myself if
it appears to be reasonable. Before I add another structure,
I’ll ask if I can imagine a biological reason for the blip I’m noticing
in my variogram. Most practitioners recommend sticking to one or
two models, and we tend to do this as well. Note: The Surfer
autofit method is ok, and is a black box, but you won't always agree
with the results. Remember: a statistically best-fitting
model (which is what an autofitting program does) is not necessarily
a best-fitting model in practice.
-- back to Outline--
Calculating
summary stats from the 100 realizations using postme99
- Edit in wordpad postme99.f by changing the numbers in this
line (near the top):
real
z(158,261,100),cut(100)
where the 100 is the number of simulations, the 158 is the
nx and the 261 is the ny (columns and rows).
- Run g77 (after downloading from the software section and installing
; read directions carefully! If you get stuck, call Andy)
by typing g77 postme99.f, then rename the output (a.exe) to pmx.exe,
x being the descriptor for the output file. This makes the
postme executable to use to post process your data. **Note, you
can also use the postsim.exe program available from the gslib
download page. However, you can only choose one percentile
at a time. See the gslib help r.e. this.
- When the simulation finishes: Run pmx.exe on the simulation
output file, specifying e, m, v, cv, iqr, 10, 20, etc. (the number
of the percentile) as the output files. E is the mean (who
knows why it's called "e-type mean") value, m is the median, v
is the variance, cv is coefficient of variation, iqr is interquartile
range (the range of the middle 50% of the data), and the numbers
are the percentiles (value below which this % of the data values
fall). This is very important for future processing; these
files must be named this way or you will not be able to run the
aml. Each file you create is a value for a given percentile
of the distribution of values generated at each location.
In other words, you will have 100 estimates of basal area at a
given pixel. These 100 values have a distribution, and postme
figures out for each pixel what is the “x” percentile (what is
the value below which x % of the other values fall) of the distribution.
***Note: If you don’t have time to run all of the simulations,
you can just stop it, then run postme, specifying the number of
realizations you intend to run. It will quit after it reaches
the end of the file, then you can run it with just that # of realizations.
-- back to Outline--
Using
a Mask
Using a nonforest mask to remove those output points that have been
modeled in nonforest areas This is more complicated, but doable.
You can get an AVHRR (1 km pixel resolution) forest/nonforest map
at http://www.epa.gov/docs/grd/forest_inventory/
. It will be in arcinfo grid format in Albers projection, and
you basically need to clip out your area of interest (in arc-info,
do "grid--clip" (see help) , reclassify each pixel of this clipped
map as forest (0) or nonforest (null) (arc-info grid command "remap"),
and then add your clipped AVHRR map to the finished map from the simulation.
This will blot out areas of the map that are nonforest. You
can also use a GAP or MRLC
product to do the same (30 m. pixels). Or, you can just present
your map as "potential distribution" or something. See our section
on documenting the dataset.
-- back to Outline--
Final
Map Production:
How to Make a Finished Map: There are 2 ways to make your final
map: In Surfer and in arcview.
**Note: if you separate your map into regions (e.g., ecoregions),
you will want to mosaic them together before making your final “giant”
map using the grid command “mosaic” (see arcinfo help); then, decompose
the grid into a Surfer file (if using the Surfer method) or leave
it as is, but clip it using the gridclip command (see arcinfo help)
(if using the arcview method). You might also want to use
a forest/nonforest mask, as well (above).
Surfer Method: (good to know, but more involved; see the
video (finalmaplong)
and video (finalmapshort.avi))
1. Convert, in one way or another, your boundary file to a Surfer
blanking file --- .bln file format (see below for description
of format); you do this to blot out the areas outside of your boundaries
that you interpolated. You can do this very easily by running
the ungen.ave script in arcview; this will take a shapefile (or
maybe an arc coverage) and turn it into a .csv textfile of vertices.
You will need to edit this file (in Surfer worksheet) so that it
adheres to the Surfer bln file format (i.e., you will need to add
a header (see below)). **Note: specify 0 as the flag if you
want to blank out the region outside your boundary! Next, you will
need to blank the area outside of this boundary, using the surfer
grid—blank command. You will make a new output grid file.
2. Now, draw the grid "image", you’ve created in 1 (from using either
postmesurfer or gridsurfer amls) and you will see your grid constrained
to the boundary (i.e. the edges outside the boundary clipped out).
You can add the boundary file as a base map, and then “overlay maps”
to overlay them, if you like. 3. You can change your
color of your output map by double clicking the map and fooling
around with the settings. See the Surfer help—it’s pretty
clear.
Excperpt from surfer help:
Golden Software Blanking File Description
This is an ASCII format file used to store geographic information
including areas, curves, and points. Even though the primary
use of GS Blanking files is to indicate regions to be "blanked-out",
they can also be used for simple boundaries and decorative illustrations.
The general format of the file is: length,flag "Pname 1"
x1,y1
x2,y2
...
xn,yn
length,flag "Pname 2"
x1,y1
x2,y2
... xn,yn
The length value is an integer which indicates the number of X,Y
coordinate pairs that follow.
The flag value is 1 if the region inside areas is to be blanked
and 0 if the region outside areas is to be blanked.
Pname is optional and is the name of a primary ID to be associated
with the object. The primary ID is used to link the object
to external data.
Following lines contain the actual X,Y coordinate pairs that make
up the object. These can be integers or real numbers, and
are stored 1 pair per line. The type of object is determined as
follows: * If the type/length field is 1, the object is considered
a point. One coordinate pair follows.
* If the type/length field is greater than 1 and the first
and last coordinate pairs are equal, the object is considered a
simple closed area. Otherwise, the object is considered a
curve.
Arcview
Method (recommended)
There are 2 ways to do this in arcview: one is a raster-based
(pixels) method, and one is a point-based method.: Arc-Info ArcView
Raster (pixel) Method:
See the video (grid2aviewlayout.avi)
for details on this. In order to make a finished map in arcview,
you need to first run the “gridviewer.aml” in arc info. Basically,
this aml will take a grid that you’ve made, create a remap table
and a colormap table based on colors that I like and show you the
grid on your display (you have to run this from the exceed x terminal
window, not from generic telnet). *you can change gridviewer.aml
to get your own colors. You can figure out what will be nice
colors by doing the “shadegrid” command in arc (see arcinfo help;
basically, this allows you to interactively make a nice color ramp).
If you save the colorramp, be sure to delete it before you run gridviewer.aml.
1. run gridviewer.aml (you might want to clip out your grid based
on a boundary file by doing the arc grid command "clip"; see the
arc-info help on its use)
2. Now, you can open up your grid in an arcview project called "legendforimage.apr".
3. Next, doubleclick the image legend and choose "interval", then
apply when you’re all done. Then, you should have your green
map on the screen.
4. Now, hit the little painter's pallete icon on view's menu.
Save the temporary file (this creates a fake legend, based on the
colormap file that’s in the same directory as the grid you opened
as an image. The levels in the legend correspond to the levels
you entered in gridviewer.aml.
5. In your layout, simplify the legend, and then plug in your own
numbers corresponding to each level. You will then have a
nice map and legend! Arc-Info/ArcView Vector (Point) method (easier):
With your outputs from postme or postsim.exe, you should run postmegridpoint,
available with the amls' you can download
from the software section. Once this is done, open the files
up in arcview, and either use the dnr tools arcview script to summarize
them (if you choose) in order to choose the percentile you want
(see below), or open up the final point shapefile you want to use.
Then you need to decide on the final map scale you want to use,
and make your view window exactly that size. Then, you need
to classify your points as you like, and turn them into squares
(see the video (postmegridpoint.avi)
on how to do this if you're not sure). This can be then brought
into a layout and turned into a final map.
Choosing
the percentile to use as the estimate
Back in ArcView, you want to create or import a county coverage
(or FIA units...). This will allow you to calculate county-level
stats of FIA data and compare that with the same area's statistics
calculated from the modeled map (reminder: after the nonforest
areas have been masked out/removed...). This will allow
you to test which of your postme results agrees best with the
FIA summary stats for the same areas.
Next, you want to summarize your plot data in ArcView:
using DNR’s arcview tools extension (after you install it, you
can find it under “window” in the top toolbar when a view is active),
click the summation button. Choose “add record id” if prompted,
and then choose as the summary field the species you simulated.
This will create for you a table of summary values, which you
can delete later, and it will automatically join this summary
table to your polygon theme. You’re really interested in
mean and standard deviation. Export your polygon theme attribute
table, somewhere you want to keep it.
Now, with your postme results, you want to do the same thing
you did with the plot data, but only with the map data.
The purpose of these last two steps is to create 2 statistics
files which can be compared. You will have statistics for
each county, and you will want to use them to see which of your
realizations you will want to choose as the one you will report
as your final estimate. (e.g., after you’ve compared the
plot statistics with those of the estimated maps, you might find
that the 50th percentile more closely matches the local statistics
of the fia data than the 30th percentile for some reason…)
In general, the 50th through the 70th percentiles seem to most
closely match with the FIA means, with the rarer species typically
requiring the higher percentiles. (see discussion).
You want to compare these statistics files: how does
the mean and sd of the original data in county x agree with those
from the estimates of each of the realizations? See the
video on this: video
(avsumstats), video
(excel join), & video
(pivottable.avi).
|