USDA Forest ServiceSkip navigational links  
 Northeastern Forest Inventory & Analysis
 Go to: NE FIA Home Page
 Go to:
 Go to:
Go to:
 Go to:
 Go to:
Go to:
 Go to: Publications & Products
 Go to:
Go to: FIA Site Map
 Go to: NE Station
 Go to:
 Go to:

Go to:Introduction

Go to:Flowchart of Process

Viewing:Outline of Steps

Go to:Definitions and Descriptions

Go to:Discussion

Viewing:Step by Step

Go to:Examples & Downloads

Go to:Videos

Go to:Arc Scripts

Viewing:Contact: Andrew Lister

 

Forest Inventory & Analysis Program
11 Campus Blvd.
Suite 200
Newtown Square, PA 19073-3294

(610)557-4075
(610)557-4250 FAX
(610)557-4132 TTY/TDD

 United States Department of Agriculture Forest Service. USDA logo which links to the department's national site. Forest Service logo which links to the agency's national site.
 

GIS /Spatial Statistics

Geostatistics Workshop

Step-by-Step

Go to:Preparing/reformatting data in Excel for ArcView

Now, with your lat, long, id, attribute1, attribute2, etc... file, prepare the data in a spreadsheet like Excel for conversion to a shapefile.  This is not a vital step, but is recommended because it allows you to visualize the data, check for obvious location errors, look for basic spatial patterns or trends in the data, etc.  You should end up with a comma-delimited table with one header row, and decimal degrees X and Y columns. Get the video (xyztoshape.avi).
-- back to Outline--.
Go to:Importing and viewing data in ArcView
In ArcView, you want to add the table, and then choose "view--add event theme", choosing the decimal degrees longitude as the X and decimal degrees latitude as the Y.  you will then be able to see your plots in geographic coordinates.  You will need to project them into a projection such as Albers, State Plane, or UTM, however, in order to have the distances between the points be measured accurately across the dataset.  In other words, you cannot accurately do modeling with x,y data in decimal degrees!  Get the video (tableaddevent.avi).
-- back to Outline--
Go to:Projecting coordinates using ArcView
You will need to check to see if you have "projector", or if you have av 3.2, the projection wizard, available.  If not, you will need to download it here, and place all of the files in the "ext32" directory, wherever you have arcview installed.  If you are using unix arcview, you should automatically have projector as an option.  If not, you need to have someone copy the projector extension to the $AVHOME/ext directory, wherever arcview is installed on your system.  **The same goes for the other extensions referenced in this document (xtools, dnr tools, etc.); all extensions can be handled the same way!  Get the video (project.avi).
-- back to Outline--
Go to:Adding newly projected XY coordinates to attribute table in ArcView
Now that your data are projected, you need to use a script, or a program written in arcview's "avenue" language, to add the x, y coordinates to the attribute table you imported.  You have a shapefile now, but it does not explicitly have the x,y coordinates in albers projection in tabular form.  You want to load, compile, and execute the addcoor.ave script, available from the scripts page.  Then, you will export this text file so that you can bring it into Surfer, and then begin modelling.  Get the video (addcoor.avi).
-- back to Outline--
Go to:Create the dataset for modeling
  • Define zones using the drawing tool or an existing ecoregion (or other) coverage.  If using user-defined areas that you just drew, convert the drawings to shapefiles using xtools "convert graphics to shape".
  • With the point theme active do "theme-select by theme", and select the features of the active theme that 'have their center in' the selected features of the zone/ecoregion theme (the polygon coverage or shapefile) of interest.
  • Export the point attribute table (with the records still selected from the above operation) as 'delimited text (only the selected records will be exported) and import into Excel.  Get the video (zonedefinitionarcview.zip).
  • -- back to Outline--
Go to:Create the dataset for interpolation
  • create another polygon coverage or shapefile in which the polygon of interest (i.e. the ecoregion or user-defined area) is larger by a certain buffer zone -- ideally equivalent to the search radius that will be used in the interpolation.
  • and with that selected, do the same steps as above to create this second dataset.  Get the video (zonedefinitionarcview.zip).
  • -- back to Outline--
Go to:Eliminating duplicate values (prior to nscoring) and NSCORE
  • Organize rows of interest in Excel.  Sort by row of interest (species).  Remove missing values (NOT necessarily 0’s; if 0 is a measurement, consider leaving it.  One needs to determine the specific product one wishes to produce (e.g., the importance of red maple on forest land, or the importance of red maple on all land.  If the former, include zero observations for red maple on forested plots, but don’t include zeros which are nonforest plots; if the latter, don’t omit any zero’s.))
  • Make a column of tiny, incrementally increasing numbers, then a column of random numbers.  Sort "tiny" and "rand" by "rand".
  • Then make a new column, species_dupe, and use the formula "if species = cell above it, then species + tiny; otherwise, just species".  This will add a tiny amount to each duplicate value, randomly.  ***Remember to copy and paste special the values of species_dupe when done
  • Delete away "tiny" and "rand", put into simplified geo eas format, and save as MS-Dos text in Wordpad or Word.
  • Simplified geo eas format is this:
        title
        #vars
        varname1
        varname2
        varname i
        data1    data2    ....   datai
        data1    data2    ...    datai
        etc,
e.g.    data from zone3 red maple
          4
          x
          y
          basal_area
          volume
          43000003    35999998    120    4000
          43883999    35999967    80      2399
           etc.
  • Edit nscore.par (see example parameter file included with program in the software seciton), or see the gslib help page.  ****NOTE  It's best to run the gslib programs from the dos prompt, i.e., open the msdos prompt, and then navigate to where you've got the nscore.exe stored, and then type "nscore", and follow prompts.
Go to:Bringing the data (output from nscore) into Surfer, and viewing and modeling the variogram/correlogram
  • Bring the *.out file from nscore into Surfer’s spreadsheet (treat consecutive delimiters as one, tab and space delimited checked)
  • Remove header, and save (as original file name from arcview export (basically, it keeps #’s of extra files you generate to a minimum; you won’t need the old file anymore anyway.)
  • After importing the data into Surfer and checking to make sure everything's ok (view the worksheet), you can create a variogram by doing grid -- variogram -- new variogram.  Be sure to manipulate the maximul lag distance in the new variogram dialogue under options (the default is often rather short).  Remember, you want to have a large enough distance to describe the entire form of the variogram, but have an adequate number of pairs of points (at least 30, but preferably hundreds) at the long distances to have faith in your semivariance value.  Choose the 'autocorrelation' option (this will display the correlogram--discussion). Get the video (variomodelx.avi).
  • Do variogram modeling (see guidelines) using the nscored data.
  • Record variogram parameters (using snagit for this is handy) and paste into excel; I like to paste the variogram graph itself, the experimental tab, and the model tab, so later I can exactly reproduce a variogram that I made with minimum effort.
  • -- back to Outline--
Go to:Variogram Modeling Steps/Tips/Guidelines-- (see the videos for this)
            viewing and modeling the omnidirectional correlogram ('autocorrelation estimator type' in Surfer)
For all steps here, see the Surfer Help.  It’s VERY useful. 1. Do Grid-Variogram-New Variogram in Surfer; choose correct columns, update statistics, and check on range of x and y in statistics; now, under options, choose a reasonable range for the maximum lag distance
Tip: The default is a little short. What you want to do is choose a maximum lag distance that will allow you to describe the form of the variogram AND have an adequate number of pairs of points in each lag class.  You can choose a distance as large as the maximum x and y ranges as your data, but you probably won’t have many pairs of points that are separated by this distance.  It’s best to choose a large range, nearly as large as the maximum range.  You can alter this later. 2. Once the variogram has been calculated, choose autocorrelation estimator*, then choose 6000 meter lags for FIA data, and set the vertical scale to be something well over the actual sill (like 1.1 or 1.2—this makes it easier to compare the different directions).  ***Remember to hit “apply” after making all changes to see the results of the change***  Your goal should be to get a variogram point as close to the y axis as possible, but have enough pairs of points within that lag bin to have some confidence in it (at least 30, but more is advisable!).   *Isaaks has suggested that the use of the autocorrelation (correlogram) measure avoids problems with the presence of a trend (nonstationary random function).  (see http://www.isaaks.com). Note:  you can overlap lag classes.  For example, consider the case where the Max Lag Distance is 100, the Number of Lags is five, and the Lag Width is 40.  The resulting lag intervals would be 0 to 30, 10 to 50, 30 to 70, 50 to 90, and 70 to 100.  Notice these intervals overlap.  Overlapping lag intervals causes a moving average type smoothing of the experimental variogram.  Often, it is easier to select an appropriate model for such smoothed experimental variograms. Note:  the linear model is there by default.  You should delete this right away (under the model tab)—it’s upsetting to see it there!  You can add and delete models and edit parameters on the experimental tab.  I’d recommend playing with this for awhile.  Remember:  The scale is the sill –(minus) the nugget, and the length is the range. Note:  when using an EXPONENTIAL model remember that in Surfer the convention is to use 1/3 of the GSLIB range as the 'length', while the range in GSLIB refers to the real range in meters as you read it off the correlogram--e.g. if it's the last component, that distance where you think the variogram actually reaches the sill.  If you use the GAUSSIAN model (which is a little tricky and we don't really recommend--we usually stick with combos of the spherical and exponential), the range is also referred to differently so you should check the Surfer help for the proper format. Tip:  We chose these options after a lot of trial and error. 3.   Decide on what combo of models is best by getting the omnidirectional variogram (tolerance is 90 degrees).  This is the variogram that fits all the directional variograms the best. (see Surfer help for a discussion of this).
-- back to Outline--
Go to:Modeling anisotropy (see video (autofit.avi) & video (anisotropy.avi) r.e. this):
Get a good look at the model.  Look at the number of pairs, the presence of multiple structures, and the variogram form.  Now, cycle through the various directions:  set the tolerance to 30, and step through in multiples of 30 degrees.  Look for whether or not the range changes; look for how well the omnidirectional model fits all directions.
Tip:  The point is to choose a combination of models that fits all directions.  However, you want to very carefully model the direction of maximum continuity (longest range), and the direction 90 degrees from that (usually the minimum range)---see the example from the GSLIB book.  One way to do this is to choose a combo of models and a nugget for your omnidirectional model, and then to model the variogram using this combo of models and nugget in the maximum and minimum directions of continuity. You should choose a variogram using your knowledge and intuition about the phenomenon under question.  You might believe that two structures is better than one because over short distances, you’re looking at spatial autocorrelation at the small watershed level, whereas at great distances, you’re looking at autocorrelation at the regional level, and the autocorrelation functions differ.  I prefer a blend of both philosophies:  I’ll look at the variogram and ask myself if it appears to be reasonable.  Before I add another structure, I’ll ask if I can imagine a biological reason for the blip I’m noticing in my variogram. Most practitioners recommend sticking to one or two models, and we tend to do this as well. Note:  The Surfer autofit method is ok, and is a black box, but you won't always agree with the results.  Remember:  a statistically best-fitting model (which is what an autofitting program does) is not necessarily a best-fitting model in practice.
-- back to Outline--
Go to:Calculating summary stats from the 100 realizations using postme99
  • Edit in wordpad postme99.f by changing the numbers in this line (near the top):

  •                   real              z(158,261,100),cut(100)
     where the 100 is the number of simulations, the 158 is the nx and the 261 is the ny (columns and rows).
     
  • Run g77 (after downloading from the software section and installing ; read directions carefully!  If you get stuck, call Andy) by typing g77 postme99.f, then rename the output (a.exe) to pmx.exe, x being the descriptor for the output file.  This makes the postme executable to use to post process your data. **Note, you can also use the postsim.exe program available from the gslib download page.  However, you can only choose one percentile at a time.  See the gslib help r.e. this.

  •  
  • When the simulation finishes:  Run pmx.exe on the simulation output file, specifying e, m, v, cv, iqr, 10, 20, etc. (the number of the percentile) as the output files.  E is the mean (who knows why it's called "e-type mean") value, m is the median, v is the variance, cv is coefficient of variation, iqr is interquartile range (the range of the middle 50% of the data), and the numbers are the percentiles (value below which this % of the data values fall).  This is very important for future processing; these files must be named this way or you will not be able to run the aml.  Each file you create is a value for a given percentile of the distribution of values generated at each location.  In other words, you will have 100 estimates of basal area at a given pixel.  These 100 values have a distribution, and postme figures out for each pixel what is the “x” percentile (what is the value below which x % of the other values fall) of the distribution.

  •    
     
      ***Note:  If you don’t have time to run all of the simulations, you can just stop it, then run postme, specifying the number of realizations you intend to run.  It will quit after it reaches the end of the file, then you can run it with just that # of realizations.
-- back to Outline--
Go to:Using a Mask
Using a nonforest mask to remove those output points that have been modeled in nonforest areas This is more complicated, but doable.  You can get an AVHRR (1 km pixel resolution) forest/nonforest map at http://www.epa.gov/docs/grd/forest_inventory/ .  It will be in arcinfo grid format in Albers projection, and you basically need to clip out your area of interest (in arc-info, do "grid--clip" (see help) , reclassify each pixel of this clipped map as forest (0) or nonforest (null) (arc-info grid command "remap"), and then add your clipped AVHRR map to the finished map from the simulation.  This will blot out areas of the map that are nonforest.  You can also use a GAP or MRLC product to do the same (30 m. pixels).  Or, you can just present your map as "potential distribution" or something.  See our section on documenting the dataset.
-- back to Outline--

Go to:Final Map Production:
How to Make a Finished Map: There are 2 ways to make your final map:  In Surfer and in arcview.
**Note:  if you separate your map into regions (e.g., ecoregions), you will want to mosaic them together before making your final “giant” map using the grid command “mosaic” (see arcinfo help); then, decompose the grid into a Surfer file (if using the Surfer method) or leave it as is, but clip it using the gridclip command (see arcinfo help) (if using the arcview method).  You might also want to use a forest/nonforest mask, as well (above).
  Surfer Method: (good to know, but more involved; see the video (finalmaplong) and video (finalmapshort.avi))
1. Convert, in one way or another, your boundary file to a Surfer blanking file ---  .bln file format (see below for description of format); you do this to blot out the areas outside of your boundaries that you interpolated.  You can do this very easily by running the ungen.ave script in arcview; this will take a shapefile (or maybe an arc coverage) and turn it into a .csv textfile of vertices.  You will need to edit this file (in Surfer worksheet) so that it adheres to the Surfer bln file format (i.e., you will need to add a header (see below)).  **Note: specify 0 as the flag if you want to blank out the region outside your boundary! Next, you will need to blank the area outside of this boundary, using the surfer grid—blank command.  You will make a new output grid file. 2. Now, draw the grid "image", you’ve created in 1 (from using either postmesurfer or gridsurfer amls) and you will see your grid constrained to the boundary (i.e. the edges outside the boundary clipped out).  You can add the boundary file as a base map, and then “overlay maps” to overlay them, if you like. 3.   You can change your color of your output map by double clicking the map and fooling around with the settings.  See the Surfer help—it’s pretty clear.
  Excperpt from surfer help:
Golden Software Blanking File Description
This is an ASCII format file used to store geographic information including areas, curves, and points.  Even though the primary use of GS Blanking files is to indicate regions to be "blanked-out", they can also be used for simple boundaries and decorative illustrations.  The general format of the file is: length,flag "Pname 1"
x1,y1
x2,y2
...
xn,yn
length,flag "Pname 2"
x1,y1
x2,y2
... xn,yn
The length value is an integer which indicates the number of X,Y coordinate pairs that follow.
The flag value is 1 if the region inside areas is to be blanked and 0 if the region outside areas is to be blanked.
Pname is optional and is the name of a primary ID to be associated with the object.  The primary ID is used to link the object to external data.
Following lines contain the actual X,Y coordinate pairs that make up the object.  These can be integers or real numbers, and are stored 1 pair per line. The type of object is determined as follows:   * If the type/length field is 1, the object is considered a point.  One coordinate pair follows.
  * If the type/length field is greater than 1 and the first and last coordinate pairs are equal, the object is considered a simple closed area.  Otherwise, the object is considered a curve.

Go to:Arcview Method (recommended)
There are 2 ways to do this in arcview:  one is a raster-based (pixels) method, and one is a point-based method.: Arc-Info ArcView Raster (pixel) Method:
See the video (grid2aviewlayout.avi) for details on this. In order to make a finished map in arcview, you need to first run the “gridviewer.aml” in arc info.  Basically, this aml will take a grid that you’ve made, create a remap table and a colormap table based on colors that I like and show you the grid on your display (you have to run this from the exceed x terminal window, not from generic telnet).  *you can change gridviewer.aml to get your own colors.  You can figure out what will be nice colors by doing the “shadegrid” command in arc (see arcinfo help; basically, this allows you to interactively make a nice color ramp).  If you save the colorramp, be sure to delete it before you run gridviewer.aml. 1. run gridviewer.aml (you might want to clip out your grid based on a boundary file by doing the arc grid command "clip"; see the arc-info help on its use)
2. Now, you can open up your grid in an arcview project called "legendforimage.apr".
3. Next, doubleclick the image legend and choose "interval", then apply when you’re all done.  Then, you should have your green map on the screen.
4. Now, hit the little painter's pallete icon on view's menu.  Save the temporary file (this creates a fake legend, based on the colormap file that’s in the same directory as the grid you opened as an image.  The levels in the legend correspond to the levels you entered in gridviewer.aml.
5. In your layout, simplify the legend, and then plug in your own numbers corresponding to each level.  You will then have a nice map and legend! Arc-Info/ArcView Vector (Point) method (easier):
With your outputs from postme or postsim.exe, you should run postmegridpoint, available with the amls' you can download from the software section.  Once this is done, open the files up in arcview, and either use the dnr tools arcview script to summarize them (if you choose) in order to choose the percentile you want (see below), or open up the final point shapefile you want to use.  Then you need to decide on the final map scale you want to use, and make your view window exactly that size.  Then, you need to classify your points as you like, and turn them into squares (see the video (postmegridpoint.avi) on how to do this if you're not sure).  This can be then brought into a layout and turned into a final map.

Go to:Choosing the percentile to use as the estimate

  • Back in ArcView, you want to create or import a county coverage (or FIA units...).  This will allow you to calculate county-level stats of FIA data and compare that with the same area's statistics calculated from the modeled map (reminder:  after the nonforest areas have been masked out/removed...).  This will allow you to test which of your postme results agrees best with the FIA summary stats for the same areas.
  • Next, you want to summarize your plot data in ArcView:  using DNR’s arcview tools extension (after you install it, you can find it under “window” in the top toolbar when a view is active), click the summation button.  Choose “add record id” if prompted, and then choose as the summary field the species you simulated.  This will create for you a table of summary values, which you can delete later, and it will automatically join this summary table to your polygon theme.  You’re really interested in mean and standard deviation.  Export your polygon theme attribute table, somewhere you want to keep it.
  • Now, with your postme results, you want to do the same thing you did with the plot data, but only with the map data.  The purpose of these last two steps is to create 2 statistics files which can be compared.  You will have statistics for each county, and you will want to use them to see which of your realizations you will want to choose as the one you will report as your final estimate.  (e.g., after you’ve compared the plot statistics with those of the estimated maps, you might find that the 50th percentile more closely matches the local statistics of the fia data than the 30th percentile for some reason…)  In general, the 50th through the 70th percentiles seem to most closely match with the FIA means, with the rarer species typically requiring the higher percentiles. (see discussion).
  • You want to compare these statistics files:  how does the mean and sd of the original data in county x agree with those from the estimates of each of the realizations?  See the video on this:  video (avsumstats), video (excel join), & video (pivottable.avi).