Rachel Riemann Hershey
Current, accessible information on the distribution of tree species aids considerably in understanding and managing ecosystems. However, such detailed information on forest composition is typically only available from ground inventory. We used geostatistical techniques to create an interpolated dataset--a `map' of individual species distribution--from known sample information. In a previous study, indicator kriging and sequential gaussian conditional simulation (sgCS) were found to be promising tools for modeling sugar maple distribution from the USDA National Forest Inventory and Analysis (FIA) data. The two techniques provided an estimate of species occurrence and a measure of uncertainty associated with that estimate, while retaining much of the local variability present in the sample data. In this study, these techniques are applied to 9 additional species in Pennsylvania and the estimates and uncertainty information examined. Several output datasets are available for each species including: the probability of species occurrence, an estimate of its relative abundance, and a plus-and-minus level of uncertainty associated with that estimate. The datasets, used in conjunction with one another, provide the user with considerable flexibility in setting up the balance of errors of omission and commission that best suit the analysis under consideration. Similarities and differences between the species are identified and discussed, with particular regard to their possible effect on the final estimates. Examples of how the datasets can be used are also presented, and accuracy is discussed. Indicator kriging and sgCS, used in conjunction with FIA sample data, are straightforward and effective techniques to describe species occurrence and relative density across a state.
Data describing forest composition--a basic data source for many aspects of ecosystem analyses, models, and management--are generally unavailable and/or are stored in fixed forest-type categories. But forest communities are often not well characterized by the discrete categories imposed by forest-cover-type divisions. Inherent in each forest-type category is an entire continuum (usually multi-dimensional) of different species and their relative importance.
Extensive individual species data are available from the Forest Inventory and Analysis (FIA) inventory, but only at the sample plot locations. To improve upon current county level information and simple area averages, we used geostatistical techniques to examine and incorporate the data's inherent spatial structure into local area estimates. In a previous study, we compared the geostatistical techniques available to interpolate FIA sample data to create a ``map'' of tree species distribution (Riemann Hershey et al. 1997). The tools of ordinary kriging, multigaussian kriging, indicator kriging, and sequential gaussian conditional simulation (sgCS) were used to estimate the occurrence and distribution of sugar maple in Pennsylvania. After considering the phenomenon being examined, the sample data being used, and the kind(s) of output desired in this study, indicator kriging and sgCS were the best interpolation tools to:
a) The estimate of the probability of each species' presence or absence was provided by indicator kriging. An indicator transform divides the data into two classes--either above or below a designated cutoff value; in this study 0% ba/acre indicating presence or absence. Indicator kriging calculates for each cell an estimate of the probability that it falls above or below the cutoff value. The output dataset thus indicates the probability that sugar maple occurred at each location given the available sample information.
b) The estimate of the relative amount of sugar maple at that location--e.g., whether the species represented a minor, moderate, or a major component of the total ba/acre on the plot at that location--was provided by sgCS. Sequential gaussian conditional simulation determines multiple estimates for each cell. All are equally probable, and yet alternative realizations of the data determined from multiple simulation runs. From this set of estimates, an entire conditional probability distribution can be built for each cell, representing a range of possible values. A summary statistic such as the mean, median, or any percentile of this distribution can then be chosen and used as the modeled ``estimate'' of %ba/acre for that cell.
c) A level of uncertainty is always associated with any estimate. Knowing how much uncertainty exists will help the user identify whether that uncertainty is acceptable for a specific task and how the data can be used. In addition, knowing how much uncertainty exists helps the producer identify areas in which additional sampling would most improve the estimates. For estimates of %ba/acre, summary statistics such as standard deviation or interquartile range were calculated from the distribution of simulated values to describe the variation associated with the %ba/acre estimate for each cell. For estimates of species' presence/absence, indicator kriging provides a probability. Although not strictly an uncertainty value, a probability of occurrence value can be effectively used to select a cutoff that reflects the user's preferences for errors of omission versus comission in the identification of areas of species occurrence.
d) Tree species in Pennsylvania exhibit a high level of local variation as a result of natural environmental factors and land use histories. At the intensity of sampling present in the FIA sample data, much of this local variation cannot be modeled and effectively predicted by spatial information alone, and instead appears as variation that is unexplained by neighboring plots. However, such local variability is an important characteristic of the distribution of a species. Thus, we did not want this local variability to become hidden behind a regional average of the resource, but to remain as apparent and accessible to the user as possible in the final estimated dataset(s). Sequential gaussian conditional simulation was the most effective of the interpolation methods at maintaining local variation.
e) One feature of a well-designed sampling scheme is that it is sensitive to and can report, with an acceptable level of error, those univariate and spatial characteristics of the phenomena of interest. In this ideal situation, the characteristics of the sample data represent reasonably well those characteristics of the phenomena itself. Each of these estimation techniques honors and maintains different aspects of the original data. The specific goals of the interpolation task at hand will determine the priorities, but in general the more characteristics of the sample data that are preserved in the estimated dataset, the more desirable the dataset. Sequential gaussian conditional simulation again did the best job of maintaining the univariate, bivariate (variogram and covariance), and local characteristics of the sample data.
f) As is true with many plant and animal populations, tree species have population distributions that are distinctly skewed toward younger individuals-- more small trees than large mature ones. In addition, Pennsylvania, like most of the northeastern states, contains primarily mixed forests. Individual species rarely occur in pure stands. The 10 species examined included 8 of the most common by volume in Pennsylvania, and yet more than 50% of the time, when a species occurred on a plot, it occurred as only a minor component (here defined as making up less than 20% of the total ba/acre on that plot). Both factors are combined in the %ba/acre ``relative importance'' value, resulting in a highly skewed frequency distribution. Such extreme characteristics in the sample data can cause difficulties and biases when used with some of the interpolation methods that depend on assumptions about the normality of the distribution of the sample data (Isaaks and Srivastava 1989). One particular advantage of indicator kriging is that it makes no assumptions about the distribution of the data. Sequential gaussian conditional simulation, on the other hand, does assume that the data are multi-normally distributed and stationary, and must be used more carefully. The sgCS routine used here, from Deutsch and Journel (1992), performs a 1-1, invertible normal-score transform on the data before running the simulation. In addition, however, the data also should be checked for bivariate normality and a decision made as to whether to assume multivariate normality before the results of conditional simulation are accepted (Rossi et al. 1993).
The data exploration techniques helped to understand the spatial characteristics of the species/variable being examined. These techniques included univariate analysis, variograms and other spatial dependence analyses, and calculating local statistics. The resulting information was critical not only in determining what interpolation methods were most suitable and for checking the sample data for errors, but also for understanding the characteristics of the sample data and thus the phenomena being investigated. Geostatistical techniques offered ways to explore, organize, and summarize spatial patterns in the data that can provide clues to the variation and spatial behavior of the individual species under investigation.
As summarized above, the previous study offered promising results for using geostatistical techniques for estimating the distribution of sugar maple from FIA data. Thus, the same geostatistical methods were applied in this study to nine additional species: red oak, white oak, chestnut oak, black oak, hemlock, red maple, beech, white pine, and yellow birch. This list includes 8 of the top 10 most abundant species in Pennsylvania by volume, and two species (yellow birch and white pine) that are much less common (Alerich 1993) to investigate the effect on the accuracy of estimating a species occurring only relatively rarely.
Tree species distribution is affected by many factors, including both environmental conditions and direct human influence through harvesting and other land use histories. As a result of being differentially affected by all of these factors, each species exhibits different patterns and scales of spatial distribution. Some of these factors occur at scales much smaller than the sampling intensity of the FIA data, and some occur over larger areas, representing broad-scale variation in the species distribution. In the previous study, it was found that a substantial amount of variation in sugar maple distribution (97%) was resolved at the sampling scale used for the FIA plots. This spatial dependence could, therefore, be modeled and used to support estimates of species occurrence and relative ``importance'' (%ba/acre). The goal of this study is to examine to what extent this is true for the other species. More specifically, the objectives of this study are to determine:
The sample data were collected by the Northeastern Forest Experiment Station's Forest Inventory and Analysis (FIA) unit. Basal area--the summed cross-sectional area at breast height--is calculated for all live trees 1.0 inches DBH or larger on the plot (Hansen et al. 1992). The data were for individual tree species, by basal area (ba) per acre as a proportion of the total basal area (% ba/acre). The data were accessed from individual tree records in the USFS Eastwide tree-level database and summarized as %ba/acre for each species by plot. In Pennsylvania, there were a total of 5,100 plots. Nonforested plots and those with total ba/acre equal to zero as a result of missing data were removed--leaving 2,905 plots.
Each species was examined independently. As in the previous study, the data for each species was organized, summarized, and explored using univariate statistics, measures of spatial dependence (variogram, covariance, and correlogram), and spatial distribution of local statistics across the state. All species were similar in many of their basic characteristics to the previously investigated species, sugar maple. Each exhibited positively skewed distributions, with more than 50% of the plots containing less than 1% ba/acre in every species except red maple and red oak. A variogram was calculated for both the raw sample data and for a 1-1, invertible normal-score transform of the sample data, using a lag distance of 500m and no directional component (anisotropy). Although there may be some directional differences in local areas, such as in the NE-SW direction in the ridge and valley section, or in the E-W direction of the Allegheny Plateau, the mixture and complexity of topography in Pennsylvania is such that when the state is treated as a single region, little anisotropy is exhibited in the variogram. Such regional differences in spatial pattern provide additional evidence for dividing the area of interest into smaller regions for modeling and simulation, but this more time-consuming approach was not used in this study. In every instance, the variogram for the normal-scored data exhibited considerably more spatial dependence and structure than for the raw data (Figure 1), revealing a spatial structure that had been hidden by the strong positive skew of the data. As sgCS uses normal-scored data, it was the model fitted to the normal-scored variogram that was used in the conditional simulation. An indicator variogram also was calculated and modeled for use in the indicator kriging. To assess how areas of `local' variability in the sample data changed across the state, the mean and standard deviation were calculated for each of the 23,400 3000 x 3000 m cells, using a 15 x 15 km area as the window defining the size of the `local' area. All species exhibited a proportional effect, with areas of high mean corresponding with areas of high local standard deviation, indicating a lack of global stationarity. Using normal-scored data largely eliminated this situation. As mentioned previously, sgCS assumes the data is multi-normal. The data was transformed to be univariate normal, and was checked for bivariate normality by comparing the indicator variograms at each decile against what their bivariate normal equivalent would be (Deutsch and Journel, 1992). In each case, the species were found to be very close to bivariate normal. Higher moment normality was assumed.
Indicator kriging and sgCS were run for each species, using models derived from the appropriate variograms. The estimation parameters of cell size (3,000m), search radius (10,000m), and minimum:maximum number of points used (1:16) were taken directly from the results of the previous study. In sgCS, 100 simulations were run for each species.
With the exception of red maple, all species examined demonstrated substantial spatial dependence in the variogram of normal-scored data, with 64 to 97% of the variation explained by the visible structure and capable of being modeled (Table 1). In some species, much of that spatial dependence was contained in a very long-range trend of about 100,000 meters. White oak, black oak, chestnut oak, and beech all fell into this category. The rest of the species appeared to split the bulk of the explained spatial dependence over 2 ranges. For red oak, this was both a short- and medium-range pattern (12,000 and 40,000m); sugar maple a very short- and a long-range pattern (2,100 and 60,000m); and yellow birch a medium- and long-range pattern (19,000 and 80,000) (Figure 2 and Table 2). The spatial dependence exhibited in the indicator variograms was much less, ranging from only 32 to 57% of the variation explained (Figure 3 and Table 3). This is not ideal for estimation and suggests the necessity for further refinement to decrease the uncertainty associated with those estimates. In general, however, where spatial dependence exists, incorporating that information into the model should improve the estimate.
The results create several possible output datasets. Figure 4a-d shows the results for beech, using 4 datasets to represent the species. Part (a) shows the estimated probability of beech occurrence, as calculated from indicator kriging. Part (b) is the 70th percentile value of the 100 sgCS simulations, representing the chosen estimate of beech in %ba/acre. The uncertainty associated with that estimate (here chosen to be a percentile range capturing approximately 2/3 of the distribution) is described in (c) and (d). Part (c) expresses the minus variation, or that distance in %ba/acre values between the 70th percentile and the bottom of that range (the 17th percentile), and part (d) expresses the plus variation (the difference between the 83rd and 70th percentiles).
Sequential gaussian conditional simulation preserves the sample histogram and variogram. To check how well it performed, given the unknowns about the normality of the sample data, the histogram and variogram of several single realizations were compared with that of the sample data (in normal-space). The realizations output by sgCS recreated almost identically the sample variogram at lag distances up to 40,000 m, indicating that the sgCS, and the models and search criteria used, is performing up to expectations over those distances.
The output from sgCS creates a distribution function of possible values for each cell. From this distribution, we can calculate any summary statistic to use as the 'estimate' and any summary statistic to use as the measure of uncertainty, depending on the objectives of the immediate task. The mean estimate, as expected, creates a noticeably smoothed dataset, similar to the output from ordinary kriging (with the exception that an uncertainty term such as the standard deviation can now be calculated with it). The median estimate dataset in each case did maintain the sample histogram and did not noticeably smooth the data. And it created a dataset that retained much of the local variation. The median estimate, however, did provide a very low estimate of %ba/acre values, typically well below what was expected from previous knowledge. As a result, the 70th percentile was chosen as the 'estimate' of beech %ba/acre. It contained many of the characteristics of the sample data in terms of the histogram and local statistics, as well as more closely maintaining the county averages. As the measure of uncertainty, about this value, the middle two-thirds of this distribution was chosen.
The results of sgCS identify much more specifically than county summaries where concentrations of each species occurs. When species are relatively rare, as yellow birch and white pine are in Pennsylvania, or patchy, as red oak is, a summary at county level smooths the information enough to hide completely all the local detail as to where within the counties concentrations of these individual species occur. The results of sgCS highlighted rather than smoothed over this detail.
All four maps presented here provide useful information about species occurrence and distribution. The probability that a species occurs is a unique and useful dataset. Species occurrence is often a first decision criterion in land management, particularly in situations where it is not yet known what density of a particular tree species is required before a wildlife species considers it suitable habitat, or before a pest finds conditions favorable. By reporting a probability of occurrence for each cell, the user is no longer faced with a fixed final map based on preferences assumed at the time of model creation, but can use any probability as the cutoff to create a species presence/absence map depending on the objectives of the task at hand. For example, if a particular insect is known to live in hemlock forests and the objective is to limit the search to only those areas where there is a high probability of finding suitable conditions, we might set the cutoff at the probability level of >= 0.8 to create a map of the area of interest. If, however, we are most interested in not missing any areas where the insect occurred, we might set the probability level for forest much lower, say >= 0.4.
The results of sgCS are particularly useful when there is interest in the distribution of densities at which a species occurs. For example, if it is known that a certain predominance of a chestnut oak is required for use by bear as permanent territory, but that a lesser dominance of oak is possible if certain other conditions also exist, an estimate of the %ba/acre of oak for each area is useful information. Figure 5 is a map produced from the sgCS estimates for chestnut oak, illustrating where it occurs as a major (>40% ba/acre), moderate (20 to 40% ba/acre), and minor (1 to 20% ba/acre) component of the forest. The %ba/acre estimate can be associated with a corresponding map reflecting the uncertainty found in the conditional simulation. In figure 6 the uncertainty is divided into three classes reflecting what the user considered to be acceptable, moderate, and unacceptable uncertainty. Highlighted in red in figure 6 are those areas where the uncertainty is greater than the level considered acceptable. Here, this is primarily due to those areas where the local variability of chestnut oak value exceeded the limits of resolution of the current sample design and intensity. Such a map illustrates those areas to target should more time and money become available to collect additional inventory data for chestnut oak, and which areas to first discount should another decision criteria become available. Again, the output from sgCS itself is unclassed; the user defines the classes of interest from the building blocks provided.
The same %ba/acre dataset(s) also can be used to create a forest-cover-type ``map.'' The advantage of creating and maintaining estimated datasets for each individual species is that forest cover types can be uniquely defined to capture more accurately the habitat required for a particular study. For example, the user might define a SM/B type as those areas where the combined %ba/acre of sugar maple and beech amounts to more than 50% of the total ba/acre in the stand, and where no more than 20% of the rest is yellow birch. For this purpose, the summary statistic (whether mean, median, or any of the percentiles of each cell's simulated distribution) was chosen specifically for the intended purpose in the same way the different levels of probability were chosen when mapping species presence/absence from indicator kriging estimates. If the objective is to reduce the error of commission (i.e., classifying areas as sugar maple/beech (SM/B) in the estimated map that really are not SM/B), then using a percentile at the lower end of each cell's distribution would be more desirable. If, however, the objective is to reduce the error of omission (i.e., reduce the possibility of missing areas that do contain SM/B), then using a higher percentile from the distribution, such as 70%, would be more desirable (Figure 7).
The accuracy of the results from any model is important information for users of those data. How well does the output dataset portray the characteristics of the phenomena it was designed to capture? If it was intended to estimate current landcover as observed on the ground, how well does it do that? If it was intended to capture local variability, how well does it do that? If it was intended to identify where there are higher concentrations of the resource, how well does it do that?
Assuming that the sample dataset reflects the phenomena it was designed to capture (a product of the sample design and intensity), we can test how closely the model represents the characteristics of the sample data by checking the univariate, variogram/covariance, and local statistics of the estimated datasets against those of the sample dataset. As expected by definition (assuming the data fit all the assumptions of normality), an individual realization of sgCS maintained nearly an identical histogram to that of the sample data, and the covariance and variogram at lag distances up to 40,000 meters. At the larger lags, however, both the variogram and covariance remained close but not identical to that of the sample data, indicating that there is some structure in the data at the large lags that was not effectively incorporated in the simulation. Incorporating the structure that exists over these larger distances could be addresed using a two-stage simulation.
High local spatial variability contributes to the uncertainty that at any given point within that cell the estimate being reported would match what was measured on the ground. Local variability in the data is a reflection of the growth and distribution patterns of the tree species. It affects the uncertainty of the estimate when %ba/acre values vary over shorter distances than the sampling intensity resolves. In this study, every tree species examined in Pennsylvania exhibited more local spatial variability than was resolved by FIA plots averaging 2.5 km apart. This unexplained variation was reflected in the variogram as a sometimes substantial nugget and resulted in higher levels of uncertainty associated with the estimates in this study.
Local variability also adds uncertainty to any area estimate or summary. This is an additional factor in the interpolated datasets, because a point estimate is being assigned to represent an entire 3 x 3 km cell. Spatial datasets, like non-spatial classifications, are themselves abstractions or generalizations of some spatial variation that is really there on the ground. Goodchild and Gopal (1989) describe two familiar examples: ``The area labeled `soil type A' on a map of soils is not in reality all type A, and its boundaries are not sharp breaks but transition zones. Similarly, the area labeled `population density 1,000-2,000 sq. km.' does not in fact have between 1,000 and 2,000 in every square kilometer, or between 10 and 20 in every hectare, since the spatial distribution of a population is punctiform and can only be approximated by a smooth surface.'' But rarely can we, or do we want to, estimate down to a level of spatial detail such that each area is completely homogeneous, even among the few variables we may be interested in. The limits of the data, including sampling design and intensity, typically limit the level of resolution possible by interpolation. The inherent local variability of the phenomena has a large affect on the uncertainty of the predicted value at each location.
The variogram model describing the spatial relationship between neighboring locations is the critical element of any spatial estimation. The model is designed to match closely the spatial relationship observed in the sample data (e.g. the structure and dependence observed in the sample variogram), paying particular attention to those distances, usually the shorter ones, that are used in the estimation. However, when subpopulations of a species have a significantly different pattern of spatial distribution, calculating a single variogram for the entire area in effect captures more than one population within that single variogram and results in a model that is probably appropriate to neither population. In such situations, treating the populations separately in the modeling and estimation will certainly improve final estimates (Riemann Hershey 1997).
In this study, the %ba/acre estimates of four species were further refined. Sugar maple and the two rare species, white pine and yellow birch, were divided into several different populations by region, and the process of variogram modeling and estimation was repeated for each region. With white pine and yellow birch, the regions were designed to target more specifically those areas in which the species was more dominant. With sugar maple, ecological subregions, at the section level, were used as the default regions for identifying potentially different populations of each species (McNab and Avers 1984). In all three situations, the regionalized variograms were substantially different from the average variograms, suggesting that some improvement in the model should be possible by using this local spatial information. When sgCS was performed using the locally tuned models, it revealed subtle differences in the simulated results for yellow birch and white pine, and more substantial differences in the results for sugar maple, including a reduction in the uncertainty. Running a separate indicator model and kriging on the two regions also modified the final map, however the effect also subtle. The fourth species, hemlock, was refined based primarily on the suspicion from historical information that small and large stand-size classes may have different spatial distribution patterns. When variograms were calculated separately for these two populations (using a cutoff of 45 years), they were indeed substantially different in shape, sill, nugget, and range (Figure 8).
What the map is going to be used for strongly affects what is considered acceptable ``error'', and what types of error are most and least tolerable. As described above, one advantage of both the probability output of indicator kriging and the modeled distribution output of sgCS is that the information is available to create maps with objective and desired error clearly in mind.
The estimated datasets output by indicator kriging and sgCS have the potential to be very useful. Every species examined, with the exception of red maple, exhibited substantial spatial dependence in the variograms of the normal-scored data, suggesting that there is considerable benefit to incorporating that spatial structure in the estimation of species occurrence and importance from FIA data. Each species exhibited some variety in spatial patterns and structure as well as the amount of spatial dependence, and each may require different levels of additional fine-tuning, depending upon the objectives of the specific analysis and the time and expertise available. Many of the species examined for regional differences did exhibit such differences in the variograms, however the effect on the final simulated results did vary. Rare species, including those that occur on less than 25% of the sample plots, are modeled more effectively by sgCS than any method of local area averaging.
These geostatistical techniques make explicit the uncertainties associated with an estimate in a form that can be incorporated when the data are used. This feature adds considerable utility and flexibility in the use of the resulting estimates, as the risk of errors of commission or omission can be specifically determined and manipulated to fit the current objectives. Recognizing the assumptions, limitations, and choice of errors in a map instead of blindly accepting the output as final allows the user to apply an estimated dataset much more effectively and with a better understanding of its logical capabilities and precision.
Maintaining individual species information separately allows considerable flexibility in the use of species distribution data. Instead of being limited to previously defined fixed classes, forest cover types can be uniquely defined to capture more accurately the habitat required for a particular study. The potential also exists to use one or more of the species datasets as a decision layer in the interpretation of satellite imagery. The two datasets offer complementary information about the species composition that really exists on the ground.
The techniques used in this study are not extremely time-consuming nor difficult to process, and can be easily extended to additional species, states, and variables. There is a high level of variance associated with these estimates of %ba/acre--in many locations this variance can be as much as the estimate itself. Nevertheless, the dataset provides a very descriptive picture of species distribution at the state level. In comparison to previous depictions of current species distribution from FIA data by summarizing at the county level, this method provides a much more detailed picture of species occurrence and distribution. Although estimates would probably be improved and variances diminished by additional investigation into and/or sampling of each species, the current estimates are informative and provide a useful basis from which to proceed.
One advantage to making the accuracy and uncertainty of a modeled estimate explicit is that it provides substantial information for effectively improving the estimate should the objective demand and should time and money become available to do so. As was observed with sugar maple, white pine, yellow birch, and hemlock, it may be possible to significantly improve the estimates of %ba/acre by refining the analysis. When subpopulations of a species have a significantly different pattern of spatial distribution, treating the populations separately in the interpolation will improve the final estimates. These populations may be described by regional land features or by some other defining characteristic (e.g., stand age for hemlock). For sugar maple, dividing the state into several broad ecological regions made a significant difference in the calculated variogram and thus in the final simulated estimates. Another important clue is previous knowledge about the species--e.g. that different ecological regions may be causing distinct spatial distribution patterns, or that different size class or age populations may have different spatial distribution patterns over the landscape. Hemlock is an example of the latter. As a result of past management practices that involved heavy harvesting of large hemlock for the tanning industry, today there are often relics of large individuals among a relatively wider distribution of smaller, younger trees that have grown up in the interim (Hough and Forbes 1943, Powell and Considine 1982).
These datasets of individual species distribution do not contain any of the fine-scale forest/nonforest detail. If such information is desired, more detailed datasets describing the forest/nonforest land cover in Pennsylvania would have to be derived from a more intense point sample or the continuous but averaged data available from satellite imagery (e.g. Zhu 1992). Such detailed datasets are then used as a `mask' overlaid on any of the datasets of species distribution to provide a more realistic picture of where individual species currently occur amidst the more spatially-detailed mosaic of forest/nonforest landcover (Figure 9).
Tree species exhibit more than just spatial dependence; they may also exhibit some correlation with particular soils or topography or with reflectance data from satellite imagery. Based on the strength of that correlation, such data can be incorporated into several of the geostatistical techniques as ancillary or `soft' information to improve the estimation of an individual species. Tree species also exhibit joint distribution relationships with other tree species. This fact is an important feature of habitat classifications, yet these relationships are not necessarily maintained when each species is estimated separately. Other techniques, such as imputation (e.g. Moeur and Stage 1995) are designed to maintain the joint distribution structure of variables, but do not take full account of the spatial structure. Investigation is currently underway to identify techniques that would better maintain both the spatial structure and the joint distribution/attribute structure of the sample data, such as cosimulation, indicator simulation, or an explicit combination of several techniques.
Alerich, C.L., Forest Statistics for Pennsylvania--1978 and 1989. Resource Bulletin NE-126. USDA Forest Service, Northeastern Forest Experiment Station. Radnor, PA, 1993, p. 244.
Deutsch, C.V. and A.G. Journel, GSLIB: Geostatistical Software Library and User's Guide. Oxford University Press, New York, 1992, p. 340.
Goodchild, M. and S. Gopal, Preface. pp. xi-xv. In: Accuracy of Spatial Databases. Goodchild, M. and S. Gopal (eds). Taylor & Francis, New York. 1989, p. 290.
Hansen M.H., T. Frieswyk, J.F. Glover, and J.F. Kelly, The Eastwide Forest Inventory Data Base: Users Manual. General Technical Report NC-151. USDA Forest Service, North Central Experiment Station, St. Paul, MN, 1992, p. 48.
Hough, A.F. and R.D. Forbes, The Ecology and silvics of forests in the High Plateaus of Pennsylvania. Ecological Monographs.. 13, pp. 299-320, 1943.
Isaaks, E.H. and R.M. Srivastava, An Introduction to Applied Geostatistics. Oxford University Press, New York, 1989, p. 561.
McNab, W.H. and P.E. Avers, Ecological subregions of the United States: section descriptions. Administrative Publication WO-WSA-5. USDA Forest Service, Washington, DC. 1994, p. 267.
Moeur, M., and A.R. Stage, Most Similar Neighbor: an improved sampling inference procedure for natural resource planning. Forest Science. 41(2), pp. 337-359, 1995.
Powell, D.S. and T.J. Considine, An analysis of Pennsylvania's forest resources. Resource Bulletin NE-69. USDA Forest Service, Northeastern Forest Experiment Station, Broomall, PA, 1982, p. 97.
Riemann Hershey, R., M.A. Ramirez, and D.A. Drake, Using geostatistical techniques to map the distribution of tree species from ground inventory data. pp. 187-198. In: Modelling Longitudinal and Spatially Correlated Data: methods, applications, and future directions. Lecture Notes in Statistics vol 122. Gregoire, T.G. et al (eds). Springer-Verlag, New York. 1997, p. 402.
Rossi, R.E., P.W. Borth, and J.J. Tollefson, Stochastic simulation for characterizing ecological spatial patterns and appraising risk. Ecological Applications. 3(4), pp. 719-735, 1993.
Zhu, Z, Advanced very high resolution radiometer data to update forest area change for midsouth states. Res. Pap. SO-270. USDA Forest Service, Southern Forest Experiment Station, New Orleans, LA, 1992, p. 11
Figure captions:
Figure 1. Spatial dependence as demonstrated by the variogram of the raw data (a) and variogram of the normal-scored data (b) for white pine.
Figure 2a-j. Variograms from the normal-scored data and models for each of the 10 species. These models were used in the sgCS.
Figure 3a-j. Indicator variograms and models for each of the 10 species. These models were used in the indicator kriging (IK).
Figure 4. Four modeled datasets describing the distribution of beech in Pennsylvania: a) the estimated probability of occurrence using indicator kriging, b) the 70th percentile from 100 sgCS realizations, c) the minus variation (70th-17th percentile), and d) the plus variation (83rd-70th percentile) about the 70th percentile estimate.
Figure 5. Chestnut oak occurrence as a major (>40% ba/acre), moderate (20-40%), and minor component (<20%) of the total ba/acre. Derived from sgCS estimates using the 75th percentile.
Figure 6. The uncertainty associated with the estimates used in Figure 5, in classes of: acceptable (<= ñ10% ba/acre), moderate (ñ10-25% ba/acre), and unacceptable (>ñ25% ba/acre). Derived from the sgCS estimates.
Figure 7. The occurrence of SM/B type, defined as those areas where the combined %ba/acre of sugar maple and beech amounts to more than 50% of the total ba/acre in the stand, and derived from the 70th percentile esimate from sgCS.
Figure 8. Variograms calculated for specified subpopulations of the sample dataset. Separating subpopulations of a) yellow birch by region and b) hemlock by standsize resulted in distinctly different variograms both in range and sill.
Figure 9. A ``map'' of sugar maple %ba/acre with nonforest areas masked out. The dataset used is the 70th percentile of sgCS, and the nonforest area is derived from AVHRR data (Evans and Zhu 1992).
Table 1. The percent variation explained by the spatial dependence in the variograms.
|
Indicator |
Normal-scores |
|
Species |
variogram |
variogram |
|
Beech |
46 |
76 |
|
Black oak |
37 |
75 |
|
Chestnut oak |
44 |
82 |
|
Hemlock |
57 |
66 |
|
Red maple |
32 |
35 |
|
Red oak |
38 |
64 |
|
Sugar maple |
32 |
97 |
|
White oak |
50 |
79 |
|
White pine |
37 |
84 |
|
Yellow birch |
39 |
81 |
Table 2. Isotropic function and parameters used to define the model used for each species in sgCS.
|
Species |
# structures |
Nugget effect |
Function |
Range (m) |
Component |
|
Beech |
2 |
.24 |
spherical |
20000 |
.09 |
|
gaussian |
160000 |
.67 | |||
|
Black oak |
2 |
.20 |
spherical |
5000 |
.05 |
|
gaussian |
95000 |
.55 | |||
|
Chestnut oak |
2 |
.18 |
spherical |
5000 |
.06 |
|
gaussian |
190000 |
.76 | |||
|
Hemlock |
2 |
.25 |
gaussian |
30000 |
.11 |
|
spherical |
100000 |
.38 | |||
|
Red maple |
2 |
.60 |
spherical |
18000 |
.18 |
|
spherical |
120000 |
.15 | |||
|
Red oak |
2 |
.35 |
exponential |
12000 |
.25 |
|
gaussian |
40000 |
.36 | |||
|
Sugar maple |
3 |
.03 |
exponential |
2100 |
.27 |
|
spherical |
60000 |
.19 | |||
|
spherical |
220000 |
.5 | |||
|
White oak |
2 |
.22 |
spherical |
11000 |
.18 |
|
gaussian |
90000 |
.64 | |||
|
White pine |
3 |
.16 |
gaussian |
21500 |
.48 |
|
spherical |
100000 |
.26 | |||
|
spherical |
140000 |
.1 | |||
|
Yellow birch |
3 |
.15 |
exponential |
6000 |
.03 |
|
gaussian |
19000 |
.25 | |||
|
spherical |
80000 |
.36 |
Table 3. Isotropic function and parameters used to define the models used in the indicator kriging.
|
Species |
# structures |
nugget effect |
function |
range (m) |
component |
|
Beech |
2 |
.11 |
exponential |
20000 |
.04 |
|
spherical |
120000 |
.055 | |||
|
Black oak |
2 |
.10 |
exponential |
20000 |
.02 |
|
spherical |
120000 |
.0375 | |||
|
Chestnut oak |
2 |
.11 |
spherical |
21000 |
.025 |
|
spherical |
120000 |
.06 | |||
|
Hemlock |
2 |
.08 |
exponential |
13500 |
.09 |
|
spherical |
80000 |
.015 | |||
|
Red maple |
2 |
.08 |
exponential |
90000 |
.035 |
|
spherical |
120000 |
.02 | |||
|
Red oak |
2 |
.15 |
exponential |
30000 |
.07 |
|
spherical |
70000 |
.02 | |||
|
Sugar maple |
2 |
.17 |
spherical spherical |
40000 100000 |
.03 .015 |
|
White oak |
2 |
.11 |
spherical |
10000 |
.05 |
|
spherical |
70000 |
.06 | |||
|
White pine |
2 |
.085 |
spherical |
20000 |
.025 |
|
spherical |
120000 |
.025 | |||
|
Yellow birch |
2 |
.08 |
exponential |
20000 |
.05 |
[Back to RSB Page]