to Abstract/Full-text at the ESA site
PREDICTING ABUNDANCE FOR 80 TREE SPECIES FOLLOWING CLIMATE CHANGE IN THE EASTERN UNITED STATES
Louis R. Iverson and
Anantha M. Prasad
USDA Forest Service, Northeastern Research Station,
359 Main Road, Delaware, Ohio 43015 USA
Abstract-- Projected climate warming will potentially have profound effects on the earth's biota, including a large redistribution of tree species. We developed models to evaluate potential shifts for 80 individual tree species in the eastern United States. First, environmental factors associated with current ranges of tree species were assessed using geographic information systems (GIS) in conjunction with regression tree analysis (RTA). The method was then extended to better understand the potential of species to survive and/or migrate under a changed climate. We collected, summarized, and analyzed data for climate, soils, land use, elevation, and species assemblages for more than 2100 counties east of the 100th meridian. Forest Inventory Analysis (FIA) data for more than 100,000 forested plots in the East provided the tree species range and abundance information for the trees. RTA was used to devise prediction rules from current species-environment relationships, which were then used to replicate the current distribution as well as predict the future potential distributions under two scenarios of climate change with 2xCO2. Validation measures prove the utility of the RTA modeling approach for mapping current tree importance values across large areas, leading to increased confidence in the predictions of potential future species distributions.
With our analysis of potential effects, we show that roughly 30 species could expand their range and/or weighted importance at least 10%, while an additional 30 species could decrease by at least 10%, following equilibrium after a changed climate. Depending on the global change scenario used, 4-9 species would potentially move out of the United States to the north. Nearly half of the species assessed (36 out of 80) showed the potential for the ecological optima to shift at least 100 km to the north, including seven that could move over 250 km. Within these potential future distributions, actual species redistributions will be controlled by actual migration rates through fragmented landscapes.
Key words: geographic information systems (GIS), landscape ecology, regression tree analysis (RTA), predictive vegetation mapping, climate change, tree species migration, tree species ranges, tree species distribution, forest inventory, global change, envelope analysis, species-environment relationships
Key phrases: Forest inventory data in the eastern United States; regression tree analysis and GIS; predicting potential future species distributions
The buildup of anthropogenic greenhouse gases are related to a general warming trend on the planet, and current estimates from global circulation models (GCM's) predict 1 to 4.5o C temperature increases as a result of doubling atmospheric CO2, probably by the end of the next century (Kattenberg et al. 1996). This trend suggests a potential to trigger major changes in the earth's living systems, including temperate forests. Many research lines, including this one, are devoted to predict these potential major changes in the earth's biota, especially with respect to the potential necessary migration of plant species.
Paleontological studies of the Holocene, when there was a climatic warming but not nearly at the rate being projected by current global climate change models, show the following points with respect to species migration (Jacobson et al. 1987, Delcourt and Delcourt 1988, Davis and Zabinski 1992, Webb and Bartlein 1992, Malanson 1993): (1) Species did see a shift in their geographical ranges, generally northward; (2) Species responses were individualistic -- the rates and direction of migration differed among taxa. Species assemblages did not remain the same; (3) Species responded to climate change in an equilibrium manner, and, at the continental scale of evaluation, competition and dispersal mechanisms did not seem to play a large role in the responses of species. Migrations were occurring over thousands of years and over a relatively uninterrupted landscape. Under current greenhouse warming scenarios, however, the climate is projected to change at a faster rate, and when combined with today's fragmented landscapes, the importance of competition, dispersal mechanisms, and nonequilibrium responses may be critical in the final species assemblages. Moreover, individual species models are necessary to account for the individualistic responses expected under climate change.
With the advance of geographic information systems (GIS) and the concomitant plethora of data available for landscapes, several techniques can now be used to predict the geographic distribution of vegetation from mapped environmental variables (Franklin 1995). For continuous data, these include regression models, general linear models, general additive models, and regression tree models. Especially important drivers to vegetation models are climatic, edaphic, and topographic variables. At local scales, modeling of vegetation pattern depends largely on local variations of topography and geomorphology (e.g., Reed et al. 1993, Ertsen et al. 1995, Iverson et al. 1996a, 1997a). At regional scales, overall vegetation patterns have been assumed to depend more on general climatic patterns (e.g., Booth 1990, Prentice et al. 1992, Woodward 1992, Box et al. 1993, Neilson 1995). To some degree at least, we evaluate both in this study.
Various GCM's are being built to simulate the future climate system, which can then be used to predict possible changes in the earth's forest biota. Several approaches are being taken at a number of scales and locations. Because such models cannot be truly validated (Rastetter 1996), multiple avenues of research are encouraged to look for convergence. Much discussion centers on the various approaches and the comparisons among them (e.g., McGuire et al. 1993, Hobbs 1994, Melillo et al. 1995). Several models assess changes in global biomes (e.g., Prentice et al. 1992, Woodward 1992, Neilson and Marks 1994), including recent models that couple vegetation effects directly into the global circulation models so that feedbacks are continuously incorporated into the outcome (e.g., Foley et al. 1996), while others assess a selected number of species over a regional scale (e.g., Bonan and Sirois 1992, Mackey and Sims 1993, Burton and Cumming 1995, Dyer 1995, Hughes et al. 1996, Starfield and Chapin 1996). For example, Huntley et al. (1995) used a three-way climate response surface to model the present and future distributions of eight species in Europe.
However, there have been few or no studies which model the entire suite of major tree species, individually, across their continental ranges. Such an approach is the only way to begin to identify the possible new outcomes in species assemblages after climate change. Most of the above referenced studies also deal only with range shifts, not with projected shifts in species importance. In this study, we empirically model each of 80 species across the eastern United States, so that the different habitat requirements among species are accommodated by the models. We statistically evaluate, using regression tree analysis (RTA), the relationship of 34 environmental variables to tree importance values in the eastern United States, and then use the derived relationships to predict their present and potential future ranges and importance values. Our approach is unique in the sense that it is a multiple species effort at a continental scale. Our study should be evaluated as a macro-scale study involving multiple species with its underlying strengths and limitations.
Data base generation Data were extracted from several sources for land east of the 100th meridian (Fig. 1). The county was chosen as the mapping unit because it is the reporting unit for many sources of data and, for the most part except for some northern counties, has roughly the same area across the study region. The Modifiable Area Unit Problem, where spatial aggregations can lead to problems with scaling and zoning (Jelinski and Wu 1996), was an issue, though it was minimized by using percentages and weighted averages as input variables.We evaluated over 100 environmental/land use/socioeconomic variables for each of nearly 2500 counties in the East. Because of missing tree information on some counties, only 2124 counties were finally mapped. The final number of variables was reduced to 34 (Table 1) by (a) removing highly redundant variables as deduced via correlation analysis; (b) dropping variables selected by experts as excessively difficult to interpret; and (c) scoring 65 variables for their value in an earlier RTA run on all species and dropping the variables of lower importance (Table 1).
Tree ranges and importance values.-- The USDA Forest Service periodically determines the extent, condition, and volume of timber, growth, and removals of the Nation's forest land by the work of six Forest Service Forest Inventory and Analysis (FIA) units. Four FIA units produced a data base of standard format called the Eastwide Data Base (EWDB) for the 37 states from North Dakota to Texas and east. These data are stored in three record types (Hansen et al. 1992): county data, plot data, and tree data. Plot locations are not precisely located but county location was provided for each plot. We used the data from more than 100,000 plots and nearly 3 million trees to summarize the desired county-level information needed for this study. We evaluated 196 species of trees from the EWDB, but as described later, only 80 species were modeled due to sample restrictions.
We first summarized the information for individual forested plots. Tree records represented observations of seedlings, saplings, and overstory trees. Tree species, tree status, and diameter at breast height were combined with information on plot size to compute a single summary record for each plot. This record contained, by species, estimates of the average number of stems and total basal area of understory and overstory trees per unit area. From this information, we generated importance values (IV) for each species as follows:
For each species, average IV per county was calculated by aggregating plot level information. These values were linked with a county coverage of the United States (ESRI 1992) for mapping into density slices of IV. With these maps, biogeographical characteristics (such as absolute and optimum range) of the species can be visualized. Further details on the methodology, along with a table summarizing the number of plots, number of species, percent forest, and dates of the inventories by state, are given in Iverson et al. (1996b).
Climatic factors. -- Monthly means (averaged from 1948-1987) of precipitation, temperature, and potential evapotranspiration for the current climate were extracted from a data base generated by the USEPA (1993). The data had been interpolated into 10 x 10 km grid cells for the conterminous United States. From these data, we also calculated annual means of each of the above variables, and we derived two attributes based on their physiological importance to tree growth for this region: July-August ratio of precipitation to potential evapotranspiration (PET) (the time most prone to drought stress in the eastern U.S.), and May-September (i.e., growing season) mean temperature. The data were then transformed to county averages via weighted averaging, based on county area.
Two scenarios of future monthly temperature and precipitation under an equilibrium state of 2 X CO2 were used for predictions of potential future species distributions: the Geophysical Fluid Dynamics Laboratory (GFDL) (Wetherald and Manabe 1988) and Goodard Institute of Space Studies (GISS) (Hansen et al. 1988) models, which depict a divergent set of possible outcomes. These GCM output data were prepared by USEPA (1993) into future equilibrium estimates, by 10 x 10 km grid, for monthly precipitation, temperature, and potential evapotranspiration. These variables, along with the derived variables mentioned above, were substituted into the data base for predicting potential future species distributions.
Soil factors.-- The State Soil Geographic Data Base (STATSGO) was developed by the USDA Soil Conservation Service (now Natural Resource Conservation Service) to help achieve their mandate to collect, store, maintain, and distribute soil-survey information for U.S. lands. STATSGO data contain physical and chemical soil properties for about 18,000 soil series recognized in the Nation (Soil Conservation Service 1991). STATSGO maps were compiled by generalizing more detailed soil-survey maps into soil associations in a scale (1: 250,000) more appropriate for regional analysis.
Preprocessing was necessary before maps of suitable attributes could be produced for our study. We selected 14 variables related to tree species habitat: pH, available water capacity, organic matter, permeability, bulk density, cation exchange capacity, depth to bedrock, T factor (water erodibility), K factor (wind erodibility), slope, potential soil productivity, and several variables related to texture (e.g., percent clay, percent coarse fragments, percent volume of soil flowing through screens with meshes of various sizes). Weighted averages by depth and by area had to be calculated for county estimates of the soil variables, as detailed in Iverson et al. (1996b).
Additional soil information was obtained from the GEOECOLOGY databases (Olson et al. 1980), including percentage of the county in each of five soil orders (Table 1).
Land use/cover factors.-- GEOECOLOGY (Olson et al. 1980) data were used for estimations of percent forest, urban, crop, disturbed, and grazing/pasture land (Table 1). These estimates originated from the USDA Soil Conservation Service's National Resources Inventory for 1977.
Elevation.-- Maximum, minimum, and variation of elevation were derived for each county from 1:250,000 U.S. Geological Survey (USGS) Digital Elevation Model (DEM) files obtained from the USGS internet site (US Geological Survey 1987).
Landscape pattern.-- The 1-km AVHRR forest cover map (USDA Forest Service 1993) was used to generate statistics on forest-cover pattern by county. Several landscape pattern indices were calculated using FRAGSTATS (McGarigal and Marks 1995), but only edge density was used in the final analysis.
Regression tree analysis
We use regression tree analysis (RTA, also known as classification and regression trees, or CART), to decipher the relationships between environmental factors and species distribution. The methods can either predict classes (classification trees) or average values (regression trees), depending on the nature of the response variable. Developed by Breiman et al. (1984), and used first in the medical field, the methods have only been used in ecological studies since about 1987 (Verbyla 1987). Not incidentally, their use has coincided with the use of geographic information systems, which allow model outputs to be readily mapped across landscapes. Since then, they have been used primarily in classification (e.g., Borchert et al. 1989, Moore et al. 1991, Lees and Ritman 1991, Baker 1993, Lynn et al. 1995), though examples of the ecological use of regression trees are found in Davis et al. (1990) and Michaelsen et al. (1994)
RTA is a recursive data partitioning algorithm that initially splits the data set into two subsets based on a single best predictor variable (the variable that minimizes the variance in the response). It then does the same on each of the subsets and so on recursively. The output is a tree with branches and terminal nodes. The predicted value at each terminal node is the average at that node, which is relatively homogeneous (Chambers and Hastie 1993). RTA is therefore much more flexible in uncovering structure in data that have variables that may be hierarchical, non-linear, non-additive, or categorical in nature. RTA is rapidly gaining popularity as a means of devising prediction rules for rapid and repeated evaluation, as a screening method for variables, as a diagnostic technique to assess the adequacy of linear models, and for summarizing large multivariate data sets (Clark and Pregibon 1992).
There are several key advantages for using RTA in our application, which covers such a wide spatial domain, over classical statistical methods (Verbyla 1987, Clark and Pregibon 1992, Breiman et al 1984, Michaelsen et al. 1994): (1) adeptness of the RTA to capture nonadditive behavior, where relationships between the response variable and some predictor variables are conditional on the values of other predictors. RTA deals with interaction effects by subsetting data without specifying the interaction terms in advance of the statistical analysis (as is necessary in multiple linear regression). For example, in our study, the factors associated with the northern range limits for a species may be quite different from the factors regulating the southern limit of the species; (2) This advantage allows, in effect, a stratification of the country so that some variables may be most related to the IV of species A for a particular region of the country, but that a different set of variables may drive its importance elsewhere; (3) Numerical and categorical variables can easily be used together and interpreted, because the RTA essentially converts continuous data into two categories at each node. The outcome is a set of step functions that provides a better capturing of non-linear relationships, while also providing a reasonable solution for linear relationships; (4) The variables that operate at large scales are used for splitting criteria early in the model, while variables that influence the response variable locally are used in decision rules near the terminal nodes (Moore et al. 1991). Thus we could expect that broad climatic patterns are captured higher up on the tree while more local effects (soil, elevation, etc.) determine more local distributional variations. It should, however, be recognized that since our data set is aggregated to a county level scale, RTA cannot capture the environmental drivers that operate on species at a very fine scale (e.g., individual slopes or valley bottoms).
There are limitations of RTA, however. While RTA may be the most appropriate tool for analyzing data sets with many possibly interacting predictor variables and for separating macro-level and micro-level effects on the response, the response ideally should be approximately normal, since least squares and the group mean are used in deriving the best split (Clark and Pregibon 1992). Even though the IV response in our example more closely resembles a Poisson distribution due to the high frequency of zeros, the effect of long tails is much more confined than in a linear regression model as the regression is local (B.D. Ripley, personal communication). Also, since we are mainly interested in variables that are driving the distribution geographically, we believe this is a reasonable approach as transforming the response would complicate the interpretability of the results. We are taking a further step and predicting the effect of a changed climate based on the model. While there is danger of extrapolating the results beyond the model's predictive ability for some species, it does provide a reasonable estimate of the species' potential migration under changed climatic conditions, whose preferences we know a priori.
Prediction of current species importance values.-- Regression trees were generated in S-PLUS (Statistical Sciences 1993) for each tree species. Species importance value (IV, based on basal area and number of stems) was the response variable, with the 34 predictor variables (see Table 1). The regression trees were generated on a split data set -- a random selection of 80% of the data -- so that 20% of the data remained for validation efforts. RTA models were generated after pruning the full tree to 12-20 nodes, depending on the rate of change of deviance explained. When the additional deviance explained was small for added nodes, the RTA model building was stopped. The resulting model was used to generate predictions of IV of each species for each county. While this pruning effort carried some subjectivity, it is the most reasonable approach while evaluating 80 tree species - picking nodes through cross validation by resampling is not without some major limitations (Venables and Ripley 1994).
The response predicted by RTA for zero values of IV was almost always a fraction less than one. Through testing across all species, we determined that predicted IV scores < 1.0 were essentially zero and were set as such. Predicted IV scores between 1.0 and 3.0 were classed 'fuzzy' in that we were not confident in the prediction of these low IV scores. The predictions of IV classes were then output to Arc/Info for mapping, using Unix tools and Arc/Info's Arc Macro Language (ESRI 1993).
In addition to the RTA models using all the 34 predictors, we built models using only climatic variables (variables 28-34 in Table 1). This analysis was done to compare prediction effectiveness with that using edaphic and land use variables.
For several species, maps were also generated that depict the counties that fall along particular branches of the RTA tree. In this way, the environmental variables most responsible for the predicted IV are shown geographically. This is a major strength of the RTA wherein we can depict counties that fall along particular branches of the RTA in a map and show which variables are driving the distribution of the species geographically.
Validation/verification of the RTA outputs.-- To evaluate the model outputs, a comparison between predicted current and actual (FIA) distributions was made using correlation, verification, and validation processes. Scattergrams and correlations of IV were calculated for each species. Verification, to assess how well the models recreate the distributions with the entire data set, was performed by assessing for each species the number and proportion of counties found in (a) both the actual and predicted scenarios, (b) only the actual, and (c) only the predicted. This allowed calculation of the percent correct assignment [(a/a+b+c) * 100, hereafter referred to as classification accuracy], and also the calculation of omission errors, where the model failed to predict the presence of the target species even though it was recorded in the FIA data [(a/a+b) * 100, hereafter referred to as omission accuracy]. We were especially interested in the omission accuracy. The error was deemed less serious if the model predicted a county with a species that was not recorded because it is quite possible that the FIA sampling may have missed it. FIA attempts to place a plot for about every 2000 ha of forest land; uncommon or habitat-specific species could easily be missed by the plots. A validation comparison was made by using a randomly selected portion of the counties (80%) to predict the IV of the remaining 20% (n=424). Similar calculations of error were then made on this 20% data set.
Prediction of future species importance values.-- Once the regression trees were generated, they could be used to generate not only predictive maps of current distributions, but also potential future distributions under a scenario of a changed climate. We did this by replacing the climate-related variables in our predictor variable set with those based on the GFDL and GISS models. The replaced variables were (see Table 1 for the description): MAYSEPT, JARPPET, JANT, JULT, AVGT, PET, and PPT. The previously established regression trees then were used with the new predictive variables, and the data output to Arc/Info as before.
Assessment of potential species changes.-- We devised several metrics to assess the potential species shifts as a result of the modeled changes in IV of each species. The first metric, change in area, is the difference, in percent, between the predicted current areal extent and the predicted potential future areal extent under the two climate change scenarios. So, a score of 100 means no change, scores less than 100 represent a contraction of the range, and scores greater than 100 are an expansion of the species range. Because of relatively large uncertainty in the 'fuzzy' class of predicted IV (1.0-3.0), we took a conservative approach and calculated this metric only for counties with IV > 3.0.
The second metric, change in weighted average of IV, is a weighted (by area of counties) average of IV for all counties calculated for predicted current vs. the two climate change scenarios; as above, a score below 100 indicates the overall importance of the species will decline (though areal extent may or may not decline). In this case, we included the 'fuzzy' class of predicted IV in the calculation since county areas with low IV would not contribute much to the overall score.
The third metric is the change in northern and southern range limits. Because we were not interested in spurious shifts out on the tails of the distribution, we statistically identified the range limits for predicted current and future ranges. The country was subdivided into a grid cell network of 10 x 10 km cells; a total of 277 rows, or latitudinal strips, was generated (Fig.1). The relative IV were calculated for each row (total IV/total area in row), and box plots were generated, in S-PLUS (Statistical Sciences, 1993), for the summed IV for each species. From this distribution, the IV of the first quartile was identified; the first and last row where this IV was exceeded was deemed the northern and southern limit of the distribution, respectively. Shifts in the northern or southern limits were simply calculated by taking the difference between current and projected future limits.
The fourth metric is the change in ecological optimum. The mean of the interquartile range was deemed the 'optimum' latitude range for the species. Both of the latter metrics used only those areas with predicted IV values > 3.0; the 'fuzzy' class was not included.
Model assumptions and limitations
Here we attempt to state our assumptions explicitly so that progress can continue as uncertainties are reduced. They fall into several categories.
Global change scenarios.-- We described earlier the GCM model outputs of projected future climate used in our model. Uncertainty remains in projected climate change under a doubled CO2 scenario, especially as it plays out spatially across the continent. The impacts of this uncertainty on our results is reduced by using two scenarios, but changes in estimates of future climate will continue to occur. Data can be reanalyzed when updated, fine-scaled predictions of future climate become available.
Data inputs.-- Any time multiple GIS layers from disparate sources and scales are overlaid, errors will propagate through the data (Goodchild and Gopal 1989). This impact is minimized in this study by using a large sampling unit, the county, as the common spatial unit. On the other hand, some factors have their scale of impact at an area much smaller than the county; i.e., some counties are very diverse, and some important ecological factors could be averaged out at the scale of a county. For example, small zones of high elevation, bottomlands, or unique soils, which harbor specialized flora, may be lost in the averaging process. There is also error associated with the FIA sampling of trees; occasionally species that do in fact reside in the county will be missed by the FIA sampling plots.
Biology.-- The method described here does not account for changes in physiological and species-interaction effects in the model outputs. Therefore, there is no way to assess changes in competition among the 'new' species mix, nor is there a way to account for whatever gains in water-use efficiency may accompany elevated CO2 (Neilson 1995). In a criticism of model-based assessments of climate change effects on forests, Loehle and LeBlanc (1996) note that many forest simulation models assume that tree species occur in all environments where it is possible for them to survive, and that they cannot survive outside the climatic conditions of their current range (fundamental vs. realized niche). This pattern may be especially problematic if confined to a small number of climate variables such as growing degree days or precipitation. The RTA models here are also susceptible to this criticism, though reduced by considering a wide range of variables and the fact that we are only trying to evaluate potential range changes due to climate change. These models assume equilibrium conditions, and that there are no barriers to migration.
Analysis.-- As mentioned previously, RTA does carry limitations, and spurious or non causative relationships will appear, especially when RTA methodology is applied to 80 species without fine-tuning for individual species preferences. We realize that improvement of models would be possible for many of the species, if individual characteristics and spatial trends are taken into account.
RESULTS AND DISCUSSION
Regression trees were generated for each species; an example for quaking aspen (Populus tremuloides) is shown in Fig. 2. At the terminal ends of each branch, a predicted IV is presented. In this example, the relatively large importance of mean annual temperature is apparent, for the distance between nodes is in proportion to the variance explained by the variable. The highest predicted importance occurs on the left branch, where a maximum IV of 91 was obtained if the mean annual temperature was less than 4.4o C, the coefficient of variation of elevation in the county was less than 17.8% (i.e., the county is homogeneously rather flat), the average potential evapotranspiration was less than 60.0 mm/month, and the percent soil which can pass a number 10 sieve was greater than 92.3% (i.e., the soils are primarily sands and finer material). Only eight counties matched these criteria. On the other branch, the lowest IV, 0.14, was found for counties that had an average temperature above 8.12o C; this condition occurs in 1704 counties. Because we used a cutoff of 1.0 for a species to be considered present, none of the 1704 counties was predicted to have quaking aspen.
The distribution of variables most responsible for the predicted IV for three species shows the additional value of the RTA in that different variables operate in controlling the distribution (Fig. 3). For example with southern red oak (Quercus falcata var. falcata), there is essentially no representation of the species in 1222 counties with an average January temperature below 0.34oC. In this example, the maximum IV (8.0) is found in 94 counties that have: a January temperature >0.34oC, an average slope in the county exceeding 2.3%, average total water holding capacity exceeding 19.5 cm, and a maximum elevation > 160 m (Fig. 3). The northern band of southern red oak presence is found where January temperature is less than 3.2oC, indicating that temperature may be the key factor regulating the northern boundary of the species.
For Virginia pine (Pinus virginiana), pH is the first controlling variable, with the species found only where pH < 4.35, except for 26 counties with higher pH but with average slopes exceeding 21% (Fig. 3). If the pH criterion is satisfied and January temperatures < 0.7o C, then the species is found (IV > 1) only in counties where slopes > 20.8%. This could be a situation where the species can survive colder winters in protected steep valleys or on more southerly exposures. Virginia pine prefers warmer temperatures in January (>0.7o C) but relatively cooler temperatures in July (<25.1o C), unless the maximum elevation in the county exceeds 889 m (Fig. 3). The importance of the species is thus distributed spatially by soil and climate factors.
American beech (Fagus grandifolia) reaches its maximum IVs when potential evapotranspiration (PET) is less than 46.5 mm/month and when maximum elevation exceeds 663 m, conditions met by 76 counties in the northern Appalachians. For lower elevations in the north, beech can be prevalent if precipitation exceeds 850 mm. For counties with higher PET, beech is mostly found in the high sloping regions of the southern Appalachians (Fig. 3). In each of the examples and, for most all species, it was a combination of climate and edaphic variables needed for the best RTA models. Edaphic factors have been ignored in many previous studies using a 'climate envelope' approach, and was one of the criticisms by and Loehle and LeBlanc (1996).
Prediction of current species distributions
Tree models work best if sufficient samples are available (Moore et al. 1991, Clark and Pregibon 1992). Analysis of the verification data set revealed there was a significant correlation between the number of counties for which the target species was recorded in the FIA data and the classification accuracy (Table 2). Those species with larger samples tend to have increased model success based on correlations between predicted and actual IV. Rarer species also tend to have more specific habitat requirements that would be difficult for the coarse, county-level data to capture. Some not-so-rare species with more specific habitat requirements also fall into this category. It should be borne in mind that the model was not fine-tuned for individual species preferences as we were interested in macro-scale comparison among several species. Also, the added benefit of the analysis is that it highlights species that require more individual attention and/or finer resolution data. In light of the above considerations, only those species with a recorded minimum IV of 3.0 in each of at least 100 counties (out of a possible 2124, or 4.7% of the counties) are included in the reporting here. A total of 80 species matches this criterion (Table 2).
Predicted current distributions match FIA data reasonably well for most species. For example, for quaking aspen (Populus tremuloides, Fig. 4), the FIA data recorded the species in a total of 338 counties. Total classification accuracy was 71% on the entire data set, and 67% on the 20% validation data set. If only considering error of omission (the RTA model predicted no target species when the FIA sampling recorded it), the accuracies are 81 and 77%, respectively (Table 2). The correlation of IVs (for counties with IV at least 3.0) between predicted and FIA data is 0.81. The species is a generalist, being the most widely dispersed species in the United States, and can grow on a wide variety of soil types (Elias 1980, Perala 1990). The county-level of resolution is, therefore, adequate to capture the major environmental variables driving its distribution, and conditions represented by county averages are adequate to model the species.
For flowering dogwood (Cornus florida) (Fig. 5), the model has classification accuracies of 73% and 65% for the full and 20% data set, and omission accuracies of 82% and 76%, respectively (Table 2). This species too is a generalist, but favors lighter soils with good drainage (McLemore 1990).
For all 80 species, classification accuracies ranged widely, from 19% for black willow (Salix nigra) to 80% for sweetgum (Liquidambar styraciflua); the average was 52% in the full verification data set (Table 2). The omission accuracies were somewhat better, ranging from 20 to 96%, with an average accuracy of 63%. In general, the more specialized the species is with respect to edaphic conditions, the less accuracy in the RTA model predictions. In fact, in nearly all situations where the accuracy was less than about 40%, the species is known to be a habitat specialist, preferring bottomlands, alkaline areas, or disturbed systems. These are not attributes readily captured in the coarse-resolution data set used here. Several poorly classified species, including, Morus rubra, Nysssa aquatica, Platanus occidentalis, Populus grandidentata, Salix nigra, and Taxodium distichum, are all bottomland species. Bottomland habitats would occupy a small minority of most counties, and county-resolution data would not be expected to consistently capture the appropriate information in the RTA process. Lower classification accuracies were also apparent in taxa that were reported at the genus level (for example, Salix sp. and Ulmus sp., Table 2). Because these taxa include a number of different species, with varying habitat requirements, the RTA would be less likely to uncover precise trends. An exception is Carya sp., in which several species of hickory have similar habitat requirements.
Species that showed high classification and omission accuracies (greater than about 70% classification accuracy or 80% omission accuracy), on the other hand, tended to be capable of occupying a large variety of habitats within their range of distribution. For example, Acer rubrum is one of the most common species in eastern North America (Elias 1980, Golet et al. 1993). Liquidambar styraciflua, Nyssa sylvatica, Pinus taeda, Quercus alba, and Q. rubra are also generalist species, occupying a wide variety of sites within their ranges (Elias 1980). County-level environmental data are much more likely to represent conditions for these species than the specialist species.
When the RTA analysis using only climatic variables as inputs was compared to the above-described results, we found the inclusion of edaphic and land-use variables to be vitally important for many species. For example the climate-only model failed for Taxodium distichum, whose distribution is mainly driven by elevation and permeability. For 70 of the 80 species, the correlation between actual and predicted IV was improved with the addition of the extra variables. In addition, when using only climate variables, one could cause the resultant model to be over-driven under changed climate scenarios. These results indicates the importance of assessing edaphic barriers and the like in the prediction of species shifts following climate change.
Prediction of potential future species importance and area
Predicted changes in potential species importance by county
Predicted changes in major species importance for any county can be generated with the RTA outputs (Table 4). For example in Vinton County in southern Ohio, the GISS climate projections show that, among 15 species projected to decline, Acer saccharum and Prunus serotina would decline sharply while Oxydendrum arboreum and Quercus alba would increase in importance. An additional 14 species are projected to change very little (less than one IV unit, on average) under the changed climate.
Predicted changes in potential species boundaries and optima
Estimates of southern limit, northern limit, and zone of ecological optimum, along with the estimated IV at the ecological optimum, were calculated using quartile distributions (Table 5). Though this statistical method of calculating limits is flawed for some species by outlier counties having predicted distributions according to the models, it gives us a metric for quantifying potential migration for most species. Because each number corresponds to a latitudinal strip of 10 km starting from the southern tip of Florida, we can begin to estimate potential shifts in the northern and southern limits. According to this analysis, four species (Betula papyrifera, Pinus resinosa, Populus grandidentata, and Thuja occidentalis) are projected by the GISS GCM model to have their southern optimum move north of the United States border, while five additional species (Abies balsamea, Acer saccharum, Betula alleghaniensis, Populus tremuloides, and Prunus serotina) are projected to disappear according to the GFDL model. Shifts of the southern optima are also apparent for several species which do stay in the United States: Acer pensylvanicum, Betula lenta, Carya glabra, Celtis occidentalis, Cercis canadensis, Cornus florida, Crataegus sp., Diospyros virginiana, Oxydendrum arboreum, Pinus virginiana, Quercus muehlenbergii, Q. palustris, Tilia americana and Tsuga canadensis show a sizeable (>100 km) northward shift in their southern boundaries (Table 5). Of course, because of tree longevity and remnant refugia, it would take centuries for these shifts to be realized (Loehle and LeBlanc 1996).
For a large portion of the species, the northern optimum abuts the Canadian border so an estimate of potential migration is not possible (row number 208 for northern optimum in Table 5). However, for the species where the northern optimum is projected to stay south of the Canadian border, sizeable shifts in northern optima are apparent. For example, the following species are projected to shift their northern limit by at least 100 km to the north: Celtis laevigata, Cercis canadensis, Cornus florida, Diospyros virginiana, Liquidambar styraciflua, Nyssa sylvatica, Oxydendrum arboreum, Pinus echinata, P. elliottii, P. palustris, P. taeda, Quercus alba, Q. coccinea, Q. laurifolia, Q. marilandica, Q. muehlenbergii, Q. nigra, Q. phellos, Q. stellata, and Ulmus alata. Many of these species do not concomitantly shift their southern limits northward. Differing environmental factors are related to the range boundaries on the north vs. south. This geographic shift in variable importance shows the power of the RTA analysis: for general linear or general additive methods, the variables have to operate equally everywhere.
Because calculations of the ecological optimum latitude are not as prone to outliers and boundary effects, they may provide the best indicator of projected species movement (Table 5). A total of 36 species are projected to have their zone of maximum IV migrate at least 100 km north. Of these, seven are estimated to move in excess of 250 km: Cercis canadensis, Prunus serotina, Quercus coccinea, Q. marilandica, Q. stellata, Tsuga canadensis, and Ulmus alata. Interestingly, five taxa are projected to have their ecological optima shift south: Carpinus caroliniana, Fagus grandifolia, Juniperus virginiana, Magnolia virginiana, and Sassafras albidum. In these cases, there seems to be a relative constriction of range to the higher elevations of the southern Appalachians.
Estimated IV for the ecological optima are also presented for current and future projections (Table 5). Some species are projected to significantly increase in importance at their ecological optima, including Celtis occidentalis, Gleditsia triacanthos, Juglans nigra, Magnolia virginiana, Nyssa biflora, Pinus elliottii, Pinus strobus, Quercus stellata, Taxodium distichum, Tilia americana, and Ulmus alata. On the other hand, significant decreases in importance are projected for several species, including Acer rubrum, Liquidambar styraciflua, and Quercus macrocarpa.
RTA was used here to predict potential migrations of trees under a 2xCO2 climate scenario. We emphasize that potential future ranges presented here do not represent forecasts, but rather an indication of the potential impact on species distributions. With our analysis of potential effects, we show that 27-36 species would significantly increase in area and/or weighted IV, while an additional 30-34 species would decrease by at least 10%, following equilibrium after a changed climate. Nearly half of the species assessed (36 out of 80) showed the potential for the ecological optima to shift at least 100 km to the north, including seven that could move over 250 km. Depending on the GCM used, 4-9 species would potentially move out of the United States.
Historic rates of migration (~10-50 km/century) will not likely occur with current fragmented habitat. Even at historic rates, many species would not reach the potential distributions predicted here within the next century without major intervention. In work underway in a related research effort (e.g. Schwartz 1996, Iverson et al. 1997b), we are investigating more realistic migration scenarios based on historic species migration rates and actual landscapes.
Sincere thanks to all the people who provided data for this effort, and to the USDA Forest Service Northern Global Change Program (R. Birdsey, Program Manager) for support. Thanks also to Mark Schwartz, David Mladenoff, Charles Scott, Thomas Jacob, William Baker, and two anonymous reviewers for technical reviews, and to Mary Buchanan for editorial review.
The use of trade, firm, or corporation names in this publication is for the information and convenience of the reader. Such use does not constitute and official endorsement or approval by the U.S. Department of Agriculture or the Forest Service of any product or service to the exclusion of others that may be suitable.
Baker, F. A. 1993. Classification and regression tree analysis for assessing hazard of pine mortality caused by Heterobasidion annosum. Plant Disease 77:136-139.
Bonan, G. B. and L. Sirois. 1992. Air temperature, tree growth, and the northern and southern range limits to Picea mariana. Journal of Vegetation Science 3:495-506.
Booth, T. H. 1990. Mapping regions climatically suitable for particular tree species at the global scale. Forest Ecology and Management 36:47-60.
Borchert, M., F. W. Davis, J. Michaelsen, and L. D. Oyler. 1989. Interactions of factors affecting seedling recruitment of blue oak (Quercus douglasii) in California. Ecology 70:389-404.
Box, E. O., D. W. Crumpacker, and E. D. Hardin. 1993. A climatic model for location of plant species in Florida, U.S.A. Journal of Biogeography 20:629-644.
Breiman, L., J. Freidman, R. Olshen, and C. Stone. 1984. Classification and regression trees. Wadsworth, Belmont, California.
Burton, P. J. and S. G. Cumming. 1995. Potential effects of climatic change on some western Canadian forests, based on phenological enhancements to a patch model of forest succession. Water, Air and Soil Pollution 82:401-414.
Chambers, J.M. and T.J. Hastie. 1993. Statistical Models in S. Chapman & Hall, New York.
Clark, L. A. and D. Pregibon, 1992. Tree-based models. Pages 377-419 in T. J. Hastie. Statistical Models in S. Wadsworth, Pacific Grove, California.
Davis, F. W., J. Michaelsen, R. Dubayah, and J. Dozier, 1990. Optimal terrain stratification for integrating ground data from FIFE. Pages 11-15. in Symposium on FIFE, First ISLSCP Field Experiment. American Meteorological Society, Boston, MA.
Davis, M. B. and C. Zabinski, 1992. Changes in geographical range resulting from greenhouse warming: effects on biodiversity in forests. Pages 297-308 in R. L. Peters and T. E. Lovejoy, editors. Global warming and biological diversity. Yale University Press, New Haven, CT.
Delcourt, H. R. and P. A. Delcourt. 1988. Quaternary landscape ecology; Relevant scales in space and time. Landscape Ecology 2 :23-44.
Dyer, J. M. 1995. Assessment of climatic warming using a model of forest species migration. Ecological Modelling 79:199-219.
Elias, T. S. 1980. The complete trees of North America. Van Nostrand Reinhold Co., New York. 948 pp.
Environmental Systems Research Institute, Inc. (ESRI). 1992. ArcUSA 1:2M, User's Guide and Data Reference. Environmental Systems Research Institute, Inc., Redlands, CA.
Environmental Systems Research Institute, Inc. (ESRI). 1993. Arc/Info GRID command reference. Environmental Systems Research Institute, Inc., Redlands, CA.
Ertsen, A. C. D., J. W. Frens, J. W. Nieuwenhuis, and M. J. Wassen. 1995. An approach to modelling the relationship between plant species and site conditions in terrestrial ecosystems. Landscape and Urban Planning 31:143-151.
Foley, J. A., I. C. Prentice, N. Ramankutty, S. Levis, D. Pollard, S. Sitch, and A. Haxeltine. 1996. An integrated biosphere model of land surface processes, terrestrial carbon balance and vegetation dynamics. Global Biogeochemical Cycles 10:603-628.
Franklin, J. 1995. Predictive vegetation mapping: geographic modelling of biospatial patterns in relation to environmental gradients. Progress in Physical Geography. 19:474-499.
Golet, F. C., A. J. K. Calhoun, W. R. DeRagon, D. J. Lowry, and A. J. Gold. 1993. Ecology of red maple swamps in the glaciated northeast: a community profile. Biological Report 12; US Department of the Interior; Fish and Wildlife Service. Washington, D.C.
Goodchild, M. F. and S. Gopal. 1989. Accuracy of Spatial Databases. Taylor and Francis, Bristol, PA. 290 pp.
Hansen, J., I. Fung, A. Lacis, D. Rind, S. Lebedeff, and R. Ruedy. 1988. Global climate changes as forecast by Goddard Institute for Space Studies three-dimensional model. Journal of Geophysical Research 93:9341-9364.
Hansen, M. H., T. Frieswyk, J. F. Glover, and J. F. Kelly. 1992. The eastwide forest inventory data base: users manual. General Technical Report NC-151. USDA Forest Service, North Central Forest Experiment Station. St. Paul, Minnesota.
Hobbs, Richard J. 1994. Dynamics of vegetation mosaics: Can we predict responses to global change? Ecoscience 1(4):346-356.
Hughes, L., E. M. Cawsey, and M. Westoby. 1996. Climatic range sizes of Eucalyptus species in relation to future climate change. Global Ecology and Biogeography Letters 5:23-29.
Huntley, B., P. Berry, W. Cramer, and A. P. McDonald. 1995. Modelling present and potential future ranges of some European higher plants using climate response surfaces. Journal of Biogeography 22:967-1001.
Iverson, L. R., M. E. Dale, C. T. Scott, and A. Prasad. 1997a. A GIS-derived integrated moisture index to predict forest composition and productivity in Ohio forests. Landscape Ecology 12:xxx-xxx.
Iverson, L. R., A. Prasad, and C. T. Scott, 1996b. Preparation of forest inventory and analysis (FIA) and state soil geographic data base (STATSGO) data for global change research in the eastern United States. in J. Hom, R. Birdsey, and K. O'Brian, eds. Proceedings, 1995 meeting of the Northern Global Change Program. General Technical Report NE-214. USDA Forest Service, Northeastern Forest Experiment Station. Radnor, Pennsylvania.
Iverson, L. R., C. T. Scott, M. Dale, and A. M. G. Prasad. 1996a. Development of an integrated moisture index for predicting species composition. Pages 101-116 in M. Kohl and G. Z. Gertner, eds. Caring for the forest: research in a changing world. Statistics, mathematics, and computers. Swiss Federal Institute for Forest, Snow and Landscape Research, Birmensdorf, Switzerland.
Iverson, L. R., A. Prasad, and M. W. Schwartz. 1997b. Regression tree analysis and a simulation model to project potential future tree importance in the eastern United States. Bulletin of the Ecological Society of America 78:115 (abstract).
Jacobson, G. L. Jr., T. Webb, III., and E. C. Grimm. 1987. Patterns and rates of vegetation change during the deglaciation of eastern North America. Pages 277-288 in W. F. Ruddiman and H. E. Jr. Wright, eds. North America and adjacent oceans during the last deglaciation. The Geological Society of America, Boulder, Colorado.
Jelinski, D.E., and J. Wu. 1996. The modifiable areal unit problem and implications for landscape ecology. Landscape Ecology 11:129-140.
Kattenberg, A. F. Giorgi H. Grassl G. A. Meehl J. F. B. Mitchell R. J. Stouffer T. Tokioka A. J. Weaver T. M. L. Wigley. 1996. Climate models - Projections of Future Climate. Pages 285-357 in J. T. Houghtonet al., editors. Climate Change 1995: The Science of Climate Change. Intergovernmental Panel on Climate Change. Cambridge University Press, Cambridge, UK.
Lees, B. G. and K. Ritman. 1991. Decision-tree and rule-induction approach to integration of remotely sensed and GIS data in mapping vegetation in disturbed or hilly environments. Environmental Management 15 :823-831.
Lenihan, J. M. and R. P. Neilson. 1993. A rule-based vegetation formation model for Canada. Journal of Biogeography 20:615-628.
Loehle, C. and D. LeBlanc. 1996. Model-based assessments of climate change effects on forests: a critical review. Ecological Modelling 90:1-31.
Lynn, H., C.L. Mohler, S.D. DeGloria, and C.E. McCulloch. 1995. Error assessment in decision-tree models applied to vegetation analysis. Landscape Ecology 10:323-335.
Mackey, B. G. and R. A. Sims. 1993. A climatic analysis of selected boreal tree species, and potential responses to global climate change. World Resource Review 5:469-487.
Malanson, G. P. 1993. Comment on modeling ecological response to climatic change. Climatic Change 23:95-105.
McGarigal, K. and B. J. Marks. 1995. FRAGSTATS: spatial pattern analysis program for quantifying landscape structure. General Technical Report PNW-GTR-351. USDA Forest Service. Pacific Northwest Research Station. Portland, Oregon.
Mcguire, A. D., L. A. Joyce, D. W. Kicklighter, J. M. Meillo, G. Esser, and C. J. Vorosmarty. 1993. Productivity response of climax temperate forests to elevated temperature and carbon dioxide: a North American comparison between two global models. Climatic Change 24:287-310.
McLemore, B.F. 1990. Cornus florida L. Flowering dogwood. Pages 278-283 in Burns. R.M. and B. H. Honkala, Coordinators. Silvics of North America. Volume 2, Hardwoods. USDA Forest Service, Agriculture Handbook 654, Washington, DC.
Melillo, J. M., J. Borchers, J. Chaney, H. Fisher, S. Fox, and A. Haxeltine. 1995. Vegetation/ecosystem modeling and analysis project: comparing biogeography and biogeochemistry models in a continental-scale study of terrestrial ecosystem reposnses to climate change and CO2 doubling. Global Biogeochemical Cycles 9:407-437.
Michaelsen, J., D. S. Schimel, M.A. Friedl, F.W. Davis, and R. C. Dubayah. 1994. Regression tree analysis of satellite and terrain data to guide vegetation sampling and surveys. Journal of Vegetation Science 5:673-686.
Moore, D. E., B. G. Lees, and S. M. Davey. 1991. A new method for predicting vegetation distributions using decision tree analysis in a geographic information system. Journal of Environmental Management 15:59-71.
Neilson, R. P. 1995. A model for predicting continental-scale vegetation distribution and water balance. Ecological Applications 5:362-385.
Neilson, R. P. and D. Marks. 1994. A global perspective of regional vegetation and hydrologic sensitivities from climatic change. Journal of Vegetation Science 5:715-730.
Olson, R. J., C. J. Emerson, and M. K. Nungesser. 1980. Geoecology: a county-level environmental data base for the conterminous United States. Publication No. 1537. Oak Ridge National Laboratory Environmental Sciences Division. Oak Ridge, TN.
Perala, D. A. 1990. Populus tremuloides Michx. Quaking Aspen. Pages 555-569 in Burns. R.M. and B. H. Honkala, Coordinators. Silvics of North America. Volume 2, Hardwoods. USDA Forest Service, Agriculture Handbook 654, Washington, DC.
Prentice, I. C., W. Cramer, S. P. Harrison, R. Leemans, R. A. Monserud, and A. M. Solomon. 1992. A global biome model based on plant physiology and dominance, soil properties and climate. Journal of Biogeography 19:117-134.
Rastetter, E. B. 1996. Validating models of ecosystem response to global change. BioScience 46:190-197.
Reed, R. A., R. K. Peet, M. W. Palmer, and P. S. White. 1993. Scale dependence of vegetation-environment correlation: A case study of a North Carolina piedmont woodland. Journal of Vegetation Science 4:329-340.
Schwartz, M. W. 1996. Assessing the ability of plants to respond to climatic change through distribution shifts. in J. Hom, R. Birdsey, and K. O'Brian, eds. Proceedings, 1995 meeting of the Northern Global Change Program. General Technical Report NE-214, USDA Forest Service, Northeastern Forest Experiment Station. Radnor, Pennsylvania.
Soil Conservation Service. 1991. State soil geographic data base (STATSGO) data users guide. Miscellaneous Publication 1492, USDA Soil Conservation Service. Washington, D.C.
Starfield, A. M. and Chapin, F. S. 1996. Model of transient changes in arctic and boreal vegetation in response to climate and land use change. Ecological Applications 6(3):842-864.
Statistical Sciences. 1993. S-PLUS Guide to Statistical and Mathematical Analysis, vers. 3.2. StatSci, Seattle, Washington.
US Geological Survey. 1987. Digital elevation models: U.S. Geological Survey Data User's Guide 5. U.S. Geological Survey. Reston, VA.
USDA Forest Service. 1993. Forest type groups of the United States. Map produced by Z. Zhu, D.L. Evans, and K. Winterberger. Southern Forest Experiment Station, Starksville, MS.
USEPA. 1993. EPA-Corvallis model-derived climate database and 2xCO2 predictions for long-term mean monthly temperature, vapor pressure, wind velocity and potential evapotranspiration from the Regional Water Balance Model and precipitation from the PRISM model, for the conterminous United States. Digital raster data on a 10 x 10 km, 470 x 295 Albers Equal Area grid, in "Image Processing Workbench" format. USEPA Environmental Research Laboratory. Corvallis, OR.
Venables W.N. and B.D. Ripley. 1994. Modern Applied Statistics with S-Plus. Springer-Verlag, New York.
V Verbyla, David L. 1987. Classification trees: a new discrimination tool. Canadian Journal of Forest Research 17:1150-1152.
Walsh, S. J., D. R. Lightfoot, and D. R. Butler. 1987. Recognition and assessment of error in geographic information systems. Photogrammetric Engineering and Remote Sensing 53:1423-1430.
Webb, T., III and P. J. Bartlein. 1992. Global changes during the last 3 million years: climatic controls and biotic responses. Annual Review of Ecology and Systematics. 23:141-173.
Wetherald, R. T. and S. Manabe. 1988. Cloud feedback processes in a general circulation model. Journal of Atmospheric Science 45:1397-1415.
Woodward, F. I. 1992. Tansley Review no. 41. Predicting plant responses to global environmental change. New Phytologist 122:239-251.
Fig. 1. Study area is the United States east of the 100th meridian. Horizontal lines correspond to the row number for the study area, in increments of 10 km. For example, row 50 corresponds to 500 km above the southern limit of the study area. Latitude and longitude are also given in dashed lines.
Fig. 2. Regression tree for quaking aspen (Populus tremuloides), using a nonuniform tree of 12 nodes. If the rule at the top of a branch is true, follow the left branch; if false, follow the right branch. The numbers at the termination points refer to predicted IV for that particular branch of the model, which is the average IV at that node.
Fig. 3. Mapped environmental variables most related to the distribution of three species: a) southern red oak (Quercus falcata var. falcata), b) Virginia pine (Pinus virginiana), and c) American beech (Fagus grandifolia). The indented legend shows the hierarchical structure of the data; see Table 1 for the variables in the first column. The second column is the number of counties that the equation pertains to, and the third column is the average IV for the species for those counties.
Fig. 4. Example model outputs for quaking aspen(Populus tremuloides), including a) actual county IV as calculated from FIA data; b) predicted current IV from the RTA model; c) predicted potential future IV after climate change according to the GISS GCM; d) predicted potential future IV after climate change according to the GFDL GCM; e) difference between predicted current and predicted future IV according to the GISS model; and f) difference between predicted current and predicted future IV according to the GFDL model. This example shows a species with the potential to contract its range and importance under climate change.
Fig. 5. Example model outputs for flowering dogwood (Cornus florida), including a) actual county IV as calculated from FIA data; b) predicted current IV from the RTA model; c) predicted potential future IV after climate change according to the GISS GCM; d) predicted potential future IV after climate change according to the GFDL GCM; e) difference between predicted current and predicted future IV according to the GISS model; and f) difference between predicted current and predicted future IV according to the GFDL model. This example shows a species with the potential to expand its range and importance under climate change.
Fig. 6. Number of species showing decreases or increases of IV (IV) or area for the two GCM scenarios evaluated. The total number of species is less than 80 for the 'Both' categories because those species that fall in different classes between GCM scenarios are not tallied.