Sharing Forest Inventory Data with Java and XML

The way to enable interchange between different storage systems is to adopt a single industry-wide interchange format that serves as the single output format for all exporting systems and the single input format for all importing systems. By using a single interchange format, different devices can display the same document as is illustrated in Figure 1. Many industries have used this approach to perform data interchange: the industry forms a standards body that defines the tag set and grammar of the markup language. We will describe a rough draft of that markup language for forest inventory data in this document.

Before we discuss the use of XML in document exchange, lets discuss the use of HTML in data transport and storage.

HTML

HTML is used to store and transmit most documents on the Web although HTML works best for hypertext and multimedia. HTML use for data and documents is limited in that it cannot be used to:

HTML usefulness is also limited in that it has a small set of tags which cannot be changed.

XML

XML was built to provide a set of specifications to enable delivery of self-describing data structures of arbitrary depth and complexity. XML differs from HTML in that:

Java and XML

There has been considerable interest in the use of Java and XML because of the promise of "Portable Code" (Java) and "Portable Data" (XML). The combination of Java and XML can solve the problem of displaying forestry data in different ways for different users. Both XML and Java are Internet friendly. XML was designed to be an optimized, flexible, readable format which is straightforward to use over the Internet; Java has been network-aware from the beginning in its support of sockets, HTTP, HTML, and servers. Both support Unicode (two byte characters) and therefore contribute to internationalized applications. Much as Java provides programmers the ability to represent complicated data structures and object-oriented models (sometimes in a tree or table view), XML is ideal for representing complex, hierarchical data models. XML is object-oriented in the sense of being suitable for describing objects of the real world or any abstract problem domain by modeling their properties as they are, instead of enforcing a normalized decomposition into various tables linked by relations.

Unique Namespace

Namespaces, a part of the XML specification, were created to guarantee uniqueness among XML elements. When two XML documents containing identical elements from different sources are merged, we can have a potential collision because the same terms can be used to name the same item. By using a unique, two-part namespace we avoid possible collisions. The namespace specification requires that a unique URI (Uniform Resource Indicator) be associated with any namespace. A URL is recommended for use as a URI, and the URL that we will define is http://www.fs.fed.us/ne/morgantown/4557/treesxml

An XML document must be designed so that namespace collisions are avoided.

Document Type Definition

The Document Type Definition (DTD) defines the structure of the XML data and is the equivalent of a class template.

This file specifies rules for how the XML document elements, attributes, and other data are defined and logically related. A sample DTD for forest inventory data is shown in figure 2.

XML for Tree Species Data

Tree Species data can be formatted in XML. The Tree Species data from the Stand-Damage Model was used as an example of how data can be formatted in this manner. In Figure 3 the Stand-Damage Model input file for Red Maple Oak was transformed into an XML format. This file begins with the line <?xml version ="1.0"?>, which is the standard beginning for all XML files. The next line is <TreeSpecies>: this is the Root Element of this XML file. All XML files must open and close with a Root Element. Note that the end of this file ends with </TreeSpecies>. The next line is the Element <Tree> and is the beginning of a sub-node. All of the data for the species Red Maple is contained within the Elements <Tree> and </Tree>.

One item that you might immediately notice is that the flat text file data is self-describing, i.e., you can tell from the context what is being referred to.

Importing XML Stem Count (Trees Per Acre)

The Java Stand-Damage Model allows the user to import Stem Count information. This option is available from the Import Stem Count pull-down menu item from the Main Menu Trees pull-down menu.

Creating a Schema

A XML Schema describes what one or more XML documents should look like and defines the Elements the document contains, and the order they appear in. Unlike DTDs (Document Type Definition), an XML schema uses XML syntax and incorporates strong datatypes.

Sources:

www.wdvl.com/Authoring/Languages/XML/Java/perfect_pair.html
www-106.ibm.com/developerworks/education/tutorial-prog/overview.html