Help for Eglimpse

The archive that you are searching has been indexed by software called Glimpse. Glimpse is a very powerful indexing and query system that allows you to search through many files very quickly. The interface that you are using is called Eglimpse. Eglimpse is a CGI program that provides a web interface to the glimpse program. It makes a query on your behalf, collects the results, and displays them via the web. Eglimpse also provides a way for you to set search options, and it remembers those options each time you make a query.


Query String

glimpse supports a large variety of query strings, including simple strings, strings with classes of characters, sets of strings, wild cards, and regular expressions.

Strings
Strings are any sequence of characters, including the special symbols `^' for beginning of line and `$' for end of line. The following special characters ( `$', `^', `*', `[', `^', `|', `(', `)', `!', and `\' ) as well as the following meta characters special to glimpse: `;', `,', `#', `<', `>', `-', and `.', should be preceded by `\' if they are to be matched as regular characters. For example, \^abc\ corresponds to the string ^abc\, whereas ^abc corresponds to the string abc at the beginning of a line.

Classes of characters
A list of characters inside [] (in order) corresponds to any character from the list. For example, [a-ho-z] is any character between a and h or between o and z. The symbol `^' inside [] complements the list. For example, [^i-n] denote any character in the character set except character `i' to `n'. The symbol `^' thus has two meanings, but this is consistent with egrep. The symbol `.' (don't care) stands for any symbol (except for the newline symbol).

Boolean operations
Glimpse supports an `AND' operation denoted by the symbol `;' an `OR' operation denoted by the symbol `,', or any combination. For example, the query string `pizza;cheeseburger' will output all lines containing both patterns. The query string `{political,computer};science' will match `political science' or `science of computers'.

Wild cards
The symbol `#' is used to denote a sequence of any number (including 0) of arbitrary characters. The symbol # is equivalent to .* in egrep. In fact, .* will work too, because it is a valid regular expression (see below), but unless this is part of an actual regular expression, # will work faster. (Currently glimpse is experiencing some problems with #.)

Combination of exact and approximate matching
Any pattern inside angle brackets <> must match the text exactly even if the match is with errors. For example, <mathemat>ics matches mathematical with one error (replacing the last s with an a), but mathe<matics> does not match mathematical no matter how many errors are allowed. (This option is buggy at the moment.)

Regular expressions
Since the index is word based, a regular expression must match words that appear in the index for glimpse to find it. Glimpse first strips the regular expression from all non-alphabetic characters, and searches the index for all remaining words. It then applies the regular expression matching algorithm to the files found in the index. For example, the query string `abc.*xyz' will search the index for all files that contain both `abc' and `xyz', and then search directly for `abc.*xyz' in those files. The syntax of regular expressions in glimpse is in general the same as that for agrep. The union operation `|', Kleene closure `*', and parentheses () are all supported. Currently `+' is not supported. Regular expressions are currently limited to approximately 30 characters.


Results

When a query is made, the results are saved on the web server, and are returned to you in pages containing 20 matches at a time, with at most 200 matches. You can page through the results using the Page: numbers in the upper right hand corner of the results page.

A summary of the number of total matches, and the number of documents that are currently in the archive is also supplied the top of each results page.

Also on the results page, is a working form that will allow you to make additional queries, or change any options about the query that you just made (although some changes would require you to re-run your query for those changes to take effect). The query field in the search form should contain the query string that you supplied to generate the results that you are viewing.

There are two types of formats that results are shown in. The standard format is an "Alta Vista" like format that shows the title of the document, the number of matches in that document (sometimes - this information is not always available - depending on the results of your query), the date the file was last updated, and the size of the file. Below that line is approximately the first 200 bytes of the file with HTML, newlines, and some other miscellaneous characters filtered out. Last is the actual URL of the document that matched your search. Here is an example:

AIX V4.1 Migration Volume 2   (396 matches - 7/19/1996 - 149 K)
AIX V4.1 Migration Volume 2 Migrating Multiple Systems Issues installation? media? Choosing the Installation Method New and Complete Overwrite Install destroyed and can be imported later. **** Attention! ****
http://strobe.weeg.uiowa.edu/aix-redbooks/htmlbooks/sg244653.00/migvlan.html

The brief format displays results in a table, with one match per line. Each line contains the title of the document (which you can select to go directly to that document), the number of matches in that document (if that information is available), the date the file was last updated and the size of the file. Here is an example:

TitleHits Date Size
AIX V4.1 Migration Volume 2   396     7/19/1996     149 K
Introduction to PCI-Based RS/6000 Servers   150     7/19/1996     41 K
Managing AIX V4 on PCI-Based RS/6000 Workstations   104     7/19/1996     36 K

You can change your preferred output format at any time, and the change will take effect without requiring you to re-run your query.


Search Options

There are a number of options that you can set by pressing the "Options" button, and filling out the form on the Options page that is displayed. One option (Output Format) affects how results are displayed, the rest of the options affect how the search is actually performed, so if you change their values, you will need to re-run your search for the new options to take effect.

When you fill out and submit the Options page, the options are saved in a "cookie" and given to your browser. Your browser saves this cookie, and presents it to eglimpse each time a request is made, or results are displayed. From that point on, each time you use eglimpse, the values in your cookie will override the default values that have been set up by the web administrator for that archive.

The options that you can set are:


Limitations

The index of glimpse is word based. A pattern that contains more than one word cannot be found in the index. The way glimpse overcomes this weakness is by splitting any multi-word pattern into its set of words and looking for all of them in the index. For example, using the query string `linear programming' will first consult the index to find all files containing both linear and programming, and then search through the results to find the combined pattern. This is usually an effective solution, but it can be slow for cases where both words are very common, but their combination is not.

As was mentioned in the section on Query Strings above, some characters serve as meta characters for glimpse and need to be preceded by `\' to search for them. The most common examples are the characters `.' (which stands for a wild card), and `*' (the Kleene closure). So, the query string "ab.de" will match abcde, but "ab\.de" will not, and "ab*de" will not match ab*de, but "ab\*de" will. The meta character - is translated automatically to a hypen unless it appears between [] (in which case it denotes a range of characters).

The index of glimpse stores all patterns in lower case. When glimpse searches the index it first converts all patterns to lower case, finds the appropriate files, and then searches the actual files using the original patterns. So, for example, the query string ABCXYZ will first find all files containing abcxyz in any combination of lower and upper cases, and then searches these files directly, so only the right cases will be found. One problem with this approach is discovering misspellings that are caused by wrong cases.

There is no size limit for simple patterns and simple patterns within Boolean expressions. More complicated patterns, such as regular expressions, are currently limited to approximately 30 characters. Lines are limited to 1024 characters. Records are limited to 48K, and may be truncated if they are larger than that. Words of greater then 64 characters are not indexed.


Most of the documentation in this page is "taken" directly from the glimpse man pages.