Strings
Strings are any sequence of characters, including the special symbols `^' for
beginning of line and `$' for end of line. The following special characters (
`$', `^', `*', `[', `^',
`|', `(', `)', `!', and `\' ) as well as
the following meta characters special to glimpse: `;', `,',
`#', `<', `>', `-', and `.', should
be preceded by `\' if they are to be matched as regular characters. For
example, \^abc\ corresponds to the string ^abc\, whereas ^abc corresponds to
the string abc at the beginning of a line.
Classes of characters
A list of characters inside [] (in order) corresponds to any character from
the list. For example, [a-ho-z] is any character between a and h or between o
and z. The symbol `^' inside [] complements the list. For example, [^i-n]
denote any character in the character set except character `i' to `n'. The
symbol `^' thus has two meanings, but this is consistent with egrep. The
symbol `.' (don't care) stands for any symbol (except for the newline
symbol).
Boolean operations
Glimpse supports an `AND' operation denoted by the symbol `;' an `OR'
operation denoted by the symbol `,', or any combination. For example, the
query string `pizza;cheeseburger' will output all lines
containing both patterns. The query string
`{political,computer};science' will match `political
science' or `science of computers'.
Wild cards
The symbol `#' is used to denote a sequence of any number (including 0) of
arbitrary characters. The symbol # is equivalent to .* in egrep. In fact, .*
will work too, because it is a valid regular expression (see below), but
unless this is part of an actual regular expression, # will work
faster. (Currently glimpse is experiencing some problems with #.)
Combination of exact and approximate matching
Any pattern inside angle brackets <> must match the text exactly even
if the match is with errors. For example, <mathemat>ics matches
mathematical with one error (replacing the last s with an a), but
mathe<matics> does not match mathematical no matter how many errors are
allowed. (This option is buggy at the moment.)
Regular expressions
Since the index is word based, a regular expression must match words that
appear in the index for glimpse to find it. Glimpse first strips the regular
expression from all non-alphabetic characters, and searches the index for all
remaining words. It then applies the regular expression matching algorithm to
the files found in the index. For example, the query string `abc.*xyz' will
search the index for all files that contain both `abc' and `xyz', and then
search directly for `abc.*xyz' in those files. The syntax of regular
expressions in glimpse is in general the same as that for
agrep. The union operation `|', Kleene closure `*', and parentheses ()
are all supported. Currently `+' is not supported. Regular expressions are
currently limited to approximately 30 characters.
A summary of the number of total matches, and the number of documents that are currently in the archive is also supplied the top of each results page.
Also on the results page, is a working form that will allow you to make additional queries, or change any options about the query that you just made (although some changes would require you to re-run your query for those changes to take effect). The query field in the search form should contain the query string that you supplied to generate the results that you are viewing.
There are two types of formats that results are shown in. The standard format is an "Alta Vista" like format that shows the title of the document, the number of matches in that document (sometimes - this information is not always available - depending on the results of your query), the date the file was last updated, and the size of the file. Below that line is approximately the first 200 bytes of the file with HTML, newlines, and some other miscellaneous characters filtered out. Last is the actual URL of the document that matched your search. Here is an example:
|
| Title | Hits | Date | Size |
| AIX V4.1 Migration Volume 2 |   396   |   7/19/1996   |   149 K |
| Introduction to PCI-Based RS/6000 Servers |   150   |   7/19/1996   |   41 K |
| Managing AIX V4 on PCI-Based RS/6000 Workstations |   104   |   7/19/1996   |   36 K |
You can change your preferred output format at any time, and the change will take effect without requiring you to re-run your query.
When you fill out and submit the Options page, the options are saved in a "cookie" and given to your browser. Your browser saves this cookie, and presents it to eglimpse each time a request is made, or results are displayed. From that point on, each time you use eglimpse, the values in your cookie will override the default values that have been set up by the web administrator for that archive.
The options that you can set are:
Case Sensitive
Tell's glimpse to ignore case (e.g., "A" and "a" are considered equivalent)
while making a search. When this option is set to "No", and Word Match Only
is set to "Yes", searches become much faster.
Word Match Only
Search for the pattern as a word (i.e., surrounded by non-alphanumeric
characters). For example, when the query string is "car", and this option is
set, a search will match "car", but not "characters" and not "car10". This
option does not work with regular expressions. It is recommended to have the
Case Sensitive option set to "No", and this option set to "Yes" by
default.
Misspellings
This option is an integer between 1 and 8 specifying the maximum number of
errors permitted in finding the approximate matches (the default is
zero). Generally, each insertion, deletion, or substitution counts as one
error. Since the index stores only lower case characters, errors of
substituting upper case with lower case may be missed. Allowing errors in the
match requires more time and can slow down the match by a factor of 2-4. Be
very careful when specifying more than one error, as the number of matches
tend to grow very quickly.
File Pattern
This option limits the search to those files whose name (including the whole
path) matches the file pattern given. This option can be used in a variety of
applications to provide limited search even for one large index. If the file
pattern matches a directory, then all files with this directory on their path
will be considered. To limit the search to actual file names, use $ at the
end of the pattern. The file pattern can be a regular expression and even a
Boolean pattern. This option can speed up the search significantly.
Date Filter
This is an integer value (or ""). If set, your match will only return file
that were created or modified in the last "N" days, where "N" is the value
you have set for this option.
As was mentioned in the section on Query Strings above, some characters serve as meta characters for glimpse and need to be preceded by `\' to search for them. The most common examples are the characters `.' (which stands for a wild card), and `*' (the Kleene closure). So, the query string "ab.de" will match abcde, but "ab\.de" will not, and "ab*de" will not match ab*de, but "ab\*de" will. The meta character - is translated automatically to a hypen unless it appears between [] (in which case it denotes a range of characters).
The index of glimpse stores all patterns in lower case. When glimpse searches the index it first converts all patterns to lower case, finds the appropriate files, and then searches the actual files using the original patterns. So, for example, the query string ABCXYZ will first find all files containing abcxyz in any combination of lower and upper cases, and then searches these files directly, so only the right cases will be found. One problem with this approach is discovering misspellings that are caused by wrong cases.
There is no size limit for simple patterns and simple patterns within Boolean expressions. More complicated patterns, such as regular expressions, are currently limited to approximately 30 characters. Lines are limited to 1024 characters. Records are limited to 48K, and may be truncated if they are larger than that. Words of greater then 64 characters are not indexed.