+ <sect2 id="querymodel-pqf-apt-mapping">
+ <title>Mapping from PQF atomic APT queries to Zebra internal
+ register indexes</title>
+ <para>
+ The rules for PQF APT mapping are rather tricky to grasp in the
+ first place. We deal first with the rules for deciding which
+ internal register or string index to use, according to the use
+ attribute or access point specified in the query. Thereafter we
+ deal with the rules for determining the correct structure type of
+ the named register.
+ </para>
+
+ <sect3 id="querymodel-pqf-apt-mapping-accesspoint">
+ <title>Mapping of PQF APT access points</title>
+ <para>
+ Zebra understands four fundamental different types of access
+ points, of which only the
+ <emphasis>numeric use attribute</emphasis> type access points
+ are defined by the <ulink url="&url.z39.50;">Z39.50</ulink>
+ standard.
+ All other access point types are Zebra specific, and non-portable.
+ </para>
+
+ <table id="querymodel-zebra-mapping-accesspoint-types"
+ frame="all" rowsep="1" colsep="1" align="center">
+
+ <caption>Access point name mapping</caption>
+ <thead>
+ <tr>
+ <td>Access Point</td>
+ <td>Type</td>
+ <td>Grammar</td>
+ <td>Notes</td>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td>Use attribute</td>
+ <td>numeric</td>
+ <td>[1-9][1-9]*</td>
+ <td>directly mapped to string index name</td>
+ </tr>
+ <tr>
+ <td>String index name</td>
+ <td>string</td>
+ <td>[a-zA-Z](\-?[a-zA-Z0-9])*</td>
+ <td>normalized name is used as internal string index name</td>
+ </tr>
+ <tr>
+ <td>Zebra internal index name</td>
+ <td>zebra</td>
+ <td>_[a-zA-Z](_?[a-zA-Z0-9])*</td>
+ <td>hardwired internal string index name</td>
+ </tr>
+ <tr>
+ <td>XPATH special index</td>
+ <td>XPath</td>
+ <td>/.*</td>
+ <td>special xpath search for GRS indexed records</td>
+ </tr>
+ </tbody>
+ </table>
+
+ <para>
+ <literal>Attribute set names</literal> and
+ <literal>string index names</literal> are normalizes
+ according to the following rules: all <emphasis>single</emphasis>
+ hyphens <literal>'-'</literal> are stripped, and all upper case
+ letters are folded to lower case.
+ </para>
+
+ <para>
+ <emphasis>Numeric use attributes</emphasis> are mapped
+ to the Zebra internal
+ string index according to the attribute set definition in use.
+ The default attribute set is <literal>Bib-1</literal>, and may be
+ omitted in the PQF query.
+ </para>
+
+ <para>
+ According to normalization and numeric
+ use attribute mapping, it follows that the following
+ PQF queries are considered equivalent (assuming the default
+ configuration has not been altered):
+ <screen>
+ Z> find @attr 1=Body-of-text serenade
+ Z> find @attr 1=bodyoftext serenade
+ Z> find @attr 1=BodyOfText serenade
+ Z> find @attr 1=bO-d-Y-of-tE-x-t serenade
+ Z> find @attr 1=1010 serenade
+ Z> find @attrset Bib-1 @attr 1=1010 serenade
+ Z> find @attrset bib1 @attr 1=1010 serenade
+ Z> find @attrset Bib1 @attr 1=1010 serenade
+ Z> find @attrset b-I-b-1 @attr 1=1010 serenade
+ </screen>
+ </para>
+
+ <para>
+ The <emphasis>numerical</emphasis>
+ <literal>use attributes (type 1)</literal>
+ are interpreted according to the
+ attribute sets which have been loaded in the
+ <literal>zebra.cfg</literal> file, and are matched against specific
+ fields as specified in the <literal>.abs</literal> file which
+ describes the profile of the records which have been loaded.
+ If no use attribute is provided, a default of
+ <literal>Bib-1 Use Any (1016)</literal> is
+ assumed.
+ The predefined <literal>use attribute sets</literal>
+ can be reconfigured by tweaking the configuration files
+ <filename>tab/*.att</filename>, and
+ new attribute sets can be defined by adding similar files in the
+ configuration path <literal>profilePath</literal> of the server.
+ </para>
+
+ <para>
+ <literal>String indexes</literal> can be accessed directly,
+ independently which attribute set is in use. These are just
+ ignored. The above mentioned name normalization applies.
+ <literal>String index names</literal> are defined in the
+ used indexing filter configuration files, for example in the
+ <literal>GRS</literal>
+ <filename>*.abs</filename> configuration files, or in the
+ <literal>alvis</literal> filter XSLT indexing stylesheets.
+ </para>
+
+ <para>
+ <literal>Zebra internal indexes</literal> can be accessed directly,
+ according to the same rules as the user defined
+ <literal>string indexes</literal>. The only difference is that
+ <literal>Zebra internal index names</literal> are hardwired,
+ all uppercase and
+ must start with the character <literal>'_'</literal>.
+ </para>
+
+ <para>
+ Finally, <literal>XPATH</literal> access points are only
+ available using the <literal>GRS</literal> filter for indexing.
+ These access point names must start with the character
+ <literal>'/'</literal>, they are <emphasis>not
+ normalized</emphasis>, but passed unaltered to the Zebra internal
+ XPATH engine. See <xref linkend="querymodel-use-xpath"/>.
+
+ </para>
+
+
+ </sect3>
+
+
+ <sect3 id="querymodel-pqf-apt-mapping-structuretype">
+ <title>Mapping of PQF APT structure and completeness to
+ register type</title>
+ <para>
+ Internally Zebra has in it's default configuration several
+ different types of registers or indexes, whose tokenization and
+ character normalization rules differ. This reflects the fact that
+ searching fundamental different tokens like dates, numbers,
+ bitfields and string based text needs different rulesets.
+ </para>
+
+ <table id="querymodel-zebra-mapping-structure-types"
+ frame="all" rowsep="1" colsep="1" align="center">
+
+ <caption>Structure and completeness mapping to register types</caption>
+ <thead>
+ <tr>
+ <td>Structure</td>
+ <td>Completeness</td>
+ <td>Register type</td>
+ <td>Notes</td>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td>
+ phrase (@attr 4=1), word (@attr 4=2),
+ word-list (@attr 4=6),
+ free-form-text (@attr 4=105), or document-text (@attr 4=106)
+ </td>
+ <td>Incomplete field (@attr 6=1)</td>
+ <td>Word ('w')</td>
+ <td>Traditional tokenized and character normalized word index</td>
+ </tr>
+ <tr>
+ <td>
+ phrase (@attr 4=1), word (@attr 4=2),
+ word-list (@attr 4=6),
+ free-form-text (@attr 4=105), or document-text (@attr 4=106)
+ </td>
+ <td>complete field' (@attr 6=3)</td>
+ <td>Phrase ('p')</td>
+ <td>Character normalized, but not tokenized index for phrase
+ matches
+ </td>
+ </tr>
+ <tr>
+ <td>urx (@attr 4=104)</td>
+ <td>ignored</td>
+ <td>URX/URL ('u')</td>
+ <td>Special index for URL web adresses</td>
+ </tr>
+ <tr>
+ <td>numeric (@attr 4=109)</td>
+ <td>ignored</td>
+ <td>Numeric ('u')</td>
+ <td>Special index for digital numbers</td>
+ </tr>
+ <tr>
+ <td>key (@attr 4=3)</td>
+ <td>ignored</td>
+ <td>Null bitmap ('0')</td>
+ <td>Used for non-tokenizated and non-normalized bit sequences</td>
+ </tr>
+ <tr>
+ <td>year (@attr 4=4)</td>
+ <td>ignored</td>
+ <td>Year ('y')</td>
+ <td>Non-tokenizated and non-normalized 4 digit numbers</td>
+ </tr>
+ <tr>
+ <td>date (@attr 4=5)</td>
+ <td>ignored</td>
+ <td>Date ('d')</td>
+ <td>Non-tokenizated and non-normalized ISO date strings</td>
+ </tr>
+ <tr>
+ <td>ignored</td>
+ <td>ignored</td>
+ <td>Sort ('s')</td>
+ <td>Used with special sort attribute set (@attr 7=1, @attr 7=2)</td>
+ </tr>
+ <tr>
+ <td>overruled</td>
+ <td>overruled</td>
+ <td>special</td>
+ <td>Internal record ID register, used whenever
+ Relation Always Matches (@attr 2=103) is specified</td>
+ </tr>
+ </tbody>
+ </table>
+
+ <!-- see in util/zebramap.c -->
+
+ <para>
+ If a <emphasis>Structure</emphasis> attribute of
+ <emphasis>Phrase</emphasis> is used in conjunction with a
+ <emphasis>Completeness</emphasis> attribute of
+ <emphasis>Complete (Sub)field</emphasis>, the term is matched
+ against the contents of the phrase (long word) register, if one
+ exists for the given <emphasis>Use</emphasis> attribute.
+ A phrase register is created for those fields in the
+ GRS <filename>*.abs</filename> file that contains a
+ <literal>p</literal>-specifier.
+ <screen>
+ Z> scan @attr 1=Title @attr 4=1 @attr 6=3 beethoven
+ ...
+ bayreuther festspiele (1)
+ * beethoven bibliography database (1)
+ benny carter (1)
+ ...
+ Z> find @attr 1=Title @attr 4=1 @attr 6=3 "beethoven bibliography"
+ ...
+ Number of hits: 0, setno 5
+ ...
+ Z> find @attr 1=Title @attr 4=1 @attr 6=3 "beethoven bibliography database"
+ ...
+ Number of hits: 1, setno 6
+ </screen>
+ </para>
+
+ <para>
+ If <emphasis>Structure</emphasis>=<emphasis>Phrase</emphasis> is
+ used in conjunction with <emphasis>Incomplete Field</emphasis> - the
+ default value for <emphasis>Completeness</emphasis>, the
+ search is directed against the normal word registers, but if the term
+ contains multiple words, the term will only match if all of the words
+ are found immediately adjacent, and in the given order.
+ The word search is performed on those fields that are indexed as
+ type <literal>w</literal> in the GRS <filename>*.abs</filename> file.
+ <screen>
+ Z> scan @attr 1=Title @attr 4=1 @attr 6=1 beethoven
+ ...
+ beefheart (1)
+ * beethoven (18)
+ beethovens (7)
+ ...
+ Z> find @attr 1=Title @attr 4=1 @attr 6=1 beethoven
+ ...
+ Number of hits: 18, setno 1
+ ...
+ Z> find @attr 1=Title @attr 4=1 @attr 6=1 "beethoven bibliography"
+ ...
+ Number of hits: 2, setno 2
+ ...
+ </screen>
+ </para>
+
+ <para>
+ If the <emphasis>Structure</emphasis> attribute is
+ <emphasis>Word List</emphasis>,
+ <emphasis>Free-form Text</emphasis>, or
+ <emphasis>Document Text</emphasis>, the term is treated as a
+ natural-language, relevance-ranked query.
+ This search type uses the word register, i.e. those fields
+ that are indexed as type <literal>w</literal> in the
+ GRS <filename>*.abs</filename> file.
+ </para>
+
+ <para>
+ If the <emphasis>Structure</emphasis> attribute is
+ <emphasis>Numeric String</emphasis> the term is treated as an integer.
+ The search is performed on those fields that are indexed
+ as type <literal>n</literal> in the GRS
+ <filename>*.abs</filename> file.
+ </para>
+
+ <para>
+ If the <emphasis>Structure</emphasis> attribute is
+ <emphasis>URX</emphasis> the term is treated as a URX (URL) entity.
+ The search is performed on those fields that are indexed as type
+ <literal>u</literal> in the <filename>*.abs</filename> file.
+ </para>
+
+ <para>
+ If the <emphasis>Structure</emphasis> attribute is
+ <emphasis>Local Number</emphasis> the term is treated as
+ native Zebra Record Identifier.
+ </para>
+
+ <para>
+ If the <emphasis>Relation</emphasis> attribute is
+ <emphasis>Equals</emphasis> (default), the term is matched
+ in a normal fashion (modulo truncation and processing of
+ individual words, if required).
+ If <emphasis>Relation</emphasis> is <emphasis>Less Than</emphasis>,
+ <emphasis>Less Than or Equal</emphasis>,
+ <emphasis>Greater than</emphasis>, or <emphasis>Greater than or
+ Equal</emphasis>, the term is assumed to be numerical, and a
+ standard regular expression is constructed to match the given
+ expression.
+ If <emphasis>Relation</emphasis> is <emphasis>Relevance</emphasis>,
+ the standard natural-language query processor is invoked.
+ </para>
+
+ <para>
+ For the <emphasis>Truncation</emphasis> attribute,
+ <emphasis>No Truncation</emphasis> is the default.
+ <emphasis>Left Truncation</emphasis> is not supported.
+ <emphasis>Process # in search term</emphasis> is supported, as is
+ <emphasis>Regxp-1</emphasis>.
+ <emphasis>Regxp-2</emphasis> enables the fault-tolerant (fuzzy)
+ search. As a default, a single error (deletion, insertion,
+ replacement) is accepted when terms are matched against the register
+ contents.
+ </para>
+
+ </sect3>
+ </sect2>
+
+ <sect2 id="querymodel-regular">
+ <title>Zebra Regular Expressions in Truncation Attribute (type = 5)</title>
+
+ <para>
+ Each term in a query is interpreted as a regular expression if
+ the truncation value is either <emphasis>Regxp-1 (@attr 5=102)</emphasis>
+ or <emphasis>Regxp-2 (@attr 5=103)</emphasis>.
+ Both query types follow the same syntax with the operands:
+ </para>
+
+ <table id="querymodel-regular-operands-table"
+ frame="all" rowsep="1" colsep="1" align="center">
+
+ <caption>Regular Expression Operands</caption>
+ <!--
+ <thead>
+ <tr><td>one</td><td>two</td></tr>
+ </thead>
+ -->
+ <tbody>
+ <tr>
+ <td><literal>x</literal></td>
+ <td>Matches the character <literal>x</literal>.</td>
+ </tr>
+ <tr>
+ <td><literal>.</literal></td>
+ <td>Matches any character.</td>
+ </tr>
+ <tr>
+ <td><literal>[ .. ]</literal></td>
+ <td>Matches the set of characters specified;
+ such as <literal>[abc]</literal> or <literal>[a-c]</literal>.</td>
+ </tr>
+ </tbody>
+ </table>
+
+ <para>
+ The above operands can be combined with the following operators:
+ </para>
+
+ <table id="querymodel-regular-operators-table"
+ frame="all" rowsep="1" colsep="1" align="center">
+ <caption>Regular Expression Operators</caption>
+ <!--
+ <thead>
+ <tr><td>one</td><td>two</td></tr>
+ </thead>
+ -->
+ <tbody>
+ <tr>
+ <td><literal>x*</literal></td>
+ <td>Matches <literal>x</literal> zero or more times.
+ Priority: high.</td>
+ </tr>
+ <tr>
+ <td><literal>x+</literal></td>
+ <td>Matches <literal>x</literal> one or more times.
+ Priority: high.</td>
+ </tr>
+ <tr>
+ <td><literal>x?</literal></td>
+ <td> Matches <literal>x</literal> zero or once.
+ Priority: high.</td>
+ </tr>
+ <tr>
+ <td><literal>xy</literal></td>
+ <td> Matches <literal>x</literal>, then <literal>y</literal>.
+ Priority: medium.</td>
+ </tr>
+ <tr>
+ <td><literal>x|y</literal></td>
+ <td> Matches either <literal>x</literal> or <literal>y</literal>.
+ Priority: low.</td>
+ </tr>
+ <tr>
+ <td><literal>( )</literal></td>
+ <td>The order of evaluation may be changed by using parentheses.</td>
+ </tr>
+ </tbody>
+ </table>
+
+ <para>
+ If the first character of the <literal>Regxp-2</literal> query
+ is a plus character (<literal>+</literal>) it marks the
+ beginning of a section with non-standard specifiers.
+ The next plus character marks the end of the section.
+ Currently Zebra only supports one specifier, the error tolerance,
+ which consists one digit.
+ </para>
+
+ <para>
+ Since the plus operator is normally a suffix operator the addition to
+ the query syntax doesn't violate the syntax for standard regular
+ expressions.
+ </para>
+
+ <para>
+ For example, a phrase search with regular expressions in
+ the title-register is performed like this:
+ <screen>
+ Z> find @attr 1=4 @attr 5=102 "informat.* retrieval"
+ </screen>
+ </para>
+
+ <para>
+ Combinations with other attributes are possible. For example, a
+ ranked search with a regular expression:
+ <screen>
+ Z> find @attr 1=4 @attr 5=102 @attr 2=102 "informat.* retrieval"
+ </screen>
+ </para>
+ </sect2>
+