doc/querymodel.xml

   1  <chapter id="querymodel">
   2   <!-- $Id: querymodel.xml,v 1.7 2006-06-16 10:30:12 marc Exp $ -->
   3   <title>Query Model</title>
   4
   5   <sect1 id="querymodel-overview">
   6    <title>Query Model Overview</title>
   7
   8
   9    <sect2 id="querymodel-query-languages">
  10     <title>Query Languages</title>
  11
  12     <para>
  13      Zebra is born as a networking Information Retrieval engine adhering
  14      to the international standards
  15      <ulink url="&url.z39.50;">Z39.50</ulink> and
  16      <ulink url="&url.sru;">SRU</ulink>,
  17      and implement the query model defined there.
  18      Unfortunately, the Z39.50 query model has only defined a binary
  19      encoded representation, which is used as transport packaging in
  20      the Z39.50 protocol layer. This representation is not human
  21      readable, nor defines any convenient way to specify queries.
  22     </para>
  23    <!-- tell about RPN - include link to YAZ
  24         url.yaz.pqf -->
  25
  26
  27
  28    <sect3 id="querymodel-query-languages-pqf">
  29     <title>Prefix Query Format (PQF)</title>
  30
  31    <para>
  32      Index Data has defined a textual representaion in the
  33      <literal>Prefix Query Format</literal>, short
  34      <literal>PQF</literal>, which then has been adopted by other
  35      parties developing Z39.50 software. It is also often referred to as
  36      <literal>Prefix Query Notation</literal>, or in short
  37      <literal>PQN</literal>, and is thoroughly explained in
  38      <xref linkend="querymodel-pqf"/>.
  39     </para>
  40    </sect3>
  41
  42
  43    <!-- PQF/RPN is natively supported. CQL is NOT . So we need a map -->
  44    <sect3 id="querymodel-query-languages-cql">
  45     <title>Common Query Language (CQL)</title>
  46    <para>
  47      In addition, Zebra can be configured to understand and map the
  48      <literal>Common Query Language</literal>
  49      (<ulink url="&url.cql;">CQL</ulink>)
  50      to PQF. See an introduction on the mapping to the internal query
  51      representation in
  52      <xref linkend="querymodel-cql-to-pqf"/>.
  53     </para>
  54    </sect3>
  55
  56    </sect2>
  57
  58    <sect2 id="querymodel-query-types">
  59     <title>Query types</title>
  60     <para>
  61     </para>
  62
  63     <sect3 id="querymodel-query-type-explain">
  64      <title>Explain Queries</title>
  65      <para>
  66      </para>
  67     </sect3>
  68
  69     <sect3 id="querymodel-query-type-search">
  70      <title>Search Queries</title>
  71      <para>
  72      </para>
  73     </sect3>
  74
  75     <sect3 id="querymodel-query-type-scan">
  76      <title>Scan Queries</title>
  77      <para>
  78      </para>
  79     </sect3>
  80
  81    </sect2>
  82
  83  </sect1>
  84
  85
  86   <sect1 id="querymodel-pqf">
  87    <title>Prefix Query Format structure and syntax</title>
  88    <para>
  89     The <ulink url="&url.yaz.pqf;">PQF grammer</ulink>
  90     is documented in the YAZ manual, and shall not be
  91     repeated here. This textual PQF representation
  92     is always during search mapped to the equivalent Zebra internal
  93     query parse tree.
  94    </para>
  95
  96    <sect2 id="querymodel-pqf-tree">
  97     <title>PQF tree structure</title>
  98     <para>
  99      The PQF parse tree - or the equivalent textual representation -
 100      may start with one specification of the
 101      <emphasis>attribute set</emphasis> used. Following is a query
 102      tree, which
 103      consists of <emphasis>atomic query parts (APT)</emphasis>, eventually
 104      paired by <emphasis>boolean binary operators</emphasis>, and
 105      finally  <emphasis>recursively combined </emphasis> into
 106      complex query trees.
 107     </para>
 108
 109     <sect3 id="querymodel-attribute-sets">
 110      <title>Attribute sets</title>
 111      <para>
 112       Attribute sets define the exact meaning and semantics of queries
 113       issued. Zebra comes with some predefined attribute set
 114       definitions, others can easily be defined and added to the
 115       configuration.
 116       <note>
 117        The Zebra internal query procesing is modeled after
 118        the <literal>Bib1</literal> attribute set, and the non-use
 119        attributes type 2-6 are hard-wired in. It is therefore essential
 120        to be familiar with <xref linkend="querymodel-bib1"/>.
 121       </note>
 122      </para>
 123
 124      <table id="querymodel-attribute-sets-table"
 125       frame="all" rowsep="1" colsep="1" align="center">
 126
 127       <caption>Attribute sets predefined in Zebra</caption>
 128        <!--
 129        <thead>
 130        <tr><td>one</td><td>two</td></tr>
 131       </thead>
 132        -->
 133        <tbody>
 134         <tr>
 135          <td><literal>exp-1</literal></td>
 136          <td><literal>Explain</literal> attribute set</td>
 137          <td>Special attribute set used on the special automagic
 138           <literal>IR-Explain-1</literal> database to gain information on
 139           server capabilities, database names, and database
 140           and semantics.</td>
 141         </tr>
 142         <tr>
 143          <td><literal>bib-1</literal></td>
 144          <td><literal>Bib1</literal> attribute set</td>
 145          <td>Standard PQF query language attribute set which defines the
 146           semantics of Z39.50 searching. In addition, all of the
 147           non-use attributes (type 2-9) define the Zebra internal query
 148           processing</td>
 149         </tr>
 150         <tr>
 151          <td><literal>gils</literal></td>
 152          <td><literal>GILS</literal> attribute set</td>
 153          <td>Extention to the <literal>Bib1</literal> attribute set.</td>
 154         </tr>
 155        </tbody>
 156      </table>
 157     </sect3>
 158
 159     <sect3 id="querymodel-boolean-operators">
 160      <title>Boolean operators</title>
 161      <para>
 162       A pair of subquery trees, or of atomic queries, is combined
 163       using the standard boolean operators into new query trees.
 164      </para>
 165
 166      <table id="querymodel-boolean-operators-table"
 167       frame="all" rowsep="1" colsep="1" align="center">
 168
 169       <caption>Boolean operators</caption>
 170        <!--
 171        <thead>
 172        <tr><td>one</td><td>two</td></tr>
 173       </thead>
 174        -->
 175        <tbody>
 176         <tr><td><literal>@and</literal></td>
 177          <td>binary <literal>AND</literal> operator</td>
 178          <td>Set intersection of two atomic queries hit sets</td>
 179         </tr>
 180         <tr><td><literal>@or</literal></td>
 181          <td>binary <literal>OR</literal> operator</td>
 182          <td>Set union of two atomic queries hit sets</td>
 183         </tr>
 184         <tr><td><literal>@not</literal></td>
 185          <td>binary <literal>AND NOT</literal> operator</td>
 186          <td>Set complement of two atomic queries hit sets</td>
 187         </tr>
 188         <tr><td><literal>@prox</literal></td>
 189          <td>binary <literal>PROXIMY</literal> operator</td>
 190          <td>Set intersection of two atomic queries hit sets. In
 191           addition, the intersection set is purged for all
 192           documents which do not satisfy the requested query
 193           term proximity. Usually a proper subset of the AND
 194           operation.</td>
 195         </tr>
 196        </tbody>
 197      </table>
 198
 199      <para>
 200       For example, we can combine the terms
 201       <emphasis>information</emphasis> and <emphasis>retrieval</emphasis>
 202       into different searches in the default index of the default
 203       attribute set as follows.
 204       Querying for the union of all documents containing the
 205       terms <emphasis>information</emphasis> OR
 206       <emphasis>retrieval</emphasis>:
 207       <screen>
 208        Z> find @or information retrieval
 209       </screen>
 210      </para>
 211      <para>
 212       Querying for the intersection of all documents containing the
 213       terms <emphasis>information</emphasis> AND
 214       <emphasis>retrieval</emphasis>:
 215       The hit set is a subset of the coresponding
 216       OR query.
 217       <screen>
 218        Z> find @and information retrieval
 219       </screen>
 220      </para>
 221      <para>
 222       Querying for the intersection of all documents containing the
 223       terms <emphasis>information</emphasis> AND
 224       <emphasis>retrieval</emphasis>, taking proximity into account:
 225       The hit set is a subset of the coresponding
 226       AND query.
 227       <screen>
 228        Z> find @prox information retrieval
 229       </screen>
 230      </para>
 231      <para>
 232       Querying for the intersection of all documents containing the
 233       terms <emphasis>information</emphasis> AND
 234       <emphasis>retrieval</emphasis>, in the same order and near each
 235       other as described in the term list
 236       The hit set is a subset of the coresponding
 237       PROXIMY query.
 238       <screen>
 239        Z> find "information retrieval"
 240       </screen>
 241      </para>
 242     </sect3>
 243
 244
 245     <sect3 id="querymodel-atomic-queries">
 246      <title>Atomic queries (APT)</title>
 247      <para>
 248       Atomic queries are the query parts which work on one acess point
 249       only. These consist of <literal>an attribute list</literal>
 250       followed by a <literal>single term</literal> or a
 251       <literal>quoted term list</literal>, and are often called
 252       <emphasis>Attributes-Plus-Terms (APT)</emphasis> queries.
 253      </para>
 254      <para>
 255       Unsupplied non-use attributes type 2-9 are either inherited from
 256       higher nodes in the query tree, or are set to Zebra's default values.
 257       See <xref linkend="querymodel-bib1"/> for details.
 258      </para>
 259
 260      <table id="querymodel-atomic-queries-table"
 261       frame="all" rowsep="1" colsep="1" align="center">
 262
 263       <caption>Atomic queries</caption>
 264        <!--
 265        <thead>
 266        <tr><td>one</td><td>two</td></tr>
 267       </thead>
 268        -->
 269        <tbody>
 270         <tr><td><emphasis>attribute list</emphasis></td>
 271          <td>List of <literal>orthogonal</literal> attributes</td>
 272          <td>Any of the orthogonal attribute types may be omitted,
 273           these are inherited from higher query tree nodes, or if not
 274           inherited, are set to the default Zebra configuration values.
 275          </td>
 276         </tr>
 277         <tr><td><emphasis>term</emphasis></td>
 278          <td>single <literal>term</literal>
 279           or <literal>quoted term list</literal>   </td>
 280          <td>Here the search terms or list of search terms is added
 281           to the query</td>
 282         </tr>
 283        </tbody>
 284      </table>
 285      <para>
 286       Querying for the term <emphasis>information</emphasis> in the
 287       default index using the default attribite set, the server choice
 288       of access point/index, and the default non-use attributes.
 289       <screen>
 290        Z> find "information"
 291       </screen>
 292      </para>
 293      <para>
 294       Equivalent query fully specified including all default values:
 295       <screen>
 296        Z> find @attrset bib-1 @attr 1=1017 @attr 2=3 @attr 3=3 @attr 4=1 @attr 5=100 @attr 6=1 "information"
 297       </screen>
 298      </para>
 299
 300      <para>
 301       Finding all documents which have empty titles. Notice that the
 302       empty term must be quoted, but is otherwise legal.
 303       <screen>
 304        Z> find @attr 1=4 ""
 305       </screen>
 306      </para>
 307
 308     </sect3>
 309
 310     <sect3 id="querymodel-use-string">
 311      <title>Zebra's special use attribute type 1 of form 'string'</title>
 312      <para>
 313       The numeric <literal>use (type 1)</literal> attribute is usually
 314       refered to from a given
 315       attribute set. In addition, Zebra let you use
 316       <emphasis>any internal index
 317        name defined in your configuration</emphasis>
 318       as use atribute value. This is a great feature for
 319       debugging, and when you do
 320       not need the complecity of defined use attribute values. It is
 321       the preferred way of accessing Zebra indexes directly.
 322      </para>
 323      <para>
 324       Finding all documents which have the term list "information
 325       retrieval" in an Zebra index, using it's internal full string name.
 326       <screen>
 327        Z> find @attr 1=sometext "information retrieval"
 328       </screen>
 329      </para>
 330      <para>
 331       Searching the bib-1 use attribute 54 using it's string name:
 332       <screen>
 333        Z> find @attr 1=Code-language eng
 334       </screen>
 335      </para>
 336      <para>
 337       Searching in any silly string index - if it's defined in your
 338       indexation rules and can be parsed by the PQF parser.
 339       This is definitely not the recommended use of
 340       this facility, as it might confuse your users with some very
 341       unexpected results.
 342       <screen>
 343        Z> find @attr 1=silly/xpath/alike[@index]/name "information retrieval"
 344       </screen>
 345      </para>
 346      <para>
 347       See <xref linkend="querymodel-bib1-mapping"/> for details, and
 348       <xref linkend="server-sru"/>
 349       for the SRU PQF query extention using string names as a fast
 350       debugging facility.
 351      </para>
 352     </sect3>
 353
 354     <sect3 id="querymodel-use-xpath">
 355      <title>Zebra's special use attribute type 1 of form 'XPath'
 356       for GRS filters</title>
 357      <para>
 358       As we have seen above, it is possible (albeit seldom a great
 359       idea) to emulate
 360       <ulink url="http://www.w3.org/TR/xpath">XPath 1.0</ulink> based
 361       search by defining <literal>use (type 1)</literal>
 362       <emphasis>string</emphasis> attributes which in appearence
 363       <emphasis>resemble XPath queries</emphasis>. There are two
 364       problems with this approach: first, the XPath-look-alike has to
 365       be defined at indexation time, no new undefined
 366       XPath queries can entered at search time, and second, it might
 367       confuse users very much that an XPath-alike index name in fact
 368       gets populated from a possible entirely different XML element
 369       than it pretends to acess.
 370      </para>
 371      <para>
 372       When using the <literal>GRS Record Model</literal>
 373       (see  <xref linkend="record-model-grs"/>), we have the
 374       possibility to embed <emphasis>life</emphasis>
 375       XPath expressions
 376       in the PQF queries, which are here called
 377       <literal>use (type 1)</literal> <emphasis>xpath</emphasis>
 378       attributes. You must enable the
 379       <literal>xpath enable</literal> directive in your
 380       <literal>.abs</literal> config files.
 381      </para>
 382      <note>
 383       Only a <emphasis>very</emphasis> restricted subset of the
 384       <ulink url="http://www.w3.org/TR/xpath">XPath 1.0</ulink>
 385       standard is supported as the GRS record model is simpler than
 386       a full XML DOM structure. See the following examples for
 387       possibilities.
 388      </note>
 389      <para>
 390       Finding all documents which have the term "content"
 391       inside a text node found in a specific XML DOM
 392       <emphasis>subtree</emphasis>, whose starting element is
 393       adressed by XPath.
 394       <screen>
 395        Z> find @attr 1=/root content
 396        Z> find @attr 1=/root/first content
 397       </screen>
 398       <emphasis>Notice that the
 399        XPath must be absolute, i.e., must start with '/', and that the
 400        XPath <literal>decendant-or-self</literal> axis followed by a
 401        text node selection <literal>text()</literal> is implicitly
 402        appended to the stated XPath.
 403       </emphasis>
 404       It follows that the above searches are interpreted as:
 405       <screen>
 406        Z> find @attr 1=/root//text() content
 407        Z> find @attr 1=/root/first//text() content
 408       </screen>
 409      </para>
 410
 411      <para>
 412       Filter the adressing XPath by a predicate working on exact
 413       string values in
 414       attributes (in the XML sense) can be done: return all those docs which
 415       have the term "english" contained in one of all text subnodes of
 416       the subtree defined by the XPath
 417       <literal>/record/title[@lang='en']</literal>
 418       <screen>
 419        Z> find @attr 1=/record/title[@lang='en'] english
 420       </screen>
 421      </para>
 422
 423      <para>
 424       Combining numeric indexes, boolean expressions,
 425       and xpath based searches is possible:
 426       <screen>
 427        Z> find @attr 1=/record/title @and foo bar
 428        Z> find @and @attr 1=/record/title foo @attr 1=4 bar
 429       </screen>
 430      </para>
 431      <para>
 432       Escaping PQF keywords and other non-parseable XPath constructs
 433       with <literal>'{ }'</literal> to prevent syntax errors:
 434       <screen>
 435        Z> find @attr {1=/root/first[@attr='danish']} content
 436        Z> find @attr {1=/root/second[@attr='danish lake']}
 437        Z> find @attr {1=/root/third[@attr='dansk s\xc3\xb8']}
 438       </screen>
 439      </para>
 440      <warning>
 441       It is worth mentioning that these dynamic performed XPath
 442       queries are a performance bottelneck, as no optimized
 443       specialized indexes can be used. Therefore, avoid the use of
 444       this facility when speed is essential, and the database content
 445       size is medium to large.
 446      </warning>
 447     </sect3>
 448
 449    </sect2>
 450
 451    <sect2 id="querymodel-exp1">
 452     <title>Explain Attribute Set</title>
 453     <para>
 454      The Z39.50 standard defines the
 455      <ulink url="&url.z39.50.explain;">Explain</ulink>attribute set
 456      <literal>exp-1</literal>, which is used to discover information
 457      about a server's search semantics and functional capabilities
 458      Zebra exposes a  "classic"
 459      Explain database by base name <literal>IR-Explain-1</literal>, which
 460      is populated with system internal information.
 461     </para>
 462    <para>
 463      The attribute-set <literal>exp-1</literal> consists of a single
 464      <literal>Use (type 1)</literal> attribute.
 465     </para>
 466     <para>
 467      In addition, the non-Use
 468      <literal>bib-1</literal> attributes, that is, the types
 469      <literal>Relation</literal>, <literal>Position</literal>,
 470      <literal>Structure</literal>, <literal>Truncation</literal>,
 471      and <literal>Completeness</literal> are imported from
 472      the <literal>bib-1</literal> attribute set, and may be used
 473      within any explain query.
 474     </para>
 475
 476     <sect3 id="querymodel-exp1-use">
 477     <title>Use Attributes (type = 1)</title>
 478      <para>
 479       The following Explain search atributes are supported:
 480       <literal>ExplainCategory</literal> (@attr 1=1),
 481       <literal>DatabaseName</literal> (@attr 1=3),
 482       <literal>DateAdded</literal> (@attr 1=9),
 483       <literal>DateChanged</literal>(@attr 1=10).
 484      </para>
 485      <para>
 486       A search in the use attribute  <literal>ExplainCategory</literal>
 487       supports only these predefined values:
 488       <literal>CategoryList</literal>, <literal>TargetInfo</literal>,
 489       <literal>DatabaseInfo</literal>, <literal>AttributeDetails</literal>.
 490      </para>
 491      <para>
 492       See <filename>tab/explain.att</filename> and the
 493       <ulink url="&url.z39.50;">Z39.50</ulink> standard
 494       for more information.
 495      </para>
 496     </sect3>
 497
 498     <sect3>
 499      <title>Explain searches with yaz-client</title>
 500      <para>
 501       Classic Explain only defines retrieval of Explain information
 502       via ASN.1. Pratically no Z39.50 clients supports this. Fortunately
 503       they don't have to - Zebra allows retrieval of this information
 504       in other formats:
 505       <literal>SUTRS</literal>, <literal>XML</literal>,
 506       <literal>GRS-1</literal> and  <literal>ASN.1</literal> Explain.
 507      </para>
 508
 509      <para>
 510       List supported categories to find out which explain commands are
 511       supported:
 512       <screen>
 513        Z> base IR-Explain-1
 514        Z> find @attr exp1 1=1 categorylist
 515        Z> form sutrs
 516        Z> show 1+2
 517       </screen>
 518      </para>
 519
 520      <para>
 521       Get target info, that is, investigate which databases exist at
 522       this server endpoint:
 523       <screen>
 524        Z> base IR-Explain-1
 525        Z> find @attr exp1 1=1 targetinfo
 526        Z> form xml
 527        Z> show 1+1
 528        Z> form grs-1
 529        Z> show 1+1
 530        Z> form sutrs
 531        Z> show 1+1
 532       </screen>
 533      </para>
 534
 535      <para>
 536       List all supported databases, the number of hits
 537       is the number of databases found, which most commonly are the
 538       following two:
 539       the <literal>Default</literal> and the
 540       <literal>IR-Explain-1</literal> databases.
 541       <screen>
 542        Z> base IR-Explain-1
 543        Z> find @attr exp1 1=1 databaseinfo
 544        Z> form sutrs
 545        Z> show 1+2
 546       </screen>
 547      </para>
 548
 549      <para>
 550       Get database info record for database <literal>Default</literal>.
 551       <screen>
 552        Z> base IR-Explain-1
 553        Z> find @and @attr exp1 1=1 databaseinfo @attr exp1 1=3 Default
 554       </screen>
 555       Identical query with explicitly specified attribute set:
 556       <screen>
 557        Z> base IR-Explain-1
 558        Z> find @attrset exp1 @and @attr 1=1 databaseinfo @attr 1=3 Default
 559       </screen>
 560      </para>
 561
 562      <para>
 563       Get attribute details record for database
 564       <literal>Default</literal>.
 565       This query is very useful to study the internal Zebra indexes.
 566       If records have been indexed using the <literal>alvis</literal>
 567       XSLT filter, the string representation names of the known indexes can be
 568       found.
 569       <screen>
 570        Z> base IR-Explain-1
 571        Z> find @and @attr exp1 1=1 attributedetails @attr exp1 1=3 Default
 572       </screen>
 573       Identical query with explicitly specified attribute set:
 574       <screen>
 575        Z> base IR-Explain-1
 576        Z> find @attrset exp1 @and @attr 1=1 attributedetails @attr 1=3 Default
 577       </screen>
 578      </para>
 579     </sect3>
 580
 581    </sect2>
 582
 583    <sect2 id="querymodel-bib1">
 584     <title>Bib1 Attribute Set</title>
 585     <para>
 586      Something about querying to be written ..
 587     </para>
 588     <para>
 589      Most of the information contained in this section is an excerpt of
 590      the <literal>ATTRIBUTE SET BIB-1 (Z39.50-1995)
 591       SEMANTICS</literal>,
 592      found at  <ulink url="&url.z39.50.attset.bib1.1995;">. The BIB-1
 593       Attribute Set Semantics</ulink> from 1995, also in an updated
 594      <ulink url="&url.z39.50.attset.bib1;">Bib-1
 595       Attribute Set</ulink>
 596      version from 2003. Index Data is not the copyright holder of this
 597      information.
 598     </para>
 599
 600
 601    <sect3 id="querymodel-bib1-use">
 602      <title>Use Attributes (type 1)</title>
 603     </sect3>
 604
 605     <para>
 606      A use attribute specifies an access point for any atomic query.
 607      These acess points are highly dependent on the attribute set used
 608      in the query, and are user configurable using the following
 609      default configuration files:
 610      <filename>tab/bib1.att</filename>,
 611      <filename>tab/dan1.att</filename>,
 612      <filename>tab/explain.att</filename>, and
 613      <filename>tab/gils.att</filename>.
 614      New attribute sets can be added by adding new
 615      <filename>tab/*.att</filename> configuration files, which need to
 616      be sourced in the main configuration <filename>zebra.cfg</filename>.
 617      </para>
 618
 619     <para>
 620      In addition, Zebra allows the acess of
 621      <emphasis>internal index names</emphasis> and <emphasis>dynamic
 622      XPath</emphasis> as use attributes.
 623      See  <xref linkend="querymodel-use-string"/> and
 624      <xref linkend="querymodel-use-xpath"/> for
 625      alternative acess to the Zebra internal index names and XPath queries.
 626     </para>
 627
 628     <para>
 629      Phrase search for <emphasis>information retrieval</emphasis> in
 630      the title-register:
 631      <screen>
 632       Z> find @attr 1=4 "information retrieval"
 633      </screen>
 634     </para>
 635
 636
 637     <sect3 id="querymodel-bib1-relation">
 638      <title>Relation Attributes (type 2)</title>
 639
 640      <para>
 641       Relation attributes describe the relationship of the access
 642       point (left side
 643       of the relation) to the search term as qualified by the attributes (right
 644       side of the relation), e.g., Date-publication &lt;= 1975.
 645       </para>
 646
 647      <table id="querymodel-bib1-relation-table"
 648       frame="all" rowsep="1" colsep="1" align="center">
 649
 650       <caption>Relation Attributes (type 2)</caption>
 651       <thead>
 652         <tr>
 653          <td>Relation</td>
 654          <td>Value</td>
 655          <td>Notes</td>
 656         </tr>
 657        </thead>
 658        <tbody>
 659         <tr>
 660          <td> Less than</td>
 661          <td>1</td>
 662          <td>supported</td>
 663         </tr>
 664         <tr>
 665          <td>Less than or equal</td>
 666          <td>2</td>
 667          <td>supported</td>
 668         </tr>
 669         <tr>
 670          <td>Equal</td>
 671          <td>3</td>
 672          <td>default</td>
 673         </tr>
 674         <tr>
 675          <td>Greater or equal</td>
 676          <td>4</td>
 677          <td>supported</td>
 678         </tr>
 679         <tr>
 680          <td>Greater than</td>
 681          <td>5</td>
 682          <td>supported</td>
 683         </tr>
 684         <tr>
 685          <td>Not equal</td>
 686          <td>6</td>
 687          <td>unsupported</td>
 688         </tr>
 689         <tr>
 690          <td>Phonetic</td>
 691          <td>100</td>
 692          <td>unsupported</td>
 693         </tr>
 694         <tr>
 695          <td>Stem</td>
 696          <td>101</td>
 697          <td>unsupported</td>
 698         </tr>
 699         <tr>
 700          <td>Relevance</td>
 701          <td>102</td>
 702          <td>supported</td>
 703         </tr>
 704         <tr>
 705          <td>AlwaysMatches</td>
 706          <td>103</td>
 707          <td>unsupported</td>
 708         </tr>
 709        </tbody>
 710      </table>
 711
 712      <para>
 713       The relation attribute
 714       <literal>relevance (102)</literal> is supported, see
 715       <xref linkend="administration-ranking"/> for full information.
 716       <!-- always-matches (103) not supported for all indexes -->
 717      </para>
 718
 719     <para>
 720      All ordering operations are based on a lexicographical ordering,
 721      <emphasis>expect</emphasis> when the
 722      <literal>structure attribute numeric (109)</literal> is used. In
 723      this case, ordering is numerical. See
 724       <xref linkend="querymodel-bib1-structure"/>.
 725     </para>
 726
 727      <para>
 728      Ranked search for <emphasis>information retrieval</emphasis> in
 729      the title-register:
 730      <screen>
 731       Z> find @attr 1=4 @attr 2=102 "information retrieval"
 732      </screen>
 733     </para>
 734     </sect3>
 735
 736     <sect3 id="querymodel-bib1-position">
 737      <title>Position Attributes (type 3)</title>
 738
 739      <para>
 740       The position attribute specifies the location of the search term
 741       within the field or subfield in which it appears.
 742      </para>
 743
 744      <table id="querymodel-bib1-position-table"
 745       frame="all" rowsep="1" colsep="1" align="center">
 746
 747       <caption>Position Attributes (type 3)</caption>
 748       <thead>
 749         <tr>
 750          <td>Position</td>
 751          <td>Value</td>
 752          <td>Notes</td>
 753         </tr>
 754        </thead>
 755        <tbody>
 756         <tr>
 757          <td>First in field </td>
 758          <td>1</td>
 759          <td>unsupported</td>
 760         </tr>
 761         <tr>
 762          <td>First in subfield</td>
 763          <td>2</td>
 764          <td>unsupported</td>
 765         </tr>
 766         <tr>
 767          <td>Any position in field</td>
 768          <td>3</td>
 769          <td>default</td>
 770         </tr>
 771        </tbody>
 772      </table>
 773
 774     <para>
 775       The position attribute values <literal>first in field (1)</literal>,
 776       and <literal>first in subfield(2)</literal> are unsupported.
 777       Using them does not trigger an error, but silent defaults to
 778       <literal>any position in field (3)</literal>.
 779       <!-- It should -->
 780       </para>
 781     </sect3>
 782
 783     <sect3 id="querymodel-bib1-structure">
 784      <title>Structure Attributes (type 4)</title>
 785
 786      <para>
 787       The structure attribute specifies the type of search
 788       term. This causes the search to be mapped on
 789       different Zebra internal indexes, which must have been defined
 790       at index time.
 791      </para>
 792
 793      <para>
 794       The possible values of the
 795       <literal>structure attribute (type 4)</literal> can be defined
 796       using the configuraiton file <filename>
 797       tab/default.idx</filename>.
 798       The default configuration is summerized in this table.
 799      </para>
 800
 801      <table id="querymodel-bib1-structure-table"
 802       frame="all" rowsep="1" colsep="1" align="center">
 803
 804       <caption>Structure Attributes (type 4)</caption>
 805       <thead>
 806         <tr>
 807          <td>Structure</td>
 808          <td>Value</td>
 809          <td>Notes</td>
 810         </tr>
 811        </thead>
 812        <tbody>
 813         <tr>
 814          <td>Phrase </td>
 815          <td>1</td>
 816          <td>default</td>
 817         </tr>
 818         <tr>
 819          <td>Word</td>
 820          <td>2</td>
 821          <td>supported</td>
 822         </tr>
 823         <tr>
 824          <td>Key</td>
 825          <td>3</td>
 826          <td>supported</td>
 827         </tr>
 828         <tr>
 829          <td>Year</td>
 830          <td>4</td>
 831          <td>supported</td>
 832         </tr>
 833         <tr>
 834          <td>Date (normalized)</td>
 835          <td>5</td>
 836          <td>supported</td>
 837         </tr>
 838         <tr>
 839          <td>Word list</td>
 840          <td>6</td>
 841          <td>supported</td>
 842         </tr>
 843         <tr>
 844          <td>Date (un-normalized)</td>
 845          <td>100</td>
 846          <td>unsupported</td>
 847         </tr>
 848         <tr>
 849          <td>Name (normalized) </td>
 850          <td>101</td>
 851          <td>unsupported</td>
 852         </tr>
 853         <tr>
 854          <td>Name (un-normalized) </td>
 855          <td>102</td>
 856          <td>unsupported</td>
 857         </tr>
 858         <tr>
 859          <td>Structure</td>
 860          <td>103</td>
 861          <td>unsupported</td>
 862         </tr>
 863         <tr>
 864          <td>Urx</td>
 865          <td>104</td>
 866          <td>supported</td>
 867         </tr>
 868         <tr>
 869          <td>Free-form-text</td>
 870          <td>105</td>
 871          <td>supported</td>
 872         </tr>
 873         <tr>
 874          <td>Document-text</td>
 875          <td>106</td>
 876          <td>supported</td>
 877         </tr>
 878         <tr>
 879          <td>Local-number</td>
 880          <td>107</td>
 881          <td>supported</td>
 882         </tr>
 883         <tr>
 884          <td>String</td>
 885          <td>108</td>
 886          <td>unsupported</td>
 887         </tr>
 888         <tr>
 889          <td>Numeric string</td>
 890          <td>109</td>
 891          <td>supported</td>
 892         </tr>
 893        </tbody>
 894      </table>
 895     </sect3>
 896
 897     <para>
 898      The structure attribute value <literal>local-number
 899       (107)</literal>
 900      is supported, and maps always to the Zebra internal document ID.
 901      </para>
 902
 903     <para>
 904      For example, in
 905      the GILS schema (<literal>gils.abs</literal>), the
 906      west-bounding-coordinate is indexed as type <literal>n</literal>,
 907      and is therefore searched by specifying
 908      <emphasis>structure</emphasis>=<emphasis>Numeric String</emphasis>.
 909      To match all those records with west-bounding-coordinate greater
 910      than -114 we use the following query:
 911      <screen>
 912       Z> find @attr 4=109 @attr 2=5 @attr gils 1=2038 -114
 913      </screen>
 914     </para>
 915
 916     <sect3 id="querymodel-bib1-truncation">
 917      <title>Truncation Attributes (type = 5)</title>
 918
 919      <para>
 920       The truncation attribute specifies whether variations of one or
 921       more characters are allowed between serch term and hit terms, or
 922       not. Using non-default truncation attributes will broaden the
 923       document hit set of a search query.
 924      </para>
 925
 926      <table id="querymodel-bib1-truncation-table"
 927       frame="all" rowsep="1" colsep="1" align="center">
 928
 929       <caption>Truncation Attributes (type 5)</caption>
 930       <thead>
 931         <tr>
 932          <td>Truncation</td>
 933          <td>Value</td>
 934          <td>Notes</td>
 935         </tr>
 936        </thead>
 937        <tbody>
 938         <tr>
 939          <td>Right truncation </td>
 940          <td>1</td>
 941          <td>supported</td>
 942         </tr>
 943         <tr>
 944          <td>Left truncation</td>
 945          <td>2</td>
 946          <td>supported</td>
 947         </tr>
 948         <tr>
 949          <td>Left and right truncation</td>
 950          <td>3</td>
 951          <td>supported</td>
 952         </tr>
 953         <tr>
 954          <td>Do not truncate</td>
 955          <td>100</td>
 956          <td>default</td>
 957         </tr>
 958         <tr>
 959          <td>Process # in search term</td>
 960          <td>101</td>
 961          <td>supported</td>
 962         </tr>
 963         <tr>
 964          <td>RegExpr-1 </td>
 965          <td>102</td>
 966          <td>supported</td>
 967         </tr>
 968         <tr>
 969          <td>RegExpr-2</td>
 970          <td>103</td>
 971          <td>supported</td>
 972         </tr>
 973        </tbody>
 974      </table>
 975
 976      <para>
 977       Truncation attribute value
 978       <literal>Process # in search term (100)</literal> is a
 979       poor-man's regular expression search. It maps
 980       each <literal>#</literal> to <literal>.*</literal>, and
 981       performes then a <literal>Regexp-1 (102)</literal> regular
 982       expression search.
 983      </para>
 984      <para>
 985       Truncation attribute value
 986        <literal>Regexp-1 (102)</literal> is a normal regular search,
 987       see.
 988      </para>
 989      <para>
 990        Truncation attribute value
 991       <literal>Regexp-2 (103) </literal> is a Zebra specific extention
 992       which allows <emphasis>fuzzy</emphasis> matches. One single
 993       error in spelling of search terms is allowed, i.e., a document
 994       is hit if it includes a term which can be mapped to the used
 995       search term by one character substitution, addition, deletion or
 996       change of posiiton.
 997       </para>
 998       <!--
 999       Special 104, 105, 106 are deprecated and will be removed! -->
1000     </sect3>
1001
1002     <sect3 id="querymodel-bib1-completeness">
1003     <title>Completeness Attributes (type = 6)</title>
1004      <para>
1005       This attribute is ONLY used if structure w, p is to be
1006       chosen. completeness is ignorned if not w, p is to be
1007       used..
1008       Incomplete field(1) is the default and makes Zebra use
1009       register type w.
1010       complete subfield(2) and complete field(3) both triggers
1011       search field type p.
1012      </para>
1013     </sect3>
1014    </sect2>
1015
1016
1017    <sect2 id="querymodel-zebra-attr-search">
1018     <title>Zebra specific Search Extentions to all Attribute Sets</title>
1019     <para>
1020      Zebra extends the Bib1 attribute types, and these extentions are
1021      recognized regardless of attribute
1022      set used in a <literal>search</literal> operation query.
1023     </para>
1024
1025      <table id="querymodel-zebra-attr-search-table"
1026       frame="all" rowsep="1" colsep="1" align="center">
1027
1028       <caption>Zebra Search Attribute Extentions</caption>
1029        <thead>
1030         <tr>
1031          <td>Name</td>
1032          <td>Value</td>
1033          <td>Operation</td>
1034          <td>Zebra version</td>
1035         </tr>
1036       </thead>
1037        <tbody>
1038         <tr>
1039          <td>Embedded Sort</td>
1040          <td>7</td>
1041          <td>search</td>
1042          <td>1.1</td>
1043         </tr>
1044         <tr>
1045          <td>Term Set</td>
1046          <td>8</td>
1047          <td>search</td>
1048          <td>1.1</td>
1049         </tr>
1050         <tr>
1051          <td>Rank Weight</td>
1052          <td>9</td>
1053          <td>search</td>
1054          <td>1.1</td>
1055         </tr>
1056         <tr>
1057          <td>Approx Limit</td>
1058          <td>9</td>
1059          <td>search</td>
1060          <td>1.4</td>
1061         </tr>
1062         <tr>
1063          <td>Term Reference</td>
1064          <td>10</td>
1065          <td>search</td>
1066          <td>1.4</td>
1067         </tr>
1068        </tbody>
1069       </table>
1070
1071     <sect3 id="querymodel-zebra-attr-sorting">
1072      <title>Zebra Extention Embedded Sort Attribute (type 7)</title>
1073     </sect3>
1074     <para>
1075      The embedded sort is a way to specify sort within a query - thus
1076      removing the need to send a Sort Request separately. It is both
1077      faster and does not require clients to deal with the Sort
1078      Facility.
1079     </para>
1080     <para>
1081      The possible values after attribute <literal>type 7</literal> are
1082      <literal>1</literal> ascending and
1083      <literal>2</literal> descending.
1084      The attributes+term (APT) node is separate from the
1085      rest and must be <literal>@or</literal>'ed.
1086      The term associated with APT is the sorting level in integers,
1087      where <literal>0</literal> means primary sort,
1088      <literal>1</literal> means secondary sort, and so forth.
1089      See also <xref linkend="administration-ranking"/>.
1090     </para>
1091     <para>
1092      For example, searching for water, sort by title (ascending)
1093      <screen>
1094       Z> find @or @attr 1=1016 water @attr 7=1 @attr 1=4 0
1095      </screen>
1096     </para>
1097     <para>
1098      Or, searching for water, sort by title ascending, then date descending
1099      <screen>
1100       Z> find @or @or @attr 1=1016 water @attr 7=1 @attr 1=4 0 @attr 7=2 @attr 1=30 1
1101      </screen>
1102     </para>
1103
1104     <sect3 id="querymodel-zebra-attr-estimation">
1105      <title>Zebra Extention Term Set Attribute (type 8)</title>
1106     </sect3>
1107     <para>
1108      The Term Set feature is a facility that allows a search to store
1109      hitting terms in a "pseudo" resultset; thus a search (as usual) +
1110      a scan-like facility. Requires a client that can do named result
1111      sets since the search generates two result sets. The value for
1112      attribute 8 is the name of a result set (string). The terms in
1113      the named term set are returned as SUTRS records.
1114     </para>
1115     <para>
1116      For example, searching  for u in title, right truncated, and
1117      storing the result in term set named 'aset'
1118      <screen>
1119       Z> find @attr 5=1 @attr 1=4 @attr 8=aset u
1120      </screen>
1121     </para>
1122     <warning>
1123      The model has one serious flaw: we don't know the size of term
1124      set. Experimental. Do not use in production code.
1125     </warning>
1126
1127     <sect3 id="querymodel-zebra-attr-weight">
1128      <title>Zebra Extention Rank Weight Attribute (type 9)</title>
1129     </sect3>
1130     <para>
1131      Rank weight is a way to pass a value to a ranking algorithm - so
1132      that one APT has one value - while another as a different one.
1133      See also <xref linkend="administration-ranking"/>.
1134     </para>
1135     <para>
1136      For example, searching  for utah in title with weight 30 as well
1137      as any with weight 20:
1138      <screen>
1139       Z> find @attr 2=102 @or @attr 9=30 @attr 1=4 utah @attr 9=20 utah
1140      </screen>
1141     </para>
1142
1143     <sect3 id="querymodel-zebra-attr-limit">
1144      <title>Zebra Extention Approximative Limit Attribute (type 9)</title>
1145     </sect3>
1146     <para>
1147      Newer Zebra versions normally estemiates hit count for every APT
1148      (leaf) in the query tree. These hit counts are returned as part of
1149      the searchResult-1 facility in the binary encoded Z39.50 search
1150      response packages.
1151     </para>
1152     <para>
1153      By setting a limit for the APT we can make Zebra turn into
1154      approximate hit count when a certain hit count limit is
1155      reached. A value of zero means exact hit count.
1156     </para>
1157     <para>
1158      For example, we might be intersted in exact hit count for a, but
1159      for b we allow hit count estimates for 1000 and higher.
1160      <screen>
1161       Z> find @and a @attr 9=1000 b
1162      </screen>
1163     </para>
1164     <note>
1165      The estimated hit count fascility makes searches faster, as one
1166      only needs to process large hit lists partially.
1167     </note>
1168     <warning>
1169      This facility clashes with rank weight, because there all
1170      documents in the hit lists need to be examined for scoring and
1171      re-sorting.
1172      It is an experimental
1173      extention. Do not use in production code.
1174     </warning>
1175
1176     <sect3 id="querymodel-zebra-attr-termref">
1177      <title>Zebra Extention Term Reference Attribute (type 10)</title>
1178     </sect3>
1179     <para>
1180      Zebra supports the <literal>searchResult-1</literal> facility.
1181      If the <literal>Term Reference Attribute (type 10)</literal> is
1182      given, that specifies a subqueryId value returned as part of the
1183      search result. It is a way for a client to name an APT part of a
1184      query.
1185     </para>
1186     <!--
1187     <para>
1188      <screen>
1189      </screen>
1190     </para>
1191     -->
1192     <warning>
1193      Experimental. Do not use in production code.
1194     </warning>
1195
1196
1197    </sect2>
1198
1199
1200    <sect2 id="querymodel-zebra-attr-scan">
1201     <title>Zebra specific Scan Extentions to all Attribute Sets</title>
1202     <para>
1203      Zebra extends the Bib1 attribute types, and these extentions are
1204      recognized regardless of attribute
1205      set used in a <literal>scan</literal> operation query.
1206     </para>
1207      <table id="querymodel-zebra-attr-scan-table"
1208       frame="all" rowsep="1" colsep="1" align="center">
1209
1210       <caption>Zebra Scan Attribute Extentions</caption>
1211        <thead>
1212         <tr>
1213          <td>Name</td>
1214          <td>Type</td>
1215          <td>Operation</td>
1216          <td>Zebra version</td>
1217         </tr>
1218       </thead>
1219        <tbody>
1220         <tr>
1221          <td>Result Set Narrow</td>
1222          <td>8</td>
1223          <td>scan</td>
1224          <td>1.3</td>
1225         </tr>
1226         <tr>
1227          <td>Approximative Limit</td>
1228          <td>9</td>
1229          <td>scan</td>
1230          <td>1.4</td>
1231         </tr>
1232        </tbody>
1233       </table>
1234
1235     <sect3 id="querymodel-zebra-attr-narrow">
1236      <title>Zebra Extention Result Set Narrow (type 8)</title>
1237     </sect3>
1238     <para>
1239      If attribute <literal>Result Set Narrow (type 8)</literal>
1240      is given for <literal>scan</literal>, the value is the name of a
1241      result set. Each hit count in <literal>scan</literal> is
1242      <literal>@and</literal>'ed with the result set given.
1243     </para>
1244     <!--
1245     <para>
1246      <screen>
1247      </screen>
1248     </para>
1249     -->
1250     <warning>
1251      Experimental and buggy. Definitely not to be used in production code.
1252     </warning>
1253
1254     <sect3 id="querymodel-zebra-attr-approx">
1255      <title>Zebra Extention Approximative Limit (type 9)</title>
1256     </sect3>
1257     <para>
1258      The <literal>Zebra Extention Approximative Limit (type
1259       9)</literal> is a way to enable approx
1260      hit counts for <literal>scan</literal> hit counts, in the same
1261      way as for <literal>search</literal> hit counts.
1262     </para>
1263     <!--
1264     <para>
1265      <screen>
1266      </screen>
1267     </para>
1268     -->
1269     <warning>
1270      Experimental. Do not use in production code.
1271     </warning>
1272
1273
1274    </sect2>
1275
1276
1277    <sect2 id="querymodel-bib1-mapping">
1278     <title>Mapping from Bib1 Attributes to Zebra internal
1279      register indexes</title>
1280     <para>
1281      TO-DO
1282      </para>
1283
1284
1285      <!-- see in util/zebramap.c
1286       int zebra_maps_attr
1287
1288   if (completeness_value == 2 || completeness_value == 3)
1289         *complete_flag = 1;
1290     else
1291         *complete_flag = 0;
1292     *reg_id = 0;
1293
1294     *sort_flag =(sort_relation_value > 0) ? 1 : 0;
1295     *search_type = "phrase";
1296     strcpy(rank_type, "void");
1297     if (relation_value == 102)
1298     {
1299         if (weight_value == -1)
1300             weight_value = 34;
1301         sprintf(rank_type, "rank,w=%d,u=%d", weight_value, use_value);
1302     }
1303     if (relation_value == 103)
1304     {
1305         *search_type = "always";
1306         *reg_id = 'w';
1307         return 0;
1308     }
1309     if (*complete_flag)
1310         *reg_id = 'p';
1311     else
1312         *reg_id = 'w';
1313     switch (structure_value)
1314     {
1315     case 6:   /* word list */
1316         *search_type = "and-list";
1317         break;
1318     case 105: /* free-form-text */
1319         *search_type = "or-list";
1320         break;
1321     case 106: /* document-text */
1322         *search_type = "or-list";
1323         break;
1324     case -1:
1325     case 1:   /* phrase */
1326     case 2:   /* word */
1327     case 108: /* string */
1328         *search_type = "phrase";
1329         break;
1330    case 107: /* local-number */
1331         *search_type = "local";
1332         *reg_id = 0;
1333         break;
1334     case 109: /* numeric string */
1335         *reg_id = 'n';
1336         *search_type = "numeric";
1337         break;
1338     case 104: /* urx */
1339         *reg_id = 'u';
1340         *search_type = "phrase";
1341         break;
1342     case 3:   /* key */
1343         *reg_id = '0';
1344         *search_type = "phrase";
1345         break;
1346     case 4:  /* year */
1347         *reg_id = 'y';
1348         *search_type = "phrase";
1349         break;
1350     case 5:  /* date */
1351         *reg_id = 'd';
1352         *search_type = "phrase";
1353         break;
1354     default:
1355         return -1;
1356     }
1357     return 0;
1358
1359      -->
1360
1361
1362     <para>
1363      <emphasis>Use</emphasis> attributes are interpreted according to the
1364      attribute sets which have been loaded in the
1365     <literal>zebra.cfg</literal> file, and are matched against specific
1366      fields as specified in the <literal>.abs</literal> file which
1367      describes the profile of the records which have been loaded.
1368      If no Use attribute is provided, a default of Bib-1 Any is assumed.
1369     </para>
1370
1371     <para>
1372      If a <emphasis>Structure</emphasis> attribute of
1373      <emphasis>Phrase</emphasis> is used in conjunction with a
1374      <emphasis>Completeness</emphasis> attribute of
1375      <emphasis>Complete (Sub)field</emphasis>, the term is matched
1376      against the contents of the phrase (long word) register, if one
1377      exists for the given <emphasis>Use</emphasis> attribute.
1378      A phrase register is created for those fields in the
1379      <literal>.abs</literal> file that contains a
1380      <literal>p</literal>-specifier.
1381      <!-- ### whatever the hell _that_ is -->
1382     </para>
1383
1384     <para>
1385      If <emphasis>Structure</emphasis>=<emphasis>Phrase</emphasis> is
1386      used in conjunction with <emphasis>Incomplete Field</emphasis> - the
1387      default value for <emphasis>Completeness</emphasis>, the
1388      search is directed against the normal word registers, but if the term
1389      contains multiple words, the term will only match if all of the words
1390      are found immediately adjacent, and in the given order.
1391      The word search is performed on those fields that are indexed as
1392      type <literal>w</literal> in the <literal>.abs</literal> file.
1393     </para>
1394
1395     <para>
1396      If the <emphasis>Structure</emphasis> attribute is
1397      <emphasis>Word List</emphasis>,
1398      <emphasis>Free-form Text</emphasis>, or
1399      <emphasis>Document Text</emphasis>, the term is treated as a
1400      natural-language, relevance-ranked query.
1401      This search type uses the word register, i.e. those fields
1402      that are indexed as type <literal>w</literal> in the
1403      <literal>.abs</literal> file.
1404     </para>
1405
1406     <para>
1407      If the <emphasis>Structure</emphasis> attribute is
1408      <emphasis>Numeric String</emphasis> the term is treated as an integer.
1409      The search is performed on those fields that are indexed
1410      as type <literal>n</literal> in the <literal>.abs</literal> file.
1411     </para>
1412
1413     <para>
1414      If the <emphasis>Structure</emphasis> attribute is
1415      <emphasis>URx</emphasis> the term is treated as a URX (URL) entity.
1416      The search is performed on those fields that are indexed as type
1417      <literal>u</literal> in the <literal>.abs</literal> file.
1418     </para>
1419
1420     <para>
1421      If the <emphasis>Structure</emphasis> attribute is
1422      <emphasis>Local Number</emphasis> the term is treated as
1423      native Zebra Record Identifier.
1424     </para>
1425
1426     <para>
1427      If the <emphasis>Relation</emphasis> attribute is
1428      <emphasis>Equals</emphasis> (default), the term is matched
1429      in a normal fashion (modulo truncation and processing of
1430      individual words, if required).
1431      If <emphasis>Relation</emphasis> is <emphasis>Less Than</emphasis>,
1432      <emphasis>Less Than or Equal</emphasis>,
1433      <emphasis>Greater than</emphasis>, or <emphasis>Greater than or
1434       Equal</emphasis>, the term is assumed to be numerical, and a
1435      standard regular expression is constructed to match the given
1436      expression.
1437      If <emphasis>Relation</emphasis> is <emphasis>Relevance</emphasis>,
1438      the standard natural-language query processor is invoked.
1439     </para>
1440
1441     <para>
1442      For the <emphasis>Truncation</emphasis> attribute,
1443      <emphasis>No Truncation</emphasis> is the default.
1444      <emphasis>Left Truncation</emphasis> is not supported.
1445      <emphasis>Process # in search term</emphasis> is supported, as is
1446      <emphasis>Regxp-1</emphasis>.
1447      <emphasis>Regxp-2</emphasis> enables the fault-tolerant (fuzzy)
1448      search. As a default, a single error (deletion, insertion,
1449      replacement) is accepted when terms are matched against the register
1450      contents.
1451     </para>
1452    </sect2>
1453
1454    <sect2  id="querymodel-regular">
1455     <title>Zebra Regular Expressions in Truncation Attribute (type = 5)</title>
1456
1457     <para>
1458      Each term in a query is interpreted as a regular expression if
1459      the truncation value is either <emphasis>Regxp-1 (@attr 5=102)</emphasis>
1460      or <emphasis>Regxp-2 (@attr 5=103)</emphasis>.
1461      Both query types follow the same syntax with the operands:
1462     </para>
1463
1464      <table id="querymodel-regular-operands-table"
1465       frame="all" rowsep="1" colsep="1" align="center">
1466
1467       <caption>Regular Expression Operands</caption>
1468        <!--
1469        <thead>
1470        <tr><td>one</td><td>two</td></tr>
1471       </thead>
1472        -->
1473        <tbody>
1474         <tr>
1475          <td><literal>x</literal></td>
1476          <td>Matches the character <literal>x</literal>.</td>
1477         </tr>
1478         <tr>
1479          <td><literal>.</literal></td>
1480          <td>Matches any character.</td>
1481         </tr>
1482         <tr>
1483          <td><literal>[ .. ]</literal></td>
1484          <td>Matches the set of characters specified;
1485          such as <literal>[abc]</literal> or <literal>[a-c]</literal>.</td>
1486         </tr>
1487        </tbody>
1488       </table>
1489
1490     <para>
1491      The above operands can be combined with the following operators:
1492     </para>
1493
1494      <table id="querymodel-regular-operators-table"
1495       frame="all" rowsep="1" colsep="1" align="center">
1496       <caption>Regular Expression Operators</caption>
1497        <!--
1498        <thead>
1499        <tr><td>one</td><td>two</td></tr>
1500       </thead>
1501        -->
1502        <tbody>
1503         <tr>
1504          <td><literal>x*</literal></td>
1505          <td>Matches <literal>x</literal> zero or more times.
1506           Priority: high.</td>
1507         </tr>
1508         <tr>
1509          <td><literal>x+</literal></td>
1510          <td>Matches <literal>x</literal> one or more times.
1511           Priority: high.</td>
1512         </tr>
1513         <tr>
1514          <td><literal>x?</literal></td>
1515          <td> Matches <literal>x</literal> zero or once.
1516           Priority: high.</td>
1517         </tr>
1518         <tr>
1519          <td><literal>xy</literal></td>
1520          <td> Matches <literal>x</literal>, then <literal>y</literal>.
1521          Priority: medium.</td>
1522         </tr>
1523         <tr>
1524          <td><literal>x|y</literal></td>
1525          <td> Matches either <literal>x</literal> or <literal>y</literal>.
1526          Priority: low.</td>
1527         </tr>
1528         <tr>
1529          <td><literal>( )</literal></td>
1530          <td>The order of evaluation may be changed by using parentheses.</td>
1531         </tr>
1532        </tbody>
1533       </table>
1534
1535     <para>
1536      If the first character of the <literal>Regxp-2</literal> query
1537      is a plus character (<literal>+</literal>) it marks the
1538      beginning of a section with non-standard specifiers.
1539      The next plus character marks the end of the section.
1540      Currently Zebra only supports one specifier, the error tolerance,
1541      which consists one digit.
1542     </para>
1543
1544     <para>
1545      Since the plus operator is normally a suffix operator the addition to
1546      the query syntax doesn't violate the syntax for standard regular
1547      expressions.
1548     </para>
1549
1550     <para>
1551      For example, a phrase search with regular expressions  in
1552      the title-register is performed like this:
1553      <screen>
1554       Z> find @attr 1=4 @attr 5=102 "informat.* retrieval"
1555      </screen>
1556     </para>
1557
1558     <para>
1559      Combinations with other attributes are possible. For example, a
1560      ranked search with a regular expression:
1561      <screen>
1562       Z> find @attr 1=4 @attr 5=102 @attr 2=102 "informat.* retrieval"
1563      </screen>
1564     </para>
1565    </sect2>
1566
1567
1568    <!--
1569    <para>
1570     The RecordType parameter in the <literal>zebra.cfg</literal> file, or
1571     the <literal>-t</literal> option to the indexer tells Zebra how to
1572     process input records.
1573     Two basic types of processing are available - raw text and structured
1574     data. Raw text is just that, and it is selected by providing the
1575     argument <literal>text</literal> to Zebra. Structured records are
1576     all handled internally using the basic mechanisms described in the
1577     subsequent sections.
1578     Zebra can read structured records in many different formats.
1579    </para>
1580    -->
1581   </sect1>
1582
1583
1584   <sect1 id="querymodel-cql-to-pqf">
1585    <title>Server Side CQL to PQF Query Translation</title>
1586    <para>
1587     Using the
1588     <literal>&lt;cql2rpn&gt;l2rpn.txt&lt;/cql2rpn&gt;</literal>
1589       YAZ Frontend Virtual
1590     Hosts option, one can configure
1591     the YAZ Frontend CQL-to-PQF
1592     converter, specifying the interpretation of various
1593     <ulink url="&url.cql;">CQL</ulink>
1594     indexes, relations, etc. in terms of Type-1 query attributes.
1595     <!-- The  yaz-client config file -->
1596    </para>
1597    <para>
1598     For example, using server-side CQL-to-PQF conversion, one might
1599     query a zebra server like this:
1600     <screen>
1601     <![CDATA[
1602      yaz-client localhost:9999
1603      Z> querytype cql
1604      Z> find text=(plant and soil)
1605      ]]>
1606     </screen>
1607      and - if properly configured - even static relevance ranking can
1608      be performed using CQL query syntax:
1609     <screen>
1610     <![CDATA[
1611      Z> find text = /relevant (plant and soil)
1612      ]]>
1613      </screen>
1614    </para>
1615
1616    <para>
1617     By the way, the same configuration can be used to
1618     search using client-side CQL-to-PQF conversion:
1619     (the only difference is <literal>querytype cql2rpn</literal>
1620     instead of
1621     <literal>querytype cql</literal>, and the call specifying a local
1622     conversion file)
1623     <screen>
1624     <![CDATA[
1625      yaz-client -q local/cql2pqf.txt localhost:9999
1626      Z> querytype cql2rpn
1627      Z> find text=(plant and soil)
1628      ]]>
1629      </screen>
1630    </para>
1631
1632    <para>
1633     Exhaustive information can be found in the
1634     Section "Specification of CQL to RPN mappings" in the YAZ manual.
1635     <ulink url="http://www.indexdata.dk/yaz/doc/tools.tkl#tools.cql.map">
1636      http://www.indexdata.dk/yaz/doc/tools.tkl#tools.cql.map</ulink>,
1637    and shall therefore not be repeated here.
1638    </para>
1639   <!--
1640   <para>
1641     See
1642       <ulink url="http://www.loc.gov/z3950/agency/zing/cql/dc-indexes.html">
1643       http://www.loc.gov/z3950/agency/zing/cql/dc-indexes.html</ulink>
1644     for the Maintenance Agency's work-in-progress mapping of Dublin Core
1645     indexes to Attribute Architecture (util, XD and BIB-2)
1646     attributes.
1647    </para>
1648    -->
1649  </sect1>
1650
1651
1652
1653 </chapter>
1654
1655  <!-- Keep this comment at the end of the file
1656  Local variables:
1657  mode: sgml
1658  sgml-omittag:t
1659  sgml-shorttag:t
1660  sgml-minimize-attributes:nil
1661  sgml-always-quote-attributes:t
1662  sgml-indent-step:1
1663  sgml-indent-data:t
1664  sgml-parent-document: "zebra.xml"
1665  sgml-local-catalogs: nil
1666  sgml-namecase-general:t
1667  End:
1668  -->