doc/querymodel.xml

   1  <chapter id="querymodel">
   2   <!-- $Id: querymodel.xml,v 1.9 2006-06-20 14:20:50 marc Exp $ -->
   3   <title>Query Model</title>
   4
   5   <sect1 id="querymodel-overview">
   6    <title>Query Model Overview</title>
   7
   8
   9    <sect2 id="querymodel-query-languages">
  10     <title>Query Languages</title>
  11
  12     <para>
  13      Zebra is born as a networking Information Retrieval engine adhering
  14      to the international standards
  15      <ulink url="&url.z39.50;">Z39.50</ulink> and
  16      <ulink url="&url.sru;">SRU</ulink>,
  17      and implement the
  18      <literal>type-1 Reverse Polish Notation (RPN)</literal> query
  19      model defined there.
  20      Unfortunately, this model has only defined a binary
  21      encoded representation, which is used as transport packaging in
  22      the Z39.50 protocol layer. This representation is not human
  23      readable, nor defines any convenient way to specify queries.
  24     </para>
  25     <para>
  26      Since the <literal>type-1 (RPN)</literal>
  27      query structure has no direct, useful string
  28      representation, every origin application needs to provide some
  29      form of mapping from a local query notation or representation to it.
  30      </para>
  31
  32
  33    <sect3 id="querymodel-query-languages-pqf">
  34     <title>Prefix Query Format (PQF)</title>
  35
  36    <para>
  37      Index Data has defined a textual representaion in the
  38      <literal>Prefix Query Format</literal>, short
  39      <literal>PQF</literal>, which mappes
  40       <literal>one-to-one</literal> to binary encoded
  41       <literal>type-1 RPN</literal> query packages.
  42       It has been adopted by other
  43       parties developing Z39.50 software, and is often referred to as
  44      <literal>Prefix Query Notation</literal>, or in short
  45      <literal>PQN</literal>. See
  46      <xref linkend="querymodel-pqf"/> for further explanaitions and
  47      descriptions of Zebra's capabilities.
  48     </para>
  49    </sect3>
  50
  51    <sect3 id="querymodel-query-languages-cql">
  52     <title>Common Query Language (CQL)</title>
  53      <para>
  54       The query model of the   <literal>type-1 RPN</literal>,
  55       expressed in <literal>PQF/PQN</literal> is natively supported.
  56       On the other hand, the default <literal>SRU</literal>
  57       webservices <literal>Common Query Language</literal>
  58      <ulink url="&url.cql;">CQL</ulink> is not natively supported.
  59      </para>
  60      <para>
  61      Zebra can be configured to understand and map CQL to PQF. See
  62      <xref linkend="querymodel-cql-to-pqf"/>.
  63     </para>
  64    </sect3>
  65
  66    </sect2>
  67
  68    <sect2 id="querymodel-operation-types">
  69     <title>Operation types</title>
  70     <para>
  71      Zebra supports all of the three different
  72      <literal>Z39.50/SRU</literal> operations defined in the
  73      standards: <literal>explain</literal>, <literal>search</literal>,
  74      and <literal>scan</literal>. A short description of the
  75      functionality and purpose of each is quite in order here.
  76     </para>
  77
  78     <sect3 id="querymodel-operation-type-explain">
  79      <title>Explain Operation</title>
  80      <para>
  81       The <emphasis>syntax</emphasis> of Z39.50/SRU queries is
  82       well known to any client, but the specific
  83       <emphasis>semantics</emphasis> - taking into account a
  84       particular servers functionalities and abilities - must be
  85       discovered from case to case. Enters the
  86       <literal>explain</literal> operation, which provides the means
  87       for learning which
  88       <emphasis>fields</emphasis> (also called
  89       <emphasis>indexes</emphasis> or <emphasis>access points</emphasis>
  90       are provided, which default parameter the server uses, which
  91       retrieve document formats are defined, and which specific parts
  92       of the general query model are supported.
  93      </para>
  94      <para>
  95       The Z39.50 embeddes the <literal>explain</literal> operation
  96       by perfoming a
  97       <literal>search</literal> in the magic
  98       <literal>IR-Explain-1</literal> database;
  99       see <xref linkend="querymodel-exp1"/>.
 100      </para>
 101      <para>
 102       In SRU, <literal>explain</literal> is an entirely  seperate
 103       operation, which returns an  <literal>Zeerex
 104       XML</literal> record according to the
 105       structure defined by the protocol.
 106      </para>
 107      <para>
 108       In both cases, the information gathered through
 109       <literal>explain</literal> operations can be used to
 110       auto-configure a client user interface to the servers
 111       capabilities.
 112      </para>
 113     </sect3>
 114
 115     <sect3 id="querymodel-operation-type-search">
 116      <title>Search Operation</title>
 117      <para>
 118       Search and retrieve interactions are the raison d'être.
 119       They are used to query the remote database and
 120       return search result documents.  Search queries span from
 121       simple free text searches to nested complex boolean queries,
 122       targeting specific indexes, and possibly enhanced with many
 123       query semantic specifications. Search interactions are the heart
 124       and soul of Z39.50/SRU servers.
 125      </para>
 126     </sect3>
 127
 128     <sect3 id="querymodel-operation-type-scan">
 129      <title>Scan Operation</title>
 130      <para>
 131       The <literal>scan</literal> operation is a helper functionality,
 132        which operates on one index or access point a time.
 133      </para>
 134      <para>
 135       It provides
 136       the means to investigate the content of specific indexes.
 137       Scanning an index returns a handfull of terms actually fond in
 138       the indexes, and in addition the <literal>scan</literal>
 139       operation returns th enumber of documents indexed by each term.
 140       A search client can use this information to propose proper
 141       spelling of search terms, to auto-fill search boxes, or to
 142       display  controlled vocabularies.
 143      </para>
 144     </sect3>
 145
 146    </sect2>
 147
 148  </sect1>
 149
 150
 151   <sect1 id="querymodel-pqf">
 152    <title>Prefix Query Format structure and syntax</title>
 153    <para>
 154     The <ulink url="&url.yaz.pqf;">PQF grammer</ulink>
 155     is documented in the YAZ manual, and shall not be
 156     repeated here. This textual PQF representation
 157     is always during search mapped to the equivalent Zebra internal
 158     query parse tree.
 159    </para>
 160
 161    <sect2 id="querymodel-pqf-tree">
 162     <title>PQF tree structure</title>
 163     <para>
 164      The PQF parse tree - or the equivalent textual representation -
 165      may start with one specification of the
 166      <emphasis>attribute set</emphasis> used. Following is a query
 167      tree, which
 168      consists of <emphasis>atomic query parts (APT)</emphasis> or
 169      <emphasis>named result sets</emphasis>, eventually
 170      paired by <emphasis>boolean binary operators</emphasis>, and
 171      finally  <emphasis>recursively combined </emphasis> into
 172      complex query trees.
 173     </para>
 174
 175     <sect3 id="querymodel-attribute-sets">
 176      <title>Attribute sets</title>
 177      <para>
 178       Attribute sets define the exact meaning and semantics of queries
 179       issued. Zebra comes with some predefined attribute set
 180       definitions, others can easily be defined and added to the
 181       configuration.
 182      </para>
 183
 184
 185      <table id="querymodel-attribute-sets-table"
 186       frame="all" rowsep="1" colsep="1" align="center">
 187
 188       <caption>Attribute sets predefined in Zebra</caption>
 189
 190        <thead>
 191        <tr>
 192          <td>Attribute set</td>
 193          <td>Short hand</td>
 194          <td>Status</td>
 195          <td>Notes</td>
 196         </tr>
 197       </thead>
 198
 199        <tbody>
 200         <tr>
 201          <td><literal>Explain</literal> attribute set</td>
 202          <td><literal>exp-1</literal></td>
 203          <td>Special attribute set used on the special automagic
 204           <literal>IR-Explain-1</literal> database to gain information on
 205           server capabilities, database names, and database
 206           and semantics.</td>
 207          <td>predefined</td>
 208         </tr>
 209         <tr>
 210          <td><literal>Bib1</literal> attribute set</td>
 211          <td><literal>bib-1</literal></td>
 212          <td>Standard PQF query language attribute set which defines the
 213           semantics of Z39.50 searching. In addition, all of the
 214           non-use attributes (type 2-9) define the hard-wired
 215           Zebra internal query
 216           processing.</td>
 217          <td>default</td>
 218         </tr>
 219         <tr>
 220          <td><literal>GILS</literal> attribute set</td>
 221          <td><literal>gils</literal></td>
 222          <td>Extention to the <literal>Bib1</literal> attribute set.</td>
 223          <td>predefined</td>
 224         </tr>
 225         <!--
 226         <tr>
 227          <td><literal>IDXPATH</literal> attribute set</td>
 228          <td><literal>idxpath</literal></td>
 229          <td>Hardwired XPATH like attribute set, only available for
 230              indexing with the GRS record model</td>
 231          <td>hardwired</td>
 232         </tr>
 233         -->
 234        </tbody>
 235      </table>
 236     </sect3>
 237
 238     <para>
 239      The use attributes (type 1) of the predefined attribute sets can
 240      be reconfigured by  tweaking the files
 241      <filename>tab/*.att</filename>.
 242      New attribute sets can be defined by adding similar files in the
 243      configuration path of the server.
 244     </para>
 245
 246       <note>
 247        The Zebra internal query processing is modeled after
 248        the <literal>Bib1</literal> attribute set, and the non-use
 249        attributes type 2-6 are hard-wired in. It is therefore essential
 250        to be familiar with <xref linkend="querymodel-bib1-nonuse"/>.
 251       </note>
 252
 253
 254     <sect3 id="querymodel-boolean-operators">
 255      <title>Boolean operators</title>
 256      <para>
 257       A pair of subquery trees, or of atomic queries, is combined
 258       using the standard boolean operators into new query trees.
 259      </para>
 260
 261      <table id="querymodel-boolean-operators-table"
 262       frame="all" rowsep="1" colsep="1" align="center">
 263
 264       <caption>Boolean operators</caption>
 265        <!--
 266        <thead>
 267        <tr><td>one</td><td>two</td></tr>
 268       </thead>
 269        -->
 270        <tbody>
 271         <tr><td><literal>@and</literal></td>
 272          <td>binary <literal>AND</literal> operator</td>
 273          <td>Set intersection of two atomic queries hit sets</td>
 274         </tr>
 275         <tr><td><literal>@or</literal></td>
 276          <td>binary <literal>OR</literal> operator</td>
 277          <td>Set union of two atomic queries hit sets</td>
 278         </tr>
 279         <tr><td><literal>@not</literal></td>
 280          <td>binary <literal>AND NOT</literal> operator</td>
 281          <td>Set complement of two atomic queries hit sets</td>
 282         </tr>
 283         <tr><td><literal>@prox</literal></td>
 284          <td>binary <literal>PROXIMY</literal> operator</td>
 285          <td>Set intersection of two atomic queries hit sets. In
 286           addition, the intersection set is purged for all
 287           documents which do not satisfy the requested query
 288           term proximity. Usually a proper subset of the AND
 289           operation.</td>
 290         </tr>
 291        </tbody>
 292      </table>
 293
 294      <para>
 295       For example, we can combine the terms
 296       <emphasis>information</emphasis> and <emphasis>retrieval</emphasis>
 297       into different searches in the default index of the default
 298       attribute set as follows.
 299       Querying for the union of all documents containing the
 300       terms <emphasis>information</emphasis> OR
 301       <emphasis>retrieval</emphasis>:
 302       <screen>
 303        Z> find @or information retrieval
 304       </screen>
 305      </para>
 306      <para>
 307       Querying for the intersection of all documents containing the
 308       terms <emphasis>information</emphasis> AND
 309       <emphasis>retrieval</emphasis>:
 310       The hit set is a subset of the coresponding
 311       OR query.
 312       <screen>
 313        Z> find @and information retrieval
 314       </screen>
 315      </para>
 316      <para>
 317       Querying for the intersection of all documents containing the
 318       terms <emphasis>information</emphasis> AND
 319       <emphasis>retrieval</emphasis>, taking proximity into account:
 320       The hit set is a subset of the coresponding
 321       AND query.
 322       <screen>
 323        Z> find @prox information retrieval
 324       </screen>
 325      </para>
 326      <para>
 327       Querying for the intersection of all documents containing the
 328       terms <emphasis>information</emphasis> AND
 329       <emphasis>retrieval</emphasis>, in the same order and near each
 330       other as described in the term list
 331       The hit set is a subset of the coresponding
 332       PROXIMY query.
 333       <screen>
 334        Z> find "information retrieval"
 335       </screen>
 336      </para>
 337     </sect3>
 338
 339
 340     <sect3 id="querymodel-atomic-queries">
 341      <title>Atomic queries (APT)</title>
 342      <para>
 343       Atomic queries are the query parts which work on one acess point
 344       only. These consist of <literal>an attribute list</literal>
 345       followed by a <literal>single term</literal> or a
 346       <literal>quoted term list</literal>, and are often called
 347       <emphasis>Attributes-Plus-Terms (APT)</emphasis> queries.
 348      </para>
 349      <para>
 350       Unsupplied non-use attributes type 2-9 are either inherited from
 351       higher nodes in the query tree, or are set to Zebra's default values.
 352       See <xref linkend="querymodel-bib1"/> for details.
 353      </para>
 354
 355      <table id="querymodel-atomic-queries-table"
 356       frame="all" rowsep="1" colsep="1" align="center">
 357
 358       <caption>Atomic queries</caption>
 359        <!--
 360        <thead>
 361        <tr><td>one</td><td>two</td></tr>
 362       </thead>
 363        -->
 364        <tbody>
 365         <tr><td><emphasis>attribute list</emphasis></td>
 366          <td>List of <literal>orthogonal</literal> attributes</td>
 367          <td>Any of the orthogonal attribute types may be omitted,
 368           these are inherited from higher query tree nodes, or if not
 369           inherited, are set to the default Zebra configuration values.
 370          </td>
 371         </tr>
 372         <tr><td><emphasis>term</emphasis></td>
 373          <td>single <literal>term</literal>
 374           or <literal>quoted term list</literal>   </td>
 375          <td>Here the search terms or list of search terms is added
 376           to the query</td>
 377         </tr>
 378        </tbody>
 379      </table>
 380      <para>
 381       Querying for the term <emphasis>information</emphasis> in the
 382       default index using the default attribite set, the server choice
 383       of access point/index, and the default non-use attributes.
 384       <screen>
 385        Z> find "information"
 386       </screen>
 387      </para>
 388      <para>
 389       Equivalent query fully specified including all default values:
 390       <screen>
 391        Z> find @attrset bib-1 @attr 1=1017 @attr 2=3 @attr 3=3 @attr 4=1 @attr 5=100 @attr 6=1 "information"
 392       </screen>
 393      </para>
 394
 395      <para>
 396       Finding all documents which have empty titles. Notice that the
 397       empty term must be quoted, but is otherwise legal.
 398       <screen>
 399        Z> find @attr 1=4 ""
 400       </screen>
 401      </para>
 402
 403     </sect3>
 404
 405
 406     <sect3 id="querymodel-resultset">
 407      <title>Named Result Sets</title>
 408      <para>
 409       Named result sets are supported in Zebra, and result sets can be
 410       used as operands without limitations.
 411      </para>
 412      <para>
 413       After the execution of a search, the result set is available at
 414       the server, such that the client can use it for subsequent
 415       searches or retrieval requests. The Z30.50 standard actually
 416       stresses the fact that result sets are voliatile. It may cease
 417       to exist at any time point after search, and the server will
 418       send a diagnostic to the effect that the requested
 419       result set does not exist any more.
 420      </para>
 421
 422      <para>
 423       Defining a named result set and re-using it in the next query,
 424       using <literal>yaz-client</literal>.
 425       <screen>
 426        Z> f @attr 1=4 mozart
 427        ...
 428        Number of hits: 43, setno 1
 429        ...
 430        Z> f @and @set 1 @attr 1=4 amadeus
 431        ...
 432        Number of hits: 14, setno 2
 433        ...
 434        Z> f @attr 1=1016 beethoven
 435        ...
 436        Number of hits: 26, setno 3
 437        ...
 438       </screen>
 439      </para>
 440
 441      <note>
 442       Named result sets are only supported by the Z39.50 protocol.
 443       The SRU web service is stateless, and therefore the notion of
 444       named result sets does not exist when acessing a Zebra server by
 445       the SRU protocol.
 446      </note>
 447     </sect3>
 448
 449
 450     <sect3 id="querymodel-use-string">
 451      <title>Zebra's special use attribute type 1 of form 'string'</title>
 452      <para>
 453       The numeric <literal>use (type 1)</literal> attribute is usually
 454       refered to from a given
 455       attribute set. In addition, Zebra let you use
 456       <emphasis>any internal index
 457        name defined in your configuration</emphasis>
 458       as use atribute value. This is a great feature for
 459       debugging, and when you do
 460       not need the complecity of defined use attribute values. It is
 461       the preferred way of accessing Zebra indexes directly.
 462      </para>
 463      <para>
 464       Finding all documents which have the term list "information
 465       retrieval" in an Zebra index, using it's internal full string name.
 466       <screen>
 467        Z> find @attr 1=sometext "information retrieval"
 468       </screen>
 469      </para>
 470      <para>
 471       Searching the bib-1 use attribute 54 using it's string name:
 472       <screen>
 473        Z> find @attr 1=Code-language eng
 474       </screen>
 475      </para>
 476      <para>
 477       Searching in any silly string index - if it's defined in your
 478       indexation rules and can be parsed by the PQF parser.
 479       This is definitely not the recommended use of
 480       this facility, as it might confuse your users with some very
 481       unexpected results.
 482       <screen>
 483        Z> find @attr 1=silly/xpath/alike[@index]/name "information retrieval"
 484       </screen>
 485      </para>
 486      <para>
 487       See <xref linkend="querymodel-bib1-mapping"/> for details, and
 488       <xref linkend="server-sru"/>
 489       for the SRU PQF query extention using string names as a fast
 490       debugging facility.
 491      </para>
 492     </sect3>
 493
 494     <sect3 id="querymodel-use-xpath">
 495      <title>Zebra's special use attribute type 1 of form 'XPath'
 496       for GRS filters</title>
 497      <para>
 498       As we have seen above, it is possible (albeit seldom a great
 499       idea) to emulate
 500       <ulink url="http://www.w3.org/TR/xpath">XPath 1.0</ulink> based
 501       search by defining <literal>use (type 1)</literal>
 502       <emphasis>string</emphasis> attributes which in appearence
 503       <emphasis>resemble XPath queries</emphasis>. There are two
 504       problems with this approach: first, the XPath-look-alike has to
 505       be defined at indexation time, no new undefined
 506       XPath queries can entered at search time, and second, it might
 507       confuse users very much that an XPath-alike index name in fact
 508       gets populated from a possible entirely different XML element
 509       than it pretends to acess.
 510      </para>
 511      <para>
 512       When using the <literal>GRS Record Model</literal>
 513       (see  <xref linkend="record-model-grs"/>), we have the
 514       possibility to embed <emphasis>life</emphasis>
 515       XPath expressions
 516       in the PQF queries, which are here called
 517       <literal>use (type 1)</literal> <emphasis>xpath</emphasis>
 518       attributes. You must enable the
 519       <literal>xpath enable</literal> directive in your
 520       <literal>.abs</literal> config files.
 521      </para>
 522      <note>
 523       Only a <emphasis>very</emphasis> restricted subset of the
 524       <ulink url="http://www.w3.org/TR/xpath">XPath 1.0</ulink>
 525       standard is supported as the GRS record model is simpler than
 526       a full XML DOM structure. See the following examples for
 527       possibilities.
 528      </note>
 529      <para>
 530       Finding all documents which have the term "content"
 531       inside a text node found in a specific XML DOM
 532       <emphasis>subtree</emphasis>, whose starting element is
 533       adressed by XPath.
 534       <screen>
 535        Z> find @attr 1=/root content
 536        Z> find @attr 1=/root/first content
 537       </screen>
 538       <emphasis>Notice that the
 539        XPath must be absolute, i.e., must start with '/', and that the
 540        XPath <literal>decendant-or-self</literal> axis followed by a
 541        text node selection <literal>text()</literal> is implicitly
 542        appended to the stated XPath.
 543       </emphasis>
 544       It follows that the above searches are interpreted as:
 545       <screen>
 546        Z> find @attr 1=/root//text() content
 547        Z> find @attr 1=/root/first//text() content
 548       </screen>
 549      </para>
 550
 551      <para>
 552       Filter the adressing XPath by a predicate working on exact
 553       string values in
 554       attributes (in the XML sense) can be done: return all those docs which
 555       have the term "english" contained in one of all text subnodes of
 556       the subtree defined by the XPath
 557       <literal>/record/title[@lang='en']</literal>
 558       <screen>
 559        Z> find @attr 1=/record/title[@lang='en'] english
 560       </screen>
 561      </para>
 562
 563      <para>
 564       Combining numeric indexes, boolean expressions,
 565       and xpath based searches is possible:
 566       <screen>
 567        Z> find @attr 1=/record/title @and foo bar
 568        Z> find @and @attr 1=/record/title foo @attr 1=4 bar
 569       </screen>
 570      </para>
 571      <para>
 572       Escaping PQF keywords and other non-parseable XPath constructs
 573       with <literal>'{ }'</literal> to prevent syntax errors:
 574       <screen>
 575        Z> find @attr {1=/root/first[@attr='danish']} content
 576        Z> find @attr {1=/root/second[@attr='danish lake']}
 577        Z> find @attr {1=/root/third[@attr='dansk s\xc3\xb8']}
 578       </screen>
 579      </para>
 580      <warning>
 581       It is worth mentioning that these dynamic performed XPath
 582       queries are a performance bottelneck, as no optimized
 583       specialized indexes can be used. Therefore, avoid the use of
 584       this facility when speed is essential, and the database content
 585       size is medium to large.
 586      </warning>
 587
 588      <!--
 589
 590      Shall I document the special 'ixpath' attribute set ?? Marc
 591       X-Path searching
 592
 593      Search for all documents with specific path.
 594
 595      For path /c1/c2/.../cn use @attr idxpath 1=1 @attr 4=3 cn/cn-1/../c1/
 596      Specifically for /c, use @attr idxpath 1=1 @attr 4=3 c/
 597
 598      Search for CDATA in elememts
 599
 600      @attr idxpath 1=1016 text
 601
 602      Search for CDATA in attributes
 603
 604      @attr idxpath 1=1015 text
 605
 606      Search for all documents with given attribute type
 607
 608      @attr idxpath 1=3 @attr 4=3 type
 609      -->
 610
 611     </sect3>
 612
 613    </sect2>
 614
 615    <sect2 id="querymodel-exp1">
 616     <title>Explain Attribute Set</title>
 617     <para>
 618      The Z39.50 standard defines the
 619      <ulink url="&url.z39.50.explain;">Explain</ulink>attribute set
 620      <literal>exp-1</literal>, which is used to discover information
 621      about a server's search semantics and functional capabilities
 622      Zebra exposes a  "classic"
 623      Explain database by base name <literal>IR-Explain-1</literal>, which
 624      is populated with system internal information.
 625     </para>
 626    <para>
 627      The attribute-set <literal>exp-1</literal> consists of a single
 628      <literal>Use (type 1)</literal> attribute.
 629     </para>
 630     <para>
 631      In addition, the non-Use
 632      <literal>bib-1</literal> attributes, that is, the types
 633      <literal>Relation</literal>, <literal>Position</literal>,
 634      <literal>Structure</literal>, <literal>Truncation</literal>,
 635      and <literal>Completeness</literal> are imported from
 636      the <literal>bib-1</literal> attribute set, and may be used
 637      within any explain query.
 638     </para>
 639
 640     <sect3 id="querymodel-exp1-use">
 641     <title>Use Attributes (type = 1)</title>
 642      <para>
 643       The following Explain search atributes are supported:
 644       <literal>ExplainCategory</literal> (@attr 1=1),
 645       <literal>DatabaseName</literal> (@attr 1=3),
 646       <literal>DateAdded</literal> (@attr 1=9),
 647       <literal>DateChanged</literal>(@attr 1=10).
 648      </para>
 649      <para>
 650       A search in the use attribute  <literal>ExplainCategory</literal>
 651       supports only these predefined values:
 652       <literal>CategoryList</literal>, <literal>TargetInfo</literal>,
 653       <literal>DatabaseInfo</literal>, <literal>AttributeDetails</literal>.
 654      </para>
 655      <para>
 656       See <filename>tab/explain.att</filename> and the
 657       <ulink url="&url.z39.50;">Z39.50</ulink> standard
 658       for more information.
 659      </para>
 660     </sect3>
 661
 662     <sect3>
 663      <title>Explain searches with yaz-client</title>
 664      <para>
 665       Classic Explain only defines retrieval of Explain information
 666       via ASN.1. Pratically no Z39.50 clients supports this. Fortunately
 667       they don't have to - Zebra allows retrieval of this information
 668       in other formats:
 669       <literal>SUTRS</literal>, <literal>XML</literal>,
 670       <literal>GRS-1</literal> and  <literal>ASN.1</literal> Explain.
 671      </para>
 672
 673      <para>
 674       List supported categories to find out which explain commands are
 675       supported:
 676       <screen>
 677        Z> base IR-Explain-1
 678        Z> find @attr exp1 1=1 categorylist
 679        Z> form sutrs
 680        Z> show 1+2
 681       </screen>
 682      </para>
 683
 684      <para>
 685       Get target info, that is, investigate which databases exist at
 686       this server endpoint:
 687       <screen>
 688        Z> base IR-Explain-1
 689        Z> find @attr exp1 1=1 targetinfo
 690        Z> form xml
 691        Z> show 1+1
 692        Z> form grs-1
 693        Z> show 1+1
 694        Z> form sutrs
 695        Z> show 1+1
 696       </screen>
 697      </para>
 698
 699      <para>
 700       List all supported databases, the number of hits
 701       is the number of databases found, which most commonly are the
 702       following two:
 703       the <literal>Default</literal> and the
 704       <literal>IR-Explain-1</literal> databases.
 705       <screen>
 706        Z> base IR-Explain-1
 707        Z> find @attr exp1 1=1 databaseinfo
 708        Z> form sutrs
 709        Z> show 1+2
 710       </screen>
 711      </para>
 712
 713      <para>
 714       Get database info record for database <literal>Default</literal>.
 715       <screen>
 716        Z> base IR-Explain-1
 717        Z> find @and @attr exp1 1=1 databaseinfo @attr exp1 1=3 Default
 718       </screen>
 719       Identical query with explicitly specified attribute set:
 720       <screen>
 721        Z> base IR-Explain-1
 722        Z> find @attrset exp1 @and @attr 1=1 databaseinfo @attr 1=3 Default
 723       </screen>
 724      </para>
 725
 726      <para>
 727       Get attribute details record for database
 728       <literal>Default</literal>.
 729       This query is very useful to study the internal Zebra indexes.
 730       If records have been indexed using the <literal>alvis</literal>
 731       XSLT filter, the string representation names of the known indexes can be
 732       found.
 733       <screen>
 734        Z> base IR-Explain-1
 735        Z> find @and @attr exp1 1=1 attributedetails @attr exp1 1=3 Default
 736       </screen>
 737       Identical query with explicitly specified attribute set:
 738       <screen>
 739        Z> base IR-Explain-1
 740        Z> find @attrset exp1 @and @attr 1=1 attributedetails @attr 1=3 Default
 741       </screen>
 742      </para>
 743     </sect3>
 744
 745    </sect2>
 746
 747    <sect2 id="querymodel-bib1">
 748     <title>Bib1 Attribute Set</title>
 749     <para>
 750      Most of the information contained in this section is an excerpt of
 751      the <literal>ATTRIBUTE SET BIB-1 (Z39.50-1995)
 752       SEMANTICS</literal>,
 753      found at  <ulink url="&url.z39.50.attset.bib1.1995;">. The BIB-1
 754       Attribute Set Semantics</ulink> from 1995, also in an updated
 755      <ulink url="&url.z39.50.attset.bib1;">Bib-1
 756       Attribute Set</ulink>
 757      version from 2003. Index Data is not the copyright holder of this
 758      information, except for the configuration details, the listing of
 759      Zebra's capabilities, and the example queries.
 760     </para>
 761
 762
 763    <sect3 id="querymodel-bib1-use">
 764      <title>Use Attributes (type 1)</title>
 765
 766     <para>
 767      A use attribute specifies an access point for any atomic query.
 768      These acess points are highly dependent on the attribute set used
 769      in the query, and are user configurable using the following
 770      default configuration files:
 771      <filename>tab/bib1.att</filename>,
 772      <filename>tab/dan1.att</filename>,
 773      <filename>tab/explain.att</filename>, and
 774      <filename>tab/gils.att</filename>.
 775      New attribute sets can be added by adding new
 776      <filename>tab/*.att</filename> configuration files, which need to
 777      be sourced in the main configuration <filename>zebra.cfg</filename>.
 778      </para>
 779
 780     <para>
 781      In addition, Zebra allows the acess of
 782      <emphasis>internal index names</emphasis> and <emphasis>dynamic
 783      XPath</emphasis> as use attributes.
 784      See  <xref linkend="querymodel-use-string"/> and
 785      <xref linkend="querymodel-use-xpath"/> for
 786      alternative acess to the Zebra internal index names and XPath queries.
 787     </para>
 788
 789     <para>
 790      Phrase search for <emphasis>information retrieval</emphasis> in
 791      the title-register:
 792      <screen>
 793       Z> find @attr 1=4 "information retrieval"
 794      </screen>
 795     </para>
 796     </sect3>
 797
 798    </sect2>
 799
 800
 801    <sect2 id="querymodel-bib1-nonuse">
 802      <title>Zebra general Bib1 Non-Use Attributes (type 2-6)</title>
 803
 804     <sect3 id="querymodel-bib1-relation">
 805      <title>Relation Attributes (type 2)</title>
 806
 807      <para>
 808       Relation attributes describe the relationship of the access
 809       point (left side
 810       of the relation) to the search term as qualified by the attributes (right
 811       side of the relation), e.g., Date-publication &lt;= 1975.
 812       </para>
 813
 814      <table id="querymodel-bib1-relation-table"
 815       frame="all" rowsep="1" colsep="1" align="center">
 816
 817       <caption>Relation Attributes (type 2)</caption>
 818       <thead>
 819         <tr>
 820          <td>Relation</td>
 821          <td>Value</td>
 822          <td>Notes</td>
 823         </tr>
 824        </thead>
 825        <tbody>
 826         <tr>
 827          <td> Less than</td>
 828          <td>1</td>
 829          <td>supported</td>
 830         </tr>
 831         <tr>
 832          <td>Less than or equal</td>
 833          <td>2</td>
 834          <td>supported</td>
 835         </tr>
 836         <tr>
 837          <td>Equal</td>
 838          <td>3</td>
 839          <td>default</td>
 840         </tr>
 841         <tr>
 842          <td>Greater or equal</td>
 843          <td>4</td>
 844          <td>supported</td>
 845         </tr>
 846         <tr>
 847          <td>Greater than</td>
 848          <td>5</td>
 849          <td>supported</td>
 850         </tr>
 851         <tr>
 852          <td>Not equal</td>
 853          <td>6</td>
 854          <td>unsupported</td>
 855         </tr>
 856         <tr>
 857          <td>Phonetic</td>
 858          <td>100</td>
 859          <td>unsupported</td>
 860         </tr>
 861         <tr>
 862          <td>Stem</td>
 863          <td>101</td>
 864          <td>unsupported</td>
 865         </tr>
 866         <tr>
 867          <td>Relevance</td>
 868          <td>102</td>
 869          <td>supported</td>
 870         </tr>
 871         <tr>
 872          <td>AlwaysMatches</td>
 873          <td>103</td>
 874          <td>unsupported</td>
 875         </tr>
 876        </tbody>
 877      </table>
 878
 879      <para>
 880       The relation attribute
 881       <literal>relevance (102)</literal> is supported, see
 882       <xref linkend="administration-ranking"/> for full information.
 883       <!-- always-matches (103) not supported for all indexes -->
 884      </para>
 885
 886     <para>
 887      All ordering operations are based on a lexicographical ordering,
 888      <emphasis>expect</emphasis> when the
 889      <literal>structure attribute numeric (109)</literal> is used. In
 890      this case, ordering is numerical. See
 891       <xref linkend="querymodel-bib1-structure"/>.
 892     </para>
 893
 894      <para>
 895      Ranked search for <emphasis>information retrieval</emphasis> in
 896      the title-register:
 897      <screen>
 898       Z> find @attr 1=4 @attr 2=102 "information retrieval"
 899      </screen>
 900     </para>
 901     </sect3>
 902
 903     <sect3 id="querymodel-bib1-position">
 904      <title>Position Attributes (type 3)</title>
 905
 906      <para>
 907       The position attribute specifies the location of the search term
 908       within the field or subfield in which it appears.
 909      </para>
 910
 911      <table id="querymodel-bib1-position-table"
 912       frame="all" rowsep="1" colsep="1" align="center">
 913
 914       <caption>Position Attributes (type 3)</caption>
 915       <thead>
 916         <tr>
 917          <td>Position</td>
 918          <td>Value</td>
 919          <td>Notes</td>
 920         </tr>
 921        </thead>
 922        <tbody>
 923         <tr>
 924          <td>First in field </td>
 925          <td>1</td>
 926          <td>unsupported</td>
 927         </tr>
 928         <tr>
 929          <td>First in subfield</td>
 930          <td>2</td>
 931          <td>unsupported</td>
 932         </tr>
 933         <tr>
 934          <td>Any position in field</td>
 935          <td>3</td>
 936          <td>default</td>
 937         </tr>
 938        </tbody>
 939      </table>
 940
 941     <para>
 942       The position attribute values <literal>first in field (1)</literal>,
 943       and <literal>first in subfield(2)</literal> are unsupported.
 944       Using them does not trigger an error, but silent defaults to
 945       <literal>any position in field (3)</literal>.
 946       <!-- It should -->
 947       </para>
 948     </sect3>
 949
 950     <sect3 id="querymodel-bib1-structure">
 951      <title>Structure Attributes (type 4)</title>
 952
 953      <para>
 954       The structure attribute specifies the type of search
 955       term. This causes the search to be mapped on
 956       different Zebra internal indexes, which must have been defined
 957       at index time.
 958      </para>
 959
 960      <para>
 961       The possible values of the
 962       <literal>structure attribute (type 4)</literal> can be defined
 963       using the configuraiton file <filename>
 964       tab/default.idx</filename>.
 965       The default configuration is summerized in this table.
 966      </para>
 967
 968      <table id="querymodel-bib1-structure-table"
 969       frame="all" rowsep="1" colsep="1" align="center">
 970
 971       <caption>Structure Attributes (type 4)</caption>
 972       <thead>
 973         <tr>
 974          <td>Structure</td>
 975          <td>Value</td>
 976          <td>Notes</td>
 977         </tr>
 978        </thead>
 979        <tbody>
 980         <tr>
 981          <td>Phrase </td>
 982          <td>1</td>
 983          <td>default</td>
 984         </tr>
 985         <tr>
 986          <td>Word</td>
 987          <td>2</td>
 988          <td>supported</td>
 989         </tr>
 990         <tr>
 991          <td>Key</td>
 992          <td>3</td>
 993          <td>supported</td>
 994         </tr>
 995         <tr>
 996          <td>Year</td>
 997          <td>4</td>
 998          <td>supported</td>
 999         </tr>
1000         <tr>
1001          <td>Date (normalized)</td>
1002          <td>5</td>
1003          <td>supported</td>
1004         </tr>
1005         <tr>
1006          <td>Word list</td>
1007          <td>6</td>
1008          <td>supported</td>
1009         </tr>
1010         <tr>
1011          <td>Date (un-normalized)</td>
1012          <td>100</td>
1013          <td>unsupported</td>
1014         </tr>
1015         <tr>
1016          <td>Name (normalized) </td>
1017          <td>101</td>
1018          <td>unsupported</td>
1019         </tr>
1020         <tr>
1021          <td>Name (un-normalized) </td>
1022          <td>102</td>
1023          <td>unsupported</td>
1024         </tr>
1025         <tr>
1026          <td>Structure</td>
1027          <td>103</td>
1028          <td>unsupported</td>
1029         </tr>
1030         <tr>
1031          <td>Urx</td>
1032          <td>104</td>
1033          <td>supported</td>
1034         </tr>
1035         <tr>
1036          <td>Free-form-text</td>
1037          <td>105</td>
1038          <td>supported</td>
1039         </tr>
1040         <tr>
1041          <td>Document-text</td>
1042          <td>106</td>
1043          <td>supported</td>
1044         </tr>
1045         <tr>
1046          <td>Local-number</td>
1047          <td>107</td>
1048          <td>supported</td>
1049         </tr>
1050         <tr>
1051          <td>String</td>
1052          <td>108</td>
1053          <td>unsupported</td>
1054         </tr>
1055         <tr>
1056          <td>Numeric string</td>
1057          <td>109</td>
1058          <td>supported</td>
1059         </tr>
1060        </tbody>
1061      </table>
1062     </sect3>
1063
1064     <para>
1065      The structure attribute value <literal>local-number
1066       (107)</literal>
1067      is supported, and maps always to the Zebra internal document ID.
1068      </para>
1069
1070     <para>
1071      For example, in
1072      the GILS schema (<literal>gils.abs</literal>), the
1073      west-bounding-coordinate is indexed as type <literal>n</literal>,
1074      and is therefore searched by specifying
1075      <emphasis>structure</emphasis>=<emphasis>Numeric String</emphasis>.
1076      To match all those records with west-bounding-coordinate greater
1077      than -114 we use the following query:
1078      <screen>
1079       Z> find @attr 4=109 @attr 2=5 @attr gils 1=2038 -114
1080      </screen>
1081     </para>
1082
1083     <sect3 id="querymodel-bib1-truncation">
1084      <title>Truncation Attributes (type = 5)</title>
1085
1086      <para>
1087       The truncation attribute specifies whether variations of one or
1088       more characters are allowed between serch term and hit terms, or
1089       not. Using non-default truncation attributes will broaden the
1090       document hit set of a search query.
1091      </para>
1092
1093      <table id="querymodel-bib1-truncation-table"
1094       frame="all" rowsep="1" colsep="1" align="center">
1095
1096       <caption>Truncation Attributes (type 5)</caption>
1097       <thead>
1098         <tr>
1099          <td>Truncation</td>
1100          <td>Value</td>
1101          <td>Notes</td>
1102         </tr>
1103        </thead>
1104        <tbody>
1105         <tr>
1106          <td>Right truncation </td>
1107          <td>1</td>
1108          <td>supported</td>
1109         </tr>
1110         <tr>
1111          <td>Left truncation</td>
1112          <td>2</td>
1113          <td>supported</td>
1114         </tr>
1115         <tr>
1116          <td>Left and right truncation</td>
1117          <td>3</td>
1118          <td>supported</td>
1119         </tr>
1120         <tr>
1121          <td>Do not truncate</td>
1122          <td>100</td>
1123          <td>default</td>
1124         </tr>
1125         <tr>
1126          <td>Process # in search term</td>
1127          <td>101</td>
1128          <td>supported</td>
1129         </tr>
1130         <tr>
1131          <td>RegExpr-1 </td>
1132          <td>102</td>
1133          <td>supported</td>
1134         </tr>
1135         <tr>
1136          <td>RegExpr-2</td>
1137          <td>103</td>
1138          <td>supported</td>
1139         </tr>
1140        </tbody>
1141      </table>
1142
1143      <para>
1144       Truncation attribute value
1145       <literal>Process # in search term (100)</literal> is a
1146       poor-man's regular expression search. It maps
1147       each <literal>#</literal> to <literal>.*</literal>, and
1148       performes then a <literal>Regexp-1 (102)</literal> regular
1149       expression search.
1150      </para>
1151      <para>
1152       Truncation attribute value
1153        <literal>Regexp-1 (102)</literal> is a normal regular search,
1154       see.
1155      </para>
1156      <para>
1157        Truncation attribute value
1158       <literal>Regexp-2 (103) </literal> is a Zebra specific extention
1159       which allows <emphasis>fuzzy</emphasis> matches. One single
1160       error in spelling of search terms is allowed, i.e., a document
1161       is hit if it includes a term which can be mapped to the used
1162       search term by one character substitution, addition, deletion or
1163       change of posiiton.
1164       </para>
1165       <!--
1166       Special 104, 105, 106 are deprecated and will be removed! -->
1167     </sect3>
1168
1169     <sect3 id="querymodel-bib1-completeness">
1170     <title>Completeness Attributes (type = 6)</title>
1171      <para>
1172       This attribute is ONLY used if structure w, p is to be
1173       chosen. completeness is ignorned if not w, p is to be
1174       used..
1175       Incomplete field(1) is the default and makes Zebra use
1176       register type w.
1177       complete subfield(2) and complete field(3) both triggers
1178       search field type p.
1179      </para>
1180     </sect3>
1181    </sect2>
1182
1183
1184    <sect2 id="querymodel-zebra-attr-search">
1185     <title>Zebra specific Search Extentions to all Attribute Sets</title>
1186     <para>
1187      Zebra extends the Bib1 attribute types, and these extentions are
1188      recognized regardless of attribute
1189      set used in a <literal>search</literal> operation query.
1190     </para>
1191
1192      <table id="querymodel-zebra-attr-search-table"
1193       frame="all" rowsep="1" colsep="1" align="center">
1194
1195       <caption>Zebra Search Attribute Extentions</caption>
1196        <thead>
1197         <tr>
1198          <td>Name</td>
1199          <td>Value</td>
1200          <td>Operation</td>
1201          <td>Zebra version</td>
1202         </tr>
1203       </thead>
1204        <tbody>
1205         <tr>
1206          <td>Embedded Sort</td>
1207          <td>7</td>
1208          <td>search</td>
1209          <td>1.1</td>
1210         </tr>
1211         <tr>
1212          <td>Term Set</td>
1213          <td>8</td>
1214          <td>search</td>
1215          <td>1.1</td>
1216         </tr>
1217         <tr>
1218          <td>Rank Weight</td>
1219          <td>9</td>
1220          <td>search</td>
1221          <td>1.1</td>
1222         </tr>
1223         <tr>
1224          <td>Approx Limit</td>
1225          <td>9</td>
1226          <td>search</td>
1227          <td>1.4</td>
1228         </tr>
1229         <tr>
1230          <td>Term Reference</td>
1231          <td>10</td>
1232          <td>search</td>
1233          <td>1.4</td>
1234         </tr>
1235        </tbody>
1236       </table>
1237
1238     <sect3 id="querymodel-zebra-attr-sorting">
1239      <title>Zebra Extention Embedded Sort Attribute (type 7)</title>
1240     </sect3>
1241     <para>
1242      The embedded sort is a way to specify sort within a query - thus
1243      removing the need to send a Sort Request separately. It is both
1244      faster and does not require clients to deal with the Sort
1245      Facility.
1246     </para>
1247     <para>
1248      The possible values after attribute <literal>type 7</literal> are
1249      <literal>1</literal> ascending and
1250      <literal>2</literal> descending.
1251      The attributes+term (APT) node is separate from the
1252      rest and must be <literal>@or</literal>'ed.
1253      The term associated with APT is the sorting level in integers,
1254      where <literal>0</literal> means primary sort,
1255      <literal>1</literal> means secondary sort, and so forth.
1256      See also <xref linkend="administration-ranking"/>.
1257     </para>
1258     <para>
1259      For example, searching for water, sort by title (ascending)
1260      <screen>
1261       Z> find @or @attr 1=1016 water @attr 7=1 @attr 1=4 0
1262      </screen>
1263     </para>
1264     <para>
1265      Or, searching for water, sort by title ascending, then date descending
1266      <screen>
1267       Z> find @or @or @attr 1=1016 water @attr 7=1 @attr 1=4 0 @attr 7=2 @attr 1=30 1
1268      </screen>
1269     </para>
1270
1271     <sect3 id="querymodel-zebra-attr-estimation">
1272      <title>Zebra Extention Term Set Attribute (type 8)</title>
1273     </sect3>
1274     <para>
1275      The Term Set feature is a facility that allows a search to store
1276      hitting terms in a "pseudo" resultset; thus a search (as usual) +
1277      a scan-like facility. Requires a client that can do named result
1278      sets since the search generates two result sets. The value for
1279      attribute 8 is the name of a result set (string). The terms in
1280      the named term set are returned as SUTRS records.
1281     </para>
1282     <para>
1283      For example, searching  for u in title, right truncated, and
1284      storing the result in term set named 'aset'
1285      <screen>
1286       Z> find @attr 5=1 @attr 1=4 @attr 8=aset u
1287      </screen>
1288     </para>
1289     <warning>
1290      The model has one serious flaw: we don't know the size of term
1291      set. Experimental. Do not use in production code.
1292     </warning>
1293
1294     <sect3 id="querymodel-zebra-attr-weight">
1295      <title>Zebra Extention Rank Weight Attribute (type 9)</title>
1296     </sect3>
1297     <para>
1298      Rank weight is a way to pass a value to a ranking algorithm - so
1299      that one APT has one value - while another as a different one.
1300      See also <xref linkend="administration-ranking"/>.
1301     </para>
1302     <para>
1303      For example, searching  for utah in title with weight 30 as well
1304      as any with weight 20:
1305      <screen>
1306       Z> find @attr 2=102 @or @attr 9=30 @attr 1=4 utah @attr 9=20 utah
1307      </screen>
1308     </para>
1309
1310     <sect3 id="querymodel-zebra-attr-limit">
1311      <title>Zebra Extention Approximative Limit Attribute (type 9)</title>
1312     </sect3>
1313     <para>
1314      Newer Zebra versions normally estemiates hit count for every APT
1315      (leaf) in the query tree. These hit counts are returned as part of
1316      the searchResult-1 facility in the binary encoded Z39.50 search
1317      response packages.
1318     </para>
1319     <para>
1320      By setting a limit for the APT we can make Zebra turn into
1321      approximate hit count when a certain hit count limit is
1322      reached. A value of zero means exact hit count.
1323     </para>
1324     <para>
1325      For example, we might be intersted in exact hit count for a, but
1326      for b we allow hit count estimates for 1000 and higher.
1327      <screen>
1328       Z> find @and a @attr 9=1000 b
1329      </screen>
1330     </para>
1331     <note>
1332      The estimated hit count fascility makes searches faster, as one
1333      only needs to process large hit lists partially.
1334     </note>
1335     <warning>
1336      This facility clashes with rank weight, because there all
1337      documents in the hit lists need to be examined for scoring and
1338      re-sorting.
1339      It is an experimental
1340      extention. Do not use in production code.
1341     </warning>
1342
1343     <sect3 id="querymodel-zebra-attr-termref">
1344      <title>Zebra Extention Term Reference Attribute (type 10)</title>
1345     </sect3>
1346     <para>
1347      Zebra supports the <literal>searchResult-1</literal> facility.
1348      If the <literal>Term Reference Attribute (type 10)</literal> is
1349      given, that specifies a subqueryId value returned as part of the
1350      search result. It is a way for a client to name an APT part of a
1351      query.
1352     </para>
1353     <!--
1354     <para>
1355      <screen>
1356      </screen>
1357     </para>
1358     -->
1359     <warning>
1360      Experimental. Do not use in production code.
1361     </warning>
1362
1363
1364    </sect2>
1365
1366
1367    <sect2 id="querymodel-zebra-attr-scan">
1368     <title>Zebra specific Scan Extentions to all Attribute Sets</title>
1369     <para>
1370      Zebra extends the Bib1 attribute types, and these extentions are
1371      recognized regardless of attribute
1372      set used in a <literal>scan</literal> operation query.
1373     </para>
1374      <table id="querymodel-zebra-attr-scan-table"
1375       frame="all" rowsep="1" colsep="1" align="center">
1376
1377       <caption>Zebra Scan Attribute Extentions</caption>
1378        <thead>
1379         <tr>
1380          <td>Name</td>
1381          <td>Type</td>
1382          <td>Operation</td>
1383          <td>Zebra version</td>
1384         </tr>
1385       </thead>
1386        <tbody>
1387         <tr>
1388          <td>Result Set Narrow</td>
1389          <td>8</td>
1390          <td>scan</td>
1391          <td>1.3</td>
1392         </tr>
1393         <tr>
1394          <td>Approximative Limit</td>
1395          <td>9</td>
1396          <td>scan</td>
1397          <td>1.4</td>
1398         </tr>
1399        </tbody>
1400       </table>
1401
1402     <sect3 id="querymodel-zebra-attr-narrow">
1403      <title>Zebra Extention Result Set Narrow (type 8)</title>
1404     </sect3>
1405     <para>
1406      If attribute <literal>Result Set Narrow (type 8)</literal>
1407      is given for <literal>scan</literal>, the value is the name of a
1408      result set. Each hit count in <literal>scan</literal> is
1409      <literal>@and</literal>'ed with the result set given.
1410     </para>
1411     <para>
1412      Consider for example
1413      the case of scanning all title fields around the
1414      scanterm <emphasis>mozart</emphasis>, then refining the scan by
1415      issuing a filtering query for <emphasis>amadeus</emphasis> to
1416      restric the scan to the result set of the query:
1417      <screen>
1418       Z> scan @attr 1=4 mozart
1419       ...
1420       * mozart (43)
1421         mozartforskningen (1)
1422         mozartiana (1)
1423         mozarts (16)
1424       ...
1425       Z> f @attr 1=4 amadeus
1426       ...
1427       Number of hits: 15, setno 2
1428       ...
1429       Z> scan @attr 1=4 @attr 8=2 mozart
1430       ...
1431       * mozart (14)
1432         mozartforskningen (0)
1433         mozartiana (0)
1434         mozarts (1)
1435       ...
1436      </screen>
1437     </para>
1438
1439     <warning>
1440      Experimental. Do not use in production code.
1441     </warning>
1442
1443     <sect3 id="querymodel-zebra-attr-approx">
1444      <title>Zebra Extention Approximative Limit (type 9)</title>
1445     </sect3>
1446     <para>
1447      The <literal>Zebra Extention Approximative Limit (type
1448       9)</literal> is a way to enable approx
1449      hit counts for <literal>scan</literal> hit counts, in the same
1450      way as for <literal>search</literal> hit counts.
1451     </para>
1452     <!--
1453     <para>
1454      <screen>
1455      </screen>
1456     </para>
1457     -->
1458     <warning>
1459      Experimental and buggy. Definitely not to be used in production code.
1460     </warning>
1461
1462
1463    </sect2>
1464
1465
1466    <sect2 id="querymodel-bib1-mapping">
1467     <title>Mapping from Bib1 Attributes to Zebra internal
1468      register indexes</title>
1469     <para>
1470      TO-DO
1471      </para>
1472
1473
1474      <!-- see in util/zebramap.c
1475       int zebra_maps_attr
1476
1477   if (completeness_value == 2 || completeness_value == 3)
1478         *complete_flag = 1;
1479     else
1480         *complete_flag = 0;
1481     *reg_id = 0;
1482
1483     *sort_flag =(sort_relation_value > 0) ? 1 : 0;
1484     *search_type = "phrase";
1485     strcpy(rank_type, "void");
1486     if (relation_value == 102)
1487     {
1488         if (weight_value == -1)
1489             weight_value = 34;
1490         sprintf(rank_type, "rank,w=%d,u=%d", weight_value, use_value);
1491     }
1492     if (relation_value == 103)
1493     {
1494         *search_type = "always";
1495         *reg_id = 'w';
1496         return 0;
1497     }
1498     if (*complete_flag)
1499         *reg_id = 'p';
1500     else
1501         *reg_id = 'w';
1502     switch (structure_value)
1503     {
1504     case 6:   /* word list */
1505         *search_type = "and-list";
1506         break;
1507     case 105: /* free-form-text */
1508         *search_type = "or-list";
1509         break;
1510     case 106: /* document-text */
1511         *search_type = "or-list";
1512         break;
1513     case -1:
1514     case 1:   /* phrase */
1515     case 2:   /* word */
1516     case 108: /* string */
1517         *search_type = "phrase";
1518         break;
1519    case 107: /* local-number */
1520         *search_type = "local";
1521         *reg_id = 0;
1522         break;
1523     case 109: /* numeric string */
1524         *reg_id = 'n';
1525         *search_type = "numeric";
1526         break;
1527     case 104: /* urx */
1528         *reg_id = 'u';
1529         *search_type = "phrase";
1530         break;
1531     case 3:   /* key */
1532         *reg_id = '0';
1533         *search_type = "phrase";
1534         break;
1535     case 4:  /* year */
1536         *reg_id = 'y';
1537         *search_type = "phrase";
1538         break;
1539     case 5:  /* date */
1540         *reg_id = 'd';
1541         *search_type = "phrase";
1542         break;
1543     default:
1544         return -1;
1545     }
1546     return 0;
1547
1548      -->
1549
1550
1551     <para>
1552      <emphasis>Use</emphasis> attributes are interpreted according to the
1553      attribute sets which have been loaded in the
1554     <literal>zebra.cfg</literal> file, and are matched against specific
1555      fields as specified in the <literal>.abs</literal> file which
1556      describes the profile of the records which have been loaded.
1557      If no Use attribute is provided, a default of Bib-1 Any is assumed.
1558     </para>
1559
1560     <para>
1561      If a <emphasis>Structure</emphasis> attribute of
1562      <emphasis>Phrase</emphasis> is used in conjunction with a
1563      <emphasis>Completeness</emphasis> attribute of
1564      <emphasis>Complete (Sub)field</emphasis>, the term is matched
1565      against the contents of the phrase (long word) register, if one
1566      exists for the given <emphasis>Use</emphasis> attribute.
1567      A phrase register is created for those fields in the
1568      <literal>.abs</literal> file that contains a
1569      <literal>p</literal>-specifier.
1570      <!-- ### whatever the hell _that_ is -->
1571     </para>
1572
1573     <para>
1574      If <emphasis>Structure</emphasis>=<emphasis>Phrase</emphasis> is
1575      used in conjunction with <emphasis>Incomplete Field</emphasis> - the
1576      default value for <emphasis>Completeness</emphasis>, the
1577      search is directed against the normal word registers, but if the term
1578      contains multiple words, the term will only match if all of the words
1579      are found immediately adjacent, and in the given order.
1580      The word search is performed on those fields that are indexed as
1581      type <literal>w</literal> in the <literal>.abs</literal> file.
1582     </para>
1583
1584     <para>
1585      If the <emphasis>Structure</emphasis> attribute is
1586      <emphasis>Word List</emphasis>,
1587      <emphasis>Free-form Text</emphasis>, or
1588      <emphasis>Document Text</emphasis>, the term is treated as a
1589      natural-language, relevance-ranked query.
1590      This search type uses the word register, i.e. those fields
1591      that are indexed as type <literal>w</literal> in the
1592      <literal>.abs</literal> file.
1593     </para>
1594
1595     <para>
1596      If the <emphasis>Structure</emphasis> attribute is
1597      <emphasis>Numeric String</emphasis> the term is treated as an integer.
1598      The search is performed on those fields that are indexed
1599      as type <literal>n</literal> in the <literal>.abs</literal> file.
1600     </para>
1601
1602     <para>
1603      If the <emphasis>Structure</emphasis> attribute is
1604      <emphasis>URx</emphasis> the term is treated as a URX (URL) entity.
1605      The search is performed on those fields that are indexed as type
1606      <literal>u</literal> in the <literal>.abs</literal> file.
1607     </para>
1608
1609     <para>
1610      If the <emphasis>Structure</emphasis> attribute is
1611      <emphasis>Local Number</emphasis> the term is treated as
1612      native Zebra Record Identifier.
1613     </para>
1614
1615     <para>
1616      If the <emphasis>Relation</emphasis> attribute is
1617      <emphasis>Equals</emphasis> (default), the term is matched
1618      in a normal fashion (modulo truncation and processing of
1619      individual words, if required).
1620      If <emphasis>Relation</emphasis> is <emphasis>Less Than</emphasis>,
1621      <emphasis>Less Than or Equal</emphasis>,
1622      <emphasis>Greater than</emphasis>, or <emphasis>Greater than or
1623       Equal</emphasis>, the term is assumed to be numerical, and a
1624      standard regular expression is constructed to match the given
1625      expression.
1626      If <emphasis>Relation</emphasis> is <emphasis>Relevance</emphasis>,
1627      the standard natural-language query processor is invoked.
1628     </para>
1629
1630     <para>
1631      For the <emphasis>Truncation</emphasis> attribute,
1632      <emphasis>No Truncation</emphasis> is the default.
1633      <emphasis>Left Truncation</emphasis> is not supported.
1634      <emphasis>Process # in search term</emphasis> is supported, as is
1635      <emphasis>Regxp-1</emphasis>.
1636      <emphasis>Regxp-2</emphasis> enables the fault-tolerant (fuzzy)
1637      search. As a default, a single error (deletion, insertion,
1638      replacement) is accepted when terms are matched against the register
1639      contents.
1640     </para>
1641    </sect2>
1642
1643    <sect2  id="querymodel-regular">
1644     <title>Zebra Regular Expressions in Truncation Attribute (type = 5)</title>
1645
1646     <para>
1647      Each term in a query is interpreted as a regular expression if
1648      the truncation value is either <emphasis>Regxp-1 (@attr 5=102)</emphasis>
1649      or <emphasis>Regxp-2 (@attr 5=103)</emphasis>.
1650      Both query types follow the same syntax with the operands:
1651     </para>
1652
1653      <table id="querymodel-regular-operands-table"
1654       frame="all" rowsep="1" colsep="1" align="center">
1655
1656       <caption>Regular Expression Operands</caption>
1657        <!--
1658        <thead>
1659        <tr><td>one</td><td>two</td></tr>
1660       </thead>
1661        -->
1662        <tbody>
1663         <tr>
1664          <td><literal>x</literal></td>
1665          <td>Matches the character <literal>x</literal>.</td>
1666         </tr>
1667         <tr>
1668          <td><literal>.</literal></td>
1669          <td>Matches any character.</td>
1670         </tr>
1671         <tr>
1672          <td><literal>[ .. ]</literal></td>
1673          <td>Matches the set of characters specified;
1674          such as <literal>[abc]</literal> or <literal>[a-c]</literal>.</td>
1675         </tr>
1676        </tbody>
1677       </table>
1678
1679     <para>
1680      The above operands can be combined with the following operators:
1681     </para>
1682
1683      <table id="querymodel-regular-operators-table"
1684       frame="all" rowsep="1" colsep="1" align="center">
1685       <caption>Regular Expression Operators</caption>
1686        <!--
1687        <thead>
1688        <tr><td>one</td><td>two</td></tr>
1689       </thead>
1690        -->
1691        <tbody>
1692         <tr>
1693          <td><literal>x*</literal></td>
1694          <td>Matches <literal>x</literal> zero or more times.
1695           Priority: high.</td>
1696         </tr>
1697         <tr>
1698          <td><literal>x+</literal></td>
1699          <td>Matches <literal>x</literal> one or more times.
1700           Priority: high.</td>
1701         </tr>
1702         <tr>
1703          <td><literal>x?</literal></td>
1704          <td> Matches <literal>x</literal> zero or once.
1705           Priority: high.</td>
1706         </tr>
1707         <tr>
1708          <td><literal>xy</literal></td>
1709          <td> Matches <literal>x</literal>, then <literal>y</literal>.
1710          Priority: medium.</td>
1711         </tr>
1712         <tr>
1713          <td><literal>x|y</literal></td>
1714          <td> Matches either <literal>x</literal> or <literal>y</literal>.
1715          Priority: low.</td>
1716         </tr>
1717         <tr>
1718          <td><literal>( )</literal></td>
1719          <td>The order of evaluation may be changed by using parentheses.</td>
1720         </tr>
1721        </tbody>
1722       </table>
1723
1724     <para>
1725      If the first character of the <literal>Regxp-2</literal> query
1726      is a plus character (<literal>+</literal>) it marks the
1727      beginning of a section with non-standard specifiers.
1728      The next plus character marks the end of the section.
1729      Currently Zebra only supports one specifier, the error tolerance,
1730      which consists one digit.
1731     </para>
1732
1733     <para>
1734      Since the plus operator is normally a suffix operator the addition to
1735      the query syntax doesn't violate the syntax for standard regular
1736      expressions.
1737     </para>
1738
1739     <para>
1740      For example, a phrase search with regular expressions  in
1741      the title-register is performed like this:
1742      <screen>
1743       Z> find @attr 1=4 @attr 5=102 "informat.* retrieval"
1744      </screen>
1745     </para>
1746
1747     <para>
1748      Combinations with other attributes are possible. For example, a
1749      ranked search with a regular expression:
1750      <screen>
1751       Z> find @attr 1=4 @attr 5=102 @attr 2=102 "informat.* retrieval"
1752      </screen>
1753     </para>
1754    </sect2>
1755
1756
1757    <!--
1758    <para>
1759     The RecordType parameter in the <literal>zebra.cfg</literal> file, or
1760     the <literal>-t</literal> option to the indexer tells Zebra how to
1761     process input records.
1762     Two basic types of processing are available - raw text and structured
1763     data. Raw text is just that, and it is selected by providing the
1764     argument <literal>text</literal> to Zebra. Structured records are
1765     all handled internally using the basic mechanisms described in the
1766     subsequent sections.
1767     Zebra can read structured records in many different formats.
1768    </para>
1769    -->
1770   </sect1>
1771
1772
1773   <sect1 id="querymodel-cql-to-pqf">
1774    <title>Server Side CQL to PQF Query Translation</title>
1775    <para>
1776     Using the
1777     <literal>&lt;cql2rpn&gt;l2rpn.txt&lt;/cql2rpn&gt;</literal>
1778       YAZ Frontend Virtual
1779     Hosts option, one can configure
1780     the YAZ Frontend CQL-to-PQF
1781     converter, specifying the interpretation of various
1782     <ulink url="&url.cql;">CQL</ulink>
1783     indexes, relations, etc. in terms of Type-1 query attributes.
1784     <!-- The  yaz-client config file -->
1785    </para>
1786    <para>
1787     For example, using server-side CQL-to-PQF conversion, one might
1788     query a zebra server like this:
1789     <screen>
1790     <![CDATA[
1791      yaz-client localhost:9999
1792      Z> querytype cql
1793      Z> find text=(plant and soil)
1794      ]]>
1795     </screen>
1796      and - if properly configured - even static relevance ranking can
1797      be performed using CQL query syntax:
1798     <screen>
1799     <![CDATA[
1800      Z> find text = /relevant (plant and soil)
1801      ]]>
1802      </screen>
1803    </para>
1804
1805    <para>
1806     By the way, the same configuration can be used to
1807     search using client-side CQL-to-PQF conversion:
1808     (the only difference is <literal>querytype cql2rpn</literal>
1809     instead of
1810     <literal>querytype cql</literal>, and the call specifying a local
1811     conversion file)
1812     <screen>
1813     <![CDATA[
1814      yaz-client -q local/cql2pqf.txt localhost:9999
1815      Z> querytype cql2rpn
1816      Z> find text=(plant and soil)
1817      ]]>
1818      </screen>
1819    </para>
1820
1821    <para>
1822     Exhaustive information can be found in the
1823     Section "Specification of CQL to RPN mappings" in the YAZ manual.
1824     <ulink url="http://www.indexdata.dk/yaz/doc/tools.tkl#tools.cql.map">
1825      http://www.indexdata.dk/yaz/doc/tools.tkl#tools.cql.map</ulink>,
1826    and shall therefore not be repeated here.
1827    </para>
1828   <!--
1829   <para>
1830     See
1831       <ulink url="http://www.loc.gov/z3950/agency/zing/cql/dc-indexes.html">
1832       http://www.loc.gov/z3950/agency/zing/cql/dc-indexes.html</ulink>
1833     for the Maintenance Agency's work-in-progress mapping of Dublin Core
1834     indexes to Attribute Architecture (util, XD and BIB-2)
1835     attributes.
1836    </para>
1837    -->
1838  </sect1>
1839
1840
1841
1842 </chapter>
1843
1844  <!-- Keep this comment at the end of the file
1845  Local variables:
1846  mode: sgml
1847  sgml-omittag:t
1848  sgml-shorttag:t
1849  sgml-minimize-attributes:nil
1850  sgml-always-quote-attributes:t
1851  sgml-indent-step:1
1852  sgml-indent-data:t
1853  sgml-parent-document: "zebra.xml"
1854  sgml-local-catalogs: nil
1855  sgml-namecase-general:t
1856  End:
1857  -->