1 <chapter id="querymodel">
2 <!-- $Id: querymodel.xml,v 1.7 2006-06-16 10:30:12 marc Exp $ -->
3 <title>Query Model</title>
5 <sect1 id="querymodel-overview">
6 <title>Query Model Overview</title>
9 <sect2 id="querymodel-query-languages">
10 <title>Query Languages</title>
13 Zebra is born as a networking Information Retrieval engine adhering
14 to the international standards
15 <ulink url="&url.z39.50;">Z39.50</ulink> and
16 <ulink url="&url.sru;">SRU</ulink>,
17 and implement the query model defined there.
18 Unfortunately, the Z39.50 query model has only defined a binary
19 encoded representation, which is used as transport packaging in
20 the Z39.50 protocol layer. This representation is not human
21 readable, nor defines any convenient way to specify queries.
23 <!-- tell about RPN - include link to YAZ
28 <sect3 id="querymodel-query-languages-pqf">
29 <title>Prefix Query Format (PQF)</title>
32 Index Data has defined a textual representaion in the
33 <literal>Prefix Query Format</literal>, short
34 <literal>PQF</literal>, which then has been adopted by other
35 parties developing Z39.50 software. It is also often referred to as
36 <literal>Prefix Query Notation</literal>, or in short
37 <literal>PQN</literal>, and is thoroughly explained in
38 <xref linkend="querymodel-pqf"/>.
43 <!-- PQF/RPN is natively supported. CQL is NOT . So we need a map -->
44 <sect3 id="querymodel-query-languages-cql">
45 <title>Common Query Language (CQL)</title>
47 In addition, Zebra can be configured to understand and map the
48 <literal>Common Query Language</literal>
49 (<ulink url="&url.cql;">CQL</ulink>)
50 to PQF. See an introduction on the mapping to the internal query
52 <xref linkend="querymodel-cql-to-pqf"/>.
58 <sect2 id="querymodel-query-types">
59 <title>Query types</title>
63 <sect3 id="querymodel-query-type-explain">
64 <title>Explain Queries</title>
69 <sect3 id="querymodel-query-type-search">
70 <title>Search Queries</title>
75 <sect3 id="querymodel-query-type-scan">
76 <title>Scan Queries</title>
86 <sect1 id="querymodel-pqf">
87 <title>Prefix Query Format structure and syntax</title>
89 The <ulink url="&url.yaz.pqf;">PQF grammer</ulink>
90 is documented in the YAZ manual, and shall not be
91 repeated here. This textual PQF representation
92 is always during search mapped to the equivalent Zebra internal
96 <sect2 id="querymodel-pqf-tree">
97 <title>PQF tree structure</title>
99 The PQF parse tree - or the equivalent textual representation -
100 may start with one specification of the
101 <emphasis>attribute set</emphasis> used. Following is a query
103 consists of <emphasis>atomic query parts (APT)</emphasis>, eventually
104 paired by <emphasis>boolean binary operators</emphasis>, and
105 finally <emphasis>recursively combined </emphasis> into
109 <sect3 id="querymodel-attribute-sets">
110 <title>Attribute sets</title>
112 Attribute sets define the exact meaning and semantics of queries
113 issued. Zebra comes with some predefined attribute set
114 definitions, others can easily be defined and added to the
117 The Zebra internal query procesing is modeled after
118 the <literal>Bib1</literal> attribute set, and the non-use
119 attributes type 2-6 are hard-wired in. It is therefore essential
120 to be familiar with <xref linkend="querymodel-bib1"/>.
124 <table id="querymodel-attribute-sets-table"
125 frame="all" rowsep="1" colsep="1" align="center">
127 <caption>Attribute sets predefined in Zebra</caption>
130 <tr><td>one</td><td>two</td></tr>
135 <td><literal>exp-1</literal></td>
136 <td><literal>Explain</literal> attribute set</td>
137 <td>Special attribute set used on the special automagic
138 <literal>IR-Explain-1</literal> database to gain information on
139 server capabilities, database names, and database
143 <td><literal>bib-1</literal></td>
144 <td><literal>Bib1</literal> attribute set</td>
145 <td>Standard PQF query language attribute set which defines the
146 semantics of Z39.50 searching. In addition, all of the
147 non-use attributes (type 2-9) define the Zebra internal query
151 <td><literal>gils</literal></td>
152 <td><literal>GILS</literal> attribute set</td>
153 <td>Extention to the <literal>Bib1</literal> attribute set.</td>
159 <sect3 id="querymodel-boolean-operators">
160 <title>Boolean operators</title>
162 A pair of subquery trees, or of atomic queries, is combined
163 using the standard boolean operators into new query trees.
166 <table id="querymodel-boolean-operators-table"
167 frame="all" rowsep="1" colsep="1" align="center">
169 <caption>Boolean operators</caption>
172 <tr><td>one</td><td>two</td></tr>
176 <tr><td><literal>@and</literal></td>
177 <td>binary <literal>AND</literal> operator</td>
178 <td>Set intersection of two atomic queries hit sets</td>
180 <tr><td><literal>@or</literal></td>
181 <td>binary <literal>OR</literal> operator</td>
182 <td>Set union of two atomic queries hit sets</td>
184 <tr><td><literal>@not</literal></td>
185 <td>binary <literal>AND NOT</literal> operator</td>
186 <td>Set complement of two atomic queries hit sets</td>
188 <tr><td><literal>@prox</literal></td>
189 <td>binary <literal>PROXIMY</literal> operator</td>
190 <td>Set intersection of two atomic queries hit sets. In
191 addition, the intersection set is purged for all
192 documents which do not satisfy the requested query
193 term proximity. Usually a proper subset of the AND
200 For example, we can combine the terms
201 <emphasis>information</emphasis> and <emphasis>retrieval</emphasis>
202 into different searches in the default index of the default
203 attribute set as follows.
204 Querying for the union of all documents containing the
205 terms <emphasis>information</emphasis> OR
206 <emphasis>retrieval</emphasis>:
208 Z> find @or information retrieval
212 Querying for the intersection of all documents containing the
213 terms <emphasis>information</emphasis> AND
214 <emphasis>retrieval</emphasis>:
215 The hit set is a subset of the coresponding
218 Z> find @and information retrieval
222 Querying for the intersection of all documents containing the
223 terms <emphasis>information</emphasis> AND
224 <emphasis>retrieval</emphasis>, taking proximity into account:
225 The hit set is a subset of the coresponding
228 Z> find @prox information retrieval
232 Querying for the intersection of all documents containing the
233 terms <emphasis>information</emphasis> AND
234 <emphasis>retrieval</emphasis>, in the same order and near each
235 other as described in the term list
236 The hit set is a subset of the coresponding
239 Z> find "information retrieval"
245 <sect3 id="querymodel-atomic-queries">
246 <title>Atomic queries (APT)</title>
248 Atomic queries are the query parts which work on one acess point
249 only. These consist of <literal>an attribute list</literal>
250 followed by a <literal>single term</literal> or a
251 <literal>quoted term list</literal>, and are often called
252 <emphasis>Attributes-Plus-Terms (APT)</emphasis> queries.
255 Unsupplied non-use attributes type 2-9 are either inherited from
256 higher nodes in the query tree, or are set to Zebra's default values.
257 See <xref linkend="querymodel-bib1"/> for details.
260 <table id="querymodel-atomic-queries-table"
261 frame="all" rowsep="1" colsep="1" align="center">
263 <caption>Atomic queries</caption>
266 <tr><td>one</td><td>two</td></tr>
270 <tr><td><emphasis>attribute list</emphasis></td>
271 <td>List of <literal>orthogonal</literal> attributes</td>
272 <td>Any of the orthogonal attribute types may be omitted,
273 these are inherited from higher query tree nodes, or if not
274 inherited, are set to the default Zebra configuration values.
277 <tr><td><emphasis>term</emphasis></td>
278 <td>single <literal>term</literal>
279 or <literal>quoted term list</literal> </td>
280 <td>Here the search terms or list of search terms is added
286 Querying for the term <emphasis>information</emphasis> in the
287 default index using the default attribite set, the server choice
288 of access point/index, and the default non-use attributes.
290 Z> find "information"
294 Equivalent query fully specified including all default values:
296 Z> find @attrset bib-1 @attr 1=1017 @attr 2=3 @attr 3=3 @attr 4=1 @attr 5=100 @attr 6=1 "information"
301 Finding all documents which have empty titles. Notice that the
302 empty term must be quoted, but is otherwise legal.
310 <sect3 id="querymodel-use-string">
311 <title>Zebra's special use attribute type 1 of form 'string'</title>
313 The numeric <literal>use (type 1)</literal> attribute is usually
314 refered to from a given
315 attribute set. In addition, Zebra let you use
316 <emphasis>any internal index
317 name defined in your configuration</emphasis>
318 as use atribute value. This is a great feature for
319 debugging, and when you do
320 not need the complecity of defined use attribute values. It is
321 the preferred way of accessing Zebra indexes directly.
324 Finding all documents which have the term list "information
325 retrieval" in an Zebra index, using it's internal full string name.
327 Z> find @attr 1=sometext "information retrieval"
331 Searching the bib-1 use attribute 54 using it's string name:
333 Z> find @attr 1=Code-language eng
337 Searching in any silly string index - if it's defined in your
338 indexation rules and can be parsed by the PQF parser.
339 This is definitely not the recommended use of
340 this facility, as it might confuse your users with some very
343 Z> find @attr 1=silly/xpath/alike[@index]/name "information retrieval"
347 See <xref linkend="querymodel-bib1-mapping"/> for details, and
348 <xref linkend="server-sru"/>
349 for the SRU PQF query extention using string names as a fast
354 <sect3 id="querymodel-use-xpath">
355 <title>Zebra's special use attribute type 1 of form 'XPath'
356 for GRS filters</title>
358 As we have seen above, it is possible (albeit seldom a great
360 <ulink url="http://www.w3.org/TR/xpath">XPath 1.0</ulink> based
361 search by defining <literal>use (type 1)</literal>
362 <emphasis>string</emphasis> attributes which in appearence
363 <emphasis>resemble XPath queries</emphasis>. There are two
364 problems with this approach: first, the XPath-look-alike has to
365 be defined at indexation time, no new undefined
366 XPath queries can entered at search time, and second, it might
367 confuse users very much that an XPath-alike index name in fact
368 gets populated from a possible entirely different XML element
369 than it pretends to acess.
372 When using the <literal>GRS Record Model</literal>
373 (see <xref linkend="record-model-grs"/>), we have the
374 possibility to embed <emphasis>life</emphasis>
376 in the PQF queries, which are here called
377 <literal>use (type 1)</literal> <emphasis>xpath</emphasis>
378 attributes. You must enable the
379 <literal>xpath enable</literal> directive in your
380 <literal>.abs</literal> config files.
383 Only a <emphasis>very</emphasis> restricted subset of the
384 <ulink url="http://www.w3.org/TR/xpath">XPath 1.0</ulink>
385 standard is supported as the GRS record model is simpler than
386 a full XML DOM structure. See the following examples for
390 Finding all documents which have the term "content"
391 inside a text node found in a specific XML DOM
392 <emphasis>subtree</emphasis>, whose starting element is
395 Z> find @attr 1=/root content
396 Z> find @attr 1=/root/first content
398 <emphasis>Notice that the
399 XPath must be absolute, i.e., must start with '/', and that the
400 XPath <literal>decendant-or-self</literal> axis followed by a
401 text node selection <literal>text()</literal> is implicitly
402 appended to the stated XPath.
404 It follows that the above searches are interpreted as:
406 Z> find @attr 1=/root//text() content
407 Z> find @attr 1=/root/first//text() content
412 Filter the adressing XPath by a predicate working on exact
414 attributes (in the XML sense) can be done: return all those docs which
415 have the term "english" contained in one of all text subnodes of
416 the subtree defined by the XPath
417 <literal>/record/title[@lang='en']</literal>
419 Z> find @attr 1=/record/title[@lang='en'] english
424 Combining numeric indexes, boolean expressions,
425 and xpath based searches is possible:
427 Z> find @attr 1=/record/title @and foo bar
428 Z> find @and @attr 1=/record/title foo @attr 1=4 bar
432 Escaping PQF keywords and other non-parseable XPath constructs
433 with <literal>'{ }'</literal> to prevent syntax errors:
435 Z> find @attr {1=/root/first[@attr='danish']} content
436 Z> find @attr {1=/root/second[@attr='danish lake']}
437 Z> find @attr {1=/root/third[@attr='dansk s\xc3\xb8']}
441 It is worth mentioning that these dynamic performed XPath
442 queries are a performance bottelneck, as no optimized
443 specialized indexes can be used. Therefore, avoid the use of
444 this facility when speed is essential, and the database content
445 size is medium to large.
451 <sect2 id="querymodel-exp1">
452 <title>Explain Attribute Set</title>
454 The Z39.50 standard defines the
455 <ulink url="&url.z39.50.explain;">Explain</ulink>attribute set
456 <literal>exp-1</literal>, which is used to discover information
457 about a server's search semantics and functional capabilities
458 Zebra exposes a "classic"
459 Explain database by base name <literal>IR-Explain-1</literal>, which
460 is populated with system internal information.
463 The attribute-set <literal>exp-1</literal> consists of a single
464 <literal>Use (type 1)</literal> attribute.
467 In addition, the non-Use
468 <literal>bib-1</literal> attributes, that is, the types
469 <literal>Relation</literal>, <literal>Position</literal>,
470 <literal>Structure</literal>, <literal>Truncation</literal>,
471 and <literal>Completeness</literal> are imported from
472 the <literal>bib-1</literal> attribute set, and may be used
473 within any explain query.
476 <sect3 id="querymodel-exp1-use">
477 <title>Use Attributes (type = 1)</title>
479 The following Explain search atributes are supported:
480 <literal>ExplainCategory</literal> (@attr 1=1),
481 <literal>DatabaseName</literal> (@attr 1=3),
482 <literal>DateAdded</literal> (@attr 1=9),
483 <literal>DateChanged</literal>(@attr 1=10).
486 A search in the use attribute <literal>ExplainCategory</literal>
487 supports only these predefined values:
488 <literal>CategoryList</literal>, <literal>TargetInfo</literal>,
489 <literal>DatabaseInfo</literal>, <literal>AttributeDetails</literal>.
492 See <filename>tab/explain.att</filename> and the
493 <ulink url="&url.z39.50;">Z39.50</ulink> standard
494 for more information.
499 <title>Explain searches with yaz-client</title>
501 Classic Explain only defines retrieval of Explain information
502 via ASN.1. Pratically no Z39.50 clients supports this. Fortunately
503 they don't have to - Zebra allows retrieval of this information
505 <literal>SUTRS</literal>, <literal>XML</literal>,
506 <literal>GRS-1</literal> and <literal>ASN.1</literal> Explain.
510 List supported categories to find out which explain commands are
514 Z> find @attr exp1 1=1 categorylist
521 Get target info, that is, investigate which databases exist at
522 this server endpoint:
525 Z> find @attr exp1 1=1 targetinfo
536 List all supported databases, the number of hits
537 is the number of databases found, which most commonly are the
539 the <literal>Default</literal> and the
540 <literal>IR-Explain-1</literal> databases.
543 Z> find @attr exp1 1=1 databaseinfo
550 Get database info record for database <literal>Default</literal>.
553 Z> find @and @attr exp1 1=1 databaseinfo @attr exp1 1=3 Default
555 Identical query with explicitly specified attribute set:
558 Z> find @attrset exp1 @and @attr 1=1 databaseinfo @attr 1=3 Default
563 Get attribute details record for database
564 <literal>Default</literal>.
565 This query is very useful to study the internal Zebra indexes.
566 If records have been indexed using the <literal>alvis</literal>
567 XSLT filter, the string representation names of the known indexes can be
571 Z> find @and @attr exp1 1=1 attributedetails @attr exp1 1=3 Default
573 Identical query with explicitly specified attribute set:
576 Z> find @attrset exp1 @and @attr 1=1 attributedetails @attr 1=3 Default
583 <sect2 id="querymodel-bib1">
584 <title>Bib1 Attribute Set</title>
586 Something about querying to be written ..
589 Most of the information contained in this section is an excerpt of
590 the <literal>ATTRIBUTE SET BIB-1 (Z39.50-1995)
592 found at <ulink url="&url.z39.50.attset.bib1.1995;">. The BIB-1
593 Attribute Set Semantics</ulink> from 1995, also in an updated
594 <ulink url="&url.z39.50.attset.bib1;">Bib-1
595 Attribute Set</ulink>
596 version from 2003. Index Data is not the copyright holder of this
601 <sect3 id="querymodel-bib1-use">
602 <title>Use Attributes (type 1)</title>
606 A use attribute specifies an access point for any atomic query.
607 These acess points are highly dependent on the attribute set used
608 in the query, and are user configurable using the following
609 default configuration files:
610 <filename>tab/bib1.att</filename>,
611 <filename>tab/dan1.att</filename>,
612 <filename>tab/explain.att</filename>, and
613 <filename>tab/gils.att</filename>.
614 New attribute sets can be added by adding new
615 <filename>tab/*.att</filename> configuration files, which need to
616 be sourced in the main configuration <filename>zebra.cfg</filename>.
620 In addition, Zebra allows the acess of
621 <emphasis>internal index names</emphasis> and <emphasis>dynamic
622 XPath</emphasis> as use attributes.
623 See <xref linkend="querymodel-use-string"/> and
624 <xref linkend="querymodel-use-xpath"/> for
625 alternative acess to the Zebra internal index names and XPath queries.
629 Phrase search for <emphasis>information retrieval</emphasis> in
632 Z> find @attr 1=4 "information retrieval"
637 <sect3 id="querymodel-bib1-relation">
638 <title>Relation Attributes (type 2)</title>
641 Relation attributes describe the relationship of the access
643 of the relation) to the search term as qualified by the attributes (right
644 side of the relation), e.g., Date-publication <= 1975.
647 <table id="querymodel-bib1-relation-table"
648 frame="all" rowsep="1" colsep="1" align="center">
650 <caption>Relation Attributes (type 2)</caption>
665 <td>Less than or equal</td>
675 <td>Greater or equal</td>
680 <td>Greater than</td>
705 <td>AlwaysMatches</td>
713 The relation attribute
714 <literal>relevance (102)</literal> is supported, see
715 <xref linkend="administration-ranking"/> for full information.
716 <!-- always-matches (103) not supported for all indexes -->
720 All ordering operations are based on a lexicographical ordering,
721 <emphasis>expect</emphasis> when the
722 <literal>structure attribute numeric (109)</literal> is used. In
723 this case, ordering is numerical. See
724 <xref linkend="querymodel-bib1-structure"/>.
728 Ranked search for <emphasis>information retrieval</emphasis> in
731 Z> find @attr 1=4 @attr 2=102 "information retrieval"
736 <sect3 id="querymodel-bib1-position">
737 <title>Position Attributes (type 3)</title>
740 The position attribute specifies the location of the search term
741 within the field or subfield in which it appears.
744 <table id="querymodel-bib1-position-table"
745 frame="all" rowsep="1" colsep="1" align="center">
747 <caption>Position Attributes (type 3)</caption>
757 <td>First in field </td>
762 <td>First in subfield</td>
767 <td>Any position in field</td>
775 The position attribute values <literal>first in field (1)</literal>,
776 and <literal>first in subfield(2)</literal> are unsupported.
777 Using them does not trigger an error, but silent defaults to
778 <literal>any position in field (3)</literal>.
783 <sect3 id="querymodel-bib1-structure">
784 <title>Structure Attributes (type 4)</title>
787 The structure attribute specifies the type of search
788 term. This causes the search to be mapped on
789 different Zebra internal indexes, which must have been defined
794 The possible values of the
795 <literal>structure attribute (type 4)</literal> can be defined
796 using the configuraiton file <filename>
797 tab/default.idx</filename>.
798 The default configuration is summerized in this table.
801 <table id="querymodel-bib1-structure-table"
802 frame="all" rowsep="1" colsep="1" align="center">
804 <caption>Structure Attributes (type 4)</caption>
834 <td>Date (normalized)</td>
844 <td>Date (un-normalized)</td>
849 <td>Name (normalized) </td>
854 <td>Name (un-normalized) </td>
869 <td>Free-form-text</td>
874 <td>Document-text</td>
879 <td>Local-number</td>
889 <td>Numeric string</td>
898 The structure attribute value <literal>local-number
900 is supported, and maps always to the Zebra internal document ID.
905 the GILS schema (<literal>gils.abs</literal>), the
906 west-bounding-coordinate is indexed as type <literal>n</literal>,
907 and is therefore searched by specifying
908 <emphasis>structure</emphasis>=<emphasis>Numeric String</emphasis>.
909 To match all those records with west-bounding-coordinate greater
910 than -114 we use the following query:
912 Z> find @attr 4=109 @attr 2=5 @attr gils 1=2038 -114
916 <sect3 id="querymodel-bib1-truncation">
917 <title>Truncation Attributes (type = 5)</title>
920 The truncation attribute specifies whether variations of one or
921 more characters are allowed between serch term and hit terms, or
922 not. Using non-default truncation attributes will broaden the
923 document hit set of a search query.
926 <table id="querymodel-bib1-truncation-table"
927 frame="all" rowsep="1" colsep="1" align="center">
929 <caption>Truncation Attributes (type 5)</caption>
939 <td>Right truncation </td>
944 <td>Left truncation</td>
949 <td>Left and right truncation</td>
954 <td>Do not truncate</td>
959 <td>Process # in search term</td>
977 Truncation attribute value
978 <literal>Process # in search term (100)</literal> is a
979 poor-man's regular expression search. It maps
980 each <literal>#</literal> to <literal>.*</literal>, and
981 performes then a <literal>Regexp-1 (102)</literal> regular
985 Truncation attribute value
986 <literal>Regexp-1 (102)</literal> is a normal regular search,
990 Truncation attribute value
991 <literal>Regexp-2 (103) </literal> is a Zebra specific extention
992 which allows <emphasis>fuzzy</emphasis> matches. One single
993 error in spelling of search terms is allowed, i.e., a document
994 is hit if it includes a term which can be mapped to the used
995 search term by one character substitution, addition, deletion or
999 Special 104, 105, 106 are deprecated and will be removed! -->
1002 <sect3 id="querymodel-bib1-completeness">
1003 <title>Completeness Attributes (type = 6)</title>
1005 This attribute is ONLY used if structure w, p is to be
1006 chosen. completeness is ignorned if not w, p is to be
1008 Incomplete field(1) is the default and makes Zebra use
1010 complete subfield(2) and complete field(3) both triggers
1011 search field type p.
1017 <sect2 id="querymodel-zebra-attr-search">
1018 <title>Zebra specific Search Extentions to all Attribute Sets</title>
1020 Zebra extends the Bib1 attribute types, and these extentions are
1021 recognized regardless of attribute
1022 set used in a <literal>search</literal> operation query.
1025 <table id="querymodel-zebra-attr-search-table"
1026 frame="all" rowsep="1" colsep="1" align="center">
1028 <caption>Zebra Search Attribute Extentions</caption>
1034 <td>Zebra version</td>
1039 <td>Embedded Sort</td>
1051 <td>Rank Weight</td>
1057 <td>Approx Limit</td>
1063 <td>Term Reference</td>
1071 <sect3 id="querymodel-zebra-attr-sorting">
1072 <title>Zebra Extention Embedded Sort Attribute (type 7)</title>
1075 The embedded sort is a way to specify sort within a query - thus
1076 removing the need to send a Sort Request separately. It is both
1077 faster and does not require clients to deal with the Sort
1081 The possible values after attribute <literal>type 7</literal> are
1082 <literal>1</literal> ascending and
1083 <literal>2</literal> descending.
1084 The attributes+term (APT) node is separate from the
1085 rest and must be <literal>@or</literal>'ed.
1086 The term associated with APT is the sorting level in integers,
1087 where <literal>0</literal> means primary sort,
1088 <literal>1</literal> means secondary sort, and so forth.
1089 See also <xref linkend="administration-ranking"/>.
1092 For example, searching for water, sort by title (ascending)
1094 Z> find @or @attr 1=1016 water @attr 7=1 @attr 1=4 0
1098 Or, searching for water, sort by title ascending, then date descending
1100 Z> find @or @or @attr 1=1016 water @attr 7=1 @attr 1=4 0 @attr 7=2 @attr 1=30 1
1104 <sect3 id="querymodel-zebra-attr-estimation">
1105 <title>Zebra Extention Term Set Attribute (type 8)</title>
1108 The Term Set feature is a facility that allows a search to store
1109 hitting terms in a "pseudo" resultset; thus a search (as usual) +
1110 a scan-like facility. Requires a client that can do named result
1111 sets since the search generates two result sets. The value for
1112 attribute 8 is the name of a result set (string). The terms in
1113 the named term set are returned as SUTRS records.
1116 For example, searching for u in title, right truncated, and
1117 storing the result in term set named 'aset'
1119 Z> find @attr 5=1 @attr 1=4 @attr 8=aset u
1123 The model has one serious flaw: we don't know the size of term
1124 set. Experimental. Do not use in production code.
1127 <sect3 id="querymodel-zebra-attr-weight">
1128 <title>Zebra Extention Rank Weight Attribute (type 9)</title>
1131 Rank weight is a way to pass a value to a ranking algorithm - so
1132 that one APT has one value - while another as a different one.
1133 See also <xref linkend="administration-ranking"/>.
1136 For example, searching for utah in title with weight 30 as well
1137 as any with weight 20:
1139 Z> find @attr 2=102 @or @attr 9=30 @attr 1=4 utah @attr 9=20 utah
1143 <sect3 id="querymodel-zebra-attr-limit">
1144 <title>Zebra Extention Approximative Limit Attribute (type 9)</title>
1147 Newer Zebra versions normally estemiates hit count for every APT
1148 (leaf) in the query tree. These hit counts are returned as part of
1149 the searchResult-1 facility in the binary encoded Z39.50 search
1153 By setting a limit for the APT we can make Zebra turn into
1154 approximate hit count when a certain hit count limit is
1155 reached. A value of zero means exact hit count.
1158 For example, we might be intersted in exact hit count for a, but
1159 for b we allow hit count estimates for 1000 and higher.
1161 Z> find @and a @attr 9=1000 b
1165 The estimated hit count fascility makes searches faster, as one
1166 only needs to process large hit lists partially.
1169 This facility clashes with rank weight, because there all
1170 documents in the hit lists need to be examined for scoring and
1172 It is an experimental
1173 extention. Do not use in production code.
1176 <sect3 id="querymodel-zebra-attr-termref">
1177 <title>Zebra Extention Term Reference Attribute (type 10)</title>
1180 Zebra supports the <literal>searchResult-1</literal> facility.
1181 If the <literal>Term Reference Attribute (type 10)</literal> is
1182 given, that specifies a subqueryId value returned as part of the
1183 search result. It is a way for a client to name an APT part of a
1193 Experimental. Do not use in production code.
1200 <sect2 id="querymodel-zebra-attr-scan">
1201 <title>Zebra specific Scan Extentions to all Attribute Sets</title>
1203 Zebra extends the Bib1 attribute types, and these extentions are
1204 recognized regardless of attribute
1205 set used in a <literal>scan</literal> operation query.
1207 <table id="querymodel-zebra-attr-scan-table"
1208 frame="all" rowsep="1" colsep="1" align="center">
1210 <caption>Zebra Scan Attribute Extentions</caption>
1216 <td>Zebra version</td>
1221 <td>Result Set Narrow</td>
1227 <td>Approximative Limit</td>
1235 <sect3 id="querymodel-zebra-attr-narrow">
1236 <title>Zebra Extention Result Set Narrow (type 8)</title>
1239 If attribute <literal>Result Set Narrow (type 8)</literal>
1240 is given for <literal>scan</literal>, the value is the name of a
1241 result set. Each hit count in <literal>scan</literal> is
1242 <literal>@and</literal>'ed with the result set given.
1251 Experimental and buggy. Definitely not to be used in production code.
1254 <sect3 id="querymodel-zebra-attr-approx">
1255 <title>Zebra Extention Approximative Limit (type 9)</title>
1258 The <literal>Zebra Extention Approximative Limit (type
1259 9)</literal> is a way to enable approx
1260 hit counts for <literal>scan</literal> hit counts, in the same
1261 way as for <literal>search</literal> hit counts.
1270 Experimental. Do not use in production code.
1277 <sect2 id="querymodel-bib1-mapping">
1278 <title>Mapping from Bib1 Attributes to Zebra internal
1279 register indexes</title>
1285 <!-- see in util/zebramap.c
1288 if (completeness_value == 2 || completeness_value == 3)
1294 *sort_flag =(sort_relation_value > 0) ? 1 : 0;
1295 *search_type = "phrase";
1296 strcpy(rank_type, "void");
1297 if (relation_value == 102)
1299 if (weight_value == -1)
1301 sprintf(rank_type, "rank,w=%d,u=%d", weight_value, use_value);
1303 if (relation_value == 103)
1305 *search_type = "always";
1313 switch (structure_value)
1315 case 6: /* word list */
1316 *search_type = "and-list";
1318 case 105: /* free-form-text */
1319 *search_type = "or-list";
1321 case 106: /* document-text */
1322 *search_type = "or-list";
1325 case 1: /* phrase */
1327 case 108: /* string */
1328 *search_type = "phrase";
1330 case 107: /* local-number */
1331 *search_type = "local";
1334 case 109: /* numeric string */
1336 *search_type = "numeric";
1340 *search_type = "phrase";
1344 *search_type = "phrase";
1348 *search_type = "phrase";
1352 *search_type = "phrase";
1363 <emphasis>Use</emphasis> attributes are interpreted according to the
1364 attribute sets which have been loaded in the
1365 <literal>zebra.cfg</literal> file, and are matched against specific
1366 fields as specified in the <literal>.abs</literal> file which
1367 describes the profile of the records which have been loaded.
1368 If no Use attribute is provided, a default of Bib-1 Any is assumed.
1372 If a <emphasis>Structure</emphasis> attribute of
1373 <emphasis>Phrase</emphasis> is used in conjunction with a
1374 <emphasis>Completeness</emphasis> attribute of
1375 <emphasis>Complete (Sub)field</emphasis>, the term is matched
1376 against the contents of the phrase (long word) register, if one
1377 exists for the given <emphasis>Use</emphasis> attribute.
1378 A phrase register is created for those fields in the
1379 <literal>.abs</literal> file that contains a
1380 <literal>p</literal>-specifier.
1381 <!-- ### whatever the hell _that_ is -->
1385 If <emphasis>Structure</emphasis>=<emphasis>Phrase</emphasis> is
1386 used in conjunction with <emphasis>Incomplete Field</emphasis> - the
1387 default value for <emphasis>Completeness</emphasis>, the
1388 search is directed against the normal word registers, but if the term
1389 contains multiple words, the term will only match if all of the words
1390 are found immediately adjacent, and in the given order.
1391 The word search is performed on those fields that are indexed as
1392 type <literal>w</literal> in the <literal>.abs</literal> file.
1396 If the <emphasis>Structure</emphasis> attribute is
1397 <emphasis>Word List</emphasis>,
1398 <emphasis>Free-form Text</emphasis>, or
1399 <emphasis>Document Text</emphasis>, the term is treated as a
1400 natural-language, relevance-ranked query.
1401 This search type uses the word register, i.e. those fields
1402 that are indexed as type <literal>w</literal> in the
1403 <literal>.abs</literal> file.
1407 If the <emphasis>Structure</emphasis> attribute is
1408 <emphasis>Numeric String</emphasis> the term is treated as an integer.
1409 The search is performed on those fields that are indexed
1410 as type <literal>n</literal> in the <literal>.abs</literal> file.
1414 If the <emphasis>Structure</emphasis> attribute is
1415 <emphasis>URx</emphasis> the term is treated as a URX (URL) entity.
1416 The search is performed on those fields that are indexed as type
1417 <literal>u</literal> in the <literal>.abs</literal> file.
1421 If the <emphasis>Structure</emphasis> attribute is
1422 <emphasis>Local Number</emphasis> the term is treated as
1423 native Zebra Record Identifier.
1427 If the <emphasis>Relation</emphasis> attribute is
1428 <emphasis>Equals</emphasis> (default), the term is matched
1429 in a normal fashion (modulo truncation and processing of
1430 individual words, if required).
1431 If <emphasis>Relation</emphasis> is <emphasis>Less Than</emphasis>,
1432 <emphasis>Less Than or Equal</emphasis>,
1433 <emphasis>Greater than</emphasis>, or <emphasis>Greater than or
1434 Equal</emphasis>, the term is assumed to be numerical, and a
1435 standard regular expression is constructed to match the given
1437 If <emphasis>Relation</emphasis> is <emphasis>Relevance</emphasis>,
1438 the standard natural-language query processor is invoked.
1442 For the <emphasis>Truncation</emphasis> attribute,
1443 <emphasis>No Truncation</emphasis> is the default.
1444 <emphasis>Left Truncation</emphasis> is not supported.
1445 <emphasis>Process # in search term</emphasis> is supported, as is
1446 <emphasis>Regxp-1</emphasis>.
1447 <emphasis>Regxp-2</emphasis> enables the fault-tolerant (fuzzy)
1448 search. As a default, a single error (deletion, insertion,
1449 replacement) is accepted when terms are matched against the register
1454 <sect2 id="querymodel-regular">
1455 <title>Zebra Regular Expressions in Truncation Attribute (type = 5)</title>
1458 Each term in a query is interpreted as a regular expression if
1459 the truncation value is either <emphasis>Regxp-1 (@attr 5=102)</emphasis>
1460 or <emphasis>Regxp-2 (@attr 5=103)</emphasis>.
1461 Both query types follow the same syntax with the operands:
1464 <table id="querymodel-regular-operands-table"
1465 frame="all" rowsep="1" colsep="1" align="center">
1467 <caption>Regular Expression Operands</caption>
1470 <tr><td>one</td><td>two</td></tr>
1475 <td><literal>x</literal></td>
1476 <td>Matches the character <literal>x</literal>.</td>
1479 <td><literal>.</literal></td>
1480 <td>Matches any character.</td>
1483 <td><literal>[ .. ]</literal></td>
1484 <td>Matches the set of characters specified;
1485 such as <literal>[abc]</literal> or <literal>[a-c]</literal>.</td>
1491 The above operands can be combined with the following operators:
1494 <table id="querymodel-regular-operators-table"
1495 frame="all" rowsep="1" colsep="1" align="center">
1496 <caption>Regular Expression Operators</caption>
1499 <tr><td>one</td><td>two</td></tr>
1504 <td><literal>x*</literal></td>
1505 <td>Matches <literal>x</literal> zero or more times.
1506 Priority: high.</td>
1509 <td><literal>x+</literal></td>
1510 <td>Matches <literal>x</literal> one or more times.
1511 Priority: high.</td>
1514 <td><literal>x?</literal></td>
1515 <td> Matches <literal>x</literal> zero or once.
1516 Priority: high.</td>
1519 <td><literal>xy</literal></td>
1520 <td> Matches <literal>x</literal>, then <literal>y</literal>.
1521 Priority: medium.</td>
1524 <td><literal>x|y</literal></td>
1525 <td> Matches either <literal>x</literal> or <literal>y</literal>.
1529 <td><literal>( )</literal></td>
1530 <td>The order of evaluation may be changed by using parentheses.</td>
1536 If the first character of the <literal>Regxp-2</literal> query
1537 is a plus character (<literal>+</literal>) it marks the
1538 beginning of a section with non-standard specifiers.
1539 The next plus character marks the end of the section.
1540 Currently Zebra only supports one specifier, the error tolerance,
1541 which consists one digit.
1545 Since the plus operator is normally a suffix operator the addition to
1546 the query syntax doesn't violate the syntax for standard regular
1551 For example, a phrase search with regular expressions in
1552 the title-register is performed like this:
1554 Z> find @attr 1=4 @attr 5=102 "informat.* retrieval"
1559 Combinations with other attributes are possible. For example, a
1560 ranked search with a regular expression:
1562 Z> find @attr 1=4 @attr 5=102 @attr 2=102 "informat.* retrieval"
1570 The RecordType parameter in the <literal>zebra.cfg</literal> file, or
1571 the <literal>-t</literal> option to the indexer tells Zebra how to
1572 process input records.
1573 Two basic types of processing are available - raw text and structured
1574 data. Raw text is just that, and it is selected by providing the
1575 argument <literal>text</literal> to Zebra. Structured records are
1576 all handled internally using the basic mechanisms described in the
1577 subsequent sections.
1578 Zebra can read structured records in many different formats.
1584 <sect1 id="querymodel-cql-to-pqf">
1585 <title>Server Side CQL to PQF Query Translation</title>
1588 <literal><cql2rpn>l2rpn.txt</cql2rpn></literal>
1589 YAZ Frontend Virtual
1590 Hosts option, one can configure
1591 the YAZ Frontend CQL-to-PQF
1592 converter, specifying the interpretation of various
1593 <ulink url="&url.cql;">CQL</ulink>
1594 indexes, relations, etc. in terms of Type-1 query attributes.
1595 <!-- The yaz-client config file -->
1598 For example, using server-side CQL-to-PQF conversion, one might
1599 query a zebra server like this:
1602 yaz-client localhost:9999
1604 Z> find text=(plant and soil)
1607 and - if properly configured - even static relevance ranking can
1608 be performed using CQL query syntax:
1611 Z> find text = /relevant (plant and soil)
1617 By the way, the same configuration can be used to
1618 search using client-side CQL-to-PQF conversion:
1619 (the only difference is <literal>querytype cql2rpn</literal>
1621 <literal>querytype cql</literal>, and the call specifying a local
1625 yaz-client -q local/cql2pqf.txt localhost:9999
1626 Z> querytype cql2rpn
1627 Z> find text=(plant and soil)
1633 Exhaustive information can be found in the
1634 Section "Specification of CQL to RPN mappings" in the YAZ manual.
1635 <ulink url="http://www.indexdata.dk/yaz/doc/tools.tkl#tools.cql.map">
1636 http://www.indexdata.dk/yaz/doc/tools.tkl#tools.cql.map</ulink>,
1637 and shall therefore not be repeated here.
1642 <ulink url="http://www.loc.gov/z3950/agency/zing/cql/dc-indexes.html">
1643 http://www.loc.gov/z3950/agency/zing/cql/dc-indexes.html</ulink>
1644 for the Maintenance Agency's work-in-progress mapping of Dublin Core
1645 indexes to Attribute Architecture (util, XD and BIB-2)
1655 <!-- Keep this comment at the end of the file
1660 sgml-minimize-attributes:nil
1661 sgml-always-quote-attributes:t
1664 sgml-parent-document: "zebra.xml"
1665 sgml-local-catalogs: nil
1666 sgml-namecase-general:t