1 <chapter id="querymodel">
2 <!-- $Id: querymodel.xml,v 1.9 2006-06-20 14:20:50 marc Exp $ -->
3 <title>Query Model</title>
5 <sect1 id="querymodel-overview">
6 <title>Query Model Overview</title>
9 <sect2 id="querymodel-query-languages">
10 <title>Query Languages</title>
13 Zebra is born as a networking Information Retrieval engine adhering
14 to the international standards
15 <ulink url="&url.z39.50;">Z39.50</ulink> and
16 <ulink url="&url.sru;">SRU</ulink>,
18 <literal>type-1 Reverse Polish Notation (RPN)</literal> query
20 Unfortunately, this model has only defined a binary
21 encoded representation, which is used as transport packaging in
22 the Z39.50 protocol layer. This representation is not human
23 readable, nor defines any convenient way to specify queries.
26 Since the <literal>type-1 (RPN)</literal>
27 query structure has no direct, useful string
28 representation, every origin application needs to provide some
29 form of mapping from a local query notation or representation to it.
33 <sect3 id="querymodel-query-languages-pqf">
34 <title>Prefix Query Format (PQF)</title>
37 Index Data has defined a textual representaion in the
38 <literal>Prefix Query Format</literal>, short
39 <literal>PQF</literal>, which mappes
40 <literal>one-to-one</literal> to binary encoded
41 <literal>type-1 RPN</literal> query packages.
42 It has been adopted by other
43 parties developing Z39.50 software, and is often referred to as
44 <literal>Prefix Query Notation</literal>, or in short
45 <literal>PQN</literal>. See
46 <xref linkend="querymodel-pqf"/> for further explanaitions and
47 descriptions of Zebra's capabilities.
51 <sect3 id="querymodel-query-languages-cql">
52 <title>Common Query Language (CQL)</title>
54 The query model of the <literal>type-1 RPN</literal>,
55 expressed in <literal>PQF/PQN</literal> is natively supported.
56 On the other hand, the default <literal>SRU</literal>
57 webservices <literal>Common Query Language</literal>
58 <ulink url="&url.cql;">CQL</ulink> is not natively supported.
61 Zebra can be configured to understand and map CQL to PQF. See
62 <xref linkend="querymodel-cql-to-pqf"/>.
68 <sect2 id="querymodel-operation-types">
69 <title>Operation types</title>
71 Zebra supports all of the three different
72 <literal>Z39.50/SRU</literal> operations defined in the
73 standards: <literal>explain</literal>, <literal>search</literal>,
74 and <literal>scan</literal>. A short description of the
75 functionality and purpose of each is quite in order here.
78 <sect3 id="querymodel-operation-type-explain">
79 <title>Explain Operation</title>
81 The <emphasis>syntax</emphasis> of Z39.50/SRU queries is
82 well known to any client, but the specific
83 <emphasis>semantics</emphasis> - taking into account a
84 particular servers functionalities and abilities - must be
85 discovered from case to case. Enters the
86 <literal>explain</literal> operation, which provides the means
88 <emphasis>fields</emphasis> (also called
89 <emphasis>indexes</emphasis> or <emphasis>access points</emphasis>
90 are provided, which default parameter the server uses, which
91 retrieve document formats are defined, and which specific parts
92 of the general query model are supported.
95 The Z39.50 embeddes the <literal>explain</literal> operation
97 <literal>search</literal> in the magic
98 <literal>IR-Explain-1</literal> database;
99 see <xref linkend="querymodel-exp1"/>.
102 In SRU, <literal>explain</literal> is an entirely seperate
103 operation, which returns an <literal>Zeerex
104 XML</literal> record according to the
105 structure defined by the protocol.
108 In both cases, the information gathered through
109 <literal>explain</literal> operations can be used to
110 auto-configure a client user interface to the servers
115 <sect3 id="querymodel-operation-type-search">
116 <title>Search Operation</title>
118 Search and retrieve interactions are the raison d'ĂȘtre.
119 They are used to query the remote database and
120 return search result documents. Search queries span from
121 simple free text searches to nested complex boolean queries,
122 targeting specific indexes, and possibly enhanced with many
123 query semantic specifications. Search interactions are the heart
124 and soul of Z39.50/SRU servers.
128 <sect3 id="querymodel-operation-type-scan">
129 <title>Scan Operation</title>
131 The <literal>scan</literal> operation is a helper functionality,
132 which operates on one index or access point a time.
136 the means to investigate the content of specific indexes.
137 Scanning an index returns a handfull of terms actually fond in
138 the indexes, and in addition the <literal>scan</literal>
139 operation returns th enumber of documents indexed by each term.
140 A search client can use this information to propose proper
141 spelling of search terms, to auto-fill search boxes, or to
142 display controlled vocabularies.
151 <sect1 id="querymodel-pqf">
152 <title>Prefix Query Format structure and syntax</title>
154 The <ulink url="&url.yaz.pqf;">PQF grammer</ulink>
155 is documented in the YAZ manual, and shall not be
156 repeated here. This textual PQF representation
157 is always during search mapped to the equivalent Zebra internal
161 <sect2 id="querymodel-pqf-tree">
162 <title>PQF tree structure</title>
164 The PQF parse tree - or the equivalent textual representation -
165 may start with one specification of the
166 <emphasis>attribute set</emphasis> used. Following is a query
168 consists of <emphasis>atomic query parts (APT)</emphasis> or
169 <emphasis>named result sets</emphasis>, eventually
170 paired by <emphasis>boolean binary operators</emphasis>, and
171 finally <emphasis>recursively combined </emphasis> into
175 <sect3 id="querymodel-attribute-sets">
176 <title>Attribute sets</title>
178 Attribute sets define the exact meaning and semantics of queries
179 issued. Zebra comes with some predefined attribute set
180 definitions, others can easily be defined and added to the
185 <table id="querymodel-attribute-sets-table"
186 frame="all" rowsep="1" colsep="1" align="center">
188 <caption>Attribute sets predefined in Zebra</caption>
192 <td>Attribute set</td>
201 <td><literal>Explain</literal> attribute set</td>
202 <td><literal>exp-1</literal></td>
203 <td>Special attribute set used on the special automagic
204 <literal>IR-Explain-1</literal> database to gain information on
205 server capabilities, database names, and database
210 <td><literal>Bib1</literal> attribute set</td>
211 <td><literal>bib-1</literal></td>
212 <td>Standard PQF query language attribute set which defines the
213 semantics of Z39.50 searching. In addition, all of the
214 non-use attributes (type 2-9) define the hard-wired
220 <td><literal>GILS</literal> attribute set</td>
221 <td><literal>gils</literal></td>
222 <td>Extention to the <literal>Bib1</literal> attribute set.</td>
227 <td><literal>IDXPATH</literal> attribute set</td>
228 <td><literal>idxpath</literal></td>
229 <td>Hardwired XPATH like attribute set, only available for
230 indexing with the GRS record model</td>
239 The use attributes (type 1) of the predefined attribute sets can
240 be reconfigured by tweaking the files
241 <filename>tab/*.att</filename>.
242 New attribute sets can be defined by adding similar files in the
243 configuration path of the server.
247 The Zebra internal query processing is modeled after
248 the <literal>Bib1</literal> attribute set, and the non-use
249 attributes type 2-6 are hard-wired in. It is therefore essential
250 to be familiar with <xref linkend="querymodel-bib1-nonuse"/>.
254 <sect3 id="querymodel-boolean-operators">
255 <title>Boolean operators</title>
257 A pair of subquery trees, or of atomic queries, is combined
258 using the standard boolean operators into new query trees.
261 <table id="querymodel-boolean-operators-table"
262 frame="all" rowsep="1" colsep="1" align="center">
264 <caption>Boolean operators</caption>
267 <tr><td>one</td><td>two</td></tr>
271 <tr><td><literal>@and</literal></td>
272 <td>binary <literal>AND</literal> operator</td>
273 <td>Set intersection of two atomic queries hit sets</td>
275 <tr><td><literal>@or</literal></td>
276 <td>binary <literal>OR</literal> operator</td>
277 <td>Set union of two atomic queries hit sets</td>
279 <tr><td><literal>@not</literal></td>
280 <td>binary <literal>AND NOT</literal> operator</td>
281 <td>Set complement of two atomic queries hit sets</td>
283 <tr><td><literal>@prox</literal></td>
284 <td>binary <literal>PROXIMY</literal> operator</td>
285 <td>Set intersection of two atomic queries hit sets. In
286 addition, the intersection set is purged for all
287 documents which do not satisfy the requested query
288 term proximity. Usually a proper subset of the AND
295 For example, we can combine the terms
296 <emphasis>information</emphasis> and <emphasis>retrieval</emphasis>
297 into different searches in the default index of the default
298 attribute set as follows.
299 Querying for the union of all documents containing the
300 terms <emphasis>information</emphasis> OR
301 <emphasis>retrieval</emphasis>:
303 Z> find @or information retrieval
307 Querying for the intersection of all documents containing the
308 terms <emphasis>information</emphasis> AND
309 <emphasis>retrieval</emphasis>:
310 The hit set is a subset of the coresponding
313 Z> find @and information retrieval
317 Querying for the intersection of all documents containing the
318 terms <emphasis>information</emphasis> AND
319 <emphasis>retrieval</emphasis>, taking proximity into account:
320 The hit set is a subset of the coresponding
323 Z> find @prox information retrieval
327 Querying for the intersection of all documents containing the
328 terms <emphasis>information</emphasis> AND
329 <emphasis>retrieval</emphasis>, in the same order and near each
330 other as described in the term list
331 The hit set is a subset of the coresponding
334 Z> find "information retrieval"
340 <sect3 id="querymodel-atomic-queries">
341 <title>Atomic queries (APT)</title>
343 Atomic queries are the query parts which work on one acess point
344 only. These consist of <literal>an attribute list</literal>
345 followed by a <literal>single term</literal> or a
346 <literal>quoted term list</literal>, and are often called
347 <emphasis>Attributes-Plus-Terms (APT)</emphasis> queries.
350 Unsupplied non-use attributes type 2-9 are either inherited from
351 higher nodes in the query tree, or are set to Zebra's default values.
352 See <xref linkend="querymodel-bib1"/> for details.
355 <table id="querymodel-atomic-queries-table"
356 frame="all" rowsep="1" colsep="1" align="center">
358 <caption>Atomic queries</caption>
361 <tr><td>one</td><td>two</td></tr>
365 <tr><td><emphasis>attribute list</emphasis></td>
366 <td>List of <literal>orthogonal</literal> attributes</td>
367 <td>Any of the orthogonal attribute types may be omitted,
368 these are inherited from higher query tree nodes, or if not
369 inherited, are set to the default Zebra configuration values.
372 <tr><td><emphasis>term</emphasis></td>
373 <td>single <literal>term</literal>
374 or <literal>quoted term list</literal> </td>
375 <td>Here the search terms or list of search terms is added
381 Querying for the term <emphasis>information</emphasis> in the
382 default index using the default attribite set, the server choice
383 of access point/index, and the default non-use attributes.
385 Z> find "information"
389 Equivalent query fully specified including all default values:
391 Z> find @attrset bib-1 @attr 1=1017 @attr 2=3 @attr 3=3 @attr 4=1 @attr 5=100 @attr 6=1 "information"
396 Finding all documents which have empty titles. Notice that the
397 empty term must be quoted, but is otherwise legal.
406 <sect3 id="querymodel-resultset">
407 <title>Named Result Sets</title>
409 Named result sets are supported in Zebra, and result sets can be
410 used as operands without limitations.
413 After the execution of a search, the result set is available at
414 the server, such that the client can use it for subsequent
415 searches or retrieval requests. The Z30.50 standard actually
416 stresses the fact that result sets are voliatile. It may cease
417 to exist at any time point after search, and the server will
418 send a diagnostic to the effect that the requested
419 result set does not exist any more.
423 Defining a named result set and re-using it in the next query,
424 using <literal>yaz-client</literal>.
426 Z> f @attr 1=4 mozart
428 Number of hits: 43, setno 1
430 Z> f @and @set 1 @attr 1=4 amadeus
432 Number of hits: 14, setno 2
434 Z> f @attr 1=1016 beethoven
436 Number of hits: 26, setno 3
442 Named result sets are only supported by the Z39.50 protocol.
443 The SRU web service is stateless, and therefore the notion of
444 named result sets does not exist when acessing a Zebra server by
450 <sect3 id="querymodel-use-string">
451 <title>Zebra's special use attribute type 1 of form 'string'</title>
453 The numeric <literal>use (type 1)</literal> attribute is usually
454 refered to from a given
455 attribute set. In addition, Zebra let you use
456 <emphasis>any internal index
457 name defined in your configuration</emphasis>
458 as use atribute value. This is a great feature for
459 debugging, and when you do
460 not need the complecity of defined use attribute values. It is
461 the preferred way of accessing Zebra indexes directly.
464 Finding all documents which have the term list "information
465 retrieval" in an Zebra index, using it's internal full string name.
467 Z> find @attr 1=sometext "information retrieval"
471 Searching the bib-1 use attribute 54 using it's string name:
473 Z> find @attr 1=Code-language eng
477 Searching in any silly string index - if it's defined in your
478 indexation rules and can be parsed by the PQF parser.
479 This is definitely not the recommended use of
480 this facility, as it might confuse your users with some very
483 Z> find @attr 1=silly/xpath/alike[@index]/name "information retrieval"
487 See <xref linkend="querymodel-bib1-mapping"/> for details, and
488 <xref linkend="server-sru"/>
489 for the SRU PQF query extention using string names as a fast
494 <sect3 id="querymodel-use-xpath">
495 <title>Zebra's special use attribute type 1 of form 'XPath'
496 for GRS filters</title>
498 As we have seen above, it is possible (albeit seldom a great
500 <ulink url="http://www.w3.org/TR/xpath">XPath 1.0</ulink> based
501 search by defining <literal>use (type 1)</literal>
502 <emphasis>string</emphasis> attributes which in appearence
503 <emphasis>resemble XPath queries</emphasis>. There are two
504 problems with this approach: first, the XPath-look-alike has to
505 be defined at indexation time, no new undefined
506 XPath queries can entered at search time, and second, it might
507 confuse users very much that an XPath-alike index name in fact
508 gets populated from a possible entirely different XML element
509 than it pretends to acess.
512 When using the <literal>GRS Record Model</literal>
513 (see <xref linkend="record-model-grs"/>), we have the
514 possibility to embed <emphasis>life</emphasis>
516 in the PQF queries, which are here called
517 <literal>use (type 1)</literal> <emphasis>xpath</emphasis>
518 attributes. You must enable the
519 <literal>xpath enable</literal> directive in your
520 <literal>.abs</literal> config files.
523 Only a <emphasis>very</emphasis> restricted subset of the
524 <ulink url="http://www.w3.org/TR/xpath">XPath 1.0</ulink>
525 standard is supported as the GRS record model is simpler than
526 a full XML DOM structure. See the following examples for
530 Finding all documents which have the term "content"
531 inside a text node found in a specific XML DOM
532 <emphasis>subtree</emphasis>, whose starting element is
535 Z> find @attr 1=/root content
536 Z> find @attr 1=/root/first content
538 <emphasis>Notice that the
539 XPath must be absolute, i.e., must start with '/', and that the
540 XPath <literal>decendant-or-self</literal> axis followed by a
541 text node selection <literal>text()</literal> is implicitly
542 appended to the stated XPath.
544 It follows that the above searches are interpreted as:
546 Z> find @attr 1=/root//text() content
547 Z> find @attr 1=/root/first//text() content
552 Filter the adressing XPath by a predicate working on exact
554 attributes (in the XML sense) can be done: return all those docs which
555 have the term "english" contained in one of all text subnodes of
556 the subtree defined by the XPath
557 <literal>/record/title[@lang='en']</literal>
559 Z> find @attr 1=/record/title[@lang='en'] english
564 Combining numeric indexes, boolean expressions,
565 and xpath based searches is possible:
567 Z> find @attr 1=/record/title @and foo bar
568 Z> find @and @attr 1=/record/title foo @attr 1=4 bar
572 Escaping PQF keywords and other non-parseable XPath constructs
573 with <literal>'{ }'</literal> to prevent syntax errors:
575 Z> find @attr {1=/root/first[@attr='danish']} content
576 Z> find @attr {1=/root/second[@attr='danish lake']}
577 Z> find @attr {1=/root/third[@attr='dansk s\xc3\xb8']}
581 It is worth mentioning that these dynamic performed XPath
582 queries are a performance bottelneck, as no optimized
583 specialized indexes can be used. Therefore, avoid the use of
584 this facility when speed is essential, and the database content
585 size is medium to large.
590 Shall I document the special 'ixpath' attribute set ?? Marc
593 Search for all documents with specific path.
595 For path /c1/c2/.../cn use @attr idxpath 1=1 @attr 4=3 cn/cn-1/../c1/
596 Specifically for /c, use @attr idxpath 1=1 @attr 4=3 c/
598 Search for CDATA in elememts
600 @attr idxpath 1=1016 text
602 Search for CDATA in attributes
604 @attr idxpath 1=1015 text
606 Search for all documents with given attribute type
608 @attr idxpath 1=3 @attr 4=3 type
615 <sect2 id="querymodel-exp1">
616 <title>Explain Attribute Set</title>
618 The Z39.50 standard defines the
619 <ulink url="&url.z39.50.explain;">Explain</ulink>attribute set
620 <literal>exp-1</literal>, which is used to discover information
621 about a server's search semantics and functional capabilities
622 Zebra exposes a "classic"
623 Explain database by base name <literal>IR-Explain-1</literal>, which
624 is populated with system internal information.
627 The attribute-set <literal>exp-1</literal> consists of a single
628 <literal>Use (type 1)</literal> attribute.
631 In addition, the non-Use
632 <literal>bib-1</literal> attributes, that is, the types
633 <literal>Relation</literal>, <literal>Position</literal>,
634 <literal>Structure</literal>, <literal>Truncation</literal>,
635 and <literal>Completeness</literal> are imported from
636 the <literal>bib-1</literal> attribute set, and may be used
637 within any explain query.
640 <sect3 id="querymodel-exp1-use">
641 <title>Use Attributes (type = 1)</title>
643 The following Explain search atributes are supported:
644 <literal>ExplainCategory</literal> (@attr 1=1),
645 <literal>DatabaseName</literal> (@attr 1=3),
646 <literal>DateAdded</literal> (@attr 1=9),
647 <literal>DateChanged</literal>(@attr 1=10).
650 A search in the use attribute <literal>ExplainCategory</literal>
651 supports only these predefined values:
652 <literal>CategoryList</literal>, <literal>TargetInfo</literal>,
653 <literal>DatabaseInfo</literal>, <literal>AttributeDetails</literal>.
656 See <filename>tab/explain.att</filename> and the
657 <ulink url="&url.z39.50;">Z39.50</ulink> standard
658 for more information.
663 <title>Explain searches with yaz-client</title>
665 Classic Explain only defines retrieval of Explain information
666 via ASN.1. Pratically no Z39.50 clients supports this. Fortunately
667 they don't have to - Zebra allows retrieval of this information
669 <literal>SUTRS</literal>, <literal>XML</literal>,
670 <literal>GRS-1</literal> and <literal>ASN.1</literal> Explain.
674 List supported categories to find out which explain commands are
678 Z> find @attr exp1 1=1 categorylist
685 Get target info, that is, investigate which databases exist at
686 this server endpoint:
689 Z> find @attr exp1 1=1 targetinfo
700 List all supported databases, the number of hits
701 is the number of databases found, which most commonly are the
703 the <literal>Default</literal> and the
704 <literal>IR-Explain-1</literal> databases.
707 Z> find @attr exp1 1=1 databaseinfo
714 Get database info record for database <literal>Default</literal>.
717 Z> find @and @attr exp1 1=1 databaseinfo @attr exp1 1=3 Default
719 Identical query with explicitly specified attribute set:
722 Z> find @attrset exp1 @and @attr 1=1 databaseinfo @attr 1=3 Default
727 Get attribute details record for database
728 <literal>Default</literal>.
729 This query is very useful to study the internal Zebra indexes.
730 If records have been indexed using the <literal>alvis</literal>
731 XSLT filter, the string representation names of the known indexes can be
735 Z> find @and @attr exp1 1=1 attributedetails @attr exp1 1=3 Default
737 Identical query with explicitly specified attribute set:
740 Z> find @attrset exp1 @and @attr 1=1 attributedetails @attr 1=3 Default
747 <sect2 id="querymodel-bib1">
748 <title>Bib1 Attribute Set</title>
750 Most of the information contained in this section is an excerpt of
751 the <literal>ATTRIBUTE SET BIB-1 (Z39.50-1995)
753 found at <ulink url="&url.z39.50.attset.bib1.1995;">. The BIB-1
754 Attribute Set Semantics</ulink> from 1995, also in an updated
755 <ulink url="&url.z39.50.attset.bib1;">Bib-1
756 Attribute Set</ulink>
757 version from 2003. Index Data is not the copyright holder of this
758 information, except for the configuration details, the listing of
759 Zebra's capabilities, and the example queries.
763 <sect3 id="querymodel-bib1-use">
764 <title>Use Attributes (type 1)</title>
767 A use attribute specifies an access point for any atomic query.
768 These acess points are highly dependent on the attribute set used
769 in the query, and are user configurable using the following
770 default configuration files:
771 <filename>tab/bib1.att</filename>,
772 <filename>tab/dan1.att</filename>,
773 <filename>tab/explain.att</filename>, and
774 <filename>tab/gils.att</filename>.
775 New attribute sets can be added by adding new
776 <filename>tab/*.att</filename> configuration files, which need to
777 be sourced in the main configuration <filename>zebra.cfg</filename>.
781 In addition, Zebra allows the acess of
782 <emphasis>internal index names</emphasis> and <emphasis>dynamic
783 XPath</emphasis> as use attributes.
784 See <xref linkend="querymodel-use-string"/> and
785 <xref linkend="querymodel-use-xpath"/> for
786 alternative acess to the Zebra internal index names and XPath queries.
790 Phrase search for <emphasis>information retrieval</emphasis> in
793 Z> find @attr 1=4 "information retrieval"
801 <sect2 id="querymodel-bib1-nonuse">
802 <title>Zebra general Bib1 Non-Use Attributes (type 2-6)</title>
804 <sect3 id="querymodel-bib1-relation">
805 <title>Relation Attributes (type 2)</title>
808 Relation attributes describe the relationship of the access
810 of the relation) to the search term as qualified by the attributes (right
811 side of the relation), e.g., Date-publication <= 1975.
814 <table id="querymodel-bib1-relation-table"
815 frame="all" rowsep="1" colsep="1" align="center">
817 <caption>Relation Attributes (type 2)</caption>
832 <td>Less than or equal</td>
842 <td>Greater or equal</td>
847 <td>Greater than</td>
872 <td>AlwaysMatches</td>
880 The relation attribute
881 <literal>relevance (102)</literal> is supported, see
882 <xref linkend="administration-ranking"/> for full information.
883 <!-- always-matches (103) not supported for all indexes -->
887 All ordering operations are based on a lexicographical ordering,
888 <emphasis>expect</emphasis> when the
889 <literal>structure attribute numeric (109)</literal> is used. In
890 this case, ordering is numerical. See
891 <xref linkend="querymodel-bib1-structure"/>.
895 Ranked search for <emphasis>information retrieval</emphasis> in
898 Z> find @attr 1=4 @attr 2=102 "information retrieval"
903 <sect3 id="querymodel-bib1-position">
904 <title>Position Attributes (type 3)</title>
907 The position attribute specifies the location of the search term
908 within the field or subfield in which it appears.
911 <table id="querymodel-bib1-position-table"
912 frame="all" rowsep="1" colsep="1" align="center">
914 <caption>Position Attributes (type 3)</caption>
924 <td>First in field </td>
929 <td>First in subfield</td>
934 <td>Any position in field</td>
942 The position attribute values <literal>first in field (1)</literal>,
943 and <literal>first in subfield(2)</literal> are unsupported.
944 Using them does not trigger an error, but silent defaults to
945 <literal>any position in field (3)</literal>.
950 <sect3 id="querymodel-bib1-structure">
951 <title>Structure Attributes (type 4)</title>
954 The structure attribute specifies the type of search
955 term. This causes the search to be mapped on
956 different Zebra internal indexes, which must have been defined
961 The possible values of the
962 <literal>structure attribute (type 4)</literal> can be defined
963 using the configuraiton file <filename>
964 tab/default.idx</filename>.
965 The default configuration is summerized in this table.
968 <table id="querymodel-bib1-structure-table"
969 frame="all" rowsep="1" colsep="1" align="center">
971 <caption>Structure Attributes (type 4)</caption>
1001 <td>Date (normalized)</td>
1011 <td>Date (un-normalized)</td>
1013 <td>unsupported</td>
1016 <td>Name (normalized) </td>
1018 <td>unsupported</td>
1021 <td>Name (un-normalized) </td>
1023 <td>unsupported</td>
1028 <td>unsupported</td>
1036 <td>Free-form-text</td>
1041 <td>Document-text</td>
1046 <td>Local-number</td>
1053 <td>unsupported</td>
1056 <td>Numeric string</td>
1065 The structure attribute value <literal>local-number
1067 is supported, and maps always to the Zebra internal document ID.
1072 the GILS schema (<literal>gils.abs</literal>), the
1073 west-bounding-coordinate is indexed as type <literal>n</literal>,
1074 and is therefore searched by specifying
1075 <emphasis>structure</emphasis>=<emphasis>Numeric String</emphasis>.
1076 To match all those records with west-bounding-coordinate greater
1077 than -114 we use the following query:
1079 Z> find @attr 4=109 @attr 2=5 @attr gils 1=2038 -114
1083 <sect3 id="querymodel-bib1-truncation">
1084 <title>Truncation Attributes (type = 5)</title>
1087 The truncation attribute specifies whether variations of one or
1088 more characters are allowed between serch term and hit terms, or
1089 not. Using non-default truncation attributes will broaden the
1090 document hit set of a search query.
1093 <table id="querymodel-bib1-truncation-table"
1094 frame="all" rowsep="1" colsep="1" align="center">
1096 <caption>Truncation Attributes (type 5)</caption>
1106 <td>Right truncation </td>
1111 <td>Left truncation</td>
1116 <td>Left and right truncation</td>
1121 <td>Do not truncate</td>
1126 <td>Process # in search term</td>
1144 Truncation attribute value
1145 <literal>Process # in search term (100)</literal> is a
1146 poor-man's regular expression search. It maps
1147 each <literal>#</literal> to <literal>.*</literal>, and
1148 performes then a <literal>Regexp-1 (102)</literal> regular
1152 Truncation attribute value
1153 <literal>Regexp-1 (102)</literal> is a normal regular search,
1157 Truncation attribute value
1158 <literal>Regexp-2 (103) </literal> is a Zebra specific extention
1159 which allows <emphasis>fuzzy</emphasis> matches. One single
1160 error in spelling of search terms is allowed, i.e., a document
1161 is hit if it includes a term which can be mapped to the used
1162 search term by one character substitution, addition, deletion or
1166 Special 104, 105, 106 are deprecated and will be removed! -->
1169 <sect3 id="querymodel-bib1-completeness">
1170 <title>Completeness Attributes (type = 6)</title>
1172 This attribute is ONLY used if structure w, p is to be
1173 chosen. completeness is ignorned if not w, p is to be
1175 Incomplete field(1) is the default and makes Zebra use
1177 complete subfield(2) and complete field(3) both triggers
1178 search field type p.
1184 <sect2 id="querymodel-zebra-attr-search">
1185 <title>Zebra specific Search Extentions to all Attribute Sets</title>
1187 Zebra extends the Bib1 attribute types, and these extentions are
1188 recognized regardless of attribute
1189 set used in a <literal>search</literal> operation query.
1192 <table id="querymodel-zebra-attr-search-table"
1193 frame="all" rowsep="1" colsep="1" align="center">
1195 <caption>Zebra Search Attribute Extentions</caption>
1201 <td>Zebra version</td>
1206 <td>Embedded Sort</td>
1218 <td>Rank Weight</td>
1224 <td>Approx Limit</td>
1230 <td>Term Reference</td>
1238 <sect3 id="querymodel-zebra-attr-sorting">
1239 <title>Zebra Extention Embedded Sort Attribute (type 7)</title>
1242 The embedded sort is a way to specify sort within a query - thus
1243 removing the need to send a Sort Request separately. It is both
1244 faster and does not require clients to deal with the Sort
1248 The possible values after attribute <literal>type 7</literal> are
1249 <literal>1</literal> ascending and
1250 <literal>2</literal> descending.
1251 The attributes+term (APT) node is separate from the
1252 rest and must be <literal>@or</literal>'ed.
1253 The term associated with APT is the sorting level in integers,
1254 where <literal>0</literal> means primary sort,
1255 <literal>1</literal> means secondary sort, and so forth.
1256 See also <xref linkend="administration-ranking"/>.
1259 For example, searching for water, sort by title (ascending)
1261 Z> find @or @attr 1=1016 water @attr 7=1 @attr 1=4 0
1265 Or, searching for water, sort by title ascending, then date descending
1267 Z> find @or @or @attr 1=1016 water @attr 7=1 @attr 1=4 0 @attr 7=2 @attr 1=30 1
1271 <sect3 id="querymodel-zebra-attr-estimation">
1272 <title>Zebra Extention Term Set Attribute (type 8)</title>
1275 The Term Set feature is a facility that allows a search to store
1276 hitting terms in a "pseudo" resultset; thus a search (as usual) +
1277 a scan-like facility. Requires a client that can do named result
1278 sets since the search generates two result sets. The value for
1279 attribute 8 is the name of a result set (string). The terms in
1280 the named term set are returned as SUTRS records.
1283 For example, searching for u in title, right truncated, and
1284 storing the result in term set named 'aset'
1286 Z> find @attr 5=1 @attr 1=4 @attr 8=aset u
1290 The model has one serious flaw: we don't know the size of term
1291 set. Experimental. Do not use in production code.
1294 <sect3 id="querymodel-zebra-attr-weight">
1295 <title>Zebra Extention Rank Weight Attribute (type 9)</title>
1298 Rank weight is a way to pass a value to a ranking algorithm - so
1299 that one APT has one value - while another as a different one.
1300 See also <xref linkend="administration-ranking"/>.
1303 For example, searching for utah in title with weight 30 as well
1304 as any with weight 20:
1306 Z> find @attr 2=102 @or @attr 9=30 @attr 1=4 utah @attr 9=20 utah
1310 <sect3 id="querymodel-zebra-attr-limit">
1311 <title>Zebra Extention Approximative Limit Attribute (type 9)</title>
1314 Newer Zebra versions normally estemiates hit count for every APT
1315 (leaf) in the query tree. These hit counts are returned as part of
1316 the searchResult-1 facility in the binary encoded Z39.50 search
1320 By setting a limit for the APT we can make Zebra turn into
1321 approximate hit count when a certain hit count limit is
1322 reached. A value of zero means exact hit count.
1325 For example, we might be intersted in exact hit count for a, but
1326 for b we allow hit count estimates for 1000 and higher.
1328 Z> find @and a @attr 9=1000 b
1332 The estimated hit count fascility makes searches faster, as one
1333 only needs to process large hit lists partially.
1336 This facility clashes with rank weight, because there all
1337 documents in the hit lists need to be examined for scoring and
1339 It is an experimental
1340 extention. Do not use in production code.
1343 <sect3 id="querymodel-zebra-attr-termref">
1344 <title>Zebra Extention Term Reference Attribute (type 10)</title>
1347 Zebra supports the <literal>searchResult-1</literal> facility.
1348 If the <literal>Term Reference Attribute (type 10)</literal> is
1349 given, that specifies a subqueryId value returned as part of the
1350 search result. It is a way for a client to name an APT part of a
1360 Experimental. Do not use in production code.
1367 <sect2 id="querymodel-zebra-attr-scan">
1368 <title>Zebra specific Scan Extentions to all Attribute Sets</title>
1370 Zebra extends the Bib1 attribute types, and these extentions are
1371 recognized regardless of attribute
1372 set used in a <literal>scan</literal> operation query.
1374 <table id="querymodel-zebra-attr-scan-table"
1375 frame="all" rowsep="1" colsep="1" align="center">
1377 <caption>Zebra Scan Attribute Extentions</caption>
1383 <td>Zebra version</td>
1388 <td>Result Set Narrow</td>
1394 <td>Approximative Limit</td>
1402 <sect3 id="querymodel-zebra-attr-narrow">
1403 <title>Zebra Extention Result Set Narrow (type 8)</title>
1406 If attribute <literal>Result Set Narrow (type 8)</literal>
1407 is given for <literal>scan</literal>, the value is the name of a
1408 result set. Each hit count in <literal>scan</literal> is
1409 <literal>@and</literal>'ed with the result set given.
1412 Consider for example
1413 the case of scanning all title fields around the
1414 scanterm <emphasis>mozart</emphasis>, then refining the scan by
1415 issuing a filtering query for <emphasis>amadeus</emphasis> to
1416 restric the scan to the result set of the query:
1418 Z> scan @attr 1=4 mozart
1421 mozartforskningen (1)
1425 Z> f @attr 1=4 amadeus
1427 Number of hits: 15, setno 2
1429 Z> scan @attr 1=4 @attr 8=2 mozart
1432 mozartforskningen (0)
1440 Experimental. Do not use in production code.
1443 <sect3 id="querymodel-zebra-attr-approx">
1444 <title>Zebra Extention Approximative Limit (type 9)</title>
1447 The <literal>Zebra Extention Approximative Limit (type
1448 9)</literal> is a way to enable approx
1449 hit counts for <literal>scan</literal> hit counts, in the same
1450 way as for <literal>search</literal> hit counts.
1459 Experimental and buggy. Definitely not to be used in production code.
1466 <sect2 id="querymodel-bib1-mapping">
1467 <title>Mapping from Bib1 Attributes to Zebra internal
1468 register indexes</title>
1474 <!-- see in util/zebramap.c
1477 if (completeness_value == 2 || completeness_value == 3)
1483 *sort_flag =(sort_relation_value > 0) ? 1 : 0;
1484 *search_type = "phrase";
1485 strcpy(rank_type, "void");
1486 if (relation_value == 102)
1488 if (weight_value == -1)
1490 sprintf(rank_type, "rank,w=%d,u=%d", weight_value, use_value);
1492 if (relation_value == 103)
1494 *search_type = "always";
1502 switch (structure_value)
1504 case 6: /* word list */
1505 *search_type = "and-list";
1507 case 105: /* free-form-text */
1508 *search_type = "or-list";
1510 case 106: /* document-text */
1511 *search_type = "or-list";
1514 case 1: /* phrase */
1516 case 108: /* string */
1517 *search_type = "phrase";
1519 case 107: /* local-number */
1520 *search_type = "local";
1523 case 109: /* numeric string */
1525 *search_type = "numeric";
1529 *search_type = "phrase";
1533 *search_type = "phrase";
1537 *search_type = "phrase";
1541 *search_type = "phrase";
1552 <emphasis>Use</emphasis> attributes are interpreted according to the
1553 attribute sets which have been loaded in the
1554 <literal>zebra.cfg</literal> file, and are matched against specific
1555 fields as specified in the <literal>.abs</literal> file which
1556 describes the profile of the records which have been loaded.
1557 If no Use attribute is provided, a default of Bib-1 Any is assumed.
1561 If a <emphasis>Structure</emphasis> attribute of
1562 <emphasis>Phrase</emphasis> is used in conjunction with a
1563 <emphasis>Completeness</emphasis> attribute of
1564 <emphasis>Complete (Sub)field</emphasis>, the term is matched
1565 against the contents of the phrase (long word) register, if one
1566 exists for the given <emphasis>Use</emphasis> attribute.
1567 A phrase register is created for those fields in the
1568 <literal>.abs</literal> file that contains a
1569 <literal>p</literal>-specifier.
1570 <!-- ### whatever the hell _that_ is -->
1574 If <emphasis>Structure</emphasis>=<emphasis>Phrase</emphasis> is
1575 used in conjunction with <emphasis>Incomplete Field</emphasis> - the
1576 default value for <emphasis>Completeness</emphasis>, the
1577 search is directed against the normal word registers, but if the term
1578 contains multiple words, the term will only match if all of the words
1579 are found immediately adjacent, and in the given order.
1580 The word search is performed on those fields that are indexed as
1581 type <literal>w</literal> in the <literal>.abs</literal> file.
1585 If the <emphasis>Structure</emphasis> attribute is
1586 <emphasis>Word List</emphasis>,
1587 <emphasis>Free-form Text</emphasis>, or
1588 <emphasis>Document Text</emphasis>, the term is treated as a
1589 natural-language, relevance-ranked query.
1590 This search type uses the word register, i.e. those fields
1591 that are indexed as type <literal>w</literal> in the
1592 <literal>.abs</literal> file.
1596 If the <emphasis>Structure</emphasis> attribute is
1597 <emphasis>Numeric String</emphasis> the term is treated as an integer.
1598 The search is performed on those fields that are indexed
1599 as type <literal>n</literal> in the <literal>.abs</literal> file.
1603 If the <emphasis>Structure</emphasis> attribute is
1604 <emphasis>URx</emphasis> the term is treated as a URX (URL) entity.
1605 The search is performed on those fields that are indexed as type
1606 <literal>u</literal> in the <literal>.abs</literal> file.
1610 If the <emphasis>Structure</emphasis> attribute is
1611 <emphasis>Local Number</emphasis> the term is treated as
1612 native Zebra Record Identifier.
1616 If the <emphasis>Relation</emphasis> attribute is
1617 <emphasis>Equals</emphasis> (default), the term is matched
1618 in a normal fashion (modulo truncation and processing of
1619 individual words, if required).
1620 If <emphasis>Relation</emphasis> is <emphasis>Less Than</emphasis>,
1621 <emphasis>Less Than or Equal</emphasis>,
1622 <emphasis>Greater than</emphasis>, or <emphasis>Greater than or
1623 Equal</emphasis>, the term is assumed to be numerical, and a
1624 standard regular expression is constructed to match the given
1626 If <emphasis>Relation</emphasis> is <emphasis>Relevance</emphasis>,
1627 the standard natural-language query processor is invoked.
1631 For the <emphasis>Truncation</emphasis> attribute,
1632 <emphasis>No Truncation</emphasis> is the default.
1633 <emphasis>Left Truncation</emphasis> is not supported.
1634 <emphasis>Process # in search term</emphasis> is supported, as is
1635 <emphasis>Regxp-1</emphasis>.
1636 <emphasis>Regxp-2</emphasis> enables the fault-tolerant (fuzzy)
1637 search. As a default, a single error (deletion, insertion,
1638 replacement) is accepted when terms are matched against the register
1643 <sect2 id="querymodel-regular">
1644 <title>Zebra Regular Expressions in Truncation Attribute (type = 5)</title>
1647 Each term in a query is interpreted as a regular expression if
1648 the truncation value is either <emphasis>Regxp-1 (@attr 5=102)</emphasis>
1649 or <emphasis>Regxp-2 (@attr 5=103)</emphasis>.
1650 Both query types follow the same syntax with the operands:
1653 <table id="querymodel-regular-operands-table"
1654 frame="all" rowsep="1" colsep="1" align="center">
1656 <caption>Regular Expression Operands</caption>
1659 <tr><td>one</td><td>two</td></tr>
1664 <td><literal>x</literal></td>
1665 <td>Matches the character <literal>x</literal>.</td>
1668 <td><literal>.</literal></td>
1669 <td>Matches any character.</td>
1672 <td><literal>[ .. ]</literal></td>
1673 <td>Matches the set of characters specified;
1674 such as <literal>[abc]</literal> or <literal>[a-c]</literal>.</td>
1680 The above operands can be combined with the following operators:
1683 <table id="querymodel-regular-operators-table"
1684 frame="all" rowsep="1" colsep="1" align="center">
1685 <caption>Regular Expression Operators</caption>
1688 <tr><td>one</td><td>two</td></tr>
1693 <td><literal>x*</literal></td>
1694 <td>Matches <literal>x</literal> zero or more times.
1695 Priority: high.</td>
1698 <td><literal>x+</literal></td>
1699 <td>Matches <literal>x</literal> one or more times.
1700 Priority: high.</td>
1703 <td><literal>x?</literal></td>
1704 <td> Matches <literal>x</literal> zero or once.
1705 Priority: high.</td>
1708 <td><literal>xy</literal></td>
1709 <td> Matches <literal>x</literal>, then <literal>y</literal>.
1710 Priority: medium.</td>
1713 <td><literal>x|y</literal></td>
1714 <td> Matches either <literal>x</literal> or <literal>y</literal>.
1718 <td><literal>( )</literal></td>
1719 <td>The order of evaluation may be changed by using parentheses.</td>
1725 If the first character of the <literal>Regxp-2</literal> query
1726 is a plus character (<literal>+</literal>) it marks the
1727 beginning of a section with non-standard specifiers.
1728 The next plus character marks the end of the section.
1729 Currently Zebra only supports one specifier, the error tolerance,
1730 which consists one digit.
1734 Since the plus operator is normally a suffix operator the addition to
1735 the query syntax doesn't violate the syntax for standard regular
1740 For example, a phrase search with regular expressions in
1741 the title-register is performed like this:
1743 Z> find @attr 1=4 @attr 5=102 "informat.* retrieval"
1748 Combinations with other attributes are possible. For example, a
1749 ranked search with a regular expression:
1751 Z> find @attr 1=4 @attr 5=102 @attr 2=102 "informat.* retrieval"
1759 The RecordType parameter in the <literal>zebra.cfg</literal> file, or
1760 the <literal>-t</literal> option to the indexer tells Zebra how to
1761 process input records.
1762 Two basic types of processing are available - raw text and structured
1763 data. Raw text is just that, and it is selected by providing the
1764 argument <literal>text</literal> to Zebra. Structured records are
1765 all handled internally using the basic mechanisms described in the
1766 subsequent sections.
1767 Zebra can read structured records in many different formats.
1773 <sect1 id="querymodel-cql-to-pqf">
1774 <title>Server Side CQL to PQF Query Translation</title>
1777 <literal><cql2rpn>l2rpn.txt</cql2rpn></literal>
1778 YAZ Frontend Virtual
1779 Hosts option, one can configure
1780 the YAZ Frontend CQL-to-PQF
1781 converter, specifying the interpretation of various
1782 <ulink url="&url.cql;">CQL</ulink>
1783 indexes, relations, etc. in terms of Type-1 query attributes.
1784 <!-- The yaz-client config file -->
1787 For example, using server-side CQL-to-PQF conversion, one might
1788 query a zebra server like this:
1791 yaz-client localhost:9999
1793 Z> find text=(plant and soil)
1796 and - if properly configured - even static relevance ranking can
1797 be performed using CQL query syntax:
1800 Z> find text = /relevant (plant and soil)
1806 By the way, the same configuration can be used to
1807 search using client-side CQL-to-PQF conversion:
1808 (the only difference is <literal>querytype cql2rpn</literal>
1810 <literal>querytype cql</literal>, and the call specifying a local
1814 yaz-client -q local/cql2pqf.txt localhost:9999
1815 Z> querytype cql2rpn
1816 Z> find text=(plant and soil)
1822 Exhaustive information can be found in the
1823 Section "Specification of CQL to RPN mappings" in the YAZ manual.
1824 <ulink url="http://www.indexdata.dk/yaz/doc/tools.tkl#tools.cql.map">
1825 http://www.indexdata.dk/yaz/doc/tools.tkl#tools.cql.map</ulink>,
1826 and shall therefore not be repeated here.
1831 <ulink url="http://www.loc.gov/z3950/agency/zing/cql/dc-indexes.html">
1832 http://www.loc.gov/z3950/agency/zing/cql/dc-indexes.html</ulink>
1833 for the Maintenance Agency's work-in-progress mapping of Dublin Core
1834 indexes to Attribute Architecture (util, XD and BIB-2)
1844 <!-- Keep this comment at the end of the file
1849 sgml-minimize-attributes:nil
1850 sgml-always-quote-attributes:t
1853 sgml-parent-document: "zebra.xml"
1854 sgml-local-catalogs: nil
1855 sgml-namecase-general:t