2 <!-- $Id: server.xml,v 1.15 2006-02-16 14:45:51 marc Exp $ -->
3 <title>The Z39.50 Server</title>
6 <title>Running the Z39.50 Server (zebrasrv)</title>
9 FIXME - We need to be consistent here, zebraidx had the options at the
10 end, and lots of explaining text before them. Same for zebrasvr! -H
11 FIXME - At least we need a small intro, what is zebrasvr, and how it
12 can be run (inetd, nt service, stand-alone program, daemon...) -H
15 <!-- re-write by MC, using the newly created input files for the
19 <sect2><title>DESCRIPTION</title>
20 <para>Zebra is a high-performance, general-purpose structured text indexing
21 and retrieval engine. It reads structured records in a variety of input
22 formats (eg. email, XML, MARC) and allows access to them through exact
23 boolean search expressions and relevance-ranked free-text queries.
26 <command>zebrasrv</command> is the Z39.50 and <ulink url="http://www.loc.gov/standards/sru/srw/">SRW</ulink>/U frontend
27 server for the <command>Zebra</command> indexer.
30 On Unix you can run the <command>zebrasrv</command>
31 server from the command line - and put it
32 in the background. It may also operate under the inet daemon.
33 On WIN32 you can run the server as a console application or
39 <title>SYNOPSIS</title>
44 <title>OPTIONS</title>
47 The options for <command>zebrasrv</command> are the same
48 as those for YAZ' <command>yaz-ztest</command>.
49 Option <literal>-c</literal> specifies a Zebra configuration
50 file - if omitted <filename>zebra.cfg</filename> is read.
55 <sect2 id="gfs-config"><title>VIRTUAL HOSTS</title>
57 <command>zebrasrv</command> uses the YAZ server frontend and does
58 support multiple virtual servers behind multiple listening sockets.
62 <sect2><title>FILES</title>
64 <filename>zebra.cfg</filename>
67 <sect2><title>SEE ALSO</title>
70 <refentrytitle>zebraidx</refentrytitle>
71 <manvolnum>1</manvolnum>
74 <refentrytitle>yaz-ztest</refentrytitle>
75 <manvolnum>8</manvolnum>
79 Section "The Z39.50 Server" in the Zebra manual.
80 <filename>http://www.indexdata.dk/zebra/doc/server.tkl</filename>
83 Section "Virtual Hosts" in the YAZ manual.
84 <filename>http://www.indexdata.dk/yaz/doc/server.vhosts.tkl</filename>
87 Section "Specification of <ulink url="http://www.loc.gov/standards/sru/cql/">CQL</ulink> to RPN mappings" in the YAZ manual.
88 <filename>http://www.indexdata.dk/yaz/doc/tools.tkl#tools.cql.map</filename>
91 The Zebra software is Copyright <command>Index Data</command>
92 <filename>http://www.indexdata.dk</filename>
93 and distributed under the
100 <emphasis remap="bf">Syntax</emphasis>
103 zebrasrv [options] [listener-address ...]
109 <emphasis remap="bf">Options</emphasis>
113 <term>-a <replaceable>APDU file</replaceable></term>
116 Specify a file for dumping PDUs (for diagnostic purposes).
117 The special name "-" sends output to <literal>stderr</literal>.
122 <term>-c <replaceable>config-file</replaceable></term>
125 Read configuration information from
126 <replaceable>config-file</replaceable>.
127 The default configuration is <literal>./zebra.cfg</literal>.
135 Don't fork on connection requests. This can be useful for
136 symbolic-level debugging. The server can only accept a single
137 connection in this mode.
145 Use the Z39.50 protocol. Currently the only protocol supported.
146 The option is retained for historical reasons, and for future
152 <term>-l <replaceable>logfile</replaceable></term>
155 Specify an output file for the diagnostic messages.
156 The default is to write this information to <literal>stderr</literal>.
161 <term>-v <replaceable>log-level</replaceable></term>
164 The log level. Use a comma-separated list of members of the set
165 {fatal,debug,warn,log,all,none}.
170 <term>-u <replaceable>username</replaceable></term>
173 Set user ID. Sets the real UID of the server process to that of the
174 given <replaceable>username</replaceable>.
175 It's useful if you aren't comfortable with having the
176 server run as root, but you need to start it as such to bind a
182 <term>-w <replaceable>working-directory</replaceable></term>
185 Change working directory.
193 Run under the Internet superserver, <literal>inetd</literal>.
194 Make sure you use the logfile option <literal>-l</literal> in
195 conjunction with this mode and specify the <literal>-l</literal>
196 option before any other options.
201 <term>-t <replaceable>timeout</replaceable></term>
204 Set the idle session timeout (default 60 minutes).
209 <term>-k <replaceable>kilobytes</replaceable></term>
212 Set the (approximate) maximum size of
213 present response messages. Default is 1024 KB (1 MB).
223 <sect1 id="protocol-support">
224 <title>Z39.50 Protocol Support and Behavior</title>
227 <title>Initialization</title>
230 During initialization, the server will negotiate to version 3 of the
231 Z39.50 protocol, and the option bits for Search, Present, Scan,
232 NamedResultSets, and concurrentOperations will be set, if requested by
233 the client. The maximum PDU size is negotiated down to a maximum of
240 <title>Search</title>
243 FIXME - Need to explain the string tag stuff before people get bogged
244 down with all these attribute numbers. Perhaps in its own
249 The supported query type are 1 and 101. All operators are currently
250 supported with the restriction that only proximity units of type "word"
251 are supported for the proximity operator.
252 Queries can be arbitrarily complex.
253 Named result sets are supported, and result sets can be used as operands
255 Searches may span multiple databases.
259 The server has full support for piggy-backed retrieval (see
260 also the following section).
264 <emphasis>Use</emphasis> attributes are interpreted according to the
265 attribute sets which have been loaded in the
266 <literal>zebra.cfg</literal> file, and are matched against specific
267 fields as specified in the <literal>.abs</literal> file which
268 describes the profile of the records which have been loaded.
269 If no Use attribute is provided, a default of Bib-1 Any is assumed.
273 If a <emphasis>Structure</emphasis> attribute of
274 <emphasis>Phrase</emphasis> is used in conjunction with a
275 <emphasis>Completeness</emphasis> attribute of
276 <emphasis>Complete (Sub)field</emphasis>, the term is matched
277 against the contents of the phrase (long word) register, if one
278 exists for the given <emphasis>Use</emphasis> attribute.
279 A phrase register is created for those fields in the
280 <literal>.abs</literal> file that contains a
281 <literal>p</literal>-specifier.
282 <!-- ### whatever the hell _that_ is -->
286 If <emphasis>Structure</emphasis>=<emphasis>Phrase</emphasis> is
287 used in conjunction with <emphasis>Incomplete Field</emphasis> - the
288 default value for <emphasis>Completeness</emphasis>, the
289 search is directed against the normal word registers, but if the term
290 contains multiple words, the term will only match if all of the words
291 are found immediately adjacent, and in the given order.
292 The word search is performed on those fields that are indexed as
293 type <literal>w</literal> in the <literal>.abs</literal> file.
297 If the <emphasis>Structure</emphasis> attribute is
298 <emphasis>Word List</emphasis>,
299 <emphasis>Free-form Text</emphasis>, or
300 <emphasis>Document Text</emphasis>, the term is treated as a
301 natural-language, relevance-ranked query.
302 This search type uses the word register, i.e. those fields
303 that are indexed as type <literal>w</literal> in the
304 <literal>.abs</literal> file.
308 If the <emphasis>Structure</emphasis> attribute is
309 <emphasis>Numeric String</emphasis> the term is treated as an integer.
310 The search is performed on those fields that are indexed
311 as type <literal>n</literal> in the <literal>.abs</literal> file.
315 If the <emphasis>Structure</emphasis> attribute is
316 <emphasis>URx</emphasis> the term is treated as a URX (URL) entity.
317 The search is performed on those fields that are indexed as type
318 <literal>u</literal> in the <literal>.abs</literal> file.
322 If the <emphasis>Structure</emphasis> attribute is
323 <emphasis>Local Number</emphasis> the term is treated as
324 native Zebra Record Identifier.
328 If the <emphasis>Relation</emphasis> attribute is
329 <emphasis>Equals</emphasis> (default), the term is matched
330 in a normal fashion (modulo truncation and processing of
331 individual words, if required).
332 If <emphasis>Relation</emphasis> is <emphasis>Less Than</emphasis>,
333 <emphasis>Less Than or Equal</emphasis>,
334 <emphasis>Greater than</emphasis>, or <emphasis>Greater than or
335 Equal</emphasis>, the term is assumed to be numerical, and a
336 standard regular expression is constructed to match the given
338 If <emphasis>Relation</emphasis> is <emphasis>Relevance</emphasis>,
339 the standard natural-language query processor is invoked.
343 For the <emphasis>Truncation</emphasis> attribute,
344 <emphasis>No Truncation</emphasis> is the default.
345 <emphasis>Left Truncation</emphasis> is not supported.
346 <emphasis>Process # in search term</emphasis> is supported, as is
347 <emphasis>Regxp-1</emphasis>.
348 <emphasis>Regxp-2</emphasis> enables the fault-tolerant (fuzzy)
349 search. As a default, a single error (deletion, insertion,
350 replacement) is accepted when terms are matched against the register
355 <title>Regular expressions</title>
358 Each term in a query is interpreted as a regular expression if
359 the truncation value is either <emphasis>Regxp-1</emphasis> (102)
360 or <emphasis>Regxp-2</emphasis> (103).
361 Both query types follow the same syntax with the operands:
368 Matches the character <emphasis>x</emphasis>.
376 Matches any character.
381 <term><literal>[</literal>..<literal>]</literal></term>
384 Matches the set of characters specified;
385 such as <literal>[abc]</literal> or <literal>[a-c]</literal>.
397 Matches <emphasis>x</emphasis> zero or more times. Priority: high.
405 Matches <emphasis>x</emphasis> one or more times. Priority: high.
413 Matches <emphasis>x</emphasis> zero or once. Priority: high.
421 Matches <emphasis>x</emphasis>, then <emphasis>y</emphasis>.
430 Matches either <emphasis>x</emphasis> or <emphasis>y</emphasis>.
436 The order of evaluation may be changed by using parentheses.
440 If the first character of the <emphasis>Regxp-2</emphasis> query
441 is a plus character (<literal>+</literal>) it marks the
442 beginning of a section with non-standard specifiers.
443 The next plus character marks the end of the section.
444 Currently Zebra only supports one specifier, the error tolerance,
445 which consists one digit.
449 Since the plus operator is normally a suffix operator the addition to
450 the query syntax doesn't violate the syntax for standard regular
457 <title>Query examples</title>
460 Phrase search for <emphasis>information retrieval</emphasis> in
463 @attr 1=4 "information retrieval"
468 Ranked search for the same thing:
470 @attr 1=4 @attr 2=102 "Information retrieval"
475 Phrase search with a regular expression:
477 @attr 1=4 @attr 5=102 "informat.* retrieval"
482 Ranked search with a regular expression:
484 @attr 1=4 @attr 5=102 @attr 2=102 "informat.* retrieval"
489 In the GILS schema (<literal>gils.abs</literal>), the
490 west-bounding-coordinate is indexed as type <literal>n</literal>,
491 and is therefore searched by specifying
492 <emphasis>structure</emphasis>=<emphasis>Numeric String</emphasis>.
493 To match all those records with west-bounding-coordinate greater
494 than -114 we use the following query:
496 @attr 4=109 @attr 2=5 @attr gils 1=2038 -114
503 <title>Present</title>
505 The present facility is supported in a standard fashion. The requested
506 record syntax is matched against the ones supported by the profile of
507 each record retrieved. If no record syntax is given, SUTRS is the
508 default. The requested element set name, again, is matched against any
509 provided by the relevant record profiles.
515 The attribute combinations provided with the termListAndStartPoint are
516 processed in the same way as operands in a query (see above).
517 Currently, only the term and the globalOccurrences are returned with
518 the termInfo structure.
525 Z39.50 specifies three different types of sort criteria.
526 Of these Zebra supports the attribute specification type in which
527 case the use attribute specifies the "Sort register".
528 Sort registers are created for those fields that are of type "sort" in
529 the default.idx file.
530 The corresponding character mapping file in default.idx specifies the
531 ordinal of each character used in the actual sort.
535 Z39.50 allows the client to specify sorting on one or more input
536 result sets and one output result set.
537 Zebra supports sorting on one result set only which may or may not
538 be the same as the output result set.
544 If a Close PDU is received, the server will respond with a Close PDU
545 with reason=FINISHED, no matter which protocol version was negotiated
546 during initialization. If the protocol version is 3 or more, the
547 server will generate a Close PDU under certain circumstances,
548 including a session timeout (60 minutes by default), and certain kinds of
549 protocol errors. Once a Close PDU has been sent, the protocol
550 association is considered broken, and the transport connection will be
551 closed immediately upon receipt of further data, or following a short
559 <chapter id="server-sru">
560 <title>The SRU/SRW Server</title>
562 In addition to Z39.50, Zebra supports the more recent and
563 web-friendly IR protocol SRU, described at
564 <ulink url="http://www.loc.gov/sru"/>.
565 SRU is ``Search/Retrieve via URL'', a simple, REST-like protocol
566 that uses HTTP GET to request search responses. The request
567 itself is made of parameters such as
568 <literal>query</literal>,
569 <literal>startRecord</literal>,
570 <literal>maximumRecords</literal>
572 <literal>recordSchema</literal>;
573 the response is an XML document containing hit-count, result-set
574 records, diagnostics, etc. SRU can be thought of as a re-casting
575 of Z39.50 semantics in web-friendly terms; or as a standardisation
576 of the ad-hoc query parameters used by search engines such as Google
577 and AltaVista; or as a superset of A9's OpenSearch (which it
581 Zebra further supports SRW, described at
582 <ulink url="http://www.loc.gov/srw"/>.
583 SRW is the ``Search/Retrieve Web Service'', a SOAP-based alternative
584 implementation of the abstract protocol that SRU implements as HTTP
585 GET requests. In SRW, requests are encoded as XML documents which
586 are posted to the server. The responses are identical to those
587 returned by SRU servers, except that they are wrapped in a several
588 layers of SOAP envelope.
591 Zebra supports all three protocols - Z39.50, SRU and SRW - on the
592 same port, recognising what protocol is used by each incoming
593 requests and handling them accordingly. This is a achieved through
594 the use of Deep Magic; civilians are warned not to stand too close.
597 From here on, ``SRU'' is used to indicate both the SRU and SRW
598 protocols, as they are identical except for the transport used for
599 the protocol packets and Zebra's support for them is equivalent.
602 <sect1 id="server-sru-run">
603 <title>Running the SRU Server (zebrasrv)</title>
605 Because Zebra supports all three protocols on one port, it would
606 seem to follow that the SRU server is run in the same way as
607 the Z39.50 server, as described above. This is true, but only in
608 an uninterestingly vacuous way: a Zebra server run in this manner
609 will indeed recognise and accept SRU requests; but since it
610 doesn't know how to handle the CQL queries that these protocols
611 use, all it can do is send failure responses.
615 It is possible to cheat, by having SRU search Zebra with
616 a PQF query instead of CQL, using the
617 <literal>x-pquery</literal>
619 <literal>query</literal>.
621 <emphasis role="strong">non-standard extension</emphasis>
623 <emphasis role="strong">very naughty</emphasis>
624 thing to do, but it does give you a way to see Zebra serving SRU
625 ``right out of the box''. If you start your favourite Zebra
626 server in the usual way, on port 9999, then you can send your web
630 http://localhost:9999/Default?version=1.1
631 &operation=searchRetrieve
632 &x-pquery=mineral
634 &maximumRecords=1
637 This will display the XML-formatted SRU response that includes the
638 first record in the result-set found by the query
639 <literal>mineral</literal>. (For clarity, the SRU URL is shown
640 here broken across lines, but the lines should be joined to gether
641 to make single-line URL for the browser to submit.)
645 In order to turn on Zebra's support for CQL queries, it's necessary
646 to have the YAZ generic front-end (which Zebra uses) translate them
647 into the Z39.50 Type-1 query format that is used internally. And
648 to do this, the generic front-end's own configuration file must be
649 used. This file is described
650 <link linkend="gfs-config">elsewhere</link>;
651 the salient point for SRU support is that
652 <command>zebrasrv</command>
653 must be started with the
654 <literal>-f frontendConfigFile</literal>
655 option rather than the
656 <literal>-c zebraConfigFile</literal>
658 and that the front-end configuration file must include both a
659 reference to the Zebra configuration file and the CQL-to-PQF
660 translator configuration file.
663 A minimal front-end configuration file that does this would read as
669 <config>zebra.cfg</config>
670 <cql2rpn>../../tab/pqf.properties</cql2rpn>
676 <literal><config></literal>
677 element contains the name of the Zebra configuration file that was
678 previously specified by the
679 <literal>-c</literal>
680 command-line argument, and the
681 <literal><cql2rpn></literal>
682 element contains the name of the CQL properties file specifying how
683 various CQL indexes, relations, etc. are translated into Type-1
687 A zebra server running with such a configuration can then be
688 queried using proper, conformant SRU URLs with CQL queries:
691 http://localhost:9999/Default?version=1.1
692 &operation=searchRetrieve
693 &query=title=utah and description=epicent*
695 &maximumRecords=1
699 <sect1 id="server-sru-support">
700 <title>SRU and SRW Protocol Support and Behavior</title>
702 Zebra running as an SRU server supports SRU version 1.1, including
703 CQL version 1.1. In particular, it provides support for the
704 following elements of the protocol.
708 <title>Search and Retrieval</title>
710 Zebra fully supports SRU's core
711 <literal>searchRetrieve</literal>
712 operation, as described at
713 <ulink url="http://www.loc.gov/standards/sru/sru-spec.html"/>
716 One of the great strengths of SRU is that it mandates a standard
717 query language, CQL, and that all conforming implementations can
718 therefore be trusted to correctly interpret the same queries. It
719 is with some shame, then, that we admit that Zebra also supports
720 an additional query language, our own Prefix Query Format (PQF,
721 <ulink url="http://indexdata.com/yaz/doc/tools.tkl#PQF"/>).
722 A PQF query is submitted by using the extension parameter
723 <literal>x-pquery</literal>,
725 <literal>query</literal>
726 parameter must be omitted, which makes the request not valid SRU.
727 Please don't do this.
734 Zebra does <emphasis>not</emphasis> support SRU's
735 <literal>scan</literal>
736 operation, as described at
737 <ulink url="http://www.loc.gov/standards/sru/scan/"/>
740 This is a rather embarrassing surprise as the pieces are all
741 there: Z39.50 scan is supported, and SRU scan requests are
742 recognised and diagnosed. To add further to the embarrassment, a
743 mutant form of SRU scan <emphasis>is</emphasis> supported, using
744 the non-standard <literal>x-pScanClause</literal> parameter in
745 place of the standard <literal>scanClause</literal> to scan on a
751 <title>Explain</title>
753 Zebra fully supports SRU's core
754 <literal>explain</literal>
755 operation, as described at
756 <ulink url="http://www.loc.gov/standards/sru/explain/index.html"/>
759 The ZeeRex record explaining a database may be requested either
760 with a fully fledged SRU request (with
761 <literal>operation</literal>=<literal>explain</literal>
762 and version-number specified)
763 or with a simple HTTP GET at the server's basename.
768 <title>Initialization, Present, Sort, Close</title>
770 In the Z39.50 protocol, Initialization, Present, Sort and Close
771 are separate operations. In SRU, however, these operations do not
777 SRU has no explicit initialization handshake phase, but
778 commences immediately with searching, scanning and explain
784 Neither does SRU have a close operation, since the protocol is
785 stateless and each request is self-contained. (It is true that
786 multiple SRU request/response pairs may be implemented as
787 multiple HTTP request/response pairs over a single persistent
788 TCP/IP connection; but the closure of that connection is not a
789 protocol-level operation.)
794 Retrieval in SRU is part of the
795 <literal>searchRetrieve</literal> operation, in which a search
796 is submitted and the response includes a subset of the records
797 in the result set. There is no direct analogue of Z39.50's
798 Present operation which requests records from an established
799 result set. In SRU, this is achieved by sending a subsequent
800 <literal>searchRetrieve</literal> request with the query
801 <literal>cql.resultSetId=</literal><emphasis>id</emphasis> where
802 <emphasis>id</emphasis> is the identifier of the previously
803 generated result-set.
808 Sorting in CQL is done within the
809 <literal>searchRetrieve</literal> operation - in v1.1, by an
810 explicit <literal>sort</literal> parameter, but the forthcoming
811 v1.2 or v2.0 will most likely use an extension of the query
812 language, CQL for sorting: see
813 <ulink url="http://zing.z3950.org/cql/sorting.html"/>
818 It can be seen, then, that while Zebra operating as an SRU server
819 does not provide the same set of operations as when operating as a
820 Z39.50 server, it does provide equivalent functionality.
826 <!-- Keep this comment at the end of the file
831 sgml-minimize-attributes:nil
832 sgml-always-quote-attributes:t
835 sgml-parent-document: "zebra.xml"
836 sgml-local-catalogs: nil
837 sgml-namecase-general:t