2 <!-- $Id: server.xml,v 1.23 2006-06-13 09:27:01 marc Exp $ -->
3 <title>The Z39.50 Server</title>
6 <title>Running the Z39.50 Server (zebrasrv)</title>
9 FIXME - We need to be consistent here, zebraidx had the options at the
10 end, and lots of explaining text before them. Same for zebrasvr! -H
11 FIXME - At least we need a small intro, what is zebrasvr, and how it
12 can be run (inetd, nt service, stand-alone program, daemon...) -H
15 <!-- re-write by MC, using the newly created input files for the
19 <sect2><title>Description</title>
20 <para>Zebra is a high-performance, general-purpose structured text indexing
21 and retrieval engine. It reads structured records in a variety of input
22 formats (eg. email, XML, MARC) and allows access to them through exact
23 boolean search expressions and relevance-ranked free-text queries.
26 <command>zebrasrv</command> is the Z39.50 and <ulink url="http://www.loc.gov/standards/sru/srw/">SRW</ulink>/U frontend
27 server for the <command>Zebra</command> indexer.
30 On Unix you can run the <command>zebrasrv</command>
31 server from the command line - and put it
32 in the background. It may also operate under the inet daemon.
33 On WIN32 you can run the server as a console application or
39 <title>Synopsis</title>
44 <title>Options</title>
47 The options for <command>zebrasrv</command> are the same
48 as those for YAZ' <command>yaz-ztest</command>.
49 Option <literal>-c</literal> specifies a Zebra configuration
50 file - if omitted <filename>zebra.cfg</filename> is read.
56 <sect2><title>Files</title>
58 <filename>zebra.cfg</filename>
61 <sect2><title>See Also</title>
64 <refentrytitle>zebraidx</refentrytitle>
65 <manvolnum>1</manvolnum>
68 <refentrytitle>yaz-ztest</refentrytitle>
69 <manvolnum>8</manvolnum>
73 The Zebra software is Copyright <command>Index Data</command>
74 <filename>http://www.indexdata.dk</filename>
75 and distributed under the
82 <emphasis remap="bf">Syntax</emphasis>
85 zebrasrv [options] [listener-address ...]
91 <emphasis remap="bf">Options</emphasis>
95 <term>-a <replaceable>APDU file</replaceable></term>
98 Specify a file for dumping PDUs (for diagnostic purposes).
99 The special name "-" sends output to <literal>stderr</literal>.
104 <term>-c <replaceable>config-file</replaceable></term>
107 Read configuration information from
108 <replaceable>config-file</replaceable>.
109 The default configuration is <literal>./zebra.cfg</literal>.
117 Don't fork on connection requests. This can be useful for
118 symbolic-level debugging. The server can only accept a single
119 connection in this mode.
127 Use the Z39.50 protocol. Currently the only protocol supported.
128 The option is retained for historical reasons, and for future
134 <term>-l <replaceable>logfile</replaceable></term>
137 Specify an output file for the diagnostic messages.
138 The default is to write this information to <literal>stderr</literal>.
143 <term>-v <replaceable>log-level</replaceable></term>
146 The log level. Use a comma-separated list of members of the set
147 {fatal,debug,warn,log,all,none}.
152 <term>-u <replaceable>username</replaceable></term>
155 Set user ID. Sets the real UID of the server process to that of the
156 given <replaceable>username</replaceable>.
157 It's useful if you aren't comfortable with having the
158 server run as root, but you need to start it as such to bind a
164 <term>-w <replaceable>working-directory</replaceable></term>
167 Change working directory.
175 Run under the Internet superserver, <literal>inetd</literal>.
176 Make sure you use the logfile option <literal>-l</literal> in
177 conjunction with this mode and specify the <literal>-l</literal>
178 option before any other options.
183 <term>-t <replaceable>timeout</replaceable></term>
186 Set the idle session timeout (default 60 minutes).
191 <term>-k <replaceable>kilobytes</replaceable></term>
194 Set the (approximate) maximum size of
195 present response messages. Default is 1024 KB (1 MB).
205 <sect1 id="protocol-support">
206 <title>Z39.50 Protocol Support and Behavior</title>
209 <title>Initialization</title>
212 During initialization, the server will negotiate to version 3 of the
213 Z39.50 protocol, and the option bits for Search, Present, Scan,
214 NamedResultSets, and concurrentOperations will be set, if requested by
215 the client. The maximum PDU size is negotiated down to a maximum of
222 <title>Search</title>
225 FIXME - Need to explain the string tag stuff before people get bogged
226 down with all these attribute numbers. Perhaps in its own
231 The supported query type are 1 and 101. All operators are currently
232 supported with the restriction that only proximity units of type "word"
233 are supported for the proximity operator.
234 Queries can be arbitrarily complex.
235 Named result sets are supported, and result sets can be used as operands
237 Searches may span multiple databases.
241 The server has full support for piggy-backed retrieval (see
242 also the following section).
248 <title>Present</title>
250 The present facility is supported in a standard fashion. The requested
251 record syntax is matched against the ones supported by the profile of
252 each record retrieved. If no record syntax is given, SUTRS is the
253 default. The requested element set name, again, is matched against any
254 provided by the relevant record profiles.
260 The attribute combinations provided with the termListAndStartPoint are
261 processed in the same way as operands in a query (see above).
262 Currently, only the term and the globalOccurrences are returned with
263 the termInfo structure.
270 Z39.50 specifies three different types of sort criteria.
271 Of these Zebra supports the attribute specification type in which
272 case the use attribute specifies the "Sort register".
273 Sort registers are created for those fields that are of type "sort" in
274 the default.idx file.
275 The corresponding character mapping file in default.idx specifies the
276 ordinal of each character used in the actual sort.
280 Z39.50 allows the client to specify sorting on one or more input
281 result sets and one output result set.
282 Zebra supports sorting on one result set only which may or may not
283 be the same as the output result set.
289 If a Close PDU is received, the server will respond with a Close PDU
290 with reason=FINISHED, no matter which protocol version was negotiated
291 during initialization. If the protocol version is 3 or more, the
292 server will generate a Close PDU under certain circumstances,
293 including a session timeout (60 minutes by default), and certain kinds of
294 protocol errors. Once a Close PDU has been sent, the protocol
295 association is considered broken, and the transport connection will be
296 closed immediately upon receipt of further data, or following a short
302 <title>Explain</title>
304 Zebra maintains a "classic"
305 <ulink url="&url.z39.50.explain;">Explain</ulink> database
307 This database is called <literal>IR-Explain-1</literal> and can be
308 searched using the attribute set <literal>exp-1</literal>.
311 The records in the explain database are of type
312 <literal>grs.sgml</literal> and can be retrieved as
313 <literal>SUTRS</literal>, <literal>XML</literal>,
314 <literal>GRS-1</literal> and <literal>ASN.1</literal> Explain.
317 Classic Explain only defines retrieaval of Explain information
318 via ASN.1. Pratically no Z39.50 clients supports this. Fortunately
319 they don't have to - since Zebra allows retrieval of this information
320 in the other formats.
323 The root element for the Explain grs.sgml records is
324 <literal>explain</literal>, thus
325 <filename>explain.abs</filename> is used for indexing.
329 Zebra <emphasis>must</emphasis> be able to locate
330 <filename>explain.abs</filename> in order to index the Explain
331 records properly. Zebra will work without it but the information
332 will not be searchable.
340 <chapter id="server-sru">
341 <title>The SRU/SRW Server</title>
343 In addition to Z39.50, Zebra supports the more recent and
344 web-friendly IR protocol SRU, described at
345 <ulink url="http://www.loc.gov/sru"/>.
346 SRU is ``Search/Retrieve via URL'', a simple, REST-like protocol
347 that uses HTTP GET to request search responses. The request
348 itself is made of parameters such as
349 <literal>query</literal>,
350 <literal>startRecord</literal>,
351 <literal>maximumRecords</literal>
353 <literal>recordSchema</literal>;
354 the response is an XML document containing hit-count, result-set
355 records, diagnostics, etc. SRU can be thought of as a re-casting
356 of Z39.50 semantics in web-friendly terms; or as a standardisation
357 of the ad-hoc query parameters used by search engines such as Google
358 and AltaVista; or as a superset of A9's OpenSearch (which it
362 Zebra further supports SRW, described at
363 <ulink url="http://www.loc.gov/srw"/>.
364 SRW is the ``Search/Retrieve Web Service'', a SOAP-based alternative
365 implementation of the abstract protocol that SRU implements as HTTP
366 GET requests. In SRW, requests are encoded as XML documents which
367 are posted to the server. The responses are identical to those
368 returned by SRU servers, except that they are wrapped in a several
369 layers of SOAP envelope.
372 Zebra supports all three protocols - Z39.50, SRU and SRW - on the
373 same port, recognising what protocol is used by each incoming
374 requests and handling them accordingly. This is a achieved through
375 the use of Deep Magic; civilians are warned not to stand too close.
378 From here on, ``SRU'' is used to indicate both the SRU and SRW
379 protocols, as they are identical except for the transport used for
380 the protocol packets and Zebra's support for them is equivalent.
383 <sect1 id="server-sru-run">
384 <title>Running the SRU Server (zebrasrv)</title>
386 Because Zebra supports all three protocols on one port, it would
387 seem to follow that the SRU server is run in the same way as
388 the Z39.50 server, as described above. This is true, but only in
389 an uninterestingly vacuous way: a Zebra server run in this manner
390 will indeed recognise and accept SRU requests; but since it
391 doesn't know how to handle the CQL queries that these protocols
392 use, all it can do is send failure responses.
396 It is possible to cheat, by having SRU search Zebra with
397 a PQF query instead of CQL, using the
398 <literal>x-pquery</literal>
400 <literal>query</literal>.
402 <emphasis role="strong">non-standard extension</emphasis>
404 <emphasis role="strong">very naughty</emphasis>
405 thing to do, but it does give you a way to see Zebra serving SRU
406 ``right out of the box''. If you start your favourite Zebra
407 server in the usual way, on port 9999, then you can send your web
411 http://localhost:9999/Default?version=1.1
412 &operation=searchRetrieve
413 &x-pquery=mineral
415 &maximumRecords=1
418 This will display the XML-formatted SRU response that includes the
419 first record in the result-set found by the query
420 <literal>mineral</literal>. (For clarity, the SRU URL is shown
421 here broken across lines, but the lines should be joined to gether
422 to make single-line URL for the browser to submit.)
426 In order to turn on Zebra's support for CQL queries, it's necessary
427 to have the YAZ generic front-end (which Zebra uses) translate them
428 into the Z39.50 Type-1 query format that is used internally. And
429 to do this, the generic front-end's own configuration file must be
430 used. This file is described
431 <link linkend="gfs-config">elsewhere</link>;
432 the salient point for SRU support is that
433 <command>zebrasrv</command>
434 must be started with the
435 <literal>-f frontendConfigFile</literal>
436 option rather than the
437 <literal>-c zebraConfigFile</literal>
439 and that the front-end configuration file must include both a
440 reference to the Zebra configuration file and the CQL-to-PQF
441 translator configuration file.
444 A minimal front-end configuration file that does this would read as
450 <config>zebra.cfg</config>
451 <cql2rpn>../../tab/pqf.properties</cql2rpn>
457 <literal><config></literal>
458 element contains the name of the Zebra configuration file that was
459 previously specified by the
460 <literal>-c</literal>
461 command-line argument, and the
462 <literal><cql2rpn></literal>
463 element contains the name of the CQL properties file specifying how
464 various CQL indexes, relations, etc. are translated into Type-1
468 A zebra server running with such a configuration can then be
469 queried using proper, conformant SRU URLs with CQL queries:
472 http://localhost:9999/Default?version=1.1
473 &operation=searchRetrieve
474 &query=title=utah and description=epicent*
476 &maximumRecords=1
480 <sect1 id="server-sru-support">
481 <title>SRU and SRW Protocol Support and Behavior</title>
483 Zebra running as an SRU server supports SRU version 1.1, including
484 CQL version 1.1. In particular, it provides support for the
485 following elements of the protocol.
489 <title>Search and Retrieval</title>
491 Zebra fully supports SRU's core
492 <literal>searchRetrieve</literal>
493 operation, as described at
494 <ulink url="http://www.loc.gov/standards/sru/sru-spec.html"/>
497 One of the great strengths of SRU is that it mandates a standard
498 query language, CQL, and that all conforming implementations can
499 therefore be trusted to correctly interpret the same queries. It
500 is with some shame, then, that we admit that Zebra also supports
501 an additional query language, our own Prefix Query Format (PQF,
502 <ulink url="http://indexdata.com/yaz/doc/tools.tkl#PQF"/>).
503 A PQF query is submitted by using the extension parameter
504 <literal>x-pquery</literal>,
506 <literal>query</literal>
507 parameter must be omitted, which makes the request not valid SRU.
508 Please don't do this.
515 Zebra does <emphasis>not</emphasis> support SRU's
516 <literal>scan</literal>
517 operation, as described at
518 <ulink url="http://www.loc.gov/standards/sru/scan/"/>
521 This is a rather embarrassing surprise as the pieces are all
522 there: Z39.50 scan is supported, and SRU scan requests are
523 recognised and diagnosed. To add further to the embarrassment, a
524 mutant form of SRU scan <emphasis>is</emphasis> supported, using
525 the non-standard <literal>x-pScanClause</literal> parameter in
526 place of the standard <literal>scanClause</literal> to scan on a
532 <title>Explain</title>
534 Zebra fully supports SRU's core
535 <literal>explain</literal>
536 operation, as described at
537 <ulink url="http://www.loc.gov/standards/sru/explain/index.html"/>
540 The ZeeRex record explaining a database may be requested either
541 with a fully fledged SRU request (with
542 <literal>operation</literal>=<literal>explain</literal>
543 and version-number specified)
544 or with a simple HTTP GET at the server's basename.
545 The ZeeRex record returned in response is the one embedded
546 in the YAZ Frontend Server configuration file that is described in the
547 <link linkend="gfs-config">Virtual Hosts</link> documentation.
550 Unfortunately, the data found in the
551 CQL-to-PQF text file must be added by hand-craft into the explain
552 section of the YAZ Frontend Server configuration file to be able
553 to provide a suitable explain record.
554 Too bad, but this is all extreme
555 new alpha stuff, and a lot of work has yet to be done ..
558 There is no linkeage whatsoever between the Z39.50 explain model
559 and the SRU/SRW explain response (well, at least not implemented
560 in Zebra, that is ..). Zebra does not provide a means using
561 Z39.50 to obtain the ZeeRex record.
566 <title>Some SRU Examples</title>
568 Surf into <literal>http://localhost:9999</literal>
569 to get an explain response, or use
571 http://localhost:9999/?version=1.1&operation=explain
575 See number of hits for a query
577 http://localhost:9999/?version=1.1&operation=searchRetrieve
578 &query=text=(plant%20and%20soil)
582 Fetch record 5-7 in Dublin Core format
584 http://localhost:9999/?version=1.1&operation=searchRetrieve
585 &query=text=(plant%20and%20soil)
586 &startRecord=5&maximumRecords=2&recordSchema=dc
590 Even search using PQF queries using the <emphasis>extended naughty
591 verb</emphasis> <literal>x-pquery</literal>
593 http://localhost:9999/?version=1.1&operation=searchRetrieve
594 &x-pquery=@attr%201=text%20@and%20plant%20soil
598 Or scan indexes using the <emphasis>extended extremely naughty
599 verb</emphasis> <literal>x-pScanClause</literal>
601 http://localhost:9999/?version=1.1&operation=scan
602 &x-pScanClause=@attr%201=text%20something
604 <emphasis>Don't do this in production code!</emphasis>
605 But it's a great fast debugging aid.
610 <title>Initialization, Present, Sort, Close</title>
612 In the Z39.50 protocol, Initialization, Present, Sort and Close
613 are separate operations. In SRU, however, these operations do not
619 SRU has no explicit initialization handshake phase, but
620 commences immediately with searching, scanning and explain
626 Neither does SRU have a close operation, since the protocol is
627 stateless and each request is self-contained. (It is true that
628 multiple SRU request/response pairs may be implemented as
629 multiple HTTP request/response pairs over a single persistent
630 TCP/IP connection; but the closure of that connection is not a
631 protocol-level operation.)
636 Retrieval in SRU is part of the
637 <literal>searchRetrieve</literal> operation, in which a search
638 is submitted and the response includes a subset of the records
639 in the result set. There is no direct analogue of Z39.50's
640 Present operation which requests records from an established
641 result set. In SRU, this is achieved by sending a subsequent
642 <literal>searchRetrieve</literal> request with the query
643 <literal>cql.resultSetId=</literal><emphasis>id</emphasis> where
644 <emphasis>id</emphasis> is the identifier of the previously
645 generated result-set.
650 Sorting in CQL is done within the
651 <literal>searchRetrieve</literal> operation - in v1.1, by an
652 explicit <literal>sort</literal> parameter, but the forthcoming
653 v1.2 or v2.0 will most likely use an extension of the query
654 language, CQL for sorting: see
655 <ulink url="http://zing.z3950.org/cql/sorting.html"/>
660 It can be seen, then, that while Zebra operating as an SRU server
661 does not provide the same set of operations as when operating as a
662 Z39.50 server, it does provide equivalent functionality.
668 <!-- Keep this comment at the end of the file
673 sgml-minimize-attributes:nil
674 sgml-always-quote-attributes:t
677 sgml-parent-document: "zebra.xml"
678 sgml-local-catalogs: nil
679 sgml-namecase-general:t