X-Git-Url: http://jsfdemo.indexdata.com/?a=blobdiff_plain;f=doc%2Fserver.xml;h=af98f12d7eb5b4ccb2a8b5d7ec61be6c0cec4e67;hb=7415d28c149c1bab51fe93aeaccdd14085b69bd9;hp=f5fa9ea64b7c04cb36682a17b35beda4e9769a04;hpb=25a37c9be836f891281688788a7a1f967ea2b2cb;p=idzebra-moved-to-github.git
diff --git a/doc/server.xml b/doc/server.xml
index f5fa9ea..af98f12 100644
--- a/doc/server.xml
+++ b/doc/server.xml
@@ -1,5 +1,5 @@
-
+
The Z39.50 Server
@@ -12,11 +12,77 @@
can be run (inetd, nt service, stand-alone program, daemon...) -H
-->
+
+
+
+ Description
+ Zebra is a high-performance, general-purpose structured text indexing
+ and retrieval engine. It reads structured records in a variety of input
+ formats (eg. email, XML, MARC) and allows access to them through exact
+ boolean search expressions and relevance-ranked free-text queries.
+
+
+ zebrasrv is the Z39.50 and SRW/U frontend
+ server for the Zebra indexer.
+
+
+ On Unix you can run the zebrasrv
+ server from the command line - and put it
+ in the background. It may also operate under the inet daemon.
+ On WIN32 you can run the server as a console application or
+ as a WIN32 Service.
+
+
+
+
+ Synopsis
+ &zebrasrv-synopsis;
+
+
+
+ Options
+
+
+ The options for zebrasrv are the same
+ as those for YAZ' yaz-ztest.
+ Option -c specifies a Zebra configuration
+ file - if omitted zebra.cfg is read.
+
+
+ &zebrasrv-options;
+
+
+ Files
+
+ zebra.cfg
+
+
+ See Also
+
+
+ zebraidx
+ 1
+ ,
+
+ yaz-ztest
+ 8
+
+
+
+ The Zebra software is Copyright Index Data
+ http://www.indexdata.dk
+ and distributed under the
+ GPLv2 license.
+
+
+
+
+
Z39.50 Protocol Support and Behavior
@@ -239,245 +242,8 @@
also the following section).
-
- Use attributes are interpreted according to the
- attribute sets which have been loaded in the
- zebra.cfg file, and are matched against specific
- fields as specified in the .abs file which
- describes the profile of the records which have been loaded.
- If no Use attribute is provided, a default of Bib-1 Any is assumed.
-
-
-
- If a Structure attribute of
- Phrase is used in conjunction with a
- Completeness attribute of
- Complete (Sub)field, the term is matched
- against the contents of the phrase (long word) register, if one
- exists for the given Use attribute.
- A phrase register is created for those fields in the
- .abs file that contains a
- p-specifier.
-
-
-
-
- If Structure=Phrase is
- used in conjunction with Incomplete Field - the
- default value for Completeness, the
- search is directed against the normal word registers, but if the term
- contains multiple words, the term will only match if all of the words
- are found immediately adjacent, and in the given order.
- The word search is performed on those fields that are indexed as
- type w in the .abs file.
-
-
-
- If the Structure attribute is
- Word List,
- Free-form Text, or
- Document Text, the term is treated as a
- natural-language, relevance-ranked query.
- This search type uses the word register, i.e. those fields
- that are indexed as type w in the
- .abs file.
-
-
-
- If the Structure attribute is
- Numeric String the term is treated as an integer.
- The search is performed on those fields that are indexed
- as type n in the .abs file.
-
-
-
- If the Structure attribute is
- URx the term is treated as a URX (URL) entity.
- The search is performed on those fields that are indexed as type
- u in the .abs file.
-
-
-
- If the Structure attribute is
- Local Number the term is treated as
- native Zebra Record Identifier.
-
-
-
- If the Relation attribute is
- Equals (default), the term is matched
- in a normal fashion (modulo truncation and processing of
- individual words, if required).
- If Relation is Less Than,
- Less Than or Equal,
- Greater than, or Greater than or
- Equal, the term is assumed to be numerical, and a
- standard regular expression is constructed to match the given
- expression.
- If Relation is Relevance,
- the standard natural-language query processor is invoked.
-
-
-
- For the Truncation attribute,
- No Truncation is the default.
- Left Truncation is not supported.
- Process # is supported, as is
- Regxp-1.
- Regxp-2 enables the fault-tolerant (fuzzy)
- search. As a default, a single error (deletion, insertion,
- replacement) is accepted when terms are matched against the register
- contents.
-
-
-
- Regular expressions
-
-
- Each term in a query is interpreted as a regular expression if
- the truncation value is either Regxp-1 (102)
- or Regxp-2 (103).
- Both query types follow the same syntax with the operands:
-
-
-
- x
-
-
- Matches the character x.
-
-
-
-
- .
-
-
- Matches any character.
-
-
-
-
- [..]
-
-
- Matches the set of characters specified;
- such as [abc] or [a-c].
-
-
-
-
- and the operators:
-
-
-
- x*
-
-
- Matches x zero or more times. Priority: high.
-
-
-
-
- x+
-
-
- Matches x one or more times. Priority: high.
-
-
-
-
- x?
-
-
- Matches x zero or once. Priority: high.
-
-
-
-
- xy
-
-
- Matches x, then y.
- Priority: medium.
-
-
-
-
- x|y
-
-
- Matches either x or y.
- Priority: low.
-
-
-
-
- The order of evaluation may be changed by using parentheses.
-
-
-
- If the first character of the Regxp-2 query
- is a plus character (+) it marks the
- beginning of a section with non-standard specifiers.
- The next plus character marks the end of the section.
- Currently Zebra only supports one specifier, the error tolerance,
- which consists one digit.
-
-
-
- Since the plus operator is normally a suffix operator the addition to
- the query syntax doesn't violate the syntax for standard regular
- expressions.
-
-
-
-
-
- Query examples
-
-
- Phrase search for information retrieval in
- the title-register:
-
- @attr 1=4 "information retrieval"
-
-
-
-
- Ranked search for the same thing:
-
- @attr 1=4 @attr 2=102 "Information retrieval"
-
-
-
-
- Phrase search with a regular expression:
-
- @attr 1=4 @attr 5=102 "informat.* retrieval"
-
-
-
-
- Ranked search with a regular expression:
-
- @attr 1=4 @attr 5=102 @attr 2=102 "informat.* retrieval"
-
-
-
-
- In the GILS schema (gils.abs), the
- west-bounding-coordinate is indexed as type n,
- and is therefore searched by specifying
- structure=Numeric String.
- To match all those records with west-bounding-coordinate greater
- than -114 we use the following query:
-
- @attr 4=109 @attr 2=5 @attr gils 1=2038 -114
-
-
-
-
-
+
+
Present
@@ -531,8 +297,364 @@
timeout.
+
+
+ Explain
+
+ Zebra maintains a "classic"
+ Explain database
+ on the side.
+ This database is called IR-Explain-1 and can be
+ searched using the attribute set exp-1.
+
+
+ The records in the explain database are of type
+ grs.sgml.
+ The root element for the Explain grs.sgml records is
+ explain, thus
+ explain.abs is used for indexing.
+
+
+
+ Zebra must be able to locate
+ explain.abs in order to index the Explain
+ records properly. Zebra will work without it but the information
+ will not be searchable.
+
+
+
+
+
+
+ The SRU/SRW Server
+
+ In addition to Z39.50, Zebra supports the more recent and
+ web-friendly IR protocol SRU, described at
+ .
+ SRU is ``Search/Retrieve via URL'', a simple, REST-like protocol
+ that uses HTTP GET to request search responses. The request
+ itself is made of parameters such as
+ query,
+ startRecord,
+ maximumRecords
+ and
+ recordSchema;
+ the response is an XML document containing hit-count, result-set
+ records, diagnostics, etc. SRU can be thought of as a re-casting
+ of Z39.50 semantics in web-friendly terms; or as a standardisation
+ of the ad-hoc query parameters used by search engines such as Google
+ and AltaVista; or as a superset of A9's OpenSearch (which it
+ predates).
+
+
+ Zebra further supports SRW, described at
+ .
+ SRW is the ``Search/Retrieve Web Service'', a SOAP-based alternative
+ implementation of the abstract protocol that SRU implements as HTTP
+ GET requests. In SRW, requests are encoded as XML documents which
+ are posted to the server. The responses are identical to those
+ returned by SRU servers, except that they are wrapped in a several
+ layers of SOAP envelope.
+
+
+ Zebra supports all three protocols - Z39.50, SRU and SRW - on the
+ same port, recognising what protocol is used by each incoming
+ requests and handling them accordingly. This is a achieved through
+ the use of Deep Magic; civilians are warned not to stand too close.
+
+
+ From here on, ``SRU'' is used to indicate both the SRU and SRW
+ protocols, as they are identical except for the transport used for
+ the protocol packets and Zebra's support for them is equivalent.
+
+
+
+ Running the SRU Server (zebrasrv)
+
+ Because Zebra supports all three protocols on one port, it would
+ seem to follow that the SRU server is run in the same way as
+ the Z39.50 server, as described above. This is true, but only in
+ an uninterestingly vacuous way: a Zebra server run in this manner
+ will indeed recognise and accept SRU requests; but since it
+ doesn't know how to handle the CQL queries that these protocols
+ use, all it can do is send failure responses.
+
+
+
+ It is possible to cheat, by having SRU search Zebra with
+ a PQF query instead of CQL, using the
+ x-pquery
+ parameter instead of
+ query.
+ This is a
+ non-standard extension
+ of CQL, and a
+ very naughty
+ thing to do, but it does give you a way to see Zebra serving SRU
+ ``right out of the box''. If you start your favourite Zebra
+ server in the usual way, on port 9999, then you can send your web
+ browser to:
+
+
+ http://localhost:9999/Default?version=1.1
+ &operation=searchRetrieve
+ &x-pquery=mineral
+ &startRecord=1
+ &maximumRecords=1
+
+
+ This will display the XML-formatted SRU response that includes the
+ first record in the result-set found by the query
+ mineral. (For clarity, the SRU URL is shown
+ here broken across lines, but the lines should be joined to gether
+ to make single-line URL for the browser to submit.)
+
+
+
+ In order to turn on Zebra's support for CQL queries, it's necessary
+ to have the YAZ generic front-end (which Zebra uses) translate them
+ into the Z39.50 Type-1 query format that is used internally. And
+ to do this, the generic front-end's own configuration file must be
+ used. This file is described
+ elsewhere;
+ the salient point for SRU support is that
+ zebrasrv
+ must be started with the
+ -f frontendConfigFile
+ option rather than the
+ -c zebraConfigFile
+ option,
+ and that the front-end configuration file must include both a
+ reference to the Zebra configuration file and the CQL-to-PQF
+ translator configuration file.
+
+
+ A minimal front-end configuration file that does this would read as
+ follows:
+
+
+
+ zebra.cfg
+ ../../tab/pqf.properties
+
+
+]]>
+
+ The
+ <config>
+ element contains the name of the Zebra configuration file that was
+ previously specified by the
+ -c
+ command-line argument, and the
+ <cql2rpn>
+ element contains the name of the CQL properties file specifying how
+ various CQL indexes, relations, etc. are translated into Type-1
+ queries.
+
+
+ A zebra server running with such a configuration can then be
+ queried using proper, conformant SRU URLs with CQL queries:
+
+
+ http://localhost:9999/Default?version=1.1
+ &operation=searchRetrieve
+ &query=title=utah and description=epicent*
+ &startRecord=1
+ &maximumRecords=1
+
+
+
+
+ SRU and SRW Protocol Support and Behavior
+
+ Zebra running as an SRU server supports SRU version 1.1, including
+ CQL version 1.1. In particular, it provides support for the
+ following elements of the protocol.
+
+
+
+ Search and Retrieval
+
+ Zebra fully supports SRU's core
+ searchRetrieve
+ operation, as described at
+
+
+
+ One of the great strengths of SRU is that it mandates a standard
+ query language, CQL, and that all conforming implementations can
+ therefore be trusted to correctly interpret the same queries. It
+ is with some shame, then, that we admit that Zebra also supports
+ an additional query language, our own Prefix Query Format (PQF,
+ ).
+ A PQF query is submitted by using the extension parameter
+ x-pquery,
+ in which case the
+ query
+ parameter must be omitted, which makes the request not valid SRU.
+ Please don't do this.
+
+
+
+
+ Scan
+
+ Zebra supports SRU's
+ scan
+ operation, as described at
+ .
+ Scanning using CQL syntax is the default, where the
+ standard scanClause parameter is used.
+
+
+ In addition, a
+ mutant form of SRU scan is supported, using
+ the non-standard x-pScanClause parameter in
+ place of the standard scanClause to scan on a
+ PQF query clause.
+
+
+
+
+ Explain
+
+ Zebra fully supports SRU's core
+ explain
+ operation, as described at
+
+
+
+ The ZeeRex record explaining a database may be requested either
+ with a fully fledged SRU request (with
+ operation=explain
+ and version-number specified)
+ or with a simple HTTP GET at the server's basename.
+ The ZeeRex record returned in response is the one embedded
+ in the YAZ Frontend Server configuration file that is described in the
+ Virtual Hosts documentation.
+
+
+ Unfortunately, the data found in the
+ CQL-to-PQF text file must be added by hand-craft into the explain
+ section of the YAZ Frontend Server configuration file to be able
+ to provide a suitable explain record.
+ Too bad, but this is all extreme
+ new alpha stuff, and a lot of work has yet to be done ..
+
+
+ There is no linkeage whatsoever between the Z39.50 explain model
+ and the SRU/SRW explain response (well, at least not implemented
+ in Zebra, that is ..). Zebra does not provide a means using
+ Z39.50 to obtain the ZeeRex record.
+
+
+
+
+ Some SRU Examples
+
+ Surf into http://localhost:9999
+ to get an explain response, or use
+
+
+
+ See number of hits for a query
+
+
+
+ Fetch record 5-7 in Dublin Core format
+
+
+
+ Even search using PQF queries using the extended naughty
+ verbx-pquery
+
+
+
+ Or scan indexes using the extended extremely naughty
+ verbx-pScanClause
+
+ Don't do this in production code!
+ But it's a great fast debugging aid.
+
+
+
+
+ Initialization, Present, Sort, Close
+
+ In the Z39.50 protocol, Initialization, Present, Sort and Close
+ are separate operations. In SRU, however, these operations do not
+ exist.
+
+
+
+
+ SRU has no explicit initialization handshake phase, but
+ commences immediately with searching, scanning and explain
+ operations.
+
+
+
+
+ Neither does SRU have a close operation, since the protocol is
+ stateless and each request is self-contained. (It is true that
+ multiple SRU request/response pairs may be implemented as
+ multiple HTTP request/response pairs over a single persistent
+ TCP/IP connection; but the closure of that connection is not a
+ protocol-level operation.)
+
+
+
+
+ Retrieval in SRU is part of the
+ searchRetrieve operation, in which a search
+ is submitted and the response includes a subset of the records
+ in the result set. There is no direct analogue of Z39.50's
+ Present operation which requests records from an established
+ result set. In SRU, this is achieved by sending a subsequent
+ searchRetrieve request with the query
+ cql.resultSetId=id where
+ id is the identifier of the previously
+ generated result-set.
+
+
+
+
+ Sorting in CQL is done within the
+ searchRetrieve operation - in v1.1, by an
+ explicit sort parameter, but the forthcoming
+ v1.2 or v2.0 will most likely use an extension of the query
+ language, CQL for sorting: see
+
+
+
+
+
+ It can be seen, then, that while Zebra operating as an SRU server
+ does not provide the same set of operations as when operating as a
+ Z39.50 server, it does provide equivalent functionality.
+
+
+
+
+