<chapter id="administration">
- <!-- $Id: administration.xml,v 1.49 2007-02-02 09:58:39 marc Exp $ -->
+ <!-- $Id: administration.xml,v 1.50 2007-02-02 11:10:08 marc Exp $ -->
<title>Administrating &zebra;</title>
<!-- ### It's a bit daft that this chapter (which describes half of
the configuration-file formats) is separated from
</para>
<para>
- Both the &zebra; administrative tool and the Z39.50 server share a
+ Both the &zebra; administrative tool and the &z3950; server share a
set of index files and a global configuration file.
The name of the configuration file defaults to
<literal>zebra.cfg</literal>.
In the configuration file, the group name is placed before the option
name itself, separated by a dot (.). For instance, to set the record type
for group <literal>public</literal> to <literal>grs.sgml</literal>
- (the SGML-like format for structured records) you would write:
+ (the &sgml;-like format for structured records) you would write:
</para>
<para>
<replaceable>database</replaceable></term>
<listitem>
<para>
- Specifies the Z39.50 database name.
+ Specifies the &z3950; database name.
<!-- FIXME - now we can have multiple databases in one server. -H -->
</para>
</listitem>
of permissions currently: read (r) and write(w). By default
users not listed in a permission directive are given the read
privilege. To specify permissions for a user with no
- username, or Z39.50 anonymous style use
+ username, or &z3950; anonymous style use
<literal>anonymous</literal>. The permstring consists of
a sequence of characters. Include character <literal>w</literal>
for write/update access, <literal>r</literal> for read access and
mounted on a CD-ROM drive,
you may want &zebra; to make an internal copy of them. To do this,
you specify 1 (true) in the <literal>storeData</literal> setting. When
- the Z39.50 server retrieves the records they will be read from the
+ the &z3950; server retrieves the records they will be read from the
internal file structures of the system.
</para>
<para>
Consider a system in which you have a group of text files called
<literal>simple</literal>.
- That group of records should belong to a Z39.50 database called
+ That group of records should belong to a &z3950; database called
<literal>textbase</literal>.
The following <literal>zebra.cfg</literal> file will suffice:
</para>
information. If you have a group of records that explicitly associates
an ID with each record, this method is convenient. For example, the
record format may contain a title or a ID-number - unique within the group.
- In either case you specify the Z39.50 attribute set and use-attribute
+ In either case you specify the &z3950; attribute set and use-attribute
location in which this information is stored, and the system looks at
that field to determine the identity of the record.
</para>
<para>
For instance, the sample GILS records that come with the &zebra;
distribution contain a unique ID in the data tagged Control-Identifier.
- The data is mapped to the Bib-1 use attribute Identifier-standard
+ The data is mapped to the &bib1; use attribute Identifier-standard
(code 1007). To use this field as a record id, specify
<literal>(bib1,Identifier-standard)</literal> as the value of the
<literal>recordId</literal> in the configuration file.
</para>
<para>
The experimental <literal>alvis</literal> filter provides a
- directive to fetch static rank information out of the indexed XML
+ directive to fetch static rank information out of the indexed &xml;
records, thus making <emphasis>all</emphasis> hit sets ordered
after <emphasis>ascending</emphasis> static
rank, and for those doc's which have the same static rank, ordered
indexing time (this is why we
call it ``dynamic ranking'' in the first place ...)
It is invoked by adding
- the Bib-1 relation attribute with
- value ``relevance'' to the PQF query (that is,
+ the &bib1; relation attribute with
+ value ``relevance'' to the &pqf; query (that is,
<literal>@attr 2=102</literal>, see also
<ulink url="&url.z39.50;bib1.html">
- The BIB-1 Attribute Set Semantics</ulink>, also in
+ The &bib1; Attribute Set Semantics</ulink>, also in
<ulink url="&url.z39.50.attset.bib1;">HTML</ulink>).
To find all articles with the word <literal>Eoraptor</literal> in
- the title, and present them relevance ranked, issue the PQF query:
+ the title, and present them relevance ranked, issue the &pqf; query:
<screen>
@attr 2=102 @attr 1=4 Eoraptor
</screen>
</para>
<sect3 id="administration-ranking-dynamic-rank1">
- <title>Dynamically ranking using PQF queries with the 'rank-1'
+ <title>Dynamically ranking using &pqf; queries with the 'rank-1'
algorithm</title>
<para>
</para>
<para>
It is possible to apply dynamic ranking on only parts of the
- PQF query:
+ &pqf; query:
<screen>
@and @attr 2=102 @attr 1=1010 Utah @attr 1=1018 Springer
</screen>
</para>
<para>
Ranking weights may be used to pass a value to a ranking
- algorithm, using the non-standard BIB-1 attribute type 9.
+ algorithm, using the non-standard &bib1; attribute type 9.
This allows one branch of a query to use one value while
another branch uses a different one. For example, we can search
for <literal>utah</literal> in the
</para>
<para>
The default weight is
- sqrt(1000) ~ 34 , as the Z39.50 standard prescribes that the top score
+ sqrt(1000) ~ 34 , as the &z3950; standard prescribes that the top score
is 1000 and the bottom score is 0, encoded in integers.
</para>
<warning>
<!--
<sect3 id="administration-ranking-dynamic-rank1">
- <title>Dynamically ranking PQF queries with the 'rank-static'
+ <title>Dynamically ranking &pqf; queries with the 'rank-static'
algorithm</title>
<para>
The dummy <literal>rank-static</literal> reranking/scoring
</sect3>
<sect3 id="administration-ranking-dynamic-cql">
- <title>Dynamically ranking CQL queries</title>
+ <title>Dynamically ranking &cql; queries</title>
<para>
- Dynamic ranking can be enabled during sever side CQL
+ Dynamic ranking can be enabled during sever side &cql;
query expansion by adding <literal>@attr 2=102</literal>
- chunks to the CQL config file. For example
+ chunks to the &cql; config file. For example
<screen>
relationModifier.relevant = 2=102
</screen>
- invokes dynamic ranking each time a CQL query of the form
+ invokes dynamic ranking each time a &cql; query of the form
<screen>
Z> querytype cql
Z> f alvis.text =/relevant house
</screen>
is issued. Dynamic ranking can also be automatically used on
- specific CQL indexes by (for example) setting
+ specific &cql; indexes by (for example) setting
<screen>
index.alvis.text = 1=text 2=102
</screen>
- which then invokes dynamic ranking each time a CQL query of the form
+ which then invokes dynamic ranking each time a &cql; query of the form
<screen>
Z> querytype cql
Z> f alvis.text = house
&zebra; sorts efficiently using special sorting indexes
(type=<literal>s</literal>; so each sortable index must be known
at indexing time, specified in the configuration of record
- indexing. For example, to enable sorting according to the BIB-1
+ indexing. For example, to enable sorting according to the &bib1;
<literal>Date/time-added-to-db</literal> field, one could add the line
<screen>
xelm /*/@created Date/time-added-to-db:s
<para>
Indexing can be specified at searching time using a query term
carrying the non-standard
- BIB-1 attribute-type <literal>7</literal>. This removes the
- need to send a Z39.50 <literal>Sort Request</literal>
+ &bib1; attribute-type <literal>7</literal>. This removes the
+ need to send a &z3950; <literal>Sort Request</literal>
separately, and can dramatically improve latency when the client
and server are on separate networks.
The sorting part of the query is separate from the rest of the
</para>
<para>
A sorting subquery needs two attributes: an index (such as a
- BIB-1 type-1 attribute) specifying which index to sort on, and a
+ &bib1; type-1 attribute) specifying which index to sort on, and a
type-7 attribute whose value is be <literal>1</literal> for
ascending sorting, or <literal>2</literal> for descending. The
term associated with the sorting attribute is the priority of
on.
</para>
<para>For example, a search for water, sort by title (ascending),
- is expressed by the PQF query
+ is expressed by the &pqf; query
<screen>
@or @attr 1=1016 water @attr 7=1 @attr 1=4 0
</screen>
<note>
<para>
Extended services are only supported when accessing the &zebra;
- server using the <ulink url="&url.z39.50;">Z39.50</ulink>
- protocol. The <ulink url="&url.sru;">SRU</ulink> protocol does
+ server using the <ulink url="&url.z39.50;">&z3950;</ulink>
+ protocol. The <ulink url="&url.sru;">&sru;</ulink> protocol does
not support extended services.
</para>
</note>
storeKeys: 1
</screen>
The general record type should be set to any record filter which
- is able to parse XML records, you may use any of the two
+ is able to parse &xml; records, you may use any of the two
declarations (but not both simultaneously!)
<screen>
recordType: grs.xml
<para>
It is not possible to carry information about record types or
similar to &zebra; when using extended services, due to
- limitations of the <ulink url="&url.z39.50;">Z39.50</ulink>
+ limitations of the <ulink url="&url.z39.50;">&z3950;</ulink>
protocol. Therefore, indexing filters can not be chosen on a
- per-record basis. One and only one general XML indexing filter
+ per-record basis. One and only one general &xml; indexing filter
must be defined.
<!-- but because it is represented as an OID, we would need some
form of proprietary mapping scheme between record type strings and
OIDs. -->
<!--
However, as a minimum, it would be extremely useful to enable
- people to use MARC21, assuming grs.marcxml.marc21 as a record
+ people to use &marc21;, assuming grs.marcxml.marc21 as a record
type.
-->
</para>
<sect2 id="administration-extended-services-z3950">
- <title>Extended services in the Z39.50 protocol</title>
+ <title>Extended services in the &z3950; protocol</title>
<para>
- The <ulink url="&url.z39.50;">Z39.50</ulink> standard allows
+ The <ulink url="&url.z39.50;">&z3950;</ulink> standard allows
servers to accept special binary <emphasis>extended services</emphasis>
protocol packages, which may be used to insert, update and delete
records into servers. These carry control and update
</para>
<table id="administration-extended-services-z3950-table" frame="top">
- <title>Extended services Z39.50 Package Fields</title>
+ <title>Extended services &z3950; Package Fields</title>
<tgroup cols="3">
<thead>
<row>
</row>
<row>
<entry><literal>record</literal></entry>
- <entry><literal>XML string</literal></entry>
- <entry>An XML formatted string containing the record</entry>
+ <entry><literal>&xml; string</literal></entry>
+ <entry>An &xml; formatted string containing the record</entry>
</row>
<row>
<entry><literal>syntax</literal></entry>
<entry><literal>'xml'</literal></entry>
- <entry>Only XML record syntax is supported</entry>
+ <entry>Only &xml; record syntax is supported</entry>
</row>
<row>
<entry><literal>recordIdOpaque</literal></entry>
<para>
When retrieving existing
- records indexed with GRS indexing filters, the &zebra; internal
+ records indexed with &grs1; indexing filters, the &zebra; internal
ID number is returned in the field
<literal>/*/id:idzebra/localnumber</literal> in the namespace
<literal>xmlns:id="http://www.indexdata.dk/zebra/"</literal>,
]]>
</screen>
Now the <literal>Default</literal> database was created,
- we can insert an XML file (esdd0006.grs
+ we can insert an &xml; file (esdd0006.grs
from example/gils/records) and index it:
<screen>
<![CDATA[
<title>Extended services from yaz-php</title>
<para>
- Extended services are also available from the YAZ PHP client layer. An
- example of an YAZ-PHP extended service transaction is given here:
+ Extended services are also available from the &yaz; &php; client layer. An
+ example of an &yaz;-&php; extended service transaction is given here:
<screen>
<![CDATA[
$record = '<record><title>A fine specimen of a record</title></record>';
<chapter id="architecture">
- <!-- $Id: architecture.xml,v 1.19 2007-02-02 09:58:39 marc Exp $ -->
+ <!-- $Id: architecture.xml,v 1.20 2007-02-02 11:10:08 marc Exp $ -->
<title>Overview of &zebra; Architecture</title>
<section id="architecture-representation">
<varlistentry>
<term>Search Evaluation</term>
<listitem>
- <para>by execution of search requests expressed in PQF/RPN
+ <para>by execution of search requests expressed in &pqf;/&rpn;
data structures, which are handed over from
- the YAZ server frontend API. Search evaluation includes
+ the &yaz; server frontend &api;. Search evaluation includes
construction of hit lists according to boolean combinations
of simpler searches. Fast performance is achieved by careful
use of index structures, and by evaluation specific index hit
<term>Record Presentation</term>
<listitem>
<para>returns - possibly ranked - result sets, hit
- numbers, and the like internal data to the YAZ server backend API
+ numbers, and the like internal data to the &yaz; server backend &api;
for shipping to the client. Each individual filter module
implements it's own specific presentation formats.
</para>
<section id="componentsearcher">
<title>&zebra; Searcher/Retriever</title>
<para>
- This is the executable which runs the Z39.50/SRU/SRW server and
+ This is the executable which runs the &z3950;/&sru;/&srw; server and
glues together the core libraries and the filter modules to one
great Information Retrieval server application.
</para>
</section>
<section id="componentyazserver">
- <title>YAZ Server Frontend</title>
+ <title>&yaz; Server Frontend</title>
<para>
- The YAZ server frontend is
- a full fledged stateful Z39.50 server taking client
+ The &yaz; server frontend is
+ a full fledged stateful &z3950; server taking client
connections, and forwarding search and scan requests to the
&zebra; core indexer.
</para>
<para>
- In addition to Z39.50 requests, the YAZ server frontend acts
+ In addition to &z3950; requests, the &yaz; server frontend acts
as HTTP server, honoring
- <ulink url="&url.srw;">SRU SOAP</ulink>
+ <ulink url="&url.srw;">&sru; &soap;</ulink>
requests, and
- <ulink url="&url.sru;">SRU REST</ulink>
+ <ulink url="&url.sru;">&sru; &rest;</ulink>
requests. Moreover, it can
translate incoming
- <ulink url="&url.cql;">CQL</ulink>
+ <ulink url="&url.cql;">&cql;</ulink>
queries to
- <ulink url="&url.yaz.pqf;">PQF</ulink>
+ <ulink url="&url.yaz.pqf;">&pqf;</ulink>
queries, if
correctly configured.
</para>
<para>
- <ulink url="&url.yaz;">YAZ</ulink>
+ <ulink url="&url.yaz;">&yaz;</ulink>
is an Open Source
toolkit that allows you to develop software using the
- ANSI Z39.50/ISO23950 standard for information retrieval.
+ &ansi; &z3950;/ISO23950 standard for information retrieval.
It is packaged in the Debian packages
<literal>yaz</literal> and <literal>libyaz</literal>.
</para>
<section id="componentmodulesalvis">
- <title>ALVIS XML Record Model and Filter Module</title>
+ <title>ALVIS &xml; Record Model and Filter Module</title>
<para>
- The Alvis filter for XML files is an XSLT based input
+ The Alvis filter for &xml; files is an &xslt; based input
filter.
- It indexes element and attribute content of any thinkable XML format
- using full XPATH support, a feature which the standard &zebra;
- GRS SGML and XML filters lacked. The indexed documents are
- parsed into a standard XML DOM tree, which restricts record size
+ It indexes element and attribute content of any thinkable &xml; format
+ using full &xpath; support, a feature which the standard &zebra;
+ &grs1; &sgml; and &xml; filters lacked. The indexed documents are
+ parsed into a standard &xml; &dom; tree, which restricts record size
according to availability of memory.
</para>
<para>
The Alvis filter
- uses XSLT display stylesheets, which let
+ uses &xslt; display stylesheets, which let
the &zebra; DB administrator associate multiple, different views on
- the same XML document type. These views are chosen on-the-fly in
+ the same &xml; document type. These views are chosen on-the-fly in
search time.
</para>
<para>
In addition, the Alvis filter configuration is not bound to the
- arcane BIB-1 Z39.50 library catalogue indexing traditions and
+ arcane &bib1; &z3950; library catalogue indexing traditions and
folklore, and is therefore easier to understand.
</para>
<para>
their Pagerank algorithm.
</para>
<para>
- Details on the experimental Alvis XSLT filter are found in
+ Details on the experimental Alvis &xslt; filter are found in
<xref linkend="record-model-alvisxslt"/>.
</para>
<para>
</section>
<section id="componentmodulesgrs">
- <title>GRS Record Model and Filter Modules</title>
+ <title>&grs1; Record Model and Filter Modules</title>
<para>
- The GRS filter modules described in
+ The &grs1; filter modules described in
<xref linkend="grs"/>
- are all based on the Z39.50 specifications, and it is absolutely
- mandatory to have the reference pages on BIB-1 attribute sets on
- you hand when configuring GRS filters. The GRS filters come in
+ are all based on the &z3950; specifications, and it is absolutely
+ mandatory to have the reference pages on &bib1; attribute sets on
+ you hand when configuring &grs1; filters. The GRS filters come in
different flavors, and a short introduction is needed here.
- GRS filters of various kind have also been called ABS filters due
+ &grs1; filters of various kind have also been called ABS filters due
to the <filename>*.abs</filename> configuration file suffix.
</para>
<para>
The <emphasis>grs.marc</emphasis> and
<emphasis>grs.marcxml</emphasis> filters are suited to parse and
- index binary and XML versions of traditional library MARC records
+ index binary and &xml; versions of traditional library &marc; records
based on the ISO2709 standard. The Debian package for both
filters is
<literal>libidzebra-2.0-mod-grs-marc</literal>.
</para>
<para>
- GRS TCL scriptable filters for extensive user configuration come
+ &grs1; TCL scriptable filters for extensive user configuration come
in two flavors: a regular expression filter
<emphasis>grs.regx</emphasis> using TCL regular expressions, and
a general scriptable TCL filter called
<literal>libidzebra-2.0-mod-grs-regx</literal> Debian package.
</para>
<para>
- A general purpose SGML filter is called
+ A general purpose &sgml; filter is called
<emphasis>grs.sgml</emphasis>. This filter is not yet packaged,
but planned to be in the
<literal>libidzebra-2.0-mod-grs-sgml</literal> Debian package.
<literal>libidzebra-2.0-mod-grs-xml</literal> includes the
<emphasis>grs.xml</emphasis> filter which uses <ulink
url="&url.expat;">Expat</ulink> to
- parse records in XML and turn them into ID&zebra;'s internal GRS node
- trees. Have also a look at the Alvis XML/XSLT filter described in
+ parse records in &xml; and turn them into ID&zebra;'s internal &grs1; node
+ trees. Have also a look at the Alvis &xml;/&xslt; filter described in
the next session.
</para>
</section>
<para>
When records are accessed by the system, they are represented
- in their local, or native format. This might be SGML or HTML files,
- News or Mail archives, MARC records. If the system doesn't already
+ in their local, or native format. This might be &sgml; or HTML files,
+ News or Mail archives, &marc; records. If the system doesn't already
know how to read the type of data you need to store, you can set up an
input filter by preparing conversion rules based on regular
expressions and possibly augmented by a flexible scripting language
<para>
Before transmitting records to the client, they are first
converted from the internal structure to a form suitable for exchange
- over the network - according to the Z39.50 standard.
+ over the network - according to the &z3950; standard.
</para>
</listitem>
In particular, the regular record filters are not invoked when
these are in use.
This can in some cases make the retrival faster than regular
- retrieval operations (for MARC, XML etc).
+ retrieval operations (for &marc;, &xml; etc).
</para>
<table id="special-retrieval-types">
<title>Special Retrieval Elements</title>
<row>
<entry><literal>zebra::meta::sysno</literal></entry>
<entry>Get &zebra; record system ID</entry>
- <entry>XML and SUTRS</entry>
+ <entry>&xml; and &sutrs;</entry>
</row>
<row>
<entry><literal>zebra::data</literal></entry>
<row>
<entry><literal>zebra::meta</literal></entry>
<entry>Get &zebra; record internal metadata</entry>
- <entry>XML and SUTRS</entry>
+ <entry>&xml; and &sutrs;</entry>
</row>
<row>
<entry><literal>zebra::index</literal></entry>
<entry>Get all indexed keys for record</entry>
- <entry>XML and SUTRS</entry>
+ <entry>&xml; and &sutrs;</entry>
</row>
<row>
<entry>
<entry>
Get indexed keys for field <replaceable>f</replaceable> for record
</entry>
- <entry>XML and SUTRS</entry>
+ <entry>&xml; and &sutrs;</entry>
</row>
<row>
<entry>
Get indexed keys for field <replaceable>f</replaceable>
and type <replaceable>t</replaceable> for record
</entry>
- <entry>XML and SUTRS</entry>
+ <entry>&xml; and &sutrs;</entry>
</row>
</tbody>
</tgroup>
Z> elements zebra::meta::sysno
Z> s 1+1
</screen>
- displays in <literal>XML</literal> record syntax only internal
+ displays in <literal>&xml;</literal> record syntax only internal
record system number, whereas
<screen>
Z> f @attr 1=title my
Z> s 1+1
</screen>
will display all indexed tokens from all indexed fields of the
- first record, and it will display in <literal>SUTRS</literal>
+ first record, and it will display in <literal>&sutrs;</literal>
record syntax, whereas
<screen>
Z> f @attr 1=title my
Z> elements zebra::index::title:p
Z> s 1+1
</screen>
- displays in <literal>XML</literal> record syntax only the content
+ displays in <literal>&xml;</literal> record syntax only the content
of the zebra string index <literal>title</literal>, or
even only the type <literal>p</literal> phrase indexed part of it.
</para>
<note>
<para>
- Trying to access numeric <literal>Bib-1</literal> use
+ Trying to access numeric <literal>&bib1;</literal> use
attributes or trying to access non-existent zebra intern string
access points will result in a Diagnostic 25: Specified element set
'name not valid for specified database.
<chapter id="examples">
- <!-- $Id: examples.xml,v 1.25 2007-02-02 09:58:39 marc Exp $ -->
+ <!-- $Id: examples.xml,v 1.26 2007-02-02 11:10:08 marc Exp $ -->
<title>Example Configurations</title>
<sect1 id="examples-overview">
</sect1>
<sect1 id="example1">
- <title>Example 1: XML Indexing And Searching</title>
+ <title>Example 1: &xml; Indexing And Searching</title>
<para>
This example shows how &zebra; can be used with absolutely minimal
configuration to index a body of
- <ulink url="&url.xml;">XML</ulink>
+ <ulink url="&url.xml;">&xml;</ulink>
documents, and search them using
<ulink url="&url.xpath;">XPath</ulink>
expressions to specify access points.
records are generated from the family tree in the file
<literal>dino.tree</literal>.)
Type <literal>make records/dino.xml</literal>
- to make the XML data file.
- (Or you could just type <literal>make dino</literal> to build the XML
+ to make the &xml; data file.
+ (Or you could just type <literal>make dino</literal> to build the &xml;
data file, create the database and populate it with the taxonomic
records all in one shot - but then you wouldn't learn anything,
would you? :-)
</para>
<para>
- Now we need to create a &zebra; database to hold and index the XML
+ Now we need to create a &zebra; database to hold and index the &xml;
records. We do this with the
&zebra; indexer, <command>zebraidx</command>, which is
driven by the <literal>zebra.cfg</literal> configuration file.
</para>
<para>
That's all you need for a minimal &zebra; configuration. Now you can
- roll the XML records into the database and build the indexes:
+ roll the &xml; records into the database and build the indexes:
<screen>
zebraidx update records
</screen>
<xref linkend="zebrasrv"/>.
</para>
<para>
- Now you can use the Z39.50 client program of your choice to execute
- XPath-based boolean queries and fetch the XML records that satisfy
+ Now you can use the &z3950; client program of your choice to execute
+ XPath-based boolean queries and fetch the &xml; records that satisfy
them:
<screen>
$ yaz-client @:9999
<para>
How, then, can we build broadcasting Information Retrieval
applications that look for records in many different databases?
- The Z39.50 protocol offers a powerful and general solution to this:
- abstract ``access points''. In the Z39.50 model, an access point
+ The &z3950; protocol offers a powerful and general solution to this:
+ abstract ``access points''. In the &z3950; model, an access point
is simply a point at which searches can be directed. Nothing is
said about implementation: in a given database, an access point
might be implemented as an index, a path into physical records, an
</para>
<para>
For convenience, access points are gathered into <firstterm>attribute
- sets</firstterm>. For example, the BIB-1 attribute set is supposed to
+ sets</firstterm>. For example, the &bib1; attribute set is supposed to
contain bibliographic access points such as author, title, subject
and ISBN; the GEO attribute set contains access points pertaining
to geospatial information (bounding coordinates, stratum, latitude
(provenance, inscriptions, etc.)
</para>
<para>
- In practice, the BIB-1 attribute set has tended to be a dumping
+ In practice, the &bib1; attribute set has tended to be a dumping
ground for all sorts of access points, so that, for example, it
includes some geospatial access points as well as strictly
bibliographic ones. Nevertheless, this model
records in databases.
</para>
<para>
- In the BIB-1 attribute set, a taxon name is probably best
+ In the &bib1; attribute set, a taxon name is probably best
interpreted as a title - that is, a phrase that identifies the item
- in question. BIB-1 represents title searches by
+ in question. &bib1; represents title searches by
access point 4. (See
- <ulink url="&url.z39.50.bib1.semantics;">The BIB-1 Attribute
+ <ulink url="&url.z39.50.bib1.semantics;">The &bib1; Attribute
Set Semantics</ulink>)
So we need to configure our dinosaur database so that searches for
- BIB-1 access point 4 look in the
+ &bib1; access point 4 look in the
<literal><termName></literal> element,
inside the top-level
<literal><Zthes></literal> element.
</para>
<para>
This is a two-step process. First, we need to tell &zebra; that we
- want to support the BIB-1 attribute set. Then we need to tell it
+ want to support the &bib1; attribute set. Then we need to tell it
which elements of its record pertain to access point 4.
</para>
<para>
</callout>
<callout arearefs="attset.attset">
<para>
- Declare Bib-1 attribute set. See <filename>bib1.att</filename> in
+ Declare &bib1; attribute set. See <filename>bib1.att</filename> in
&zebra;'s <filename>tab</filename> directory.
</para>
</callout>
<callout arearefs="termName">
<para>
Make <literal>termName</literal> word searchable by both
- Zthes attribute termName (1002) and Bib-1 atttribute title (4).
+ Zthes attribute termName (1002) and &bib1; atttribute title (4).
</para>
</callout>
</calloutlist>
</programlistingco>
<para>
- After re-indexing, we can search the database using Bib-1
+ After re-indexing, we can search the database using &bib1;
attribute, title, as follows:
<screen>
Z> form xml
Z> s
Sent presentRequest (1+1).
Records: 1
-[Default]Record type: XML
+[Default]Record type: &xml;
<Zthes>
<termId>2</termId>
<termName>Eoraptor</termName>
-<!-- $Id: installation.xml,v 1.34 2007-02-02 09:58:39 marc Exp $ -->
+<!-- $Id: installation.xml,v 1.35 2007-02-02 11:10:08 marc Exp $ -->
<chapter id="installation">
<title>Installation</title>
<para>
- &zebra; is written in ANSI C and was implemented with portability in mind.
+ &zebra; is written in &ansi; C and was implemented with portability in mind.
We primarily use <ulink url="&url.gcc;">GCC</ulink> on UNIX and
<ulink url="&url.vstudio;">Microsoft Visual C++</ulink> on Windows.
</para>
(required)</term>
<listitem>
<para>
- &zebra; uses YAZ to support <ulink url="&url.z39.50;">Z39.50</ulink> /
- <ulink url="&url.sru;">SRU</ulink>.
- Also the memory management utilites from YAZ is used by &zebra;.
+ &zebra; uses &yaz; to support <ulink url="&url.z39.50;">&z3950;</ulink> /
+ <ulink url="&url.sru;">&sru;</ulink>.
+ Also the memory management utilites from &yaz; is used by &zebra;.
</para>
</listitem>
</varlistentry>
(optional)</term>
<listitem>
<para>
- XML parser. If you're going to index real XML you should
+ &xml; parser. If you're going to index real &xml; you should
install this (filter grs.xml). On most systems you should be able
to find binary Expat packages.
</para>
<para>
On Unix, GCC works fine, but any native
C compiler should be possible to use as long as it is
- ANSI C compliant.
+ &ansi; C compliant.
</para>
<para>
<term><literal>zebrasrv</literal></term>
<listitem>
<para>
- The Z39.50 server and search engine.
+ The &z3950; server and search engine.
</para>
</listitem>
</varlistentry>
<para>
The <literal>.so</literal>-files are &zebra; record filter modules.
There are modules for reading
- MARC (<filename>mod-grs-marc.so</filename>),
- XML (<filename>mod-grs-xml.so</filename>) , etc.
+ &marc; (<filename>mod-grs-marc.so</filename>),
+ &xml; (<filename>mod-grs-xml.so</filename>) , etc.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><literal>YAZDIR</literal></term>
<listitem><para>
- Directory of YAZ source. &zebra;'s makefile expects to find
+ Directory of &yaz; source. &zebra;'s makefile expects to find
<filename>yaz.lib</filename>, <filename>yaz.dll</filename>
in <replaceable>yazdir</replaceable><literal>/lib</literal> and
<replaceable>yazdir</replaceable><literal>/bin</literal> respectively.
<para>
The <literal>DEBUG</literal> setting in the makefile for &zebra; must
be set to the same value as <literal>DEBUG</literal> setting in the
- makefile for YAZ.
+ makefile for &yaz;.
If not, the &zebra; server/indexer will crash.
</para>
</warning>
redirection to other fields.
For example the following snippet of
a custom <filename>custom/bib1.att</filename>
- Bib-1 attribute set definition file is no
+ &bib1; attribute set definition file is no
longer supported:
<screen>
att 1016 Any 1016,4,1005,62
</para>
<para>
Similar behaviour can be expressed in the new release by defining
- a new index <literal>Any:w</literal> in all GRS
+ a new index <literal>Any:w</literal> in all &grs1;
<filename>*.abs</filename> record indexing configuration files.
The above example configuration needs to make the changes
from version 1.3.x indexing instructions
<screen>
att 1016 Body-of-text
</screen>
- with equivalent outcome without editing all GRS
+ with equivalent outcome without editing all &grs1;
<filename>*.abs</filename> record indexing configuration files.
</para>
<para>
Server installations which use the special
- <literal>IDXPATH</literal> attribute set must add the following
+ <literal>&idxpath;</literal> attribute set must add the following
line to the <filename>zebra.cfg</filename> configuration file:
<screen>
attset: idxpath.att
<chapter id="introduction">
- <!-- $Id: introduction.xml,v 1.42 2007-02-02 09:58:39 marc Exp $ -->
+ <!-- $Id: introduction.xml,v 1.43 2007-02-02 11:10:08 marc Exp $ -->
<title>Introduction</title>
<section id="overview">
<para>
&zebra; is a free, fast, friendly information management system. It can
- index records in XML/SGML, MARC, e-mail archives and many other
+ index records in &xml;/&sgml;, &marc;, e-mail archives and many other
formats, and quickly find them using a combination of boolean
searching and relevance ranking. Search-and-retrieve applications can
- be written using APIs in a wide variety of languages, communicating
+ be written using &api;s in a wide variety of languages, communicating
with the &zebra; server using industry-standard information-retrieval
protocols or web services.
</para>
&zebra; is a networked component which acts as a reliable &z3950; server
for both record/document search, presentation, insert, update and
delete operations. In addition, it understands the &sru; family of
- webservices, which exist in REST GET/POST and truly SOAP flavors.
+ webservices, which exist in &rest; &get;/&post; and truly &soap; flavors.
</para>
<para>
&zebra; is available as MS Windows 2003 Server (32 bit) self-extracting
<ulink url="http://indexdata.dk/zebra/">&zebra;</ulink>
is a high-performance, general-purpose structured text
indexing and retrieval engine. It reads records in a
- variety of input formats (eg. email, XML, MARC) and provides access
+ variety of input formats (eg. email, &xml;, &marc;) and provides access
to them through a powerful combination of boolean search
expressions and relevance-ranked free-text queries.
</para>
&zebra; supports large databases (tens of millions of records,
tens of gigabytes of data). It allows safe, incremental
database updates on live systems. Because &zebra; supports
- the industry-standard information retrieval protocol, Z39.50,
+ the industry-standard information retrieval protocol, &z3950;,
you can search &zebra; databases using an enormous variety of
programs and toolkits, both commercial and free, which understand
this protocol. Application libraries are available to allow
bespoke clients to be written in Perl, C, C++, Java, Tcl, Visual
- Basic, Python, PHP and more - see the
- <ulink url="&url.zoom;">ZOOM web site</ulink>
+ Basic, Python, &php; and more - see the
+ <ulink url="&url.zoom;">&zoom; web site</ulink>
for more information on some of these client toolkits.
</para>
<tbody>
<row>
<entry>Boolean query language</entry>
- <entry>CQL and RPN/PQF</entry>
- <entry>The type-1 Reverse Polish Notation (RPN)
- and it's textual representation Prefix Query Format (PQF) are
- supported. The Common Query Language (CQL) can be configured as
- a mapping from CQL to RPN/PQF</entry>
+ <entry>&cql; and &rpn;/&pqf;</entry>
+ <entry>The type-1 Reverse Polish Notation (&rpn;)
+ and it's textual representation Prefix Query Format (&pqf;) are
+ supported. The Common Query Language (&cql;) can be configured as
+ a mapping from &cql; to &rpn;/&pqf;</entry>
<entry><xref linkend="querymodel-query-languages-pqf"/>
<xref linkend="querymodel-cql-to-pqf"/></entry>
</row>
<row>
<entry>Operation types</entry>
- <entry> Z39.50/SRU explain, search, and scan</entry>
+ <entry> &z3950;/&sru; explain, search, and scan</entry>
<entry></entry>
<entry><xref linkend="querymodel-operation-types"/></entry>
</row>
<row>
<entry>Recursive boolean query tree</entry>
- <entry>CQL and RPN/PQF</entry>
- <entry>Both CQL and RPN/PQF allow atomic query parts (APT) to
+ <entry>&cql; and &rpn;/&pqf;</entry>
+ <entry>Both &cql; and &rpn;/&pqf; allow atomic query parts (&apt;) to
be combined into complex boolean query trees</entry>
<entry><xref linkend="querymodel-rpn-tree"/></entry>
</row>
</row>
<row>
<entry>Complex semi-structured Documents</entry>
- <entry>XML and GRS-1 Documents</entry>
- <entry>Both XML and GRS-1 documents exhibit a DOM like internal
+ <entry>&xml; and &grs1; Documents</entry>
+ <entry>Both &xml; and &grs1; documents exhibit a &dom; like internal
representation allowing for complex indexing and display rules</entry>
<entry><xref linkend=""/></entry>
</row>
</row>
<row>
<entry>Input document formats</entry>
- <entry>XML, SGML, Text, ISO2709 (MARC)</entry>
+ <entry>&xml;, &sgml;, Text, ISO2709 (&marc;)</entry>
<entry>
A system of input filters driven by
regular expressions allows most ASCII-based
data formats to be easily processed.
- SGML, XML, ISO2709 (MARC), and raw text are also
+ &sgml;, &xml;, ISO2709 (&marc;), and raw text are also
supported.</entry>
<entry><xref linkend=""/></entry>
</row>
</row>
<row>
<entry>Remote update</entry>
- <entry>Z39.50 extended services</entry>
+ <entry>&z3950; extended services</entry>
<entry></entry>
<entry><xref linkend=""/></entry>
</row>
<entry><xref linkend=""/></entry>
</row>
<row>
- <entry>Z39.50</entry>
- <entry>Z39.50 protocol support</entry>
+ <entry>&z3950;</entry>
+ <entry>&z3950; protocol support</entry>
<entry> Protocol facilities: Init, Search, Present (retrieval),
Segmentation (support for very large records), Delete, Scan
(index browsing), Sort, Close and support for the ``update''
- Extended Service to add or replace an existing XML
+ Extended Service to add or replace an existing &xml;
record. Piggy-backed presents are honored in the search
request. Named result sets are supported.</entry>
<entry><xref linkend=""/></entry>
<entry>Record Syntaxes</entry>
<entry></entry>
<entry> Multiple record syntaxes
- for data retrieval: GRS-1, SUTRS,
- XML, ISO2709 (MARC), etc. Records can be mapped between record syntaxes
+ for data retrieval: &grs1;, &sutrs;,
+ &xml;, ISO2709 (&marc;), etc. Records can be mapped between record syntaxes
and schemas on the fly.</entry>
<entry><xref linkend=""/></entry>
</row>
<row>
<entry>Web Service support</entry>
- <entry>SRU GET/POST/SOAP</entry>
+ <entry>&sru_gps;</entry>
<entry> The protocol operations <literal>explain</literal>,
<literal>searchRetrieve</literal> and <literal>scan</literal>
- are supported. <ulink url="&url.cql;">CQL</ulink> to internal
- query model RPN conversion is supported. Extended RPN queries
+ are supported. <ulink url="&url.cql;">&cql;</ulink> to internal
+ query model &rpn; conversion is supported. Extended RPN queries
for search/retrieve and scan are supported.</entry>
<entry><xref linkend=""/></entry>
</row>
</para>
<para>
In early 2005, the Koha project development team began looking at
- ways to improve MARC support and overcome scalability limitations
+ ways to improve &marc; support and overcome scalability limitations
in the Koha 2.x series. After extensive evaluations of the best
of the Open Source textual database engines - including MySQL
full-text searching, PostgreSQL, Lucene and Plucene - the team
and relevance-ranked free-text queries, both of which the Koha
2.x series lack. &zebra; also supports incremental and safe
database updates, which allow on-the-fly record
- management. Finally, since &zebra; has at its heart the Z39.50
+ management. Finally, since &zebra; has at its heart the &z3950;
protocol, it greatly improves Koha's support for that critical
library standard."
</para>
from virtually any computer with an Internet connection, has
template based layout allowing anyone to alter the visual
appearance of Emilda, and is
- XML based language for fast and easy portability to virtually any
+ &xml; based language for fast and easy portability to virtually any
language.
Currently, Emilda is used at three schools in Espoo, Finland.
</para>
<para>
- As a surplus, 100% MARC compatibility has been achieved using the
+ As a surplus, 100% &marc; compatibility has been achieved using the
&zebra; Server from Index Data as backend server.
</para>
</section>
is a netbased library service offering all
traditional functions on a very high level plus many new
services. Reindex.net is a comprehensive and powerful WEB system
- based on standards such as XML and Z39.50.
- updates. Reindex supports MARC21, danMARC eller Dublin Core with
+ based on standards such as &xml; and &z3950;.
+ updates. Reindex supports &marc21;, dan&marc; eller Dublin Core with
UTF8-encoding.
</para>
<para>
Reindex.net runs on GNU/Debian Linux with &zebra; and Simpleserver
from Index
Data for bibliographic data. The relational database system
- Sybase 9 XML is used for
+ Sybase 9 &xml; is used for
administrative data.
- Internally MARCXML is used for bibliographical records. Update
- utilizes Z39.50 extended services.
+ Internally &marcxml; is used for bibliographical records. Update
+ utilizes &z3950; extended services.
</para>
</section>
The &zebra; information retrieval indexing machine is used inside
the Alvis framework to
manage huge collections of natural language processed and
- enhanced XML data, coming from a topic relevant web crawl.
- In this application, &zebra; swallows and manages 37GB of XML data
+ enhanced &xml; data, coming from a topic relevant web crawl.
+ In this application, &zebra; swallows and manages 37GB of &xml; data
in about 4 hours, resulting in search times of fractions of
seconds.
</para>
<para>
The member libraries send in data files representing their
periodicals, including both brief bibliographic data and summary
- holdings. Then 21 individual Z39.50 targets are created, each
+ holdings. Then 21 individual &z3950; targets are created, each
using &zebra;, and all mounted on the single hardware server.
- The live service provides a web gateway allowing Z39.50 searching
+ The live service provides a web gateway allowing &z3950; searching
of all of the targets or a selection of them. &zebra;'s small
footprint allows a relatively modest system to comfortably host
the 21 servers.
</section>
<section id="nli">
- <title>NLI-Z39.50 - a Natural Language Interface for Libraries</title>
+ <title>NLI-&z3950; - a Natural Language Interface for Libraries</title>
<para>
Fernuniversität Hagen in Germany have developed a natural
language interface for access to library databases.
In order to evaluate this interface for recall and precision, they
chose &zebra; as the basis for retrieval effectiveness. The &zebra;
server contains a copy of the GIRT database, consisting of more
- than 76000 records in SGML format (bibliographic records from
- social science), which are mapped to MARC for presentation.
+ than 76000 records in &sgml; format (bibliographic records from
+ social science), which are mapped to &marc; for presentation.
</para>
<para>
(GIRT is the German Indexing and Retrieval Testdatabase. It is a
<listitem>
<para>
- Improved support for XML in search and retrieval. Eventually,
+ Improved support for &xml; in search and retrieval. Eventually,
the goal is for &zebra; to pull double duty as a flexible
- information retrieval engine and high-performance XML
+ information retrieval engine and high-performance &xml;
repository. The recent addition of XPath searching is one
example of the kind of enhancement we're working on.
</para>
<para>
- There is also the experimental <literal>ALVIS XSLT</literal>
- XML input filter, which unleashes the full power of DOM based
- XSLT transformations during indexing and record retrieval. Work
+ There is also the experimental <literal>ALVIS &xslt;</literal>
+ &xml; input filter, which unleashes the full power of &dom; based
+ &xslt; transformations during indexing and record retrieval. Work
on this filter has been sponsored by the ALVIS EU project
<ulink url="http://www.alvis.info/alvis/"/>. We expect this filter to
mature soon, as it is planned to be included in the version 2.0
<listitem>
<para>
Finalisation and documentation of &zebra;'s C programming
- API, allowing updates, database management and other functions
- not readily expressed in Z39.50. We will also consider
- exposing the API through SOAP.
+ &api;, allowing updates, database management and other functions
+ not readily expressed in &z3950;. We will also consider
+ exposing the &api; through &soap;.
</para>
</listitem>
<?xml version="1.0" encoding="iso-8859-1" standalone="no" ?>
-<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
+<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook &xml; V4.2//EN"
"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
-<!-- $Id: marc_indexing.xml,v 1.4 2007-02-02 09:58:39 marc Exp $ -->
+<!-- $Id: marc_indexing.xml,v 1.5 2007-02-02 11:10:08 marc Exp $ -->
<book id="marc_indexing">
<bookinfo>
- <title>Indexing of MARC records by &zebra;</title>
+ <title>Indexing of &marc; records by &zebra;</title>
<abstract>
- <simpara>&zebra; is suitable for distribution of MARC records via Z39.50. We
- have a several possibilities to describe the indexing process of MARC records.
+ <simpara>&zebra; is suitable for distribution of &marc; records via &z3950;. We
+ have a several possibilities to describe the indexing process of &marc; records.
This document shows these possibilities.
</simpara>
</abstract>
</bookinfo>
<chapter id="simple">
- <title>Simple indexing of MARC records</title>
+ <title>Simple indexing of &marc; records</title>
<para>Simple indexing is not described yet.</para>
</chapter>
<chapter id="extended">
- <title>Extended indexing of MARC records</title>
+ <title>Extended indexing of &marc; records</title>
-<para>Extended indexing of MARC records will help you if you need index a
+<para>Extended indexing of &marc; records will help you if you need index a
combination of subfields, or index only a part of the whole field,
-or use during indexing process embedded fields of MARC record.
+or use during indexing process embedded fields of &marc; record.
</para>
-<para>Extended indexing of MARC records additionally allows:
+<para>Extended indexing of &marc; records additionally allows:
<itemizedlist>
<listitem>
-<para>to index data in LEADER of MARC record</para>
+<para>to index data in LEADER of &marc; record</para>
</listitem>
<listitem>
</listitem>
<listitem>
-<para>to index linked fields for UNIMARC based formats</para>
+<para>to index linked fields for UNI&marc; based formats</para>
</listitem>
</itemizedlist>
</para>
<note><para>In compare with simple indexing process the extended indexing
-may increase (about 2-3 times) the time of indexing process for MARC
+may increase (about 2-3 times) the time of indexing process for &marc;
records.</para></note>
<sect1 id="formula">
<title>The index-formula</title>
<para>At the beginning, we have to define the term <emphasis>index-formula</emphasis>
-for MARC records. This term helps to understand the notation of extended indexing of MARC records
+for &marc; records. This term helps to understand the notation of extended indexing of MARC records
by &zebra;. Our definition is based on the document <ulink url="http://www.rba.ru/rusmarc/soft/Z39-50.htm">"The
-table of conformity for Z39.50 use attributes and RUSMARC fields"</ulink>.
+table of conformity for &z3950; use attributes and R&usmarc; fields"</ulink>.
The document is available only in russian language.</para>
<para>The <emphasis>index-formula</emphasis> is the combination of subfields presented in such way:</para>
71-00$a, $g, $h ($c){.$b ($c)} , (1)
</screen>
-<para>We know that &zebra; supports a Bib-1 attribute - right truncation.
+<para>We know that &zebra; supports a &bib1; attribute - right truncation.
In this case, the <emphasis>index-formula</emphasis> (1) consists from
forms, defined in the same way as (1)</para>
71-00$a
</screen>
-<note><para>The original MARC record may be without some elements, which included in <emphasis>index-formula</emphasis>.</para>
+<note><para>The original &marc; record may be without some elements, which included in <emphasis>index-formula</emphasis>.</para>
</note>
<para>This notation includes such operands as:
<varlistentry>
<term>-</term>
- <listitem><para>The position may contain any value, defined by MARC format.
+ <listitem><para>The position may contain any value, defined by &marc; format.
For example, <emphasis>index-formula</emphasis></para>
<screen>
</varlistentry>
</variablelist>
-<note><para>All another operands are the same as accepted in MARC world.</para>
+<note><para>All another operands are the same as accepted in &marc; world.</para>
</note>
</para>
</sect1>
(<literal>.abs</literal> file). It means that names beginning with
<literal>"mc-"</literal> are interpreted by &zebra; as
<emphasis>index-formula</emphasis>. The database index is created and
-linked with <emphasis>access point</emphasis> (Bib-1 use attribute)
+linked with <emphasis>access point</emphasis> (&bib1; use attribute)
according to this formula.</para>
<para>For example, <emphasis>index-formula</emphasis></para>
<varlistentry>
<term>.</term>
-<listitem><para>The position may contain any value, defined by MARC format. For example,
+<listitem><para>The position may contain any value, defined by &marc; format. For example,
<emphasis>index-formula</emphasis></para>
<screen>
</para>
<note>
-<para>All another operands are the same as accepted in MARC world.</para>
+<para>All another operands are the same as accepted in &marc; world.</para>
</note>
<sect2>
elm mc-008[0-5] Date/time-added-to-db !
</screen>
-<para>or for RUSMARC (this data included in 100th field)</para>
+<para>or for R&usmarc; (this data included in 100th field)</para>
<screen>
elm mc-100___$a[0-7]_ Date/time-added-to-db !
<para>using indicators while indexing</para>
-<para>For RUSMARC <emphasis>index-formula</emphasis>
+<para>For R&usmarc; <emphasis>index-formula</emphasis>
<literal>70-#1$a, $g</literal> matches</para>
<screen>
<listitem>
-<para>indexing embedded (linked) fields for UNIMARC based formats</para>
+<para>indexing embedded (linked) fields for UNI&marc; based formats</para>
-<para>For RUSMARC <emphasis>index-formula</emphasis>
+<para>For R&usmarc; <emphasis>index-formula</emphasis>
<literal>4--#-$170-#1$a, $g ($c)</literal> matches</para>
<screen>
<chapter id="querymodel">
- <!-- $Id: querymodel.xml,v 1.29 2007-02-02 09:58:39 marc Exp $ -->
+ <!-- $Id: querymodel.xml,v 1.30 2007-02-02 11:10:08 marc Exp $ -->
<title>Query Model</title>
<section id="querymodel-overview">
<para>
&zebra; is born as a networking Information Retrieval engine adhering
to the international standards
- <ulink url="&url.z39.50;">Z39.50</ulink> and
- <ulink url="&url.sru;">SRU</ulink>,
+ <ulink url="&url.z39.50;">&z3950;</ulink> and
+ <ulink url="&url.sru;">&sru;</ulink>,
and implement the
- type-1 Reverse Polish Notation (RPN) query
+ type-1 Reverse Polish Notation (&rpn;) query
model defined there.
Unfortunately, this model has only defined a binary
encoded representation, which is used as transport packaging in
- the Z39.50 protocol layer. This representation is not human
+ the &z3950; protocol layer. This representation is not human
readable, nor defines any convenient way to specify queries.
</para>
<para>
- Since the type-1 (RPN)
+ Since the type-1 (&rpn;)
query structure has no direct, useful string
representation, every client application needs to provide some
form of mapping from a local query notation or representation to it.
<section id="querymodel-query-languages-pqf">
- <title>Prefix Query Format (PQF)</title>
+ <title>Prefix Query Format (&pqf;)</title>
<para>
Index Data has defined a textual representation in the
<ulink url="&url.yaz.pqf;">Prefix Query Format</ulink>, short
- <emphasis>PQF</emphasis>, which maps
+ <emphasis>&pqf;</emphasis>, which maps
one-to-one to binary encoded
- <emphasis>type-1 RPN</emphasis> queries.
- PQF has been adopted by other
- parties developing Z39.50 software, and is often referred to as
+ <emphasis>type-1 &rpn;</emphasis> queries.
+ &pqf; has been adopted by other
+ parties developing &z3950; software, and is often referred to as
<emphasis>Prefix Query Notation</emphasis>, or in short
- PQN. See
+ &pqn;. See
<xref linkend="querymodel-rpn"/> for further explanations and
descriptions of &zebra;'s capabilities.
</para>
</section>
<section id="querymodel-query-languages-cql">
- <title>Common Query Language (CQL)</title>
+ <title>Common Query Language (&cql;)</title>
<para>
- The query model of the type-1 RPN,
- expressed in PQF/PQN is natively supported.
- On the other hand, the default SRU
+ The query model of the type-1 &rpn;,
+ expressed in &pqf;/&pqn; is natively supported.
+ On the other hand, the default &sru;
web services <emphasis>Common Query Language</emphasis>
- <ulink url="&url.cql;">CQL</ulink> is not natively supported.
+ <ulink url="&url.cql;">&cql;</ulink> is not natively supported.
</para>
<para>
- &zebra; can be configured to understand and map CQL to PQF. See
+ &zebra; can be configured to understand and map &cql; to &pqf;. See
<xref linkend="querymodel-cql-to-pqf"/>.
</para>
</section>
<title>Operation types</title>
<para>
&zebra; supports all of the three different
- Z39.50/SRU operations defined in the
+ &z3950;/&sru; operations defined in the
standards: explain, search,
and scan. A short description of the
functionality and purpose of each is quite in order here.
<section id="querymodel-operation-type-explain">
<title>Explain Operation</title>
<para>
- The <emphasis>syntax</emphasis> of Z39.50/SRU queries is
+ The <emphasis>syntax</emphasis> of &z3950;/&sru; queries is
well known to any client, but the specific
<emphasis>semantics</emphasis> - taking into account a
particular servers functionalities and abilities - must be
of the general query model are supported.
</para>
<para>
- The Z39.50 embeds the explain operation
+ The &z3950; embeds the explain operation
by performing a
search in the magic
<literal>IR-Explain-1</literal> database;
see <xref linkend="querymodel-exp1"/>.
</para>
<para>
- In SRU, explain is an entirely separate
- operation, which returns an ZeeRex XML record according to the
+ In &sru;, explain is an entirely separate
+ operation, which returns an ZeeRex &xml; record according to the
structure defined by the protocol.
</para>
<para>
simple free text searches to nested complex boolean queries,
targeting specific indexes, and possibly enhanced with many
query semantic specifications. Search interactions are the heart
- and soul of Z39.50/SRU servers.
+ and soul of &z3950;/&sru; servers.
</para>
</section>
<section id="querymodel-rpn">
- <title>RPN queries and semantics</title>
+ <title>&rpn; queries and semantics</title>
<para>
- The <ulink url="&url.yaz.pqf;">PQF grammar</ulink>
- is documented in the YAZ manual, and shall not be
- repeated here. This textual PQF representation
+ The <ulink url="&url.yaz.pqf;">&pqf; grammar</ulink>
+ is documented in the &yaz; manual, and shall not be
+ repeated here. This textual &pqf; representation
is not transmistted to &zebra; during search, but it is in the
- client mapped to the equivalent Z39.50 binary
+ client mapped to the equivalent &z3950; binary
query parse tree.
</para>
<section id="querymodel-rpn-tree">
- <title>RPN tree structure</title>
+ <title>&rpn; tree structure</title>
<para>
- The RPN parse tree - or the equivalent textual representation in PQF -
+ The &rpn; parse tree - or the equivalent textual representation in &pqf; -
may start with one specification of the
<emphasis>attribute set</emphasis> used. Following is a query
tree, which
- consists of <emphasis>atomic query parts (APT)</emphasis> or
+ consists of <emphasis>atomic query parts (&apt;)</emphasis> or
<emphasis>named result sets</emphasis>, eventually
paired by <emphasis>boolean binary operators</emphasis>, and
finally <emphasis>recursively combined </emphasis> into
<thead>
<row>
<entry>Attribute set</entry>
- <entry>PQF notation (Short hand)</entry>
+ <entry>&pqf; notation (Short hand)</entry>
<entry>Status</entry>
<entry>Notes</entry>
</row>
<entry>predefined</entry>
</row>
<row>
- <entry>Bib-1</entry>
+ <entry>&bib1;</entry>
<entry><literal>bib-1</literal></entry>
- <entry>Standard PQF query language attribute set which defines the
- semantics of Z39.50 searching. In addition, all of the
+ <entry>Standard &pqf; query language attribute set which defines the
+ semantics of &z3950; searching. In addition, all of the
non-use attributes (types 2-12) define the hard-wired
&zebra; internal query
processing.</entry>
<row>
<entry>GILS</entry>
<entry><literal>gils</literal></entry>
- <entry>Extension to the Bib-1 attribute set.</entry>
+ <entry>Extension to the &bib1; attribute set.</entry>
<entry>predefined</entry>
</row>
<!--
<row>
- <entry>IDXPATH</entry>
+ <entry>&idxpath;</entry>
<entry><literal>idxpath</literal></entry>
- <entry>Hardwired XPATH like attribute set, only available for
- indexing with the GRS record model</entry>
+ <entry>Hardwired &xpath; like attribute set, only available for
+ indexing with the &grs1; record model</entry>
<entry>deprecated</entry>
</row>
-->
<note>
<para>
The &zebra; internal query processing is modeled after
- the Bib-1 attribute set, and the non-use
+ the &bib1; attribute set, and the non-use
attributes type 2-6 are hard-wired in. It is therefore essential
to be familiar with <xref linkend="querymodel-bib1-nonuse"/>.
</para>
<emphasis>retrieval</emphasis>, taking proximity into account:
The hit set is a subset of the corresponding
AND query
- (see the <ulink url="&url.yaz.pqf;">PQF grammar</ulink> for
+ (see the <ulink url="&url.yaz.pqf;">&pqf; grammar</ulink> for
details on the proximity operator):
<screen>
Z> find @prox 0 3 0 2 k 2 information retrieval
<section id="querymodel-atomic-queries">
- <title>Atomic queries (APT)</title>
+ <title>Atomic queries (&apt;)</title>
<para>
Atomic queries are the query parts which work on one access point
only. These consist of <emphasis>an attribute list</emphasis>
followed by a <emphasis>single term</emphasis> or a
<emphasis>quoted term list</emphasis>, and are often called
- <emphasis>Attributes-Plus-Terms (APT)</emphasis> queries.
+ <emphasis>Attributes-Plus-Terms (&apt;)</emphasis> queries.
</para>
<para>
- Atomic (APT) queries are always leaf nodes in the PQF query tree.
+ Atomic (&apt;) queries are always leaf nodes in the &pqf; query tree.
UN-supplied non-use attributes types 2-12 are either inherited from
higher nodes in the query tree, or are set to &zebra;'s default values.
See <xref linkend="querymodel-bib1"/> for details.
</para>
<table id="querymodel-atomic-queries-table" frame="top">
- <title>Atomic queries (APT)</title>
+ <title>Atomic queries (&apt;)</title>
<tgroup cols="3">
<thead>
<row>
<para>
The <emphasis>scan</emphasis> operation is only supported with
- atomic APT queries, as it is bound to one access point at a
+ atomic &apt; queries, as it is bound to one access point at a
time. Boolean query trees are not allowed during
<emphasis>scan</emphasis>.
</para>
<para>
Named result sets are supported in &zebra;, and result sets can be
used as operands without limitations. It follows that named
- result sets are leaf nodes in the PQF query tree, exactly as
- atomic APT queries are.
+ result sets are leaf nodes in the &pqf; query tree, exactly as
+ atomic &apt; queries are.
</para>
<para>
After the execution of a search, the result set is available at
<note>
<para>
- Named result sets are only supported by the Z39.50 protocol.
- The SRU web service is stateless, and therefore the notion of
+ Named result sets are only supported by the &z3950; protocol.
+ The &sru; web service is stateless, and therefore the notion of
named result sets does not exist when accessing a &zebra; server by
- the SRU protocol.
+ the &sru; protocol.
</para>
</note>
</section>
<para>
It is possible to search
in any silly string index - if it's defined in your
- indexation rules and can be parsed by the PQF parser.
+ indexation rules and can be parsed by the &pqf; parser.
This is definitely not the recommended use of
this facility, as it might confuse your users with some very
unexpected results.
<para>
See also <xref linkend="querymodel-pqf-apt-mapping"/> for details, and
<xref linkend="zebrasrv-sru"/>
- for the SRU PQF query extension using string names as a fast
+ for the &sru; &pqf; query extension using string names as a fast
debugging facility.
</para>
</section>
<section id="querymodel-use-xpath">
<title>&zebra;'s special access point of type 'XPath'
- for GRS filters</title>
+ for &grs1; filters</title>
<para>
As we have seen above, it is possible (albeit seldom a great
idea) to emulate
be defined at indexation time, no new undefined
XPath queries can entered at search time, and second, it might
confuse users very much that an XPath-alike index name in fact
- gets populated from a possible entirely different XML element
+ gets populated from a possible entirely different &xml; element
than it pretends to access.
</para>
<para>
- When using the GRS Record Model
+ When using the &grs1; Record Model
(see <xref linkend="grs"/>), we have the
possibility to embed <emphasis>life</emphasis>
XPath expressions
- in the PQF queries, which are here called
+ in the &pqf; queries, which are here called
<emphasis>use (type 1)</emphasis> <emphasis>xpath</emphasis>
attributes. You must enable the
<literal>xpath enable</literal> directive in your
<para>
Only a <emphasis>very</emphasis> restricted subset of the
<ulink url="http://www.w3.org/TR/xpath">XPath 1.0</ulink>
- standard is supported as the GRS record model is simpler than
- a full XML DOM structure. See the following examples for
+ standard is supported as the &grs1; record model is simpler than
+ a full &xml; &dom; structure. See the following examples for
possibilities.
</para>
</note>
<para>
Finding all documents which have the term "content"
- inside a text node found in a specific XML DOM
+ inside a text node found in a specific &xml; &dom;
<emphasis>subtree</emphasis>, whose starting element is
addressed by XPath.
<screen>
<para>
Filter the addressing XPath by a predicate working on exact
string values in
- attributes (in the XML sense) can be done: return all those docs which
+ attributes (in the &xml; sense) can be done: return all those docs which
have the term "english" contained in one of all text sub nodes of
the subtree defined by the XPath
<literal>/record/title[@lang='en']</literal>. And similar
</screen>
</para>
<para>
- Escaping PQF keywords and other non-parseable XPath constructs
- with <literal>'{ }'</literal> to prevent client-side PQF parsing
+ Escaping &pqf; keywords and other non-parseable XPath constructs
+ with <literal>'{ }'</literal> to prevent client-side &pqf; parsing
syntax errors:
<screen>
Z> find @attr {1=/root/first[@attr='danish']} content
<section id="querymodel-exp1">
<title>Explain Attribute Set</title>
<para>
- The Z39.50 standard defines the
+ The &z3950; standard defines the
<ulink url="&url.z39.50.explain;">Explain</ulink> attribute set
Exp-1, which is used to discover information
about a server's search semantics and functional capabilities
</para>
<para>
In addition, the non-Use
- Bib-1 attributes, that is, the types
+ &bib1; attributes, that is, the types
<emphasis>Relation</emphasis>, <emphasis>Position</emphasis>,
<emphasis>Structure</emphasis>, <emphasis>Truncation</emphasis>,
and <emphasis>Completeness</emphasis> are imported from
- the Bib-1 attribute set, and may be used
+ the &bib1; attribute set, and may be used
within any explain query.
</para>
</para>
<para>
See <filename>tab/explain.att</filename> and the
- <ulink url="&url.z39.50;">Z39.50</ulink> standard
+ <ulink url="&url.z39.50;">&z3950;</ulink> standard
for more information.
</para>
</section>
<title>Explain searches with yaz-client</title>
<para>
Classic Explain only defines retrieval of Explain information
- via ASN.1. Practically no Z39.50 clients supports this. Fortunately
+ via ASN.1. Practically no &z3950; clients supports this. Fortunately
they don't have to - &zebra; allows retrieval of this information
in other formats:
- <literal>SUTRS</literal>, <literal>XML</literal>,
- <literal>GRS-1</literal> and <literal>ASN.1</literal> Explain.
+ <literal>&sutrs;</literal>, <literal>&xml;</literal>,
+ <literal>&grs1;</literal> and <literal>ASN.1</literal> Explain.
</para>
<para>
<literal>Default</literal>.
This query is very useful to study the internal &zebra; indexes.
If records have been indexed using the <literal>alvis</literal>
- XSLT filter, the string representation names of the known indexes can be
+ &xslt; filter, the string representation names of the known indexes can be
found.
<screen>
Z> base IR-Explain-1
</section>
<section id="querymodel-bib1">
- <title>Bib-1 Attribute Set</title>
+ <title>&bib1; Attribute Set</title>
<para>
Most of the information contained in this section is an excerpt of
- the ATTRIBUTE SET BIB-1 (Z39.50-1995) SEMANTICS
- found at <ulink url="&url.z39.50.attset.bib1.1995;">. The Bib-1
+ the ATTRIBUTE SET &bib1; (&z3950;-1995) SEMANTICS
+ found at <ulink url="&url.z39.50.attset.bib1.1995;">. The &bib1;
Attribute Set Semantics</ulink> from 1995, also in an updated
- <ulink url="&url.z39.50.attset.bib1;">Bib-1
+ <ulink url="&url.z39.50.attset.bib1;">&bib1;
Attribute Set</ulink>
version from 2003. Index Data is not the copyright holder of this
information, except for the configuration details, the listing of
<filename>tab/gils.att</filename>.
</para>
<para>
- For example, some few Bib-1 use
+ For example, some few &bib1; use
attributes from the <filename>tab/bib1.att</filename> are:
<screen>
att 1 Personal-name
<emphasis>AlwaysMatches (103)</emphasis> is a
great way to discover how many documents have been indexed in a
given field. The search term is ignored, but needed for correct
- PQF syntax. An empty search term may be supplied.
+ &pqf; syntax. An empty search term may be supplied.
<screen>
Z> find @attr 1=Title @attr 2=103 ""
Z> find @attr 1=Title @attr 2=103 @attr 4=1 ""
is supported, and maps to the boolean <literal>AND</literal>
combination of words supplied. The word list is useful when
google-like bag-of-word queries need to be translated from a GUI
- query language to PQF. For example, the following queries
+ query language to &pqf;. For example, the following queries
are equivalent:
<screen>
Z> find @attr 1=Title @attr 4=6 "mozart amadeus"
</para>
<note>
<para>
- The exact mapping between PQF queries and &zebra; internal indexes
+ The exact mapping between &pqf; queries and &zebra; internal indexes
and index types is explained in
<xref linkend="querymodel-pqf-apt-mapping"/>.
</para>
</para>
<para>
The <literal>Complete subfield (2)</literal> is a reminiscens
- from the happy <literal>MARC</literal>
+ from the happy <literal>&marc;</literal>
binary format days. &zebra; does not support it, but maps silently
to <literal>Complete field (3)</literal>.
</para>
<note>
<para>
- The exact mapping between PQF queries and &zebra; internal indexes
+ The exact mapping between &pqf; queries and &zebra; internal indexes
and index types is explained in
<xref linkend="querymodel-pqf-apt-mapping"/>.
</para>
<section id="querymodel-zebra">
- <title>Extended &zebra; RPN Features</title>
+ <title>Extended &zebra; &rpn; Features</title>
<para>
The &zebra; internal query engine has been extended to specific needs
not covered by the <literal>bib-1</literal> attribute set query
<section id="querymodel-zebra-attr-search">
<title>&zebra; specific Search Extensions to all Attribute Sets</title>
<para>
- &zebra; extends the Bib-1 attribute types, and these extensions are
+ &zebra; extends the &bib1; attribute types, and these extensions are
recognized regardless of attribute
set used in a <literal>search</literal> operation query.
</para>
The possible values after attribute <literal>type 7</literal> are
<literal>1</literal> ascending and
<literal>2</literal> descending.
- The attributes+term (APT) node is separate from the
+ The attributes+term (&apt;) node is separate from the
rest and must be <literal>@or</literal>'ed.
- The term associated with APT is the sorting level in integers,
+ The term associated with &apt; is the sorting level in integers,
where <literal>0</literal> means primary sort,
<literal>1</literal> means secondary sort, and so forth.
See also <xref linkend="administration-ranking"/>.
a scan-like facility. Requires a client that can do named result
sets since the search generates two result sets. The value for
attribute 8 is the name of a result set (string). The terms in
- the named term set are returned as SUTRS records.
+ the named term set are returned as &sutrs; records.
</para>
<para>
For example, searching for u in title, right truncated, and
<title>&zebra; Extension Rank Weight Attribute (type 9)</title>
<para>
Rank weight is a way to pass a value to a ranking algorithm - so
- that one APT has one value - while another as a different one.
+ that one &apt; has one value - while another as a different one.
See also <xref linkend="administration-ranking"/>.
</para>
<para>
&zebra; supports the searchResult-1 facility.
If the Term Reference Attribute (type 10) is
given, that specifies a subqueryId value returned as part of the
- search result. It is a way for a client to name an APT part of a
+ search result. It is a way for a client to name an &apt; part of a
query.
</para>
<!--
<title>Local Approximative Limit Attribute (type 11)</title>
<para>
&zebra; computes - unless otherwise configured -
- the exact hit count for every APT
+ the exact hit count for every &apt;
(leaf) in the query tree. These hit counts are returned as part of
- the searchResult-1 facility in the binary encoded Z39.50 search
+ the searchResult-1 facility in the binary encoded &z3950; search
response packages.
</para>
<para>
- By setting an estimation limit size of the resultset of the APT
+ By setting an estimation limit size of the resultset of the &apt;
leaves, &zebra; stoppes processing the result set when the limit
length is reached.
Hit counts under this limit are still precise, but hit counts over it
</para>
<para>
The attribute (12) can occur anywhere in the query tree.
- Unlike regular attributes it does not relate to the leaf (APT)
+ Unlike regular attributes it does not relate to the leaf (&apt;)
- but to the whole query.
</para>
<warning>
</section>
<section id="querymodel-idxpath">
- <title>&zebra; special IDXPATH Attribute Set for GRS indexing</title>
+ <title>&zebra; special &idxpath; Attribute Set for &grs1; indexing</title>
<para>
The attribute-set <literal>idxpath</literal> consists of a single
Use (type 1) attribute. All non-use attributes behave as normal.
</para>
<para>
This feature is enabled when defining the
- <literal>xpath enable</literal> option in the GRS filter
+ <literal>xpath enable</literal> option in the &grs1; filter
<filename>*.abs</filename> configuration files. If one wants to use
the special <literal>idxpath</literal> numeric attribute set, the
main &zebra; configuration file <filename>zebra.cfg</filename>
</warning>
<section id="querymodel-idxpath-use">
- <title>IDXPATH Use Attributes (type = 1)</title>
+ <title>&idxpath; Use Attributes (type = 1)</title>
<para>
- This attribute set allows one to search GRS filter indexed
- records by XPATH like structured index names.
+ This attribute set allows one to search &grs1; filter indexed
+ records by &xpath; like structured index names.
</para>
<warning>
</warning>
<table id="querymodel-idxpath-use-table" frame="top">
- <title>&zebra; specific IDXPATH Use Attributes (type 1)</title>
+ <title>&zebra; specific &idxpath; Use Attributes (type 1)</title>
<tgroup cols="4">
<thead>
<row>
- <entry>IDXPATH</entry>
+ <entry>&idxpath;</entry>
<entry>Value</entry>
<entry>String Index</entry>
<entry>Notes</entry>
</thead>
<tbody>
<row>
- <entry>XPATH Begin</entry>
+ <entry>&xpath; Begin</entry>
<entry>1</entry>
<entry>_XPATH_BEGIN</entry>
<entry>deprecated</entry>
</row>
<row>
- <entry>XPATH End</entry>
+ <entry>&xpath; End</entry>
<entry>2</entry>
<entry>_XPATH_END</entry>
<entry>deprecated</entry>
</row>
<row>
- <entry>XPATH CData</entry>
+ <entry>&xpath; CData</entry>
<entry>1016</entry>
<entry>_XPATH_CDATA</entry>
<entry>deprecated</entry>
</row>
<row>
- <entry>XPATH Attribute Name</entry>
+ <entry>&xpath; Attribute Name</entry>
<entry>3</entry>
<entry>_XPATH_ATTR_NAME</entry>
<entry>deprecated</entry>
</row>
<row>
- <entry>XPATH Attribute CData</entry>
+ <entry>&xpath; Attribute CData</entry>
<entry>1015</entry>
<entry>_XPATH_ATTR_CDATA</entry>
<entry>deprecated</entry>
</screen>
</para>
<para>
- Search for all documents where specific nested XPATH
+ Search for all documents where specific nested &xpath;
<literal>/c1/c2/../cn</literal> exists. Notice the very
counter-intuitive <emphasis>reverse</emphasis> notation!
<screen>
</screen>
</para>
<para>
- Search for all documents with have an XML element node
- including an XML attribute named <emphasis>creator</emphasis>
+ Search for all documents with have an &xml; element node
+ including an &xml; attribute named <emphasis>creator</emphasis>
<screen>
Z> find @attrset idxpath @attr 1=3 @attr 4=3 creator
Z> find @attr 1=_XPATH_ATTR_NAME @attr 4=3 creator
<section id="querymodel-pqf-apt-mapping">
- <title>Mapping from PQF atomic APT queries to &zebra; internal
+ <title>Mapping from &pqf; atomic &apt; queries to &zebra; internal
register indexes</title>
<para>
- The rules for PQF APT mapping are rather tricky to grasp in the
+ The rules for &pqf; &apt; mapping are rather tricky to grasp in the
first place. We deal first with the rules for deciding which
internal register or string index to use, according to the use
attribute or access point specified in the query. Thereafter we
</para>
<section id="querymodel-pqf-apt-mapping-accesspoint">
- <title>Mapping of PQF APT access points</title>
+ <title>Mapping of &pqf; &apt; access points</title>
<para>
&zebra; understands four fundamental different types of access
points, of which only the
<emphasis>numeric use attribute</emphasis> type access points
- are defined by the <ulink url="&url.z39.50;">Z39.50</ulink>
+ are defined by the <ulink url="&url.z39.50;">&z3950;</ulink>
standard.
All other access point types are &zebra; specific, and non-portable.
</para>
<entry>hardwired internal string index name</entry>
</row>
<row>
- <entry>XPATH special index</entry>
+ <entry>&xpath; special index</entry>
<entry>XPath</entry>
<entry>/.*</entry>
- <entry>special xpath search for GRS indexed records</entry>
+ <entry>special xpath search for &grs1; indexed records</entry>
</row>
</tbody>
</tgroup>
<emphasis>Numeric use attributes</emphasis> are mapped
to the &zebra; internal
string index according to the attribute set definition in use.
- The default attribute set is <literal>Bib-1</literal>, and may be
- omitted in the PQF query.
+ The default attribute set is <literal>&bib1;</literal>, and may be
+ omitted in the &pqf; query.
</para>
<para>
According to normalization and numeric
use attribute mapping, it follows that the following
- PQF queries are considered equivalent (assuming the default
+ &pqf; queries are considered equivalent (assuming the default
configuration has not been altered):
<screen>
Z> find @attr 1=Body-of-text serenade
Z> find @attr 1=BodyOfText serenade
Z> find @attr 1=bO-d-Y-of-tE-x-t serenade
Z> find @attr 1=1010 serenade
- Z> find @attrset Bib-1 @attr 1=1010 serenade
+ Z> find @attrset &bib1; @attr 1=1010 serenade
Z> find @attrset bib1 @attr 1=1010 serenade
Z> find @attrset Bib1 @attr 1=1010 serenade
Z> find @attrset b-I-b-1 @attr 1=1010 serenade
fields as specified in the <literal>.abs</literal> file which
describes the profile of the records which have been loaded.
If no use attribute is provided, a default of
- Bib-1 Use Any (1016) is assumed.
+ &bib1; Use Any (1016) is assumed.
The predefined use attribute sets
can be reconfigured by tweaking the configuration files
<filename>tab/*.att</filename>, and
ignored. The above mentioned name normalization applies.
String index names are defined in the
used indexing filter configuration files, for example in the
- <literal>GRS</literal>
+ <literal>&grs1;</literal>
<filename>*.abs</filename> configuration files, or in the
- <literal>alvis</literal> filter XSLT indexing stylesheets.
+ <literal>alvis</literal> filter &xslt; indexing stylesheets.
</para>
<para>
</para>
<para>
- Finally, <literal>XPATH</literal> access points are only
- available using the <literal>GRS</literal> filter for indexing.
+ Finally, <literal>&xpath;</literal> access points are only
+ available using the <literal>&grs1;</literal> filter for indexing.
These access point names must start with the character
<literal>'/'</literal>, they are <emphasis>not
normalized</emphasis>, but passed unaltered to the &zebra; internal
- XPATH engine. See <xref linkend="querymodel-use-xpath"/>.
+ &xpath; engine. See <xref linkend="querymodel-use-xpath"/>.
</para>
<section id="querymodel-pqf-apt-mapping-structuretype">
- <title>Mapping of PQF APT structure and completeness to
+ <title>Mapping of &pqf; &apt; structure and completeness to
register type</title>
<para>
Internally &zebra; has in it's default configuration several
against the contents of the phrase (long word) register, if one
exists for the given <emphasis>Use</emphasis> attribute.
A phrase register is created for those fields in the
- GRS <filename>*.abs</filename> file that contains a
+ &grs1; <filename>*.abs</filename> file that contains a
<literal>p</literal>-specifier.
<screen>
Z> scan @attr 1=Title @attr 4=1 @attr 6=3 beethoven
contains multiple words, the term will only match if all of the words
are found immediately adjacent, and in the given order.
The word search is performed on those fields that are indexed as
- type <literal>w</literal> in the GRS <filename>*.abs</filename> file.
+ type <literal>w</literal> in the &grs1; <filename>*.abs</filename> file.
<screen>
Z> scan @attr 1=Title @attr 4=1 @attr 6=1 beethoven
...
natural-language, relevance-ranked query.
This search type uses the word register, i.e. those fields
that are indexed as type <literal>w</literal> in the
- GRS <filename>*.abs</filename> file.
+ &grs1; <filename>*.abs</filename> file.
</para>
<para>
If the <emphasis>Structure</emphasis> attribute is
<emphasis>Numeric String</emphasis> the term is treated as an integer.
The search is performed on those fields that are indexed
- as type <literal>n</literal> in the GRS
+ as type <literal>n</literal> in the &grs1;
<filename>*.abs</filename> file.
</para>
<section id="querymodel-cql-to-pqf">
- <title>Server Side CQL to PQF Query Translation</title>
+ <title>Server Side &cql; to &pqf; Query Translation</title>
<para>
Using the
<literal><cql2rpn>l2rpn.txt</cql2rpn></literal>
- YAZ Frontend Virtual
+ &yaz; Frontend Virtual
Hosts option, one can configure
- the YAZ Frontend CQL-to-PQF
+ the &yaz; Frontend &cql;-to-&pqf;
converter, specifying the interpretation of various
- <ulink url="&url.cql;">CQL</ulink>
+ <ulink url="&url.cql;">&cql;</ulink>
indexes, relations, etc. in terms of Type-1 query attributes.
<!-- The yaz-client config file -->
</para>
<para>
- For example, using server-side CQL-to-PQF conversion, one might
+ For example, using server-side &cql;-to-&pqf; conversion, one might
query a zebra server like this:
<screen>
<![CDATA[
]]>
</screen>
and - if properly configured - even static relevance ranking can
- be performed using CQL query syntax:
+ be performed using &cql; query syntax:
<screen>
<![CDATA[
Z> find text = /relevant (plant and soil)
<para>
By the way, the same configuration can be used to
- search using client-side CQL-to-PQF conversion:
+ search using client-side &cql;-to-&pqf; conversion:
(the only difference is <literal>querytype cql2rpn</literal>
instead of
<literal>querytype cql</literal>, and the call specifying a local
<para>
Exhaustive information can be found in the
- Section "Specification of CQL to RPN mappings" in the YAZ manual.
+ Section "Specification of &cql; to &rpn; mappings" in the &yaz; manual.
<ulink url="&url.yaz.cql2pqf;"/>,
and shall therefore not be repeated here.
</para>
<chapter id="quick-start">
- <!-- $Id: quickstart.xml,v 1.12 2007-02-02 09:58:39 marc Exp $ -->
+ <!-- $Id: quickstart.xml,v 1.13 2007-02-02 11:10:08 marc Exp $ -->
<title>Quick Start </title>
<para>
named <literal>Default</literal>.
The database contains records structured according to
the GILS profile, and the server will
- return records in USMARC, GRS-1, or SUTRS format depending
+ return records in &usmarc;, &grs1;, or &sutrs; format depending
on what the client asks for.
</para>
<para>
- To test the server, you can use any Z39.50 client.
+ To test the server, you can use any &z3950; client.
For instance, you can use the demo command-line client that comes
- with YAZ:
+ with &yaz;:
</para>
<para>
<screen>
</para>
<para>
- The default retrieval syntax for the client is USMARC, and the
+ The default retrieval syntax for the client is &usmarc;, and the
default element set is <literal>F</literal> (``full record''). To
try other formats and element sets for the same record, try:
</para>
<note>
<para>You may notice that more fields are returned when your
- client requests SUTRS, GRS-1 or XML records.
+ client requests &sutrs;, &grs1; or &xml; records.
This is normal - not all of the GILS data elements have mappings in
- the USMARC record format.
+ the &usmarc; record format.
</para>
</note>
<para>
<chapter id="record-model-alvisxslt">
- <!-- $Id: recordmodel-alvisxslt.xml,v 1.14 2007-02-02 09:58:39 marc Exp $ -->
+ <!-- $Id: recordmodel-alvisxslt.xml,v 1.15 2007-02-02 11:10:08 marc Exp $ -->
<title>ALVIS &xml; Record Model and Filter Module</title>
<section id="record-model-alvisxslt-filter">
<title>ALVIS Record Filter</title>
<para>
- The experimental, loadable Alvis &xml;/XSLT filter module
+ The experimental, loadable Alvis &xml;/&xslt; filter module
<literal>mod-alvis.so</literal> is packaged in the GNU/Debian package
<literal>libidzebra1.4-mod-alvis</literal>.
It is invoked by the <filename>zebra.cfg</filename> configuration statement
</screen>
In this example on all data files with suffix
<filename>*.xml</filename>, where the
- Alvis XSLT filter configuration file is found in the
+ Alvis &xslt; filter configuration file is found in the
path <filename>db/filter_alvis_conf.xml</filename>.
</para>
- <para>The Alvis XSLT filter configuration file must be
+ <para>The Alvis &xslt; filter configuration file must be
valid &xml;. It might look like this (This example is
- used for indexing and display of OAI harvested records):
+ used for indexing and display of &oai; harvested records):
<screen>
<?xml version="1.0" encoding="UTF-8"?>
<schemaInfo>
<schema name="index" identifier="http://indexdata.dk/zebra/xslt/1"
stylesheet="xsl/oai2index.xsl" />
<schema name="dc" stylesheet="xsl/oai2dc.xsl" />
- <!-- use split level 2 when indexing whole OAI Record lists -->
+ <!-- use split level 2 when indexing whole &oai; Record lists -->
<split level="2"/>
</schemaInfo>
</screen>
names defined in the <literal>name</literal> attributes must be
unique, these are the literal <literal>schema</literal> or
<literal>element set</literal> names used in
- <ulink url="http://www.loc.gov/standards/sru/srw/">SRW</ulink>,
- <ulink url="&url.sru;">SRU</ulink> and
- Z39.50 protocol queries.
+ <ulink url="http://www.loc.gov/standards/sru/srw/">&srw;</ulink>,
+ <ulink url="&url.sru;">&sru;</ulink> and
+ &z3950; protocol queries.
The paths in the <literal>stylesheet</literal> attributes
are relative to zebras working directory, or absolute to file
system root.
The <literal><split level="2"/></literal> decides where the
&xml; Reader shall split the
collections of records into individual records, which then are
- loaded into DOM, and have the indexing XSLT stylesheet applied.
+ loaded into &dom;, and have the indexing &xslt; stylesheet applied.
</para>
<para>
- There must be exactly one indexing XSLT stylesheet, which is
+ There must be exactly one indexing &xslt; stylesheet, which is
defined by the magic attribute
<literal>identifier="http://indexdata.dk/zebra/xslt/1"</literal>.
</para>
<title>ALVIS Internal Record Representation</title>
<para>When indexing, an &xml; Reader is invoked to split the input
files into suitable record &xml; pieces. Each record piece is then
- transformed to an &xml; DOM structure, which is essentially the
- record model. Only XSLT transformations can be applied during
+ transformed to an &xml; &dom; structure, which is essentially the
+ record model. Only &xslt; transformations can be applied during
index, search and retrieval. Consequently, output formats are
- restricted to whatever XSLT can deliver from the record &xml;
+ restricted to whatever &xslt; can deliver from the record &xml;
structure, be it other &xml; formats, HTML, or plain text. In case
- you have <literal>libxslt1</literal> running with EXSLT support,
+ you have <literal>libxslt1</literal> running with E&xslt; support,
you can use this functionality inside the Alvis
- filter configuration XSLT stylesheets.
+ filter configuration &xslt; stylesheets.
</para>
</section>
<section id="record-model-alvisxslt-canonical">
<title>ALVIS Canonical Indexing Format</title>
- <para>The output of the indexing XSLT stylesheets must contain
+ <para>The output of the indexing &xslt; stylesheets must contain
certain elements in the magic
<literal>xmlns:z="http://indexdata.dk/zebra/xslt/1"</literal>
- namespace. The output of the XSLT indexing transformation is then
- parsed using DOM methods, and the contained instructions are
+ namespace. The output of the &xslt; indexing transformation is then
+ parsed using &dom; methods, and the contained instructions are
performed on the <emphasis>magic elements and their
subtrees</emphasis>.
</para>
</screen>
</para>
<para>This means the following: From the original &xml; file
- <literal>one-record.xml</literal> (or from the &xml; record DOM of the
+ <literal>one-record.xml</literal> (or from the &xml; record &dom; of the
same form coming from a splitted input file), the indexing
stylesheet produces an indexing &xml; record, which is defined by
the <literal>record</literal> element in the magic namespace
the same character normalization map <literal>w</literal>.
</para>
<para>
- Finally, this example configuration can be queried using PQF
- queries, either transported by Z39.50, (here using a yaz-client)
+ Finally, this example configuration can be queried using &pqf;
+ queries, either transported by &z3950;, (here using a yaz-client)
<screen>
<![CDATA[
Z> open localhost:9999
or the proprietary
extentions <literal>x-pquery</literal> and
<literal>x-pScanClause</literal> to
- SRU, and SRW
+ &sru;, and &srw;
<screen>
<![CDATA[
http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=%40attr+1%3Ddc_creator+%40attr+4%3D6+%22the
http://localhost:9999/?version=1.1&operation=scan&x-pScanClause=@attr+1=dc_date+@attr+4=2+a
]]>
</screen>
- See <xref linkend="zebrasrv-sru"/> for more information on SRU/SRW
- configuration, and <xref linkend="gfs-config"/> or the YAZ
- <ulink url="&url.yaz.cql;">CQL section</ulink>
- for the details or the YAZ frontend server.
+ See <xref linkend="zebrasrv-sru"/> for more information on &sru;/&srw;
+ configuration, and <xref linkend="gfs-config"/> or the &yaz;
+ <ulink url="&url.yaz.cql;">&cql; section</ulink>
+ for the details or the &yaz; frontend server.
</para>
<para>
Notice that there are no <filename>*.abs</filename>,
- <filename>*.est</filename>, <filename>*.map</filename>, or other GRS-1
+ <filename>*.est</filename>, <filename>*.map</filename>, or other &grs1;
filter configuration files involves in this process, and that the
literal index names are used during search and retrieval.
</para>
<para>
As mentioned above, there can be only one indexing
stylesheet, and configuration of the indexing process is a synonym
- of writing an XSLT stylesheet which produces &xml; output containing the
+ of writing an &xslt; stylesheet which produces &xml; output containing the
magic elements discussed in
<xref linkend="record-model-alvisxslt-internal"/>.
Obviously, there are million of different ways to accomplish this
Stylesheets can be written in the <emphasis>pull</emphasis> or
the <emphasis>push</emphasis> style: <emphasis>pull</emphasis>
means that the output &xml; structure is taken as starting point of
- the internal structure of the XSLT stylesheet, and portions of
+ the internal structure of the &xslt; stylesheet, and portions of
the input &xml; are <emphasis>pulled</emphasis> out and inserted
into the right spots of the output &xml; structure. On the other
- side, <emphasis>push</emphasis> XSLT stylesheets are recursavly
+ side, <emphasis>push</emphasis> &xslt; stylesheets are recursavly
calling their template definitions, a process which is commanded
by the input &xml; structure, and avake to produce some output &xml;
whenever some special conditions in the input styelsheets are
met. The <emphasis>pull</emphasis> type is well-suited for input
&xml; with strong and well-defined structure and semantcs, like the
- following OAI indexing example, whereas the
+ following &oai; indexing example, whereas the
<emphasis>push</emphasis> type might be the only possible way to
sort out deeply recursive input &xml; formats.
</para>
<para>
A <emphasis>pull</emphasis> stylesheet example used to index
- OAI harvested records could use some of the following template
+ &oai; harvested records could use some of the following template
definitions:
<screen>
<![CDATA[
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:z="http://indexdata.dk/zebra/xslt/1"
- xmlns:oai="http://www.openarchives.org/OAI/2.0/"
- xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"
+ xmlns:oai="http://www.openarchives.org/&oai;/2.0/"
+ xmlns:oai_dc="http://www.openarchives.org/&oai;/2.0/oai_dc/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
version="1.0">
<xsl:template match="/">
<z:record z:id="{normalize-space(oai:record/oai:header/oai:identifier)}"
z:type="update">
- <!-- you might want to use z:rank="{some XSLT function here}" -->
+ <!-- you might want to use z:rank="{some &xslt; function here}" -->
<xsl:apply-templates/>
</z:record>
</xsl:template>
- <!-- OAI indexing templates -->
+ <!-- &oai; indexing templates -->
<xsl:template match="oai:record/oai:header/oai:identifier">
<z:index name="oai_identifier" type="0">
<xsl:value-of select="."/>
<para>
Notice also,
that the names and types of the indexes can be defined in the
- indexing XSLT stylesheet <emphasis>dynamically according to
+ indexing &xslt; stylesheet <emphasis>dynamically according to
content in the original &xml; records</emphasis>, which has
opportunities for great power and wizardery as well as grande
disaster.
<title>ALVIS Exchange Formats</title>
<para>
An exchange format can be anything which can be the outcome of an
- XSLT transformation, as far as the stylesheet is registered in
- the main Alvis XSLT filter configuration file, see
+ &xslt; transformation, as far as the stylesheet is registered in
+ the main Alvis &xslt; filter configuration file, see
<xref linkend="record-model-alvisxslt-filter"/>.
In principle anything that can be expressed in &xml;, HTML, and
TEXT can be the output of a <literal>schema</literal> or
<literal>element set</literal> directive during search, as long as
the information comes from the
- <emphasis>original input record &xml; DOM tree</emphasis>
+ <emphasis>original input record &xml; &dom; tree</emphasis>
(and not the transformed and <emphasis>indexed</emphasis> &xml;!!).
</para>
<para>
</section>
<section id="record-model-alvisxslt-example">
- <title>ALVIS Filter OAI Indexing Example</title>
+ <title>ALVIS Filter &oai; Indexing Example</title>
<para>
The sourcecode tarball contains a working Alvis filter example in
the directory <filename>examples/alvis-oai/</filename>, which
should get you started.
</para>
<para>
- More example data can be harvested from any OAI complient server,
- see details at the OAI
+ More example data can be harvested from any &oai; complient server,
+ see details at the &oai;
<ulink url="http://www.openarchives.org/">
http://www.openarchives.org/</ulink> web site, and the community
links at
<!--
-c) Main "alvis" XSLT filter config file:
+c) Main "alvis" &xslt; filter config file:
cat db/filter_alvis_conf.xml
<?xml version="1.0" encoding="UTF8"?>
The split level decides where the SAX parser shall split the
collections of records into individual records, which then are
- loaded into DOM, and have the indexing XSLT stylesheet applied.
+ loaded into &dom;, and have the indexing &xslt; stylesheet applied.
The indexing stylesheet is found by it's identifier.
and so on.
- in db/ a cql2pqf.txt yaz-client config file
- which is also used in the yaz-server <ulink url="&url.cql;">CQL</ulink>-to-PQF process
+ which is also used in the yaz-server <ulink url="&url.cql;">&cql;</ulink>-to-&pqf; process
see: http://www.indexdata.com/yaz/doc/tools.tkl#tools.cql.map
-- in db/ an indexing XSLT stylesheet. This is a PULL-type XSLT thing,
+- in db/ an indexing &xslt; stylesheet. This is a PULL-type XSLT thing,
as it constructs the new &xml; structure by pulling data out of the
respective elements/attributes of the old structure.
<chapter id="grs">
- <!-- $Id: recordmodel-grs.xml,v 1.6 2007-02-02 09:58:39 marc Exp $ -->
- <title>GRS Record Model and Filter Modules</title>
+ <!-- $Id: recordmodel-grs.xml,v 1.7 2007-02-02 11:10:08 marc Exp $ -->
+ <title>&grs1; Record Model and Filter Modules</title>
<para>
The record model described in this chapter applies to the fundamental,
<section id="grs-filters">
- <title>GRS Record Filters</title>
+ <title>&grs1; Record Filters</title>
<para>
Many basic subtypes of the <emphasis>grs</emphasis> type are
currently available:
<para>
This is the canonical input format
described <xref linkend="grs-canonical-format"/>. It is using
- simple SGML-like syntax.
+ simple &sgml;-like syntax.
</para>
</listitem>
</varlistentry>
<listitem>
<para>
This allows &zebra; to read
- records in the ISO2709 (MARC) encoding standard.
+ records in the ISO2709 (&marc;) encoding standard.
Last parameter <replaceable>type</replaceable> names the
<literal>.abs</literal> file (see below)
- which describes the specific MARC structure of the input record as
+ which describes the specific &marc; structure of the input record as
well as the indexing rules.
</para>
<para>The <literal>grs.marc</literal> uses an internal represtantion
- which is not XML conformant. In particular MARC tags are
- presented as elements with the same name. And XML elements
+ which is not &xml; conformant. In particular &marc; tags are
+ presented as elements with the same name. And &xml; elements
may not start with digits. Therefore this filter is only
- suitable for systems returning GRS-1 and MARC records. For XML
+ suitable for systems returning &grs1; and &marc; records. For &xml;
use <literal>grs.marcxml</literal> filter instead (see below).
</para>
<para>
This allows &zebra; to read ISO2709 encoded records.
Last parameter <replaceable>type</replaceable> names the
<literal>.abs</literal> file (see below)
- which describes the specific MARC structure of the input record as
+ which describes the specific &marc; structure of the input record as
well as the indexing rules.
</para>
<para>
The internal representation for <literal>grs.marcxml</literal>
- is the same as for <ulink url="&url.marcxml;">MARCXML</ulink>.
+ is the same as for <ulink url="&url.marcxml;">&marcxml;</ulink>.
It slightly more complicated to work with than
- <literal>grs.marc</literal> but XML conformant.
+ <literal>grs.marc</literal> but &xml; conformant.
</para>
<para>
The loadable <literal>grs.marcxml</literal> filter module
<term><literal>grs.xml</literal></term>
<listitem>
<para>
- This filter reads XML records and uses
+ This filter reads &xml; records and uses
<ulink url="http://expat.sourceforge.net/">Expat</ulink> to
parse them and convert them into ID&zebra;'s internal
<literal>grs</literal> record model.
- Only one record per file is supported, due to the fact XML does
+ Only one record per file is supported, due to the fact &xml; does
not allow two documents to "follow" each other (there is no way
to know when a document is finished).
This filter is only available if &zebra; is compiled with EXPAT support.
</para>
<section id="grs-canonical-format">
- <title>GRS Canonical Input Format</title>
+ <title>&grs1; Canonical Input Format</title>
<para>
Although input data can take any form, it is sometimes useful to
describe the record processing capabilities of the system in terms of
a single, canonical input format that gives access to the full
spectrum of structure and flexibility in the system. In &zebra;, this
- canonical format is an "SGML-like" syntax.
+ canonical format is an "&sgml;-like" syntax.
</para>
<para>
contains only a single element (strictly speaking, that makes it an
illegal GILS record, since the GILS profile includes several mandatory
elements - &zebra; does not validate the contents of a record against
- the Z39.50 profile, however - it merely attempts to match up elements
+ the &z3950; profile, however - it merely attempts to match up elements
of a local representation with the given schema):
</para>
textual data elements which might appear in different languages, and
images which may appear in different formats or layouts.
The variant system in &zebra; is essentially a representation of
- the variant mechanism of Z39.50-1995.
+ the variant mechanism of &z3950;-1995.
</para>
<para>
<para>
The title element above comes in two variants. Both have the IANA body
type "text/plain", but one is in English, and the other in
- Danish. The client, using the element selection mechanism of Z39.50,
+ Danish. The client, using the element selection mechanism of &z3950;,
can retrieve information about the available variant forms of data
elements, or it can select specific variants based on the requirements
of the end-user.
</section>
<section id="grs-regx-tcl">
- <title>GRS REGX And TCL Input Filters</title>
+ <title>&grs1; REGX And TCL Input Filters</title>
<para>
In order to handle general input formats, &zebra; allows the
</section>
<section id="grs-internal-representation">
- <title>GRS Internal Record Representation</title>
+ <title>&grs1; Internal Record Representation</title>
<para>
When records are manipulated by the system, they're represented in a
<para>
In practice, each variant node is associated with a triple of class,
- type, value, corresponding to the variant mechanism of Z39.50.
+ type, value, corresponding to the variant mechanism of &z3950;.
</para>
</section>
</section>
<section id="grs-conf">
- <title>GRS Record Model Configuration</title>
+ <title>&grs1; Record Model Configuration</title>
<para>
The following sections describe the configuration files that govern
<listitem>
<para>
- The object identifier of the Z39.50 schema associated
+ The object identifier of the &z3950; schema associated
with the ARS, so that it can be referred to by the client.
</para>
</listitem>
ask for a subset of the data elements contained in a record. Element
set names, in the retrieval module, are mapped to <emphasis>element
specifications</emphasis>, which contain information equivalent to the
- <emphasis>Espec-1</emphasis> syntax of Z39.50.
+ <emphasis>Espec-1</emphasis> syntax of &z3950;.
</para>
</listitem>
<listitem>
<para>
Possibly, a set of rules describing the mapping of elements to a
- MARC representation.
+ &marc; representation.
</para>
</listitem>
<listitem>
<para>
A list of element descriptions (this is the actual ARS of the
- schema, in Z39.50 terms), which lists the ways in which the various
+ schema, in &z3950; terms), which lists the ways in which the various
tags can be used and organized hierarchically.
</para>
</listitem>
<para>
The number of different file types may appear daunting at first, but
- each type corresponds fairly clearly to a single aspect of the Z39.50
+ each type corresponds fairly clearly to a single aspect of the &z3950;
retrieval facilities. Further, the average database administrator,
who is simply reusing an existing profile for which tables already
exist, shouldn't have to worry too much about the contents of these tables.
<title>The Abstract Syntax (.abs) Files</title>
<para>
- The name of this file type is slightly misleading in Z39.50 terms,
+ The name of this file type is slightly misleading in &z3950; terms,
since, apart from the actual abstract syntax of the profile, it also
includes most of the other definitions that go into a database
profile.
</para>
<para>
- When a record in the canonical, SGML-like format is read from a file
+ When a record in the canonical, &sgml;-like format is read from a file
or from the database, the first tag of the file should reference the
profile that governs the layout of the record. If the first tag of the
record is, say, <literal><gils></literal>, the system will look
<para>
(m) The reference name of the OID for the profile.
The reference names can be found in the <emphasis>util</emphasis>
- module of YAZ.
+ module of &yaz;.
</para>
</listitem>
</varlistentry>
<para>
(o) Points to a file containing parameters
for representing the record contents in the ISO2709 syntax.
- Read the description of the MARC representation facility below.
+ Read the description of the &marc; representation facility below.
</para>
</listitem>
</varlistentry>
<para>
(o,r) Adds an element to the abstract record syntax of the schema.
The <replaceable>path</replaceable> follows the
- syntax which is suggested by the Z39.50 document - that is, a sequence
+ syntax which is suggested by the &z3950; document - that is, a sequence
of tags separated by slashes (/). Each tag is given as a
comma-separated pair of tag type and -value surrounded by parenthesis.
The <replaceable>name</replaceable> is the name of the element, and
<term>melm <replaceable>field$subfield attributes</replaceable></term>
<listitem>
<para>
- This directive is specifically for MARC-formatted records,
- ingested either in the form of MARCXML documents, or in the
+ This directive is specifically for &marc;-formatted records,
+ ingested either in the form of &marcxml; documents, or in the
ISO2709/Z39.2 format using the grs.marcxml input filter. You can
specify indexing rules for any subfield, or you can leave off the
<replaceable>$subfield</replaceable> part and specify default rules
<listitem>
<para>
This directive specifies character encoding for external records.
- For records such as XML that specifies encoding within the
+ For records such as &xml; that specifies encoding within the
file via a header this directive is ignored.
If neither this directive is given, nor an encoding is set
within external records, ISO-8859-1 encoding is assumed.
An automatically generated identifier for the record,
unique within this database. It is represented by the
<literal><localControlNumber></literal> element in
- XML and the <literal>(1,14)</literal> tag in GRS-1.
+ &xml; and the <literal>(1,14)</literal> tag in &grs1;.
</para></listitem>
</varlistentry>
<varlistentry>
(m) The reference name of the OID for
the attribute set.
The reference names can be found in the <replaceable>util</replaceable>
- module of <replaceable>YAZ</replaceable>.
+ module of <replaceable>&yaz;</replaceable>.
</para>
</listitem></varlistentry>
<varlistentry>
set. For instance, many new attribute sets are defined as extensions
to the <replaceable>bib-1</replaceable> set.
This is an important feature of the retrieval
- system of Z39.50, as it ensures the highest possible level of
+ system of &z3950;, as it ensures the highest possible level of
interoperability, as those access points of your database which are
derived from the external set (say, bib-1) can be used even by clients
who are unaware of the new set.
<para>
This file type defines the tagset of the profile, possibly by
referencing other tag sets (most tag sets, for instance, will include
- tagsetG and tagsetM from the Z39.50 specification. The file may
+ tagsetG and tagsetM from the &z3950; specification. The file may
contain the following directives.
</para>
<para>
(o) The reference name of the OID for the tag set.
The reference names can be found in the <emphasis>util</emphasis>
- module of <emphasis>YAZ</emphasis>.
+ module of <emphasis>&yaz;</emphasis>.
The directive is optional, since not all tag sets
are registered outside of their schema.
</para>
<para>
(o) The reference name of the OID for
the variant set, if one is required. The reference names can be found
- in the <emphasis>util</emphasis> module of <emphasis>YAZ</emphasis>.
+ in the <emphasis>util</emphasis> module of <emphasis>&yaz;</emphasis>.
</para>
</listitem></varlistentry>
<varlistentry>
The element set specification files describe a selection of a subset
of the elements of a database record. The element selection mechanism
is equivalent to the one supplied by the <emphasis>Espec-1</emphasis>
- syntax of the Z39.50 specification.
+ syntax of the &z3950; specification.
In fact, the internal representation of an element set
specification is identical to the <emphasis>Espec-1</emphasis> structure,
and we'll refer you to the description of that structure for most of
a schema that differs from the native schema of the record. For
instance, a client might only know how to process WAIS records, while
the database record is represented in a more specific schema, such as
- GILS. In this module, a mapping of data to one of the MARC formats is
+ GILS. In this module, a mapping of data to one of the &marc; formats is
also thought of as a schema mapping (mapping the elements of the
- record into fields consistent with the given MARC specification, prior
+ record into fields consistent with the given &marc; specification, prior
to actually converting the data to the ISO2709). This use of the
- object identifier for USMARC as a schema identifier represents an
+ object identifier for &usmarc; as a schema identifier represents an
overloading of the OID which might not be entirely proper. However,
it represents the dual role of schema and record syntax which
- is assumed by the MARC family in Z39.50.
+ is assumed by the &marc; family in &z3950;.
</para>
<!--
This is used, for instance, by a server receiving a request to present
a record in a different schema from the native one.
The name, again, is found in the <emphasis>oid</emphasis>
- module of <emphasis>YAZ</emphasis>.
+ module of <emphasis>&yaz;</emphasis>.
</para>
</listitem></varlistentry>
<varlistentry>
</section>
<section id="grs-mar-files">
- <title>The MARC (ISO2709) Representation (.mar) Files</title>
+ <title>The &marc; (ISO2709) Representation (.mar) Files</title>
<para>
This file provides rules for representing a record in the ISO2709
<!--
NOTE: FIXME! This will be described better. We're in the process of
- re-evaluating and most likely changing the way that MARC records are
+ re-evaluating and most likely changing the way that &marc; records are
handled by the system.</emphasis>
-->
</section>
<section id="grs-exchange-formats">
- <title>GRS Exchange Formats</title>
+ <title>&grs1; Exchange Formats</title>
<para>
Converting records from the internal structure to an exchange format
<itemizedlist>
<listitem>
<para>
- GRS-1. The internal representation is based on GRS-1/XML, so the
+ &grs1;. The internal representation is based on &grs1;/&xml;, so the
conversion here is straightforward. The system will create
applied variant and supported variant lists as required, if a record
contains variant information.
<listitem>
<para>
- XML. The internal representation is based on GRS-1/XML so
- the mapping is trivial. Note that XML schemas, preprocessing
+ &xml;. The internal representation is based on &grs1;/&xml; so
+ the mapping is trivial. Note that &xml; schemas, preprocessing
instructions and comments are not part of the internal representation
- and therefore will never be part of a generated XML record.
+ and therefore will never be part of a generated &xml; record.
Future versions of the &zebra; will support that.
</para>
</listitem>
<listitem>
<para>
- SUTRS. Again, the mapping is fairly straightforward. Indentation
+ &sutrs;. Again, the mapping is fairly straightforward. Indentation
is used to show the hierarchical structure of the record. All
- "GRS" type records support both the GRS-1 and SUTRS
+ "&grs1;" type records support both the &grs1; and &sutrs;
representations.
- <!-- FIXME - What is SUTRS - should be expanded here -->
+ <!-- FIXME - What is &sutrs; - should be expanded here -->
</para>
</listitem>
<listitem>
<para>
- ISO2709-based formats (USMARC, etc.). Only records with a
+ ISO2709-based formats (&usmarc;, etc.). Only records with a
two-level structure (corresponding to fields and subfields) can be
directly mapped to ISO2709. For records with a different structuring
- (eg., GILS), the representation in a structure like USMARC involves a
+ (eg., GILS), the representation in a structure like &usmarc; involves a
schema-mapping (see <xref linkend="schema-mapping"/>), to an
- "implied" USMARC schema (implied,
+ "implied" &usmarc; schema (implied,
because there is no formal schema which specifies the use of the
- USMARC fields outside of ISO2709). The resultant, two-level record is
+ &usmarc; fields outside of ISO2709). The resultant, two-level record is
then mapped directly from the internal representation to ISO2709. See
the GILS schema definition files for a detailed example of this
approach.
</section>
<section id="grs-extended-marc-indexing">
- <title>Extended indexing of MARC records</title>
+ <title>Extended indexing of &marc; records</title>
- <para>Extended indexing of MARC records will help you if you need index a
+ <para>Extended indexing of &marc; records will help you if you need index a
combination of subfields, or index only a part of the whole field,
- or use during indexing process embedded fields of MARC record.
+ or use during indexing process embedded fields of &marc; record.
</para>
- <para>Extended indexing of MARC records additionally allows:
+ <para>Extended indexing of &marc; records additionally allows:
<itemizedlist>
<listitem>
- <para>to index data in LEADER of MARC record</para>
+ <para>to index data in LEADER of &marc; record</para>
</listitem>
<listitem>
</listitem>
<listitem>
- <para>to index linked fields for UNIMARC based formats</para>
+ <para>to index linked fields for UNI&marc; based formats</para>
</listitem>
</itemizedlist>
</para>
<note><para>In compare with simple indexing process the extended indexing
- may increase (about 2-3 times) the time of indexing process for MARC
+ may increase (about 2-3 times) the time of indexing process for &marc;
records.</para></note>
<section id="formula">
<title>The index-formula</title>
<para>At the beginning, we have to define the term
- <emphasis>index-formula</emphasis> for MARC records. This term helps
- to understand the notation of extended indexing of MARC records by &zebra;.
+ <emphasis>index-formula</emphasis> for &marc; records. This term helps
+ to understand the notation of extended indexing of &marc; records by &zebra;.
Our definition is based on the document
<ulink url="http://www.rba.ru/rusmarc/soft/Z39-50.htm">"The table
- of conformity for Z39.50 use attributes and RUSMARC fields"</ulink>.
+ of conformity for &z3950; use attributes and R&usmarc; fields"</ulink>.
The document is available only in russian language.</para>
<para>
</screen>
<para>
- We know that &zebra; supports a Bib-1 attribute - right truncation.
+ We know that &zebra; supports a &bib1; attribute - right truncation.
In this case, the <emphasis>index-formula</emphasis> (1) consists from
forms, defined in the same way as (1)</para>
</screen>
<note>
- <para>The original MARC record may be without some elements, which included in <emphasis>index-formula</emphasis>.
+ <para>The original &marc; record may be without some elements, which included in <emphasis>index-formula</emphasis>.
</para>
</note>
<varlistentry>
<term>-</term>
<listitem><para>The position may contain any value, defined by
- MARC format.
+ &marc; format.
For example, <emphasis>index-formula</emphasis></para>
<screen>
<note>
<para>
- All another operands are the same as accepted in MARC world.
+ All another operands are the same as accepted in &marc; world.
</para>
</note>
</para>
(<literal>.abs</literal> file). It means that names beginning with
<literal>"mc-"</literal> are interpreted by &zebra; as
<emphasis>index-formula</emphasis>. The database index is created and
- linked with <emphasis>access point</emphasis> (Bib-1 use attribute)
+ linked with <emphasis>access point</emphasis> (&bib1; use attribute)
according to this formula.</para>
<para>For example, <emphasis>index-formula</emphasis></para>
<varlistentry>
<term>.</term>
<listitem><para>The position may contain any value, defined by
- MARC format. For example,
+ &marc; format. For example,
<emphasis>index-formula</emphasis></para>
<screen>
</para>
<note>
- <para>All another operands are the same as accepted in MARC world.</para>
+ <para>All another operands are the same as accepted in &marc; world.</para>
</note>
<section id="grs-examples">
elm mc-008[0-5] Date/time-added-to-db !
</screen>
- <para>or for RUSMARC (this data included in 100th field)</para>
+ <para>or for R&usmarc; (this data included in 100th field)</para>
<screen>
elm mc-100___$a[0-7]_ Date/time-added-to-db !
<para>using indicators while indexing</para>
- <para>For RUSMARC <emphasis>index-formula</emphasis>
+ <para>For R&usmarc; <emphasis>index-formula</emphasis>
<literal>70-#1$a, $g</literal> matches</para>
<screen>
<listitem>
- <para>indexing embedded (linked) fields for UNIMARC based
+ <para>indexing embedded (linked) fields for UNI&marc; based
formats</para>
- <para>For RUSMARC <emphasis>index-formula</emphasis>
+ <para>For R&usmarc; <emphasis>index-formula</emphasis>
<literal>4--#-$170-#1$a, $g ($c)</literal> matches</para>
<screen><![CDATA[
<!ENTITY test SYSTEM "test.xml">
]>
-<!-- $Id: zebra.xml,v 1.15 2007-02-02 09:58:40 marc Exp $ -->
+<!-- $Id: zebra.xml,v 1.16 2007-02-02 11:10:08 marc Exp $ -->
<book id="zebra">
<bookinfo>
<title>&zebra; - User's Guide and Reference</title>
can index records in &xml;, &sgml;, &marc;, e-mail archives and many
other formats, and quickly find them using a combination of
boolean searching and relevance ranking. Search-and-retrieve
- applications can be written using APIs in a wide variety of
+ applications can be written using &api;s in a wide variety of
languages, communicating with the &zebra; server using
industry-standard information-retrieval protocols or web services.
</simpara>
<!ENTITY % common SYSTEM "common/common.ent">
%common;
]>
-<!-- $Id: zebraidx.xml,v 1.11 2007-02-02 09:58:40 marc Exp $ -->
+<!-- $Id: zebraidx.xml,v 1.12 2007-02-02 11:10:08 marc Exp $ -->
<refentry id="zebraidx">
<refentryinfo>
<productname>zebra</productname>
<listitem>
<para>
The records located should be associated with the database name
- <replaceable>database</replaceable> for access through the Z39.50 server.
+ <replaceable>database</replaceable> for access through the &z3950; server.
</para>
</listitem>
</varlistentry>
<!--
- $Id: zebrasrv-options.xml,v 1.6 2006-09-05 12:01:31 adam Exp $
+ $Id: zebrasrv-options.xml,v 1.7 2007-02-02 11:10:08 marc Exp $
Options for generic frontend server and yaz-ztest.
Included in both manual and man page for yaz-ztest
Note - these files have been altered for zebrasrv, and are not in
<varlistentry><term><literal>-z</literal></term>
<listitem><para>
- Use the Z39.50 protocol (default). This option and <literal>-s</literal>
+ Use the &z3950; protocol (default). This option and <literal>-s</literal>
complement each other.
You can use both multiple times on the same command
line, between listener-specifications (see below). This way, you
<varlistentry><term><literal>-f </literal>
<replaceable>vconfig</replaceable></term>
- <listitem><para>This specifies an XML file that describes
- one or more YAZ frontend virtual servers. See section VIRTUAL
+ <listitem><para>This specifies an &xml; file that describes
+ one or more &yaz; frontend virtual servers. See section VIRTUAL
HOSTS for details.
</para></listitem></varlistentry>
<screen>
hostname | IP-number [: portnumber]
</screen>
- The port number defaults to 210 (standard Z39.50 port) for
+ The port number defaults to 210 (standard &z3950; port) for
privileged users (root), and 9999 for normal users.
The special hostname "@" is mapped to
the address INADDR_ANY, which causes the server to listen on any local
<para>
The default behavior for <literal>zebrasrv</literal> - if started
as non-priviledged user - is to establish
- a single TCP/IP listener, for the Z39.50 protocol, on port 9999.
+ a single TCP/IP listener, for the &z3950; protocol, on port 9999.
<screen>
zebrasrv @
zebrasrv tcp:some.server.name.org:1234
<para>
To start the server listening on the registered port for
- Z39.50, or on a filesystem socket,
+ &z3950;, or on a filesystem socket,
and to drop root privileges once the ports are bound, execute
the server like this from a root shell:
<screen>
<!--
- $Id: zebrasrv-synopsis.xml,v 1.4 2006-09-05 12:01:31 adam Exp $
- cmd description of YAZ GFS application.
+ $Id: zebrasrv-synopsis.xml,v 1.5 2007-02-02 11:10:08 marc Exp $
+ cmd description of &yaz; GFS application.
Included in both manual and man page for yaz-ztest
-->
<!--
- $Id: zebrasrv-virtual.xml,v 1.8 2006-09-05 12:01:31 adam Exp $
- Description of the virtual host mechanism in YAZ GFS
+ $Id: zebrasrv-virtual.xml,v 1.9 2007-02-02 11:10:08 marc Exp $
+ Description of the virtual host mechanism in &yaz; GFS
Included in both manual and man page for yaz-ztest
-->
<para>
- The Virtual hosts mechanism allows a YAZ frontend server to
+ The Virtual hosts mechanism allows a &yaz; frontend server to
support multiple backends. A backend is selected on the basis of
the TCP/IP binding (port+listening adddress) and/or the virtual host.
</para>
<para>
A backend can be configured to execute in a particular working
- directory. Or the YAZ frontend may perform <ulink url="&url.cql;">CQL</ulink> to RPN conversion, thus
- allowing traditional Z39.50 backends to be offered as a
-<ulink url="&url.sru;">SRU</ulink> service.
- SRU Explain information for a particular backend may also be specified.
+ directory. Or the &yaz; frontend may perform <ulink url="&url.cql;">&cql;</ulink> to &rpn; conversion, thus
+ allowing traditional &z3950; backends to be offered as a
+<ulink url="&url.sru;">&sru;</ulink> service.
+ &sru; Explain information for a particular backend may also be specified.
</para>
<para>
For the HTTP protocol, the virtual host is specified in the Host header.
- For the Z39.50 protocol, the virtual host is specified as in the
+ For the &z3950; protocol, the virtual host is specified as in the
Initialize Request in the OtherInfo, OID 1.2.840.10003.10.1000.81.1.
</para>
<note>
<para>
- Not all Z39.50 clients allows the VHOST information to be set.
+ Not all &z3950; clients allows the VHOST information to be set.
For those the selection of the backend must rely on the
TCP/IP information alone (port and address).
</para>
</note>
<para>
- The YAZ frontend server uses XML to describe the backend
+ The &yaz; frontend server uses &xml; to describe the backend
configurations. Command-line option <literal>-f</literal>
- specifies filename of the XML configuration.
+ specifies filename of the &xml; configuration.
</para>
<para>
The configuration uses the root element <literal>yazgfs</literal>.
<listitem>
<para>
Specifies a working directory for this backend server. If
- specifid, the YAZ fronend changes current working directory
+ specifid, the &yaz; fronend changes current working directory
to this directory whenever a backend of this type is
started (backend handler bend_start), stopped (backend handler hand_stop)
and initialized (bend_init).
<varlistentry><term>element <literal>cql2rpn</literal> (optional)</term>
<listitem>
<para>
- Specifies a filename that includes <ulink url="&url.cql;">CQL</ulink> to RPN conversion for this
- backend server. See <ulink url="&url.cql;">CQL</ulink> section in YAZ manual.
- If given, the backend server will only "see" a Type-1/RPN query.
+ Specifies a filename that includes <ulink url="&url.cql;">&cql;</ulink> to &rpn; conversion for this
+ backend server. See <ulink url="&url.cql;">&cql;</ulink> section in &yaz; manual.
+ If given, the backend server will only "see" a Type-1/&rpn; query.
</para>
</listitem>
</varlistentry>
<varlistentry><term>element <literal>explain</literal> (optional)</term>
<listitem>
<para>
- Specifies <ulink url="&url.sru;">SRU</ulink> ZeeRex content for this
+ Specifies <ulink url="&url.sru;">&sru;</ulink> ZeeRex content for this
server - copied verbatim to the client.
As things are now, some of the Explain content seems redundant
because host information, etc. is also stored elsewhere.
</para>
<para>
- The XML below configures a server that accepts connections from
+ The &xml; below configures a server that accepts connections from
two ports, TCP/IP port 9900 and a local UNIX file socket.
We name the TCP/IP server <literal>public</literal> and the
other server <literal>internal</literal>.
</para>
<para>
For <literal>"server2"</literal> elements for
-<ulink url="&url.cql;">CQL</ulink> to RPN conversion
+<ulink url="&url.cql;">&cql;</ulink> to &rpn; conversion
is supported and explain information has been added (a short one here
to keep the example small).
</para>
<!ENTITY % common SYSTEM "common/common.ent">
%common;
]>
- <!-- $Id: zebrasrv.xml,v 1.3 2007-01-15 14:55:50 adam Exp $ -->
+ <!-- $Id: zebrasrv.xml,v 1.4 2007-02-02 11:10:08 marc Exp $ -->
<refentry id="zebrasrv">
<refentryinfo>
<productname>zebra</productname>
<refsect1><title>DESCRIPTION</title>
<para>Zebra is a high-performance, general-purpose structured text indexing
and retrieval engine. It reads structured records in a variety of input
- formats (eg. email, XML, MARC) and allows access to them through exact
+ formats (eg. email, &xml;, &marc;) and allows access to them through exact
boolean search expressions and relevance-ranked free-text queries.
</para>
<para>
- <command>zebrasrv</command> is the Z39.50 and SRU frontend
+ <command>zebrasrv</command> is the &z3950; and &sru; frontend
server for the <command>Zebra</command> search engine and indexer.
</para>
<para>
<para>
The options for <command>zebrasrv</command> are the same
- as those for YAZ' <command>yaz-ztest</command>.
+ as those for &yaz;' <command>yaz-ztest</command>.
Option <literal>-c</literal> specifies a Zebra configuration
file - if omitted <filename>zebra.cfg</filename> is read.
</para>
</refsect1>
<refsect1 id="protocol-support">
- <title>Z39.50 Protocol Support and Behavior</title>
+ <title>&z3950; Protocol Support and Behavior</title>
<refsect2 id="zebrasrv-initialization">
- <title>Z39.50 Initialization</title>
+ <title>&z3950; Initialization</title>
<para>
During initialization, the server will negotiate to version 3 of the
- Z39.50 protocol, and the option bits for Search, Present, Scan,
+ &z3950; protocol, and the option bits for Search, Present, Scan,
NamedResultSets, and concurrentOperations will be set, if requested by
the client. The maximum PDU size is negotiated down to a maximum of
1 MB by default.
</refsect2>
<refsect2 id="zebrasrv-search">
- <title>Z39.50 Search</title>
+ <title>&z3950; Search</title>
<para>
The supported query type are 1 and 101. All operators are currently
</refsect2>
<refsect2 id="zebrasrv-present">
- <title>Z39.50 Present</title>
+ <title>&z3950; Present</title>
<para>
The present facility is supported in a standard fashion. The requested
record syntax is matched against the ones supported by the profile of
- each record retrieved. If no record syntax is given, SUTRS is the
+ each record retrieved. If no record syntax is given, &sutrs; is the
default. The requested element set name, again, is matched against any
provided by the relevant record profiles.
</para>
</refsect2>
<refsect2 id="zebrasrv-scan">
- <title>Z39.50 Scan</title>
+ <title>&z3950; Scan</title>
<para>
The attribute combinations provided with the termListAndStartPoint are
processed in the same way as operands in a query (see above).
</para>
</refsect2>
<refsect2 id="zebrasrv-sort">
- <title>Z39.50 Sort</title>
+ <title>&z3950; Sort</title>
<para>
- Z39.50 specifies three different types of sort criteria.
+ &z3950; specifies three different types of sort criteria.
Of these Zebra supports the attribute specification type in which
case the use attribute specifies the "Sort register".
Sort registers are created for those fields that are of type "sort" in
</para>
<para>
- Z39.50 allows the client to specify sorting on one or more input
+ &z3950; allows the client to specify sorting on one or more input
result sets and one output result set.
Zebra supports sorting on one result set only which may or may not
be the same as the output result set.
</para>
</refsect2>
<refsect2 id="zebrasrv-close">
- <title>Z39.50 Close</title>
+ <title>&z3950; Close</title>
<para>
If a Close PDU is received, the server will respond with a Close PDU
with reason=FINISHED, no matter which protocol version was negotiated
</refsect2>
<refsect2 id="zebrasrv-explain">
- <title>Z39.50 Explain</title>
+ <title>&z3950; Explain</title>
<para>
Zebra maintains a "classic"
- <ulink url="&url.z39.50.explain;">Z39.50 Explain</ulink> database
+ <ulink url="&url.z39.50.explain;">&z3950; Explain</ulink> database
on the side.
This database is called <literal>IR-Explain-1</literal> and can be
searched using the attribute set <literal>exp-1</literal>.
</refsect2>
</refsect1>
<refsect1 id="zebrasrv-sru">
- <title>The SRU Server</title>
+ <title>The &sru; Server</title>
<para>
- In addition to Z39.50, Zebra supports the more recent and
- web-friendly IR protocol <ulink url="&url.sru;">SRU</ulink>.
- SRU can be carried over SOAP or a REST-like protocol
- that uses HTTP GET or POST to request search responses. The request
+ In addition to &z3950;, Zebra supports the more recent and
+ web-friendly IR protocol <ulink url="&url.sru;">&sru;</ulink>.
+ &sru; can be carried over &soap; or a &rest;-like protocol
+ that uses HTTP &get; or &post; to request search responses. The request
itself is made of parameters such as
<literal>query</literal>,
<literal>startRecord</literal>,
<literal>maximumRecords</literal>
and
<literal>recordSchema</literal>;
- the response is an XML document containing hit-count, result-set
- records, diagnostics, etc. SRU can be thought of as a re-casting
- of Z39.50 semantics in web-friendly terms; or as a standardisation
+ the response is an &xml; document containing hit-count, result-set
+ records, diagnostics, etc. &sru; can be thought of as a re-casting
+ of &z3950; semantics in web-friendly terms; or as a standardisation
of the ad-hoc query parameters used by search engines such as Google
and AltaVista; or as a superset of A9's OpenSearch (which it
predates).
</para>
<para>
- Zebra supports Z39.50, SRU GET, SRU POST, SRU SOAP (SRW)
+ Zebra supports &z3950;, &sru; &get;, SRU &post;, SRU &soap; (&srw;)
- on the same port, recognising what protocol is used by each incoming
requests and handling them accordingly. This is a achieved through
the use of Deep Magic; civilians are warned not to stand too close.
</para>
<refsect2 id="zebrasrv-sru-run">
- <title>Running zebrasrv as an SRU Server</title>
+ <title>Running zebrasrv as an &sru; Server</title>
<para>
Because Zebra supports all protocols on one port, it would
- seem to follow that the SRU server is run in the same way as
- the Z39.50 server, as described above. This is true, but only in
+ seem to follow that the &sru; server is run in the same way as
+ the &z3950; server, as described above. This is true, but only in
an uninterestingly vacuous way: a Zebra server run in this manner
- will indeed recognise and accept SRU requests; but since it
- doesn't know how to handle the CQL queries that these protocols
+ will indeed recognise and accept &sru; requests; but since it
+ doesn't know how to handle the &cql; queries that these protocols
use, all it can do is send failure responses.
</para>
<note>
<para>
- It is possible to cheat, by having SRU search Zebra with
- a PQF query instead of CQL, using the
+ It is possible to cheat, by having &sru; search Zebra with
+ a &pqf; query instead of &cql;, using the
<literal>x-pquery</literal>
parameter instead of
<literal>query</literal>.
This is a
<emphasis role="strong">non-standard extension</emphasis>
- of CQL, and a
+ of &cql;, and a
<emphasis role="strong">very naughty</emphasis>
- thing to do, but it does give you a way to see Zebra serving SRU
+ thing to do, but it does give you a way to see Zebra serving &sru;
``right out of the box''. If you start your favourite Zebra
server in the usual way, on port 9999, then you can send your web
browser to:
&maximumRecords=1
</screen>
<para>
- This will display the XML-formatted SRU response that includes the
+ This will display the &xml;-formatted &sru; response that includes the
first record in the result-set found by the query
- <literal>mineral</literal>. (For clarity, the SRU URL is shown
+ <literal>mineral</literal>. (For clarity, the &sru; URL is shown
here broken across lines, but the lines should be joined to gether
to make single-line URL for the browser to submit.)
</para>
</note>
<para>
- In order to turn on Zebra's support for CQL queries, it's necessary
- to have the YAZ generic front-end (which Zebra uses) translate them
- into the Z39.50 Type-1 query format that is used internally. And
+ In order to turn on Zebra's support for &cql; queries, it's necessary
+ to have the &yaz; generic front-end (which Zebra uses) translate them
+ into the &z3950; Type-1 query format that is used internally. And
to do this, the generic front-end's own configuration file must be
used. See <xref linkend="gfs-config"/>;
- the salient point for SRU support is that
+ the salient point for &sru; support is that
<command>zebrasrv</command>
must be started with the
<literal>-f frontendConfigFile</literal>
<literal>-c zebraConfigFile</literal>
option,
and that the front-end configuration file must include both a
- reference to the Zebra configuration file and the CQL-to-PQF
+ reference to the Zebra configuration file and the &cql;-to-&pqf;
translator configuration file.
</para>
<para>
<literal>-c</literal>
command-line argument, and the
<literal><cql2rpn></literal>
- element contains the name of the CQL properties file specifying how
- various CQL indexes, relations, etc. are translated into Type-1
+ element contains the name of the &cql; properties file specifying how
+ various &cql; indexes, relations, etc. are translated into Type-1
queries.
</para>
<para>
A zebra server running with such a configuration can then be
- queried using proper, conformant SRU URLs with CQL queries:
+ queried using proper, conformant &sru; URLs with &cql; queries:
</para>
<screen>
http://localhost:9999/Default?version=1.1
</refsect2>
</refsect1>
<refsect1 id="zebrasrv-sru-support">
- <title>SRU Protocol Support and Behavior</title>
+ <title>&sru; Protocol Support and Behavior</title>
<para>
- Zebra running as an SRU server supports SRU version 1.1, including
- CQL version 1.1. In particular, it provides support for the
+ Zebra running as an &sru; server supports SRU version 1.1, including
+ &cql; version 1.1. In particular, it provides support for the
following elements of the protocol.
</para>
<refsect2 id="zebrasrvr-search-and-retrieval">
- <title>SRU Search and Retrieval</title>
+ <title>&sru; Search and Retrieval</title>
<para>
Zebra supports the
- <ulink url="&url.sru.searchretrieve;">SRU searchRetrieve</ulink>
+ <ulink url="&url.sru.searchretrieve;">&sru; searchRetrieve</ulink>
operation.
</para>
<para>
- One of the great strengths of SRU is that it mandates a standard
- query language, CQL, and that all conforming implementations can
+ One of the great strengths of &sru; is that it mandates a standard
+ query language, &cql;, and that all conforming implementations can
therefore be trusted to correctly interpret the same queries. It
is with some shame, then, that we admit that Zebra also supports
an additional query language, our own Prefix Query Format
- (<ulink url="&url.yaz.pqf;">PQF</ulink>).
- A PQF query is submitted by using the extension parameter
+ (<ulink url="&url.yaz.pqf;">&pqf;</ulink>).
+ A &pqf; query is submitted by using the extension parameter
<literal>x-pquery</literal>,
in which case the
<literal>query</literal>
- parameter must be omitted, which makes the request not valid SRU.
+ parameter must be omitted, which makes the request not valid &sru;.
Please feel free to use this facility within your own
- applications; but be aware that it is not only non-standard SRU
+ applications; but be aware that it is not only non-standard &sru;
but not even syntactically valid, since it omits the mandatory
<literal>query</literal> parameter.
</para>
</refsect2>
<refsect2 id="zebrasrv-sru-scan">
- <title>SRU Scan</title>
+ <title>&sru; Scan</title>
<para>
- Zebra supports <ulink url="&url.sru.scan;">SRU scan</ulink>
+ Zebra supports <ulink url="&url.sru.scan;">&sru; scan</ulink>
operation.
- Scanning using CQL syntax is the default, where the
+ Scanning using &cql; syntax is the default, where the
standard <literal>scanClause</literal> parameter is used.
</para>
<para>
In addition, a
- mutant form of SRU scan is supported, using
+ mutant form of &sru; scan is supported, using
the non-standard <literal>x-pScanClause</literal> parameter in
place of the standard <literal>scanClause</literal> to scan on a
- PQF query clause.
+ &pqf; query clause.
</para>
</refsect2>
<refsect2 id="zebrasrv-sru-explain">
- <title>SRU Explain</title>
+ <title>&sru; Explain</title>
<para>
- Zebra supports <ulink url="&url.sru.explain;">SRU explain</ulink>.
+ Zebra supports <ulink url="&url.sru.explain;">&sru; explain</ulink>.
</para>
<para>
The ZeeRex record explaining a database may be requested either
- with a fully fledged SRU request (with
+ with a fully fledged &sru; request (with
<literal>operation</literal>=<literal>explain</literal>
and version-number specified)
- or with a simple HTTP GET at the server's basename.
+ or with a simple HTTP &get; at the server's basename.
The ZeeRex record returned in response is the one embedded
- in the YAZ Frontend Server configuration file that is described in the
+ in the &yaz; Frontend Server configuration file that is described in the
<xref linkend="gfs-config"/>.
</para>
<para>
Unfortunately, the data found in the
- CQL-to-PQF text file must be added by hand-craft into the explain
- section of the YAZ Frontend Server configuration file to be able
+ &cql;-to-&pqf; text file must be added by hand-craft into the explain
+ section of the &yaz; Frontend Server configuration file to be able
to provide a suitable explain record.
Too bad, but this is all extreme
new alpha stuff, and a lot of work has yet to be done ..
</para>
<para>
- There is no linkeage whatsoever between the Z39.50 explain model
- and the SRU explain response (well, at least not implemented
+ There is no linkeage whatsoever between the &z3950; explain model
+ and the &sru; explain response (well, at least not implemented
in Zebra, that is ..). Zebra does not provide a means using
- Z39.50 to obtain the ZeeRex record.
+ &z3950; to obtain the ZeeRex record.
</para>
</refsect2>
<refsect2 id="zebrasrv-non-sru-ops">
- <title>Other SRU operations</title>
+ <title>Other &sru; operations</title>
<para>
- In the Z39.50 protocol, Initialization, Present, Sort and Close
- are separate operations. In SRU, however, these operations do not
+ In the &z3950; protocol, Initialization, Present, Sort and Close
+ are separate operations. In &sru;, however, these operations do not
exist.
</para>
<itemizedlist>
<listitem>
<para>
- SRU has no explicit initialization handshake phase, but
+ &sru; has no explicit initialization handshake phase, but
commences immediately with searching, scanning and explain
operations.
</para>
</listitem>
<listitem>
<para>
- Neither does SRU have a close operation, since the protocol is
+ Neither does &sru; have a close operation, since the protocol is
stateless and each request is self-contained. (It is true that
- multiple SRU request/response pairs may be implemented as
+ multiple &sru; request/response pairs may be implemented as
multiple HTTP request/response pairs over a single persistent
TCP/IP connection; but the closure of that connection is not a
protocol-level operation.)
</listitem>
<listitem>
<para>
- Retrieval in SRU is part of the
+ Retrieval in &sru; is part of the
<literal>searchRetrieve</literal> operation, in which a search
is submitted and the response includes a subset of the records
- in the result set. There is no direct analogue of Z39.50's
+ in the result set. There is no direct analogue of &z3950;'s
Present operation which requests records from an established
- result set. In SRU, this is achieved by sending a subsequent
+ result set. In &sru;, this is achieved by sending a subsequent
<literal>searchRetrieve</literal> request with the query
<literal>cql.resultSetId=</literal><emphasis>id</emphasis> where
<emphasis>id</emphasis> is the identifier of the previously
</listitem>
<listitem>
<para>
- Sorting in CQL is done within the
+ Sorting in &cql; is done within the
<literal>searchRetrieve</literal> operation - in v1.1, by an
explicit <literal>sort</literal> parameter, but the forthcoming
v1.2 or v2.0 will most likely use an extension of the query
- language, <ulink url="&url.cql.sorting;">CQL sorting</ulink>.
+ language, <ulink url="&url.cql.sorting;">&cql; sorting</ulink>.
</para>
</listitem>
</itemizedlist>
<para>
- It can be seen, then, that while Zebra operating as an SRU server
+ It can be seen, then, that while Zebra operating as an &sru; server
does not provide the same set of operations as when operating as a
- Z39.50 server, it does provide equivalent functionality.
+ &z3950; server, it does provide equivalent functionality.
</para>
</refsect2>
</refsect1>
<refsect1 id="zebrasrv-sru-examples">
- <title>SRU Examples</title>
+ <title>&sru; Examples</title>
<para>
Surf into <literal>http://localhost:9999</literal>
to get an explain response, or use
]]></screen>
</para>
<para>
- Even search using PQF queries using the <emphasis>extended naughty
+ Even search using &pqf; queries using the <emphasis>extended naughty
parameter</emphasis> <literal>x-pquery</literal>
<screen><![CDATA[
http://localhost:9999/?version=1.1&operation=searchRetrieve
</refsect1>
- <refsect1 id="gfs-config"><title>YAZ server virtual hosts</title>
+ <refsect1 id="gfs-config"><title>&yaz; server virtual hosts</title>
&zebrasrv-virtual;
</refsect1>