<chapter id="administration">
- <!-- $Id: administration.xml,v 1.48 2007-01-17 13:31:36 marc Exp $ -->
- <title>Administrating Zebra</title>
+ <!-- $Id: administration.xml,v 1.49 2007-02-02 09:58:39 marc Exp $ -->
+ <title>Administrating &zebra;</title>
<!-- ### It's a bit daft that this chapter (which describes half of
the configuration-file formats) is separated from
"recordmodel-grs.xml" (which describes the other half) by the
-->
<para>
- Unlike many simpler retrieval systems, Zebra supports safe, incremental
+ Unlike many simpler retrieval systems, &zebra; supports safe, incremental
updates to an existing index.
</para>
<para>
- Normally, when Zebra modifies the index it reads a number of records
+ Normally, when &zebra; modifies the index it reads a number of records
that you specify.
Depending on your specifications and on the contents of each record
one the following events take place for each record:
<listitem>
<para>
The record is indexed as if it never occurred before.
- Either the Zebra system doesn't know how to identify the record or
- Zebra can identify the record but didn't find it to be already indexed.
+ Either the &zebra; system doesn't know how to identify the record or
+ &zebra; can identify the record but didn't find it to be already indexed.
</para>
</listitem>
</varlistentry>
</para>
<para>
- Please note that in both the modify- and delete- case the Zebra
+ Please note that in both the modify- and delete- case the &zebra;
indexer must be able to generate a unique key that identifies the record
in question (more on this below).
</para>
<para>
- To administrate the Zebra retrieval system, you run the
+ To administrate the &zebra; retrieval system, you run the
<literal>zebraidx</literal> program.
This program supports a number of options which are preceded by a dash,
and a few commands (not preceded by dash).
</para>
<para>
- Both the Zebra administrative tool and the Z39.50 server share a
+ Both the &zebra; administrative tool and the Z39.50 server share a
set of index files and a global configuration file.
The name of the configuration file defaults to
<literal>zebra.cfg</literal>.
Indexing is a per-record process, in which either insert/modify/delete
will occur. Before a record is indexed search keys are extracted from
whatever might be the layout the original record (sgml,html,text, etc..).
- The Zebra system currently supports two fundamental types of records:
+ The &zebra; system currently supports two fundamental types of records:
structured and simple text.
To specify a particular extraction process, use either the
command line option <literal>-t</literal> or specify a
</sect1>
<sect1 id="zebra-cfg">
- <title>The Zebra Configuration File</title>
+ <title>The &zebra; Configuration File</title>
<para>
- The Zebra configuration file, read by <literal>zebraidx</literal> and
+ The &zebra; configuration file, read by <literal>zebraidx</literal> and
<literal>zebrasrv</literal> defaults to <literal>zebra.cfg</literal>
unless specified by <literal>-c</literal> option.
</para>
<listitem>
<para>
Specifies whether the records should be stored internally
- in the Zebra system files.
+ in the &zebra; system files.
If you want to maintain the raw records yourself,
this option should be false (0).
- If you want Zebra to take care of the records for you, it
+ If you want &zebra; to take care of the records for you, it
should be true(1).
</para>
</listitem>
<term>register: <replaceable>register-location</replaceable></term>
<listitem>
<para>
- Specifies the location of the various register files that Zebra uses
+ Specifies the location of the various register files that &zebra; uses
to represent your databases.
See <xref linkend="register-location"/>.
</para>
<term>shadow: <replaceable>register-location</replaceable></term>
<listitem>
<para>
- Enables the <emphasis>safe update</emphasis> facility of Zebra, and
+ Enables the <emphasis>safe update</emphasis> facility of &zebra;, and
tells the system where to place the required, temporary files.
See <xref linkend="shadow-registers"/>.
</para>
<term>estimatehits:: <replaceable>integer</replaceable></term>
<listitem>
<para>
- Controls whether Zebra should calculate approximite hit counts and
+ Controls whether &zebra; should calculate approximite hit counts and
at which hit count it is to be enabled.
A value of 0 disables approximiate hit counts.
For a positive value approximaite hit count is enabled
<term>root: <replaceable>dir</replaceable></term>
<listitem>
<para>
- Specifies a directory base for Zebra. All relative paths
+ Specifies a directory base for &zebra;. All relative paths
given (in profilePath, register, shadow) are based on this
- directory. This setting is useful if your Zebra server
+ directory. This setting is useful if your &zebra; server
is running in a different directory from where
<literal>zebra.cfg</literal> is located.
</para>
<term>passwd: <replaceable>file</replaceable></term>
<listitem>
<para>
- Specifies a file with description of user accounts for Zebra.
+ Specifies a file with description of user accounts for &zebra;.
The format is similar to that known to Apache's htpasswd files
and UNIX' passwd files. Non-empty lines not beginning with
# are considered account lines. There is one account per-line.
<term>passwd.c: <replaceable>file</replaceable></term>
<listitem>
<para>
- Specifies a file with description of user accounts for Zebra.
+ Specifies a file with description of user accounts for &zebra;.
File format is similar to that used by the passwd directive except
that the password are encrypted. Use Apache's htpasswd or similar
for maintenance.
<listitem>
<para>
Specifies permissions (priviledge) for a user that are allowed
- to access Zebra via the passwd system. There are two kinds
+ to access &zebra; via the passwd system. There are two kinds
of permissions currently: read (r) and write(w). By default
users not listed in a permission directive are given the read
privilege. To specify permissions for a user with no
<title>Locating Records</title>
<para>
- The default behavior of the Zebra system is to reference the
+ The default behavior of the &zebra; system is to reference the
records from their original location, i.e. where they were found when you
run <literal>zebraidx</literal>.
That is, when a client wishes to retrieve a record
If your input files are not permanent - for example if you retrieve
your records from an outside source, or if they were temporarily
mounted on a CD-ROM drive,
- you may want Zebra to make an internal copy of them. To do this,
+ you may want &zebra; to make an internal copy of them. To do this,
you specify 1 (true) in the <literal>storeData</literal> setting. When
the Z39.50 server retrieves the records they will be read from the
internal file structures of the system.
To enable indexing with pathname IDs, you must specify
<literal>file</literal> as the value of <literal>recordId</literal>
in the configuration file. In addition, you should set
- <literal>storeKeys</literal> to <literal>1</literal>, since the Zebra
+ <literal>storeKeys</literal> to <literal>1</literal>, since the &zebra;
indexer must save additional information about the contents of each record
in order to modify the indexes correctly at a later time.
</para>
<note>
<para>You cannot start out with a group of records with simple
indexing (no record IDs as in the previous section) and then later
- enable file record Ids. Zebra must know from the first time that you
+ enable file record Ids. &zebra; must know from the first time that you
index the group that
the files should be indexed with file record IDs.
</para>
</para>
<para>
- For instance, the sample GILS records that come with the Zebra
+ For instance, the sample GILS records that come with the &zebra;
distribution contain a unique ID in the data tagged Control-Identifier.
The data is mapped to the Bib-1 use attribute Identifier-standard
(code 1007). To use this field as a record id, specify
<literal>zebraidx</literal>. If you wish to store these, possibly large,
files somewhere else, you must add the <literal>register</literal>
entry to the <literal>zebra.cfg</literal> file.
- Furthermore, the Zebra system allows its file
+ Furthermore, the &zebra; system allows its file
structures to span multiple file systems, which is useful for
managing very large databases.
</para>
The <emphasis>dir</emphasis> specifies a directory in which index files
will be stored and the <emphasis>size</emphasis> specifies the maximum
- size of all files in that directory. The Zebra indexer system fills
+ size of all files in that directory. The &zebra; indexer system fills
each directory in the order specified and use the next specified
directories as needed.
The <emphasis>size</emphasis> is an integer followed by a qualifier
</para>
<para>
- Note that Zebra does not verify that the amount of space specified is
+ Note that &zebra; does not verify that the amount of space specified is
actually available on the directory (file system) specified - it is
your responsibility to ensure that enough space is available, and that
other applications do not attempt to use the free space. In a large
production system, it is recommended that you allocate one or more
- file system exclusively to the Zebra register files.
+ file system exclusively to the &zebra; register files.
</para>
</sect1>
<title>Description</title>
<para>
- The Zebra server supports <emphasis>updating</emphasis> of the index
+ The &zebra; server supports <emphasis>updating</emphasis> of the index
structures. That is, you can add, modify, or remove records from
- databases managed by Zebra without rebuilding the entire index.
+ databases managed by &zebra; without rebuilding the entire index.
Since this process involves modifying structured files with various
references between blocks of data in the files, the update process
is inherently sensitive to system crashes, or to process interruptions:
<para>
You can solve these problems by enabling the shadow register system in
- Zebra.
+ &zebra;.
During the updating procedure, <literal>zebraidx</literal> will temporarily
write changes to the involved files in a set of "shadow
files", without modifying the files that are accessed by the
<title>Overview</title>
<para>
The default ordering of a result set is left up to the server,
- which inside Zebra means sorting in ascending document ID order.
+ which inside &zebra; means sorting in ascending document ID order.
This is not always the order humans want to browse the sometimes
quite large hit sets. Ranking and sorting comes to the rescue.
</para>
Simply put, <literal>dynamic relevance ranking</literal>
sorts a set of retrieved records such that those most likely to be
relevant to your request are retrieved first.
- Internally, Zebra retrieves all documents that satisfy your
+ Internally, &zebra; retrieves all documents that satisfy your
query, and re-orders the hit list to arrange them based on
a measurement of similarity between your query and the content of
each record.
<title>Static Ranking</title>
<para>
- Zebra uses internally inverted indexes to look up term occurencies
+ &zebra; uses internally inverted indexes to look up term occurencies
in documents. Multiple queries from different indexes can be
combined by the binary boolean operations <literal>AND</literal>,
<literal>OR</literal> and/or <literal>NOT</literal> (which
<screen>
staticrank: 1
</screen>
- directive in the main core Zebra configuration file, the internal document
+ directive in the main core &zebra; configuration file, the internal document
keys used for ordering are augmented by a preceding integer, which
contains the static rank of a given document, and the index lists
are ordered
algorithms, which only considers searching in one full-text
index, this one works on multiple indexes at the same time.
More precisely,
- Zebra does boolean queries and searches in specific addressed
+ &zebra; does boolean queries and searches in specific addressed
indexes (there are inverted indexes pointing from terms in the
dictionary to documents and term positions inside documents).
It works like this:
<sect2 id="administration-ranking-sorting">
<title>Sorting</title>
<para>
- Zebra sorts efficiently using special sorting indexes
+ &zebra; sorts efficiently using special sorting indexes
(type=<literal>s</literal>; so each sortable index must be known
at indexing time, specified in the configuration of record
indexing. For example, to enable sorting according to the BIB-1
<note>
<para>
- Extended services are only supported when accessing the Zebra
+ Extended services are only supported when accessing the &zebra;
server using the <ulink url="&url.z39.50;">Z39.50</ulink>
protocol. The <ulink url="&url.sru;">SRU</ulink> protocol does
not support extended services.
<para>
The extended services are not enabled by default in zebra - due to the
- fact that they modify the system. Zebra can be configured
+ fact that they modify the system. &zebra; can be configured
to allow anybody to
search, and to allow only updates for a particular admin user
in the main zebra configuration file <filename>zebra.cfg</filename>.
<screen>
admin:secret
</screen>
- It is essential to configure Zebra to store records internally,
+ It is essential to configure &zebra; to store records internally,
and to support
modifications and deletion of records:
<screen>
<note>
<para>
It is not possible to carry information about record types or
- similar to Zebra when using extended services, due to
+ similar to &zebra; when using extended services, due to
limitations of the <ulink url="&url.z39.50;">Z39.50</ulink>
protocol. Therefore, indexing filters can not be chosen on a
per-record basis. One and only one general XML indexing filter
<row>
<entry><literal>recordIdNumber </literal></entry>
<entry><literal>positive number</literal></entry>
- <entry>Zebra's internal system number,
+ <entry>&zebra;'s internal system number,
not allowed for <literal>recordInsert</literal> or
<literal>specialUpdate</literal> actions which result in fresh
record inserts.
<para>
During all actions, the
usual rules for internal record ID generation apply, unless an
- optional <literal>recordIdNumber</literal> Zebra internal ID or a
+ optional <literal>recordIdNumber</literal> &zebra; internal ID or a
<literal>recordIdOpaque</literal> string identifier is assigned.
The default ID generation is
configured using the <literal>recordId:</literal> from
<para>
Setting of the <literal>recordIdNumber</literal> parameter,
- which must be an existing Zebra internal system ID number, is not
+ which must be an existing &zebra; internal system ID number, is not
allowed during any <literal>recordInsert</literal> or
<literal>specialUpdate</literal> action resulting in fresh record
inserts.
<para>
When retrieving existing
- records indexed with GRS indexing filters, the Zebra internal
+ records indexed with GRS indexing filters, the &zebra; internal
ID number is returned in the field
<literal>/*/id:idzebra/localnumber</literal> in the namespace
<literal>xmlns:id="http://www.indexdata.dk/zebra/"</literal>,
<para>
A new element set for retrieval of internal record
data has been added, which can be used to access minimal records
- containing only the <literal>recordIdNumber</literal> Zebra
+ containing only the <literal>recordIdNumber</literal> &zebra;
internal ID, or the <literal>recordIdOpaque</literal> string
identifier. This works for any indexing filter used.
See <xref linkend="special-retrieval"/>.
records. This identifier will
replace zebra's own automagic identifier generation with a unique
mapping from <literal>recordIdOpaque</literal> to the
- Zebra internal <literal>recordIdNumber</literal>.
+ &zebra; internal <literal>recordIdNumber</literal>.
<emphasis>The opaque <literal>recordIdOpaque</literal> string
identifiers
are not visible in retrieval records, nor are
searchable, so the value of this parameter is
questionable. It serves mostly as a convenient mapping from
- application domain string identifiers to Zebra internal ID's.
+ application domain string identifiers to &zebra; internal ID's.
</emphasis>
</para>
</sect2>
<chapter id="architecture">
- <!-- $Id: architecture.xml,v 1.18 2007-02-01 21:08:52 marc Exp $ -->
- <title>Overview of Zebra Architecture</title>
+ <!-- $Id: architecture.xml,v 1.19 2007-02-02 09:58:39 marc Exp $ -->
+ <title>Overview of &zebra; Architecture</title>
<section id="architecture-representation">
<title>Local Representation</title>
<para>
- As mentioned earlier, Zebra places few restrictions on the type of
+ As mentioned earlier, &zebra; places few restrictions on the type of
data that you can index and manage. Generally, whatever the form of
the data, it is parsed by an input filter specific to that format, and
- turned into an internal structure that Zebra knows how to handle. This
+ turned into an internal structure that &zebra; knows how to handle. This
process takes place whenever the record is accessed - for indexing and
retrieval.
</para>
<para>
The RecordType parameter in the <literal>zebra.cfg</literal> file, or
- the <literal>-t</literal> option to the indexer tells Zebra how to
+ the <literal>-t</literal> option to the indexer tells &zebra; how to
process input records.
Two basic types of processing are available - raw text and structured
data. Raw text is just that, and it is selected by providing the
- argument <emphasis>text</emphasis> to Zebra. Structured records are
+ argument <emphasis>text</emphasis> to &zebra;. Structured records are
all handled internally using the basic mechanisms described in the
subsequent sections.
- Zebra can read structured records in many different formats.
+ &zebra; can read structured records in many different formats.
<!--
How this is done is governed by additional parameters after the
"grs" keyword, separated by "." characters.
<section id="architecture-maincomponents">
<title>Main Components</title>
<para>
- The Zebra system is designed to support a wide range of data management
+ The &zebra; system is designed to support a wide range of data management
applications. The system can be configured to handle virtually any
kind of structured data. Each record in the system is associated with
a <emphasis>record schema</emphasis> which lends context to the data
one database, the system poses no such restrictions.
</para>
<para>
- The Zebra indexer and information retrieval server consists of the
+ The &zebra; indexer and information retrieval server consists of the
following main applications: the <command>zebraidx</command>
indexing maintenance utility, and the <command>zebrasrv</command>
information query and retrieval server. Both are using some of the
<para>
The virtual Debian package <literal>idzebra-2.0</literal>
installs all the necessary packages to start
- working with Zebra - including utility programs, development libraries,
+ working with &zebra; - including utility programs, development libraries,
documentation and modules.
</para>
<section id="componentcore">
- <title>Core Zebra Libraries Containing Common Functionality</title>
+ <title>Core &zebra; Libraries Containing Common Functionality</title>
<para>
- The core Zebra module is the meat of the <command>zebraidx</command>
+ The core &zebra; module is the meat of the <command>zebraidx</command>
indexing maintenance utility, and the <command>zebrasrv</command>
information query and retrieval server binaries. Shortly, the core
libraries are responsible for
<varlistentry>
<term>Index Maintenance</term>
<listitem>
- <para> Zebra maintains Term Dictionaries and ISAM index
+ <para> &zebra; maintains Term Dictionaries and ISAM index
entries in inverted index structures kept on disk. These are
optimized for fast inset, update and delete, as well as good
search performance.
</para>
<para>
The Debian package <literal>libidzebra-2.0</literal>
- contains all run-time libraries for Zebra, the
+ contains all run-time libraries for &zebra;, the
documentation in PDF and HTML is found in
<literal>idzebra-2.0-doc</literal>, and
<literal>idzebra-2.0-common</literal>
- includes common essential Zebra configuration files.
+ includes common essential &zebra; configuration files.
</para>
</section>
<section id="componentindexer">
- <title>Zebra Indexer</title>
+ <title>&zebra; Indexer</title>
<para>
The <command>zebraidx</command>
indexing maintenance utility
</section>
<section id="componentsearcher">
- <title>Zebra Searcher/Retriever</title>
+ <title>&zebra; Searcher/Retriever</title>
<para>
This is the executable which runs the Z39.50/SRU/SRW server and
glues together the core libraries and the filter modules to one
The YAZ server frontend is
a full fledged stateful Z39.50 server taking client
connections, and forwarding search and scan requests to the
- Zebra core indexer.
+ &zebra; core indexer.
</para>
<para>
In addition to Z39.50 requests, the YAZ server frontend acts
The Alvis filter for XML files is an XSLT based input
filter.
It indexes element and attribute content of any thinkable XML format
- using full XPATH support, a feature which the standard Zebra
+ using full XPATH support, a feature which the standard &zebra;
GRS SGML and XML filters lacked. The indexed documents are
parsed into a standard XML DOM tree, which restricts record size
according to availability of memory.
<para>
The Alvis filter
uses XSLT display stylesheets, which let
- the Zebra DB administrator associate multiple, different views on
+ the &zebra; DB administrator associate multiple, different views on
the same XML document type. These views are chosen on-the-fly in
search time.
</para>
<literal>libidzebra-2.0-mod-grs-xml</literal> includes the
<emphasis>grs.xml</emphasis> filter which uses <ulink
url="&url.expat;">Expat</ulink> to
- parse records in XML and turn them into IDZebra's internal GRS node
+ parse records in XML and turn them into ID&zebra;'s internal GRS node
trees. Have also a look at the Alvis XML/XSLT filter described in
the next session.
</para>
</section>
<section id="special-retrieval">
- <title>Retrieval of Zebra internal record data</title>
+ <title>Retrieval of &zebra; internal record data</title>
<para>
- Starting with <literal>Zebra</literal> version 2.0.5 or newer, it is
+ Starting with <literal>&zebra;</literal> version 2.0.5 or newer, it is
possible to use a special element set which has the prefix
<literal>zebra::</literal>.
</para>
<para>
Using this element will, regardless of record type, return
- Zebra's internal index structure/data for a record.
+ &zebra;'s internal index structure/data for a record.
In particular, the regular record filters are not invoked when
these are in use.
This can in some cases make the retrival faster than regular
<tbody>
<row>
<entry><literal>zebra::meta::sysno</literal></entry>
- <entry>Get Zebra record system ID</entry>
+ <entry>Get &zebra; record system ID</entry>
<entry>XML and SUTRS</entry>
</row>
<row>
</row>
<row>
<entry><literal>zebra::meta</literal></entry>
- <entry>Get Zebra record internal metadata</entry>
+ <entry>Get &zebra; record internal metadata</entry>
<entry>XML and SUTRS</entry>
</row>
<row>
specific transformations will be applied to the raw record data.
</para>
<para>
- Also, Zebra internal metadata about the record can be accessed:
+ Also, &zebra; internal metadata about the record can be accessed:
<screen>
Z> f @attr 1=title my
Z> format xml
<chapter id="examples">
- <!-- $Id: examples.xml,v 1.24 2006-09-22 12:34:45 adam Exp $ -->
+ <!-- $Id: examples.xml,v 1.25 2007-02-02 09:58:39 marc Exp $ -->
<title>Example Configurations</title>
<sect1 id="examples-overview">
option to specify an alternative master configuration file.
</para>
<para>
- The master configuration file tells Zebra:
+ The master configuration file tells &zebra;:
<itemizedlist>
<listitem>
<title>Example 1: XML Indexing And Searching</title>
<para>
- This example shows how Zebra can be used with absolutely minimal
+ This example shows how &zebra; can be used with absolutely minimal
configuration to index a body of
<ulink url="&url.xml;">XML</ulink>
documents, and search them using
would you? :-)
</para>
<para>
- Now we need to create a Zebra database to hold and index the XML
+ Now we need to create a &zebra; database to hold and index the XML
records. We do this with the
- Zebra indexer, <command>zebraidx</command>, which is
+ &zebra; indexer, <command>zebraidx</command>, which is
driven by the <literal>zebra.cfg</literal> configuration file.
For our purposes, we don't need any
special behaviour - we can use the defaults - so we can start with a
</screen>
</para>
<para>
- That's all you need for a minimal Zebra configuration. Now you can
+ That's all you need for a minimal &zebra; configuration. Now you can
roll the XML records into the database and build the indexes:
<screen>
zebraidx update records
<literal><Zthes></literal> element.
</para>
<para>
- This is a two-step process. First, we need to tell Zebra that we
+ This is a two-step process. First, we need to tell &zebra; that we
want to support the BIB-1 attribute set. Then we need to tell it
which elements of its record pertain to access point 4.
</para>
<callout arearefs="attset.attset">
<para>
Declare Bib-1 attribute set. See <filename>bib1.att</filename> in
- Zebra's <filename>tab</filename> directory.
+ &zebra;'s <filename>tab</filename> directory.
</para>
</callout>
<callout arearefs="termId">
by exporting a line-drawing done in TGIF, then converted that to the
GIF using a shell-script called "epstogif" which used an appallingly
baroque sequence of conversions, which I would prefer not to pollute
-the Zebra build environment with:
+the &zebra; build environment with:
#!/bin/sh
<chapter id="fields-and-charsets">
- <!-- $Id: field-structure.xml,v 1.11 2007-01-15 14:51:04 adam Exp $ -->
+ <!-- $Id: field-structure.xml,v 1.12 2007-02-02 09:58:39 marc Exp $ -->
<title>Field Structure and Character Sets
</title>
<para>
In order to provide a flexible approach to national character set
- handling, Zebra allows the administrator to configure the set up the
+ handling, &zebra; allows the administrator to configure the set up the
system to handle any 8-bit character set — including sets that
require multi-octet diacritics or other multi-octet characters. The
definition of a character set includes a specification of the
<para>
The character map files are used to define the word tokenization
and character normalization performed before inserting text into
- the inverse indexes. Zebra ships with the predefined character map
+ the inverse indexes. &zebra; ships with the predefined character map
files <filename>tab/*.chr</filename>. Users are allowed to add
and/or modify maps according to their needs.
</para>
<table id="character-map-table" frame="top">
- <title>Character maps predefined in Zebra</title>
+ <title>Character maps predefined in &zebra;</title>
<tgroup cols="3">
<thead>
<row>
<para>
In addition to specifying sort orders, space (blank) handling,
and upper/lowercase folding, you can also use the character map
- files to make Zebra ignore leading articles in sorting records,
+ files to make &zebra; ignore leading articles in sorting records,
or when doing complete field searching.
</para>
<para>
<!ENTITY % common SYSTEM "common/common.ent">
%common;
]>
-<!-- $Id: idzebra-config.xml,v 1.2 2007-01-15 14:55:50 adam Exp $ -->
+<!-- $Id: idzebra-config.xml,v 1.3 2007-02-02 09:58:39 marc Exp $ -->
<refentry id="idzebra-config">
<refentryinfo>
<productname>zebra</productname>
<varlistentry>
<term>--modules</term>
<listitem><para>
- Return directory for Zebra modules.
+ Return directory for &zebra; modules.
</para></listitem>
</varlistentry>
<appendix id="indexdata">
- <!-- $Id: indexdata.xml,v 1.10 2006-09-22 12:34:45 adam Exp $ -->
- <title>About Index Data and the Zebra Server</title>
+ <!-- $Id: indexdata.xml,v 1.11 2007-02-02 09:58:39 marc Exp $ -->
+ <title>About Index Data and the &zebra; Server</title>
<para>
Index Data is a consulting and software-development enterprise that
-<!-- $Id: installation.xml,v 1.33 2007-01-09 09:11:22 adam Exp $ -->
+<!-- $Id: installation.xml,v 1.34 2007-02-02 09:58:39 marc Exp $ -->
<chapter id="installation">
<title>Installation</title>
<para>
- Zebra is written in ANSI C and was implemented with portability in mind.
+ &zebra; is written in ANSI C and was implemented with portability in mind.
We primarily use <ulink url="&url.gcc;">GCC</ulink> on UNIX and
<ulink url="&url.vstudio;">Microsoft Visual C++</ulink> on Windows.
</para>
</para>
<para>
- Zebra can be configured to use the following utilities (most of
+ &zebra; can be configured to use the following utilities (most of
which are optional):
<variablelist>
(required)</term>
<listitem>
<para>
- Zebra uses YAZ to support <ulink url="&url.z39.50;">Z39.50</ulink> /
+ &zebra; uses YAZ to support <ulink url="&url.z39.50;">Z39.50</ulink> /
<ulink url="&url.sru;">SRU</ulink>.
- Also the memory management utilites from YAZ is used by Zebra.
+ Also the memory management utilites from YAZ is used by &zebra;.
</para>
</listitem>
</varlistentry>
<listitem>
<para>
Tcl is required if you need to use the Tcl record filter
- for Zebra. You can find binary packages for Tcl for many
+ for &zebra;. You can find binary packages for Tcl for many
Unices and Windows.
</para>
</listitem>
<listitem>
<para>
GNU Automake and Autoconf are only required if you're
- using the CVS version of Zebra. You do not need these
- if you have fetched a Zebra tar.
+ using the CVS version of &zebra;. You do not need these
+ if you have fetched a &zebra; tar.
</para>
</listitem>
</varlistentry>
<listitem>
<para>
These tools are only required if you're writing
- documentation for Zebra. You need the following
+ documentation for &zebra;. You need the following
Debian packages: docbook, docbook-xml, docbook-xsl,
docbook-utils, xsltproc.
</para>
shell script attempts to guess correct values for various
system-dependent variables used during compilation.
It uses those values to create a <literal>Makefile</literal> in each
- directory of Zebra.
+ directory of &zebra;.
</para>
<para>
<term><literal>index/*.so</literal></term>
<listitem>
<para>
- The <literal>.so</literal>-files are Zebra record filter modules.
+ The <literal>.so</literal>-files are &zebra; record filter modules.
There are modules for reading
MARC (<filename>mod-grs-marc.so</filename>),
XML (<filename>mod-grs-xml.so</filename>) , etc.
<note>
<para>
Using configure option <literal>--disable-shared</literal> builds
- Zebra statically and links "in" Zebra filter code statically, i.e.
+ &zebra; statically and links "in" &zebra; filter code statically, i.e.
no <literal>.so-files</literal> are generated
</para>
</note>
<para>
- You can now use Zebra. If you wish to install it system-wide, then
+ You can now use &zebra;. If you wish to install it system-wide, then
as root type
<screen>
make install
</screen>
- By default this will install the Zebra executables in
+ By default this will install the &zebra; executables in
<filename>/usr/local/bin</filename>,
and the standard configuration files in
<filename>/usr/local/share/idzebra-2.0</filename>. If
apt-get update
</screen>
as <literal>root</literal>, the
- <ulink url="&url.idzebra;">Zebra</ulink> indexer is
+ <ulink url="&url.idzebra;">&zebra;</ulink> indexer is
easily installed issuing
<screen>
apt-get install idzebra-2.0 idzebra-2.0-doc
<section id="installation-debia-nother">
<title>Ubuntu/Debian and GNU/Debian on other platforms</title>
<para>
- These <ulink url="&url.idzebra;">Zebra</ulink>
+ These <ulink url="&url.idzebra;">&zebra;</ulink>
packages are specifically compiled for
GNU/Debian Linux systems. Installation on other
GNU/Debian systems is possible by
apt-get build-dep idzebra-2.0
</screen>
as <literal>root</literal>, the
- <ulink url="&url.idzebra;">Zebra</ulink> indexer is
+ <ulink url="&url.idzebra;">&zebra;</ulink> indexer is
recompiled and installed issuing
<screen>
fakeroot apt-get source --compile idzebra-2.0
</section>
<section id="installation-win32"><title>WIN32</title>
- <para>The easiest way to install Zebra on Windows is by downloading
+ <para>The easiest way to install &zebra; on Windows is by downloading
an installer from
<ulink url="&url.idzebra.download.win32;">here</ulink>.
The installer comes with source too - in case you wish to
- compile Zebra with different Compiler options.
+ compile &zebra; with different Compiler options.
</para>
<para>
- Zebra is shipped with "makefiles" for the NMAKE tool that comes
+ &zebra; is shipped with "makefiles" for the NMAKE tool that comes
with <ulink url="&url.vstudio;">Microsoft Visual C++</ulink>.
Version 2003 and 2005 has been tested. We expect that zebra compiles
with version 6 as well.
<varlistentry>
<term><literal>YAZDIR</literal></term>
<listitem><para>
- Directory of YAZ source. Zebra's makefile expects to find
+ Directory of YAZ source. &zebra;'s makefile expects to find
<filename>yaz.lib</filename>, <filename>yaz.dll</filename>
in <replaceable>yazdir</replaceable><literal>/lib</literal> and
<replaceable>yazdir</replaceable><literal>/bin</literal> respectively.
<term><literal>HAVE_EXPAT</literal>,
<literal>EXPAT_DIR</literal></term>
<listitem><para>
- If <literal>HAVE_EXPAT</literal> is set to 1, Zebra is compiled
+ If <literal>HAVE_EXPAT</literal> is set to 1, &zebra; is compiled
with <ulink url="&url.expat;">Expat</ulink> support.
In this configuration, set
<literal>ZEBRA_DIR</literal> to the Expat source directory.
<term><literal>HAVE_ICONV</literal>,
<literal>ICONV_DIR</literal></term>
<listitem><para>
- If <literal>HAVE_ICONV</literal> is set to 1, Zebra is compiled
+ If <literal>HAVE_ICONV</literal> is set to 1, &zebra; is compiled
with iconv support. In this configuration, set
<literal>ICONV_DIR</literal> to the iconv source directory.
Iconv binaries can be downloaded from
<literal>BZIP2DEF</literal>
</term>
<listitem><para>
- Define these symbols if Zebra is to be compiled with
+ Define these symbols if &zebra; is to be compiled with
<ulink url="&url.bzip2;">BZIP2</ulink> record compression support.
</para></listitem>
</varlistentry>
</para>
<warning>
<para>
- The <literal>DEBUG</literal> setting in the makefile for Zebra must
+ The <literal>DEBUG</literal> setting in the makefile for &zebra; must
be set to the same value as <literal>DEBUG</literal> setting in the
makefile for YAZ.
- If not, the Zebra server/indexer will crash.
+ If not, the &zebra; server/indexer will crash.
</para>
</warning>
<para>
</para>
</note>
<para>
- If you wish to recompile Zebra - for example if you modify
+ If you wish to recompile &zebra; - for example if you modify
settings in the <filename>makefile</filename> you can delete
object files, etc by running.
<screen>
<variablelist>
<varlistentry><term><filename>bin/zebraidx.exe</filename></term>
<listitem><para>
- The Zebra indexer.
+ The &zebra; indexer.
</para></listitem></varlistentry>
<varlistentry><term><filename>bin/zebrasrv.exe</filename></term>
<listitem><para>
- The Zebra server.
+ The &zebra; server.
</para></listitem></varlistentry>
</variablelist>
<section id="installation-upgrade">
- <title>Upgrading from Zebra version 1.3.x</title>
+ <title>Upgrading from &zebra; version 1.3.x</title>
<para>
- Zebra's installation directories have changed a bit. In addition,
+ &zebra;'s installation directories have changed a bit. In addition,
the new loadable modules must be defined in the
master <filename>zebra.cfg</filename> configuration file. The old
version 1.3.x configuration options
</para>
<note>
<para>
- The internal binary register structures have changed; all Zebra
+ The internal binary register structures have changed; all &zebra;
databases must be re-indexed after upgrade.
</para>
</note>
<chapter id="introduction">
- <!-- $Id: introduction.xml,v 1.41 2007-02-01 20:49:05 marc Exp $ -->
+ <!-- $Id: introduction.xml,v 1.42 2007-02-02 09:58:39 marc Exp $ -->
<title>Introduction</title>
<section id="overview">
<title>Overview</title>
+ <para>
+ &zebra; is a free, fast, friendly information management system. It can
+ index records in XML/SGML, MARC, e-mail archives and many other
+ formats, and quickly find them using a combination of boolean
+ searching and relevance ranking. Search-and-retrieve applications can
+ be written using APIs in a wide variety of languages, communicating
+ with the &zebra; server using industry-standard information-retrieval
+ protocols or web services.
+ </para>
+ <para>
+ &zebra; is licensed Open Source, and can be
+ deployed by anyone for any purpose without license fees. The C source
+ code is open to anybody to read and change under the GPL license.
+ </para>
+ <para>
+ &zebra; is a networked component which acts as a reliable &z3950; server
+ for both record/document search, presentation, insert, update and
+ delete operations. In addition, it understands the &sru; family of
+ webservices, which exist in REST GET/POST and truly SOAP flavors.
+ </para>
+ <para>
+ &zebra; is available as MS Windows 2003 Server (32 bit) self-extracting
+ package as well as GNU/Debian Linux (32 bit and 64 bit) precompiled
+ packages. It has been deployed successfully on other Unix systems,
+ including Sun Sparc, HP Unix, and many variants of Linux and BSD
+ based systems.
+ </para>
+ <para>
+ <ulink url="http://www.indexdata.com/zebra/">http://www.indexdata.com/zebra/</ulink>
+ <ulink url="http://ftp.indexdata.dk/pub/zebra/win32/">http://ftp.indexdata.dk/pub/zebra/win32/</ulink>
+ <ulink url="http://ftp.indexdata.dk/pub/zebra/debian/">http://ftp.indexdata.dk/pub/zebra/debian/</ulink>
+ </para>
+
<para>
- <ulink url="http://indexdata.dk/zebra/">Zebra</ulink>
+ <ulink url="http://indexdata.dk/zebra/">&zebra;</ulink>
is a high-performance, general-purpose structured text
indexing and retrieval engine. It reads records in a
variety of input formats (eg. email, XML, MARC) and provides access
</para>
<para>
- Zebra supports large databases (tens of millions of records,
+ &zebra; supports large databases (tens of millions of records,
tens of gigabytes of data). It allows safe, incremental
- database updates on live systems. Because Zebra supports
+ database updates on live systems. Because &zebra; supports
the industry-standard information retrieval protocol, Z39.50,
- you can search Zebra databases using an enormous variety of
+ you can search &zebra; databases using an enormous variety of
programs and toolkits, both commercial and free, which understand
this protocol. Application libraries are available to allow
bespoke clients to be written in Perl, C, C++, Java, Tcl, Visual
</para>
<para>
- This document is an introduction to the Zebra system. It explains
+ This document is an introduction to the &zebra; system. It explains
how to compile the software, how to prepare your first database,
and how to configure the server to give you the
functionality that you need.
</section>
<section id="features">
- <title>Zebra Features Overview</title>
+ <title>&zebra; Features Overview</title>
<table id="table-features-overview" frame="top">
- <title>Zebra Features Overview</title>
+ <title>&zebra; Features Overview</title>
<tgroup cols="4">
<thead>
<row>
<entry>Document storage</entry>
<entry>Index-only, Key storage, Document storage</entry>
<entry>Data can be, and usually is, imported
- into Zebra's own storage, but Zebra can also refer to
+ into &zebra;'s own storage, but &zebra; can also refer to
external files, building and maintaining indexes of "live"
collections.</entry>
<entry><xref linkend=""/></entry>
<row>
<entry>Supported Platforms</entry>
<entry>UNIX, Linux, Windows (NT/2000/2003/XP)</entry>
- <entry>Zebra is written in portable C, so it runs on most
+ <entry>&zebra; is written in portable C, so it runs on most
Unix-like systems as well as Windows (NT/2000/2003/XP). Binary
distributions are
available for GNU/Debian Linux and Windows</entry>
</section>
<section id="introduction-apps">
- <title>References and Zebra based Applications</title>
+ <title>References and &zebra; based Applications</title>
<para>
- Zebra has been deployed in numerous applications, in both the
+ &zebra; has been deployed in numerous applications, in both the
academic and commercial worlds, in application domains as diverse
as bibliographic catalogues, geospatial information, structured
vocabulary browsing, government information locators, civic
<para>
<ulink url="http://liblime.com/">LibLime</ulink>,
a company that is marketing and supporting Koha, adds in
- the new release of Koha 3.0 the Zebra
+ the new release of Koha 3.0 the &zebra;
database server to drive its bibliographic database.
</para>
<para>
in the Koha 2.x series. After extensive evaluations of the best
of the Open Source textual database engines - including MySQL
full-text searching, PostgreSQL, Lucene and Plucene - the team
- selected Zebra.
+ selected &zebra;.
</para>
<para>
- "Zebra completely eliminates scalability limitations, because it
+ "&zebra; completely eliminates scalability limitations, because it
can support tens of millions of records." explained Joshua
Ferraro, LibLime's Technology President and Koha's Project
Release Manager. "Our performance tests showed search results in
modest i386 900Mhz test server."
</para>
<para>
- "Zebra also includes support for true boolean search expressions
+ "&zebra; also includes support for true boolean search expressions
and relevance-ranked free-text queries, both of which the Koha
- 2.x series lack. Zebra also supports incremental and safe
+ 2.x series lack. &zebra; also supports incremental and safe
database updates, which allow on-the-fly record
- management. Finally, since Zebra has at its heart the Z39.50
+ management. Finally, since &zebra; has at its heart the Z39.50
protocol, it greatly improves Koha's support for that critical
library standard."
</para>
<para>
- Although the bibliographic database will be moved to Zebra, Koha
+ Although the bibliographic database will be moved to &zebra;, Koha
3.0 will continue to use a relational SQL-based database design
for the 'factual' database. "Relational database managers have
their strengths, in spite of their inability to handle large
</para>
<para>
As a surplus, 100% MARC compatibility has been achieved using the
- Zebra Server from Index Data as backend server.
+ &zebra; Server from Index Data as backend server.
</para>
</section>
UTF8-encoding.
</para>
<para>
- Reindex.net runs on GNU/Debian Linux with Zebra and Simpleserver
+ Reindex.net runs on GNU/Debian Linux with &zebra; and Simpleserver
from Index
Data for bibliographic data. The relational database system
Sybase 9 XML is used for
bioinformatics.
</para>
<para>
- The Zebra information retrieval indexing machine is used inside
+ The &zebra; information retrieval indexing machine is used inside
the Alvis framework to
manage huge collections of natural language processed and
enhanced XML data, coming from a topic relevant web crawl.
- In this application, Zebra swallows and manages 37GB of XML data
+ In this application, &zebra; swallows and manages 37GB of XML data
in about 4 hours, resulting in search times of fractions of
seconds.
</para>
The member libraries send in data files representing their
periodicals, including both brief bibliographic data and summary
holdings. Then 21 individual Z39.50 targets are created, each
- using Zebra, and all mounted on the single hardware server.
+ using &zebra;, and all mounted on the single hardware server.
The live service provides a web gateway allowing Z39.50 searching
- of all of the targets or a selection of them. Zebra's small
+ of all of the targets or a selection of them. &zebra;'s small
footprint allows a relatively modest system to comfortably host
the 21 servers.
</para>
<!-- <ulink
url="http://ki212.fernuni-hagen.de/nli/NLIintro.html"/> -->
In order to evaluate this interface for recall and precision, they
- chose Zebra as the basis for retrieval effectiveness. The Zebra
+ chose &zebra; as the basis for retrieval effectiveness. The &zebra;
server contains a copy of the GIRT database, consisting of more
than 76000 records in SGML format (bibliographic records from
social science), which are mapped to MARC for presentation.
<section id="various-web-indexes">
<title>Various web indexes</title>
<para>
- Zebra has been used by a variety of institutions to construct
+ &zebra; has been used by a variety of institutions to construct
indexes of large web sites, typically in the region of tens of
millions of pages. In this role, it functions somewhat similarly
to the engine of google or altavista, but for a selected intranet
For example, Liverpool University's web-search facility (see on
the home page at
<ulink url="http://www.liv.ac.uk/"/>
- and many sub-pages) works by relevance-searching a Zebra database
+ and many sub-pages) works by relevance-searching a &zebra; database
which is populated by the Harvest-NG web-crawling software.
</para>
<para>
</para>
<para>
Kang-Jin Lee
- has recently modified the Harvest web indexer to use Zebra as
+ has recently modified the Harvest web indexer to use &zebra; as
its native repository engine. His comments on the switch over
from the old engine are revealing:
<blockquote>
<para>
- The first results after some testing with Zebra are very
+ The first results after some testing with &zebra; are very
promising. The tests were done with around 220,000 SOIF files,
which occupies 1.6GB of disk space.
</para>
<para>
- Building the index from scratch takes around one hour with Zebra
+ Building the index from scratch takes around one hour with &zebra;
where [old-engine] needs around five hours. While [old-engine]
- blocks search requests when updating its index, Zebra can still
+ blocks search requests when updating its index, &zebra; can still
answer search requests.
[...]
- Zebra supports incremental indexing which will speed up indexing
+ &zebra; supports incremental indexing which will speed up indexing
even further.
</para>
<para>
While the search time of [old-engine] varies from some seconds
- to some minutes depending how expensive the query is, Zebra
+ to some minutes depending how expensive the query is, &zebra;
usually takes around one to three seconds, even for expensive
queries.
[...]
- Zebra can search more than 100 times faster than [old-engine]
+ &zebra; can search more than 100 times faster than [old-engine]
and can process multiple search requests simultaneously
</para>
<para>
<section id="introduction-support">
<title>Support</title>
<para>
- You can get support for Zebra from at least three sources.
+ You can get support for &zebra; from at least three sources.
</para>
<para>
- First, there's the Zebra web site at
+ First, there's the &zebra; web site at
<ulink url="&url.idzebra;"/>,
which always has the most recent version available for download.
- If you have a problem with Zebra, the first thing to do is see
+ If you have a problem with &zebra;, the first thing to do is see
whether it's fixed in the current release.
</para>
<para>
- Second, there's the Zebra mailing list. Its home page at
+ Second, there's the &zebra; mailing list. Its home page at
<ulink url="&url.idzebra.mailinglist;"/>
includes a complete archive of all messages that have ever been
- posted on the list. The Zebra mailing list is used both for
+ posted on the list. The &zebra; mailing list is used both for
announcements from the authors (new
releases, bug fixes, etc.) and general discussion. You are welcome
to seek support there. Join by filling the form on the list home page.
<listitem>
<para>
Improved support for XML in search and retrieval. Eventually,
- the goal is for Zebra to pull double duty as a flexible
+ the goal is for &zebra; to pull double duty as a flexible
information retrieval engine and high-performance XML
repository. The recent addition of XPath searching is one
example of the kind of enhancement we're working on.
on this filter has been sponsored by the ALVIS EU project
<ulink url="http://www.alvis.info/alvis/"/>. We expect this filter to
mature soon, as it is planned to be included in the version 2.0
- release of Zebra.
+ release of &zebra;.
</para>
</listitem>
<listitem>
<para>
- Finalisation and documentation of Zebra's C programming
+ Finalisation and documentation of &zebra;'s C programming
API, allowing updates, database management and other functions
not readily expressed in Z39.50. We will also consider
exposing the API through SOAP.
<appendix id="license">
- <!-- $Id: license.xml,v 1.14 2007-01-15 15:10:16 adam Exp $ -->
+ <!-- $Id: license.xml,v 1.15 2007-02-02 09:58:39 marc Exp $ -->
<title>License</title>
<para>
- Zebra Server,
+ &zebra; Server,
Copyright © 1995-2007 Index Data ApS.
</para>
<para>
- Zebra is free software; you can redistribute it and/or modify it under
+ &zebra; is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free
Software Foundation; either version 2, or (at your option) any later
version.
</para>
<para>
- Zebra is distributed in the hope that it will be useful, but WITHOUT ANY
+ &zebra; is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
for more details.
<para>
You should have received a copy of the GNU General Public License
- along with Zebra; see the file LICENSE.zebra. If not, write to the
+ along with &zebra;; see the file LICENSE.zebra. If not, write to the
Free Software Foundation,
51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
</para>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
-<!-- $Id: marc_indexing.xml,v 1.3 2006-02-15 11:07:47 marc Exp $ -->
+<!-- $Id: marc_indexing.xml,v 1.4 2007-02-02 09:58:39 marc Exp $ -->
<book id="marc_indexing">
<bookinfo>
- <title>Indexing of MARC records by Zebra</title>
+ <title>Indexing of MARC records by &zebra;</title>
<abstract>
- <simpara>Zebra is suitable for distribution of MARC records via Z39.50. We
+ <simpara>&zebra; is suitable for distribution of MARC records via Z39.50. We
have a several possibilities to describe the indexing process of MARC records.
This document shows these possibilities.
</simpara>
<para>At the beginning, we have to define the term <emphasis>index-formula</emphasis>
for MARC records. This term helps to understand the notation of extended indexing of MARC records
-by Zebra. Our definition is based on the document <ulink url="http://www.rba.ru/rusmarc/soft/Z39-50.htm">"The
+by &zebra;. Our definition is based on the document <ulink url="http://www.rba.ru/rusmarc/soft/Z39-50.htm">"The
table of conformity for Z39.50 use attributes and RUSMARC fields"</ulink>.
The document is available only in russian language.</para>
71-00$a, $g, $h ($c){.$b ($c)} , (1)
</screen>
-<para>We know that Zebra supports a Bib-1 attribute - right truncation.
+<para>We know that &zebra; supports a Bib-1 attribute - right truncation.
In this case, the <emphasis>index-formula</emphasis> (1) consists from
forms, defined in the same way as (1)</para>
</sect1>
<sect1 id="notation">
-<title>Notation of <emphasis>index-formula</emphasis> for Zebra</title>
+<title>Notation of <emphasis>index-formula</emphasis> for &zebra;</title>
<para>Extended indexing overloads <literal>path</literal> of
-<literal>elm</literal> definition in abstract syntax file of Zebra
+<literal>elm</literal> definition in abstract syntax file of &zebra;
(<literal>.abs</literal> file). It means that names beginning with
-<literal>"mc-"</literal> are interpreted by Zebra as
+<literal>"mc-"</literal> are interpreted by &zebra; as
<emphasis>index-formula</emphasis>. The database index is created and
linked with <emphasis>access point</emphasis> (Bib-1 use attribute)
according to this formula.</para>
elm 70._1_$a,_$g_ Author !:w,!:p
</screen>
-<para>When Zebra finds a field according to <literal>"70."</literal> pattern it checks
+<para>When &zebra; finds a field according to <literal>"70."</literal> pattern it checks
the indicators. In this case the value of first indicator doesn't mater, but
the value of second one must be whitespace, in another case a field is not
indexed.</para>
<chapter id="querymodel">
- <!-- $Id: querymodel.xml,v 1.28 2007-01-17 12:59:38 adam Exp $ -->
+ <!-- $Id: querymodel.xml,v 1.29 2007-02-02 09:58:39 marc Exp $ -->
<title>Query Model</title>
<section id="querymodel-overview">
<title>Query Languages</title>
<para>
- Zebra is born as a networking Information Retrieval engine adhering
+ &zebra; is born as a networking Information Retrieval engine adhering
to the international standards
<ulink url="&url.z39.50;">Z39.50</ulink> and
<ulink url="&url.sru;">SRU</ulink>,
<emphasis>Prefix Query Notation</emphasis>, or in short
PQN. See
<xref linkend="querymodel-rpn"/> for further explanations and
- descriptions of Zebra's capabilities.
+ descriptions of &zebra;'s capabilities.
</para>
</section>
<ulink url="&url.cql;">CQL</ulink> is not natively supported.
</para>
<para>
- Zebra can be configured to understand and map CQL to PQF. See
+ &zebra; can be configured to understand and map CQL to PQF. See
<xref linkend="querymodel-cql-to-pqf"/>.
</para>
</section>
<section id="querymodel-operation-types">
<title>Operation types</title>
<para>
- Zebra supports all of the three different
+ &zebra; supports all of the three different
Z39.50/SRU operations defined in the
standards: explain, search,
and scan. A short description of the
The <ulink url="&url.yaz.pqf;">PQF grammar</ulink>
is documented in the YAZ manual, and shall not be
repeated here. This textual PQF representation
- is not transmistted to Zebra during search, but it is in the
+ is not transmistted to &zebra; during search, but it is in the
client mapped to the equivalent Z39.50 binary
query parse tree.
</para>
<title>Attribute sets</title>
<para>
Attribute sets define the exact meaning and semantics of queries
- issued. Zebra comes with some predefined attribute set
+ issued. &zebra; comes with some predefined attribute set
definitions, others can easily be defined and added to the
configuration.
</para>
<table id="querymodel-attribute-sets-table" frame="top">
- <title>Attribute sets predefined in Zebra</title>
+ <title>Attribute sets predefined in &zebra;</title>
<tgroup cols="4">
<thead>
<row>
<entry>Standard PQF query language attribute set which defines the
semantics of Z39.50 searching. In addition, all of the
non-use attributes (types 2-12) define the hard-wired
- Zebra internal query
+ &zebra; internal query
processing.</entry>
<entry>default</entry>
</row>
<note>
<para>
- The Zebra internal query processing is modeled after
+ The &zebra; internal query processing is modeled after
the Bib-1 attribute set, and the non-use
attributes type 2-6 are hard-wired in. It is therefore essential
to be familiar with <xref linkend="querymodel-bib1-nonuse"/>.
<para>
Atomic (APT) queries are always leaf nodes in the PQF query tree.
UN-supplied non-use attributes types 2-12 are either inherited from
- higher nodes in the query tree, or are set to Zebra's default values.
+ higher nodes in the query tree, or are set to &zebra;'s default values.
See <xref linkend="querymodel-bib1"/> for details.
</para>
<entry>List of <emphasis>orthogonal</emphasis> attributes</entry>
<entry>Any of the orthogonal attribute types may be omitted,
these are inherited from higher query tree nodes, or if not
- inherited, are set to the default Zebra configuration values.
+ inherited, are set to the default &zebra; configuration values.
</entry>
</row>
<row>
<section id="querymodel-resultset">
<title>Named Result Sets</title>
<para>
- Named result sets are supported in Zebra, and result sets can be
+ Named result sets are supported in &zebra;, and result sets can be
used as operands without limitations. It follows that named
result sets are leaf nodes in the PQF query tree, exactly as
atomic APT queries are.
<para>
Named result sets are only supported by the Z39.50 protocol.
The SRU web service is stateless, and therefore the notion of
- named result sets does not exist when accessing a Zebra server by
+ named result sets does not exist when accessing a &zebra; server by
the SRU protocol.
</para>
</note>
</section>
<section id="querymodel-use-string">
- <title>Zebra's special access point of type 'string'</title>
+ <title>&zebra;'s special access point of type 'string'</title>
<para>
The numeric <emphasis>use (type 1)</emphasis> attribute is usually
referred to from a given
- attribute set. In addition, Zebra let you use
+ attribute set. In addition, &zebra; let you use
<emphasis>any internal index
name defined in your configuration</emphasis>
as use attribute value. This is a great feature for
debugging, and when you do
not need the complexity of defined use attribute values. It is
- the preferred way of accessing Zebra indexes directly.
+ the preferred way of accessing &zebra; indexes directly.
</para>
<para>
Finding all documents which have the term list "information
- retrieval" in an Zebra index, using it's internal full string
+ retrieval" in an &zebra; index, using it's internal full string
name. Scanning the same index.
<screen>
Z> find @attr 1=sometext "information retrieval"
</section>
<section id="querymodel-use-xpath">
- <title>Zebra's special access point of type 'XPath'
+ <title>&zebra;'s special access point of type 'XPath'
for GRS filters</title>
<para>
As we have seen above, it is possible (albeit seldom a great
<ulink url="&url.z39.50.explain;">Explain</ulink> attribute set
Exp-1, which is used to discover information
about a server's search semantics and functional capabilities
- Zebra exposes a "classic"
+ &zebra; exposes a "classic"
Explain database by base name <literal>IR-Explain-1</literal>, which
is populated with system internal information.
</para>
<para>
Classic Explain only defines retrieval of Explain information
via ASN.1. Practically no Z39.50 clients supports this. Fortunately
- they don't have to - Zebra allows retrieval of this information
+ they don't have to - &zebra; allows retrieval of this information
in other formats:
<literal>SUTRS</literal>, <literal>XML</literal>,
<literal>GRS-1</literal> and <literal>ASN.1</literal> Explain.
<para>
Get attribute details record for database
<literal>Default</literal>.
- This query is very useful to study the internal Zebra indexes.
+ This query is very useful to study the internal &zebra; indexes.
If records have been indexed using the <literal>alvis</literal>
XSLT filter, the string representation names of the known indexes can be
found.
Attribute Set</ulink>
version from 2003. Index Data is not the copyright holder of this
information, except for the configuration details, the listing of
- Zebra's capabilities, and the example queries.
+ &zebra;'s capabilities, and the example queries.
</para>
be sourced in the main configuration <filename>zebra.cfg</filename>.
</para>
<para>
- In addition, Zebra allows the access of
+ In addition, &zebra; allows the access of
<emphasis>internal index names</emphasis> and <emphasis>dynamic
XPath</emphasis> as use attributes; see
<xref linkend="querymodel-use-string"/> and
<section id="querymodel-bib1-nonuse">
- <title>Zebra general Bib1 Non-Use Attributes (type 2-6)</title>
+ <title>&zebra; general Bib1 Non-Use Attributes (type 2-6)</title>
<section id="querymodel-bib1-relation">
<title>Relation Attributes (type 2)</title>
<note>
<para>
- Zebra only supports first-in-field seaches if the
+ &zebra; only supports first-in-field seaches if the
<literal>firstinfield</literal> is enabled for the index
Refer to <xref linkend="default-idx-file"/>.
- Zebra does not distinguish between first in field and
+ &zebra; does not distinguish between first in field and
first in subfield. They result in the same hit count.
- Searching for first position in (sub)field in only supported in Zebra
+ Searching for first position in (sub)field in only supported in &zebra;
2.0.2 and later.
</para>
</note>
<para>
The structure attribute specifies the type of search
term. This causes the search to be mapped on
- different Zebra internal indexes, which must have been defined
+ different &zebra; internal indexes, which must have been defined
at index time.
</para>
<para>
The structure attribute value
<literal>Local number (107)</literal>
- is supported, and maps always to the Zebra internal document ID,
+ is supported, and maps always to the &zebra; internal document ID,
irrespectively which use attribute is specified. The following queries
have exactly the same unique record in the hit set:
<screen>
</para>
<note>
<para>
- The exact mapping between PQF queries and Zebra internal indexes
+ The exact mapping between PQF queries and &zebra; internal indexes
and index types is explained in
<xref linkend="querymodel-pqf-apt-mapping"/>.
</para>
<para>
The truncation attribute value
- <literal>Regexp-2 (103) </literal> is a Zebra specific extension
+ <literal>Regexp-2 (103) </literal> is a &zebra; specific extension
which allows <emphasis>fuzzy</emphasis> matches. One single
error in spelling of search terms is allowed, i.e., a document
is hit if it includes a term which can be mapped to the used
</para>
<para>
<literal>Incomplete subfield (1)</literal> is the default, and
- makes Zebra use
+ makes &zebra; use
register <literal>type="w"</literal>, whereas
<literal>Complete field (3)</literal> triggers
search and scan in index <literal>type="p"</literal>.
<para>
The <literal>Complete subfield (2)</literal> is a reminiscens
from the happy <literal>MARC</literal>
- binary format days. Zebra does not support it, but maps silently
+ binary format days. &zebra; does not support it, but maps silently
to <literal>Complete field (3)</literal>.
</para>
<note>
<para>
- The exact mapping between PQF queries and Zebra internal indexes
+ The exact mapping between PQF queries and &zebra; internal indexes
and index types is explained in
<xref linkend="querymodel-pqf-apt-mapping"/>.
</para>
<section id="querymodel-zebra">
- <title>Extended Zebra RPN Features</title>
+ <title>Extended &zebra; RPN Features</title>
<para>
- The Zebra internal query engine has been extended to specific needs
+ The &zebra; internal query engine has been extended to specific needs
not covered by the <literal>bib-1</literal> attribute set query
model. These extensions are <emphasis>non-standard</emphasis>
and <emphasis>non-portable</emphasis>: most functional extensions
</para>
<section id="querymodel-zebra-attr-allrecords">
- <title>Zebra specific retrieval of all records</title>
+ <title>&zebra; specific retrieval of all records</title>
<para>
- Zebra defines a hardwired <literal>string</literal> index name
+ &zebra; defines a hardwired <literal>string</literal> index name
called <literal>_ALLRECORDS</literal>. It matches any record
contained in the database, if used in conjunction with
the relation attribute
<para>
The special string index <literal>_ALLRECORDS</literal> is
experimental, and the provided functionality and syntax may very
- well change in future releases of Zebra.
+ well change in future releases of &zebra;.
</para>
</warning>
</section>
<section id="querymodel-zebra-attr-search">
- <title>Zebra specific Search Extensions to all Attribute Sets</title>
+ <title>&zebra; specific Search Extensions to all Attribute Sets</title>
<para>
- Zebra extends the Bib-1 attribute types, and these extensions are
+ &zebra; extends the Bib-1 attribute types, and these extensions are
recognized regardless of attribute
set used in a <literal>search</literal> operation query.
</para>
<table id="querymodel-zebra-attr-search-table" frame="top">
- <title>Zebra Search Attribute Extensions</title>
+ <title>&zebra; Search Attribute Extensions</title>
<tgroup cols="4">
<thead>
<row>
<entry>Name</entry>
<entry>Value</entry>
<entry>Operation</entry>
- <entry>Zebra version</entry>
+ <entry>&zebra; version</entry>
</row>
</thead>
<tbody>
</table>
<section id="querymodel-zebra-attr-sorting">
- <title>Zebra Extension Embedded Sort Attribute (type 7)</title>
+ <title>&zebra; Extension Embedded Sort Attribute (type 7)</title>
<para>
The embedded sort is a way to specify sort within a query - thus
removing the need to send a Sort Request separately. It is both
</section>
<!--
- Zebra Extension Term Set Attribute
+ &zebra; Extension Term Set Attribute
From the manual text, I can not see what is the point with this feature.
I think it makes more sense when there are multiple terms in a query, or
something...
<!--
<section id="querymodel-zebra-attr-estimation">
- <title>Zebra Extension Term Set Attribute (type 8)</title>
+ <title>&zebra; Extension Term Set Attribute (type 8)</title>
<para>
The Term Set feature is a facility that allows a search to store
hitting terms in a "pseudo" resultset; thus a search (as usual) +
<section id="querymodel-zebra-attr-weight">
- <title>Zebra Extension Rank Weight Attribute (type 9)</title>
+ <title>&zebra; Extension Rank Weight Attribute (type 9)</title>
<para>
Rank weight is a way to pass a value to a ranking algorithm - so
that one APT has one value - while another as a different one.
</section>
<section id="querymodel-zebra-attr-termref">
- <title>Zebra Extension Term Reference Attribute (type 10)</title>
+ <title>&zebra; Extension Term Reference Attribute (type 10)</title>
<para>
- Zebra supports the searchResult-1 facility.
+ &zebra; supports the searchResult-1 facility.
If the Term Reference Attribute (type 10) is
given, that specifies a subqueryId value returned as part of the
search result. It is a way for a client to name an APT part of a
<section id="querymodel-zebra-local-attr-limit">
<title>Local Approximative Limit Attribute (type 11)</title>
<para>
- Zebra computes - unless otherwise configured -
+ &zebra; computes - unless otherwise configured -
the exact hit count for every APT
(leaf) in the query tree. These hit counts are returned as part of
the searchResult-1 facility in the binary encoded Z39.50 search
</para>
<para>
By setting an estimation limit size of the resultset of the APT
- leaves, Zebra stoppes processing the result set when the limit
+ leaves, &zebra; stoppes processing the result set when the limit
length is reached.
Hit counts under this limit are still precise, but hit counts over it
are estimated using the statistics gathered from the chopped
<section id="querymodel-zebra-global-attr-limit">
<title>Global Approximative Limit Attribute (type 12)</title>
<para>
- By default Zebra computes precise hit counts for a query as
+ By default &zebra; computes precise hit counts for a query as
a whole. Setting attribute 12 makes it perform approximative
hit counts instead. It has the same semantics as
<literal>estimatehits</literal> for the <xref linkend="zebra-cfg"/>.
</section>
<section id="querymodel-zebra-attr-scan">
- <title>Zebra specific Scan Extensions to all Attribute Sets</title>
+ <title>&zebra; specific Scan Extensions to all Attribute Sets</title>
<para>
- Zebra extends the Bib1 attribute types, and these extensions are
+ &zebra; extends the Bib1 attribute types, and these extensions are
recognized regardless of attribute
set used in a scan operation query.
</para>
<table id="querymodel-zebra-attr-scan-table" frame="top">
- <title>Zebra Scan Attribute Extensions</title>
+ <title>&zebra; Scan Attribute Extensions</title>
<tgroup cols="4">
<thead>
<row>
<entry>Name</entry>
<entry>Type</entry>
<entry>Operation</entry>
- <entry>Zebra version</entry>
+ <entry>&zebra; version</entry>
</row>
</thead>
<tbody>
</table>
<section id="querymodel-zebra-attr-narrow">
- <title>Zebra Extension Result Set Narrow (type 8)</title>
+ <title>&zebra; Extension Result Set Narrow (type 8)</title>
<para>
If attribute Result Set Narrow (type 8)
is given for scan, the value is the name of a
</para>
<para>
- Zebra 2.0.2 and later is able to skip 0 hit counts. This, however,
+ &zebra; 2.0.2 and later is able to skip 0 hit counts. This, however,
is known not to scale if the number of terms to skip is high.
This most likely will happen if the result set is small (and
result in many 0 hits).
</section>
<section id="querymodel-zebra-attr-approx">
- <title>Zebra Extension Approximative Limit (type 11)</title>
+ <title>&zebra; Extension Approximative Limit (type 11)</title>
<para>
- The Zebra Extension Approximative Limit (type 11) is a way to
+ The &zebra; Extension Approximative Limit (type 11) is a way to
enable approximate hit counts for scan hit counts, in the same
way as for search hit counts.
</para>
</section>
<section id="querymodel-idxpath">
- <title>Zebra special IDXPATH Attribute Set for GRS indexing</title>
+ <title>&zebra; special IDXPATH Attribute Set for GRS indexing</title>
<para>
The attribute-set <literal>idxpath</literal> consists of a single
Use (type 1) attribute. All non-use attributes behave as normal.
<literal>xpath enable</literal> option in the GRS filter
<filename>*.abs</filename> configuration files. If one wants to use
the special <literal>idxpath</literal> numeric attribute set, the
- main Zebra configuration file <filename>zebra.cfg</filename>
+ main &zebra; configuration file <filename>zebra.cfg</filename>
directive <literal>attset: idxpath.att</literal> must be enabled.
</para>
<warning>
<para>
The <literal>idxpath</literal> is deprecated, may not be
- supported in future Zebra versions, and should definitely
+ supported in future &zebra; versions, and should definitely
not be used in production code.
</para>
</warning>
</warning>
<table id="querymodel-idxpath-use-table" frame="top">
- <title>Zebra specific IDXPATH Use Attributes (type 1)</title>
+ <title>&zebra; specific IDXPATH Use Attributes (type 1)</title>
<tgroup cols="4">
<thead>
<row>
<section id="querymodel-pqf-apt-mapping">
- <title>Mapping from PQF atomic APT queries to Zebra internal
+ <title>Mapping from PQF atomic APT queries to &zebra; internal
register indexes</title>
<para>
The rules for PQF APT mapping are rather tricky to grasp in the
<section id="querymodel-pqf-apt-mapping-accesspoint">
<title>Mapping of PQF APT access points</title>
<para>
- Zebra understands four fundamental different types of access
+ &zebra; understands four fundamental different types of access
points, of which only the
<emphasis>numeric use attribute</emphasis> type access points
are defined by the <ulink url="&url.z39.50;">Z39.50</ulink>
standard.
- All other access point types are Zebra specific, and non-portable.
+ All other access point types are &zebra; specific, and non-portable.
</para>
<table id="querymodel-zebra-mapping-accesspoint-types" frame="top">
<entry>normalized name is used as internal string index name</entry>
</row>
<row>
- <entry>Zebra internal index name</entry>
+ <entry>&zebra; internal index name</entry>
<entry>zebra</entry>
<entry>_[a-zA-Z](_?[a-zA-Z0-9])*</entry>
<entry>hardwired internal string index name</entry>
<para>
<emphasis>Numeric use attributes</emphasis> are mapped
- to the Zebra internal
+ to the &zebra; internal
string index according to the attribute set definition in use.
The default attribute set is <literal>Bib-1</literal>, and may be
omitted in the PQF query.
</para>
<para>
- Zebra internal indexes can be accessed directly,
+ &zebra; internal indexes can be accessed directly,
according to the same rules as the user defined
string indexes. The only difference is that
- Zebra internal index names are hardwired,
+ &zebra; internal index names are hardwired,
all uppercase and
must start with the character <literal>'_'</literal>.
</para>
available using the <literal>GRS</literal> filter for indexing.
These access point names must start with the character
<literal>'/'</literal>, they are <emphasis>not
- normalized</emphasis>, but passed unaltered to the Zebra internal
+ normalized</emphasis>, but passed unaltered to the &zebra; internal
XPATH engine. See <xref linkend="querymodel-use-xpath"/>.
</para>
<title>Mapping of PQF APT structure and completeness to
register type</title>
<para>
- Internally Zebra has in it's default configuration several
+ Internally &zebra; has in it's default configuration several
different types of registers or indexes, whose tokenization and
character normalization rules differ. This reflects the fact that
searching fundamental different tokens like dates, numbers,
<para>
If the <emphasis>Structure</emphasis> attribute is
<emphasis>Local Number</emphasis> the term is treated as
- native Zebra Record Identifier.
+ native &zebra; Record Identifier.
</para>
<para>
</section>
<section id="querymodel-regular">
- <title>Zebra Regular Expressions in Truncation Attribute (type = 5)</title>
+ <title>&zebra; Regular Expressions in Truncation Attribute (type = 5)</title>
<para>
Each term in a query is interpreted as a regular expression if
is a plus character (<literal>+</literal>) it marks the
beginning of a section with non-standard specifiers.
The next plus character marks the end of the section.
- Currently Zebra only supports one specifier, the error tolerance,
+ Currently &zebra; only supports one specifier, the error tolerance,
which consists one digit.
<!-- TODO Nice thing, but what does
that error tolerance digit *mean*? Maybe an example would be nice? -->
<!--
<para>
The RecordType parameter in the <literal>zebra.cfg</literal> file, or
- the <literal>-t</literal> option to the indexer tells Zebra how to
+ the <literal>-t</literal> option to the indexer tells &zebra; how to
process input records.
Two basic types of processing are available - raw text and structured
data. Raw text is just that, and it is selected by providing the
- argument <literal>text</literal> to Zebra. Structured records are
+ argument <literal>text</literal> to &zebra;. Structured records are
all handled internally using the basic mechanisms described in the
subsequent sections.
- Zebra can read structured records in many different formats.
+ &zebra; can read structured records in many different formats.
</para>
-->
</section>
<chapter id="quick-start">
- <!-- $Id: quickstart.xml,v 1.11 2006-07-03 12:16:31 sondberg Exp $ -->
+ <!-- $Id: quickstart.xml,v 1.12 2007-02-02 09:58:39 marc Exp $ -->
<title>Quick Start </title>
<para>
<!-- ### ulink to GILS profile: what's the URL? -->
In this section, we will test the system by indexing a small set of
- sample GILS records that are included with the Zebra distribution,
- running a Zebra server against the newly created database, and
+ sample GILS records that are included with the &zebra; distribution,
+ running a &zebra; server against the newly created database, and
searching the indexes with a client that connects to that server.
</para>
<para>
</para>
<para>
- The Zebra index that you have just created has a single database
+ The &zebra; index that you have just created has a single database
named <literal>Default</literal>.
The database contains records structured according to
the GILS profile, and the server will
<chapter id="record-model-alvisxslt">
- <!-- $Id: recordmodel-alvisxslt.xml,v 1.13 2007-02-01 21:26:30 marc Exp $ -->
- <title>ALVIS XML Record Model and Filter Module</title>
+ <!-- $Id: recordmodel-alvisxslt.xml,v 1.14 2007-02-02 09:58:39 marc Exp $ -->
+ <title>ALVIS &xml; Record Model and Filter Module</title>
<para>
The record model described in this chapter applies to the fundamental,
- structured XML
+ structured &xml;
record type <literal>alvis</literal>, introduced in
- <xref linkend="componentmodulesalvis"/>. The ALVIS XML record model
+ <xref linkend="componentmodulesalvis"/>. The ALVIS &xml; record model
is experimental, and it's inner workings might change in future
- releases of the Zebra Information Server.
+ releases of the &zebra; Information Server.
</para>
<para> This filter has been developed under the
<section id="record-model-alvisxslt-filter">
<title>ALVIS Record Filter</title>
<para>
- The experimental, loadable Alvis XML/XSLT filter module
+ The experimental, loadable Alvis &xml;/XSLT filter module
<literal>mod-alvis.so</literal> is packaged in the GNU/Debian package
<literal>libidzebra1.4-mod-alvis</literal>.
It is invoked by the <filename>zebra.cfg</filename> configuration statement
path <filename>db/filter_alvis_conf.xml</filename>.
</para>
<para>The Alvis XSLT filter configuration file must be
- valid XML. It might look like this (This example is
+ valid &xml;. It might look like this (This example is
used for indexing and display of OAI harvested records):
<screen>
<?xml version="1.0" encoding="UTF-8"?>
</para>
<para>
The <literal><split level="2"/></literal> decides where the
- XML Reader shall split the
+ &xml; Reader shall split the
collections of records into individual records, which then are
loaded into DOM, and have the indexing XSLT stylesheet applied.
</para>
<section id="record-model-alvisxslt-internal">
<title>ALVIS Internal Record Representation</title>
- <para>When indexing, an XML Reader is invoked to split the input
- files into suitable record XML pieces. Each record piece is then
- transformed to an XML DOM structure, which is essentially the
+ <para>When indexing, an &xml; Reader is invoked to split the input
+ files into suitable record &xml; pieces. Each record piece is then
+ transformed to an &xml; DOM structure, which is essentially the
record model. Only XSLT transformations can be applied during
index, search and retrieval. Consequently, output formats are
- restricted to whatever XSLT can deliver from the record XML
- structure, be it other XML formats, HTML, or plain text. In case
+ restricted to whatever XSLT can deliver from the record &xml;
+ structure, be it other &xml; formats, HTML, or plain text. In case
you have <literal>libxslt1</literal> running with EXSLT support,
you can use this functionality inside the Alvis
filter configuration XSLT stylesheets.
</z:record>
</screen>
</para>
- <para>This means the following: From the original XML file
- <literal>one-record.xml</literal> (or from the XML record DOM of the
+ <para>This means the following: From the original &xml; file
+ <literal>one-record.xml</literal> (or from the &xml; record DOM of the
same form coming from a splitted input file), the indexing
- stylesheet produces an indexing XML record, which is defined by
+ stylesheet produces an indexing &xml; record, which is defined by
the <literal>record</literal> element in the magic namespace
<literal>xmlns:z="http://indexdata.dk/zebra/xslt/1"</literal>.
- Zebra uses the content of
+ &zebra; uses the content of
<literal>z:id="oai:JTRS:CP-3290---Volume-I"</literal> as internal
record ID, and - in case static ranking is set - the content of
<literal>z:rank="47896"</literal> as static rank. Following the
<para>
As mentioned above, there can be only one indexing
stylesheet, and configuration of the indexing process is a synonym
- of writing an XSLT stylesheet which produces XML output containing the
+ of writing an XSLT stylesheet which produces &xml; output containing the
magic elements discussed in
<xref linkend="record-model-alvisxslt-internal"/>.
Obviously, there are million of different ways to accomplish this
<para>
Stylesheets can be written in the <emphasis>pull</emphasis> or
the <emphasis>push</emphasis> style: <emphasis>pull</emphasis>
- means that the output XML structure is taken as starting point of
+ means that the output &xml; structure is taken as starting point of
the internal structure of the XSLT stylesheet, and portions of
- the input XML are <emphasis>pulled</emphasis> out and inserted
- into the right spots of the output XML structure. On the other
+ the input &xml; are <emphasis>pulled</emphasis> out and inserted
+ into the right spots of the output &xml; structure. On the other
side, <emphasis>push</emphasis> XSLT stylesheets are recursavly
calling their template definitions, a process which is commanded
- by the input XML structure, and avake to produce some output XML
+ by the input &xml; structure, and avake to produce some output &xml;
whenever some special conditions in the input styelsheets are
met. The <emphasis>pull</emphasis> type is well-suited for input
- XML with strong and well-defined structure and semantcs, like the
+ &xml; with strong and well-defined structure and semantcs, like the
following OAI indexing example, whereas the
<emphasis>push</emphasis> type might be the only possible way to
- sort out deeply recursive input XML formats.
+ sort out deeply recursive input &xml; formats.
</para>
<para>
A <emphasis>pull</emphasis> stylesheet example used to index
Notice also,
that the names and types of the indexes can be defined in the
indexing XSLT stylesheet <emphasis>dynamically according to
- content in the original XML records</emphasis>, which has
+ content in the original &xml; records</emphasis>, which has
opportunities for great power and wizardery as well as grande
disaster.
</para>
<para>
The following excerpt of a <emphasis>push</emphasis> stylesheet
<emphasis>might</emphasis>
- be a good idea according to your strict control of the XML
+ be a good idea according to your strict control of the &xml;
input format (due to rigerours checking against well-defined and
- tight RelaxNG or XML Schema's, for example):
+ tight RelaxNG or &xml; Schema's, for example):
<screen>
<![CDATA[
<xsl:template name="element-name-indexes">
]]>
</screen>
This template creates indexes which have the name of the working
- node of any input XML file, and assigns a '1' to the index.
+ node of any input &xml; file, and assigns a '1' to the index.
The example query
<literal>find @attr 1=xyz 1</literal>
finds all files which contain at least one
- <literal>xyz</literal> XML element. In case you can not control
+ <literal>xyz</literal> &xml; element. In case you can not control
which element names the input files contain, you might ask for
disaster and bad karma using this technique.
</para>
XSLT transformation, as far as the stylesheet is registered in
the main Alvis XSLT filter configuration file, see
<xref linkend="record-model-alvisxslt-filter"/>.
- In principle anything that can be expressed in XML, HTML, and
+ In principle anything that can be expressed in &xml;, HTML, and
TEXT can be the output of a <literal>schema</literal> or
<literal>element set</literal> directive during search, as long as
the information comes from the
- <emphasis>original input record XML DOM tree</emphasis>
- (and not the transformed and <emphasis>indexed</emphasis> XML!!).
+ <emphasis>original input record &xml; DOM tree</emphasis>
+ (and not the transformed and <emphasis>indexed</emphasis> &xml;!!).
</para>
<para>
- In addition, internal administrative information from the Zebra
+ In addition, internal administrative information from the &zebra;
indexer can be accessed during record retrieval. The following
example is a summary of the possibilities:
<screen>
see: http://www.indexdata.com/yaz/doc/tools.tkl#tools.cql.map
- in db/ an indexing XSLT stylesheet. This is a PULL-type XSLT thing,
- as it constructs the new XML structure by pulling data out of the
+ as it constructs the new &xml; structure by pulling data out of the
respective elements/attributes of the old structure.
Notice the special zebra namespace, and the special elements in this
indicates that a new record with given id and static rank has to be updated.
<z:index name="title" type="w">
- encloses all the text/XML which shall be indexed in the index named
+ encloses all the text/&xml; which shall be indexed in the index named
"title" and of index type "w" (see file default.idx in your zebra
installation)
<chapter id="grs">
- <!-- $Id: recordmodel-grs.xml,v 1.5 2006-10-11 12:37:23 adam Exp $ -->
+ <!-- $Id: recordmodel-grs.xml,v 1.6 2007-02-02 09:58:39 marc Exp $ -->
<title>GRS Record Model and Filter Modules</title>
<para>
<term><literal>grs.marc.</literal><replaceable>type</replaceable></term>
<listitem>
<para>
- This allows Zebra to read
+ This allows &zebra; to read
records in the ISO2709 (MARC) encoding standard.
Last parameter <replaceable>type</replaceable> names the
<literal>.abs</literal> file (see below)
<term><literal>grs.marcxml.</literal><replaceable>type</replaceable></term>
<listitem>
<para>
- This allows Zebra to read ISO2709 encoded records.
+ This allows &zebra; to read ISO2709 encoded records.
Last parameter <replaceable>type</replaceable> names the
<literal>.abs</literal> file (see below)
which describes the specific MARC structure of the input record as
<para>
This filter reads XML records and uses
<ulink url="http://expat.sourceforge.net/">Expat</ulink> to
- parse them and convert them into IDZebra's internal
+ parse them and convert them into ID&zebra;'s internal
<literal>grs</literal> record model.
Only one record per file is supported, due to the fact XML does
not allow two documents to "follow" each other (there is no way
to know when a document is finished).
- This filter is only available if Zebra is compiled with EXPAT support.
+ This filter is only available if &zebra; is compiled with EXPAT support.
</para>
<para>
The loadable <literal>grs.xml</literal> filter module
Although input data can take any form, it is sometimes useful to
describe the record processing capabilities of the system in terms of
a single, canonical input format that gives access to the full
- spectrum of structure and flexibility in the system. In Zebra, this
+ spectrum of structure and flexibility in the system. In &zebra;, this
canonical format is an "SGML-like" syntax.
</para>
<!-- There is no indentation in the example above! -H
-note-
-para-
- The indentation used above is used to illustrate how Zebra
+ The indentation used above is used to illustrate how &zebra;
interprets the mark-up. The indentation, in itself, has no
significance to the parser for the canonical input format, which
discards superfluous whitespace.
The following is a GILS record that
contains only a single element (strictly speaking, that makes it an
illegal GILS record, since the GILS profile includes several mandatory
- elements - Zebra does not validate the contents of a record against
+ elements - &zebra; does not validate the contents of a record against
the Z39.50 profile, however - it merely attempts to match up elements
of a local representation with the given schema):
</para>
<title>Variants</title>
<para>
- Zebra allows you to provide individual data elements in a number of
+ &zebra; allows you to provide individual data elements in a number of
<emphasis>variant forms</emphasis>. Examples of variant forms are
textual data elements which might appear in different languages, and
images which may appear in different formats or layouts.
- The variant system in Zebra is essentially a representation of
+ The variant system in &zebra; is essentially a representation of
the variant mechanism of Z39.50-1995.
</para>
<title>GRS REGX And TCL Input Filters</title>
<para>
- In order to handle general input formats, Zebra allows the
+ In order to handle general input formats, &zebra; allows the
operator to define filters which read individual records in their
native format and produce an internal representation that the system
can work with.
</para>
<para>
- If Zebra is compiled with support for Tcl enabled, the statements
+ If &zebra; is compiled with support for Tcl enabled, the statements
described above are supplemented with a complete
scripting environment, including control structures (conditional
expressions and loop constructs), and powerful string manipulation
<term>sysno</term>
<listitem>
<para>
- Zebra's system number (record ID) for the
+ &zebra;'s system number (record ID) for the
record. By default this is mapped to element
<literal>localControlNumber</literal>.
</para>
</term>
<listitem>
<para>
- Specifies what information, if any, Zebra should
+ Specifies what information, if any, &zebra; should
automatically include in retrieval records for the
``system fields'' that it supports.
<replaceable>systemTag</replaceable> may
the mapping is trivial. Note that XML schemas, preprocessing
instructions and comments are not part of the internal representation
and therefore will never be part of a generated XML record.
- Future versions of the Zebra will support that.
+ Future versions of the &zebra; will support that.
</para>
</listitem>
<para>At the beginning, we have to define the term
<emphasis>index-formula</emphasis> for MARC records. This term helps
- to understand the notation of extended indexing of MARC records by Zebra.
+ to understand the notation of extended indexing of MARC records by &zebra;.
Our definition is based on the document
<ulink url="http://www.rba.ru/rusmarc/soft/Z39-50.htm">"The table
of conformity for Z39.50 use attributes and RUSMARC fields"</ulink>.
</screen>
<para>
- We know that Zebra supports a Bib-1 attribute - right truncation.
+ We know that &zebra; supports a Bib-1 attribute - right truncation.
In this case, the <emphasis>index-formula</emphasis> (1) consists from
forms, defined in the same way as (1)</para>
</section>
<section id="notation">
- <title>Notation of <emphasis>index-formula</emphasis> for Zebra</title>
+ <title>Notation of <emphasis>index-formula</emphasis> for &zebra;</title>
<para>Extended indexing overloads <literal>path</literal> of
- <literal>elm</literal> definition in abstract syntax file of Zebra
+ <literal>elm</literal> definition in abstract syntax file of &zebra;
(<literal>.abs</literal> file). It means that names beginning with
- <literal>"mc-"</literal> are interpreted by Zebra as
+ <literal>"mc-"</literal> are interpreted by &zebra; as
<emphasis>index-formula</emphasis>. The database index is created and
linked with <emphasis>access point</emphasis> (Bib-1 use attribute)
according to this formula.</para>
elm 70._1_$a,_$g_ Author !:w,!:p
</screen>
- <para>When Zebra finds a field according to
+ <para>When &zebra; finds a field according to
<literal>"70."</literal> pattern it checks the indicators. In this
case the value of first indicator doesn't mater, but the value of
second one must be whitespace, in another case a field is not
<!ENTITY test SYSTEM "test.xml">
]>
-<!-- $Id: zebra.xml,v 1.14 2007-02-01 21:04:15 marc Exp $ -->
+<!-- $Id: zebra.xml,v 1.15 2007-02-02 09:58:40 marc Exp $ -->
<book id="zebra">
<bookinfo>
- <title>Zebra - User's Guide and Reference</title>
+ <title>&zebra; - User's Guide and Reference</title>
<authorgroup>
- <author>
- <firstname>Adam</firstname><surname>Dickmeiss</surname>
- </author>
- <author>
- <firstname>Heikki</firstname><surname>Levanto</surname>
- </author>
- <author>
- <firstname>Marc</firstname><surname>Cromme</surname>
- </author>
- <author>
- <firstname>Mike</firstname><surname>Taylor</surname>
- </author>
- <author>
- <firstname>Sebastian</firstname><surname>Hammer</surname>
- </author>
+ <author>&adam;</author>
+ <author>&heikki;</author>
+ <author>&marccromme;</author>
+ <author>&mike;</author>
+ <author>&sebastian;</author>
</authorgroup>
<releaseinfo>&version;</releaseinfo>
<copyright>
</copyright>
<abstract>
<simpara>
- Zebra is a free, fast, friendly information management system. It
- can index records in XML/SGML, MARC, e-mail archives and many
+ &zebra; is a free, fast, friendly information management system. It
+ can index records in &xml;, &sgml;, &marc;, e-mail archives and many
other formats, and quickly find them using a combination of
boolean searching and relevance ranking. Search-and-retrieve
applications can be written using APIs in a wide variety of
- languages, communicating with the Zebra server using
+ languages, communicating with the &zebra; server using
industry-standard information-retrieval protocols or web services.
</simpara>
<simpara>
- This manual explains how to build and install Zebra, configure it
+ This manual explains how to build and install &zebra;, configure it
appropriately for your application, add data and set up a running
- information service. It describes version &version; of Zebra.
+ information service. It describes version &version; of &zebra;.
</simpara>
<simpara>
<inlinemediaobject>
<!ENTITY % common SYSTEM "common/common.ent">
%common;
]>
-<!-- $Id: zebraidx.xml,v 1.10 2007-01-15 14:55:50 adam Exp $ -->
+<!-- $Id: zebraidx.xml,v 1.11 2007-02-02 09:58:40 marc Exp $ -->
<refentry id="zebraidx">
<refentryinfo>
<productname>zebra</productname>
<refnamediv>
<refname>zebraidx</refname>
- <refpurpose>Zebra Administrative Tool</refpurpose>
+ <refpurpose>&zebra; Administrative Tool</refpurpose>
</refnamediv>
<refsynopsisdiv>
<refsect1><title>DESCRIPTION</title>
<para>
<command>zebraidx</command> allows you to insert, delete or updates
- records in Zebra. <command>zebraidx</command> accepts a set options
+ records in &zebra;. <command>zebraidx</command> accepts a set options
(see below) and exactly one command (mandatory).
</para>
</refsect1>
<replaceable>directory</replaceable>.
If no directory is provided, a list of files is read from
<literal>stdin</literal>.
- See <link linkend="administration">Administration</link> in the Zebra
+ See <link linkend="administration">Administration</link> in the &zebra;
Manual.
</para>
</listitem>
commands to the register. This command is only available if the use of
shadow register files is enabled
(see <link linkend="shadow-registers">Shadow Registers</link> in the
- Zebra Manual).
+ &zebra; Manual).
</para>
</listitem>
</varlistentry>
and <literal>grs</literal><replaceable>.subtype</replaceable>.
Generally, it is probably advisable to specify the record types
in the <literal>zebra.cfg</literal> file (see
- <link linkend="record-types">Record Types</link> in the Zebra manual),
+ <link linkend="record-types">Record Types</link> in the &zebra; manual),
to avoid confusion at subsequent updates.
</para>
</listitem>
<para>
Update the files according to the group
settings for <replaceable>group</replaceable>
- (see <link linkend="zebra-cfg">Zebra Configuration File</link> in
- the Zebra manual).
+ (see <link linkend="zebra-cfg">&zebra; Configuration File</link> in
+ the &zebra; manual).
</para>
</listitem>
</varlistentry>
<para>
Disable the use of shadow registers for this operation
(see <link linkend="shadow-registers">Shadow Registers in
- the Zebra manual</link>).
+ the &zebra; manual</link>).
</para>
</listitem>
</varlistentry>
<term>-V</term>
<listitem>
<para>
- Show Zebra version.
+ Show &zebra; version.
</para>
</listitem>
</varlistentry>