<!ENTITY % common SYSTEM "common/common.ent">
%common;
]>
-<!-- $Id: book.xml,v 1.9 2007-04-10 08:59:35 adam Exp $ -->
+<!-- $Id: book.xml,v 1.13 2007-05-25 12:30:27 marc Exp $ -->
<book id="book">
<bookinfo>
<title>Pazpar2 - User's Guide and Reference</title>
<author>
<firstname>Sebastian</firstname><surname>Hammer</surname>
</author>
+ <author>
+ <firstname>Adam</firstname><surname>Dickmeiss</surname>
+ </author>
+ <author>
+ <firstname>Marc</firstname><surname>Cromme</surname>
+ </author>
<releaseinfo>&version;</releaseinfo>
<copyright>
<year>©right-year;</year>
</para>
</listitem>
</varlistentry>
+ <varlistentry><term><ulink url="&url.icu;">International
+ Components for Unicode (ICU)</ulink></term>
+ <listitem>
+ <para>
+ ICU provides Unicode support for non-english languages with
+ character sets outside the range of 7bit ASCII, like
+ Greek, Russian, German and Frensh. Pazpar2 uses the ICU
+ unicode character conversions, unicode normalization, case
+ folding and other fundamental operations needed in
+ tokenization, normalization and ranking of records.
+ </para>
+ <para>
+ Compiling, linking, and usage of the ICU libraries is optional,
+ but strongly recommended for usage in an international
+ environment.
+ </para>
+ </listitem>
+ </varlistentry>
</variablelist>
</para>
<para>
</para>
<screen>
apt-get install libyaz-dev
+ apt-get install libicu36-dev
</screen>
<para>
With these packages installed, the usual configure + make
<chapter id="using">
<title>Using pazpar2</title>
<para>
- This chapter provides a general introduction to the use and deployment of pazpar2.
+ This chapter provides a general introduction to the use and
+ deployment of pazpar2.
</para>
<section id="architecture">
functionality, but it isn't a requirement -- you can choose to use
pazpar2 entirely as a backend to your regular server-side scripting.
When you do use pazpar2 in conjunction
- with browser scripting (JavaScript/Ajax, Flash, applets, etc.), there are
- special considerations.
+ with browser scripting (JavaScript/Ajax, Flash, applets,
+ etc.), there are special considerations.
</para>
<para>
</section>
<section id="client">
- <title>Client development</title>
+ <title>Client development overview</title>
<para>
You can use pazpar2 from any environment that allows you to use
webservices. The initial goal of the software was to support
can be fetched using the 'record' command.
</para>
</section>
+
+ <section id="nonstandard">
+ <title>Connecting to non-standard resources</title>
+ <para>
+ Pazpar2 uses Z39.50 as its switchboard language -- i.e. as far as it
+ is concerned, all resources speak Z39.50. It is, however, equipped
+ to handle a broad range of different server behavior, through
+ configurable query mapping and record normalization. If you develop
+ configuration, stylesheets, etc., for a new type of resources, we
+ encourage you to share your work. But you can also use pazpar2 to
+ connect to hundreds of resources that do not support standard
+ protocols.
+ </para>
+
+ <para>
+ For a growing number of resources, Z39.50 is all you need. Over the
+ last few years, a number of commercial, full-text resources have
+ implemented Z39.50. These can be used through pazpar2 with little or
+ no effort. Resources that use non-standard record formats will
+ require a bit of XSLT work, but that's all.
+ </para>
+
+ <para>
+ But what about resources that don't support Z39.50 at all? The NISO
+ SRU (MXG) protocol is slowly gathering steam. Other resources might
+ support OpenSearch, private, XML/HTTP-based protocols, or something
+ else entirely. Some databases exist only as web user interfaces and
+ will require screen-scraping. Still others exist only as static
+ files, or perhaps as databases supporting the OAI-PMH protocol.
+ There is hope! Read on.
+ </para>
+
+ <para>
+ Index Data continues to advocate the support of open standards. We
+ work with database vendors to support standards, so you don't have
+ to worry about programming against non-standard services. We also
+ provide tools (see <ulink
+ url="http://www.indexdata.com/simpleserver">SimpleServer</ulink>)
+ which make it comparatively easy to build gateways against servers
+ with non-standard behavior. Again, we encourage you to share any
+ work you do in this direction.
+ </para>
+
+ <para>
+ But the bottom line is that working with non-standard resources in
+ metasearching is really, really hard. If you want to build a
+ project with pazpar2, and you need access to resources with
+ non-standard interfaces, we can help. We run gateways to more than
+ 2,000 popular, commercial databases and other resources,
+ making it simple
+ to plug them directly into pazpar2. For a small annual fee per
+ database, we can help you establish connections to your licensed
+ resources. Meanwhile, you can help! If you build your own
+ standards-compliant gateways, host them for others, or share the
+ code! And tell your vendors that they can save everybody money and
+ increase the appeal of their resources by supporting standards.
+ </para>
+
+ <para>
+ There are those who will ask us why we are using Z39.50 as our
+ switchboard langyage rather than a different protocol. Basically,
+ we believe that Z39.50 is presently the most widely implemented
+ information retrieval protocol that has the level of functionality
+ required to support a good metasearching experience (structured
+ searching, structured, well-defined results). It is also compact and
+ efficient, and there is a very broad range of tools available to
+ implement it.
+ </para>
+ </section>
+
+ <section id="unicode">
+ <title>Unicode Compliance</title>
+ <para>
+ Pazpar2 is unicode compliant and language and locale aware to
+ the exted the used backend Z39.50 targets are. Just a few bad
+ behaving targets can spoil the search experience considerably
+ if for example Greek, Russian or otherwise non 7-bit ASCII
+ search terms are entered. In these cases some targets return
+ records irrelevant to the query, and the result screens wil be
+ cluttered with noise.
+ </para>
+ <para>
+ While noise from misbehaving targets can not be removed, it can
+ be reduced using truely unicode based ranking. This is an
+ option which is available to the system administrator if ICU
+ support is compiled into Pazpar2, see
+ <xref linkend="installation"/> for details.
+ </para>
+ <para>
+ In addition, the ICU tokenization and normalization rules must
+ be defined in the master configuration file described in
+ <xref linkend="config-server"/>.
+ </para>
+ </section>
+
</chapter> <!-- Using pazpar2 -->
<reference id="reference">
<title>Reference</title>
- <partintro>
+ <partintro id="reference-introduction">
<para>
The material in this chapter is drawn directly from the individual
manual entries.