Updated documentation. This update may be unstable, as I can't presently test on...

author Sebastian Hammer <quinn@indexdata.com>

Fri, 19 Jan 2007 18:28:08 +0000 (18:28 +0000)

committer Sebastian Hammer <quinn@indexdata.com>

Fri, 19 Jan 2007 18:28:08 +0000 (18:28 +0000)
author Sebastian Hammer <quinn@indexdata.com>
Fri, 19 Jan 2007 18:28:08 +0000 (18:28 +0000)
committer Sebastian Hammer <quinn@indexdata.com>
Fri, 19 Jan 2007 18:28:08 +0000 (18:28 +0000)
diff --git a/doc/book.xml b/doc/book.xml

index 7d28253..4ec781e 100644 (file)
--- a/doc/book.xml
+++ b/doc/book.xml
@@ -9,165 +9,369 @@
       <!ENTITY % common SYSTEM "common/common.ent">
       %common;
  ]>
-<!-- $Id: book.xml,v 1.4 2007-01-13 05:48:41 quinn Exp $ -->
+<!-- $Id: book.xml,v 1.5 2007-01-19 18:28:08 quinn Exp $ -->
  <book id="book">
- <bookinfo>
-  <title>Pazpar2 - User's Guide and Reference</title>
-  <author>
-   <firstname>Sebastian</firstname><surname>Hammer</surname>
-  </author>
-  <copyright>
-   <year>&copyright-year;</year>
-   <holder>Index Data</holder>
-  </copyright>
-  <abstract>
-   <simpara>
-    Pazpar2 - High-performance, user-interface independent, metasearching
-         middleware featuring record merging, relevance ranking, and faceted search
-         results.
-   </simpara>
-   <simpara>
-    This document is a guide and reference to Pazpar version &version;.
-   </simpara>
-   <simpara>
-    <inlinemediaobject>
-     <imageobject>
-      <imagedata fileref="common/id.png" format="PNG"/>
-     </imageobject>
-     <imageobject>
-      <imagedata fileref="common/id.eps" format="EPS"/>
-     </imageobject>
-    </inlinemediaobject>
-   </simpara>
-  </abstract>
- </bookinfo>
-
- <chapter id="introduction">
-  <title>Introduction</title>
-  <para>
-    Pazpar2 is a stand-alone package which implements
-    the best we know to do in terms of the core metasearching
-    functionality; that is, searching a number of databases in parallel,
-    merging, and analyzing the results. Additional functionality such as
-    user management, attractive displays are expected to be implemented by
-    applications that use pazpar2. Pazpar2 is user interface independent.
-    Its functionality is exposed through a simple REST-style webservice API,
-    designed to be simple to use from an Ajax-anbled browser, from a
-    higher-level server-side language like PHP or Java, or even from a Flash
-    application.
-  </para>
-  <para>
-    Once you launch a search in pazpar2, the operation continues behind the
-    scenes. Pazpar2 connects to servers, carries out searches, and
-    retrieves, deduplicates, and stores results internally. Your application
-    code may periodically inquire about the status of an ongoing operation,
-    and ask to see records or other result set facets.
-  </para>
-  <para>
-    Pazpar2 is designed to be highly configurable. Incoming records are
-    normalized to XML/UTF-8, and then further normalized using XSLT to a
-    simple internal representation that is suitable for analysis. By
-    providing XSLT stylesheets for different kinds of result records, you
-    can tune pazpar2 to work against different kinds of information
-    retrieval servers. Finally, metadata is extracted, in a configurable
-    way, from this internal record, to support display, merging, ranking,
-    result set facets, and sorting. Pazpar2 is not bound to a specific model
-    of metadata, such as DublinCore or MARC -- by providing the right
-    configuration, it can work with a number of different kinds of data in
-    support of many different applications.
-  </para>
-  <para>
-    Pazpar2 is designed to be efficient and scalable. You can set it up to
-    search several hundred targets in parallel, or you can use it to support
-    hundreds of concurrent users. It is implemented with the same attention
-    to performance and economy that we use in our indexing engines, so that
-    you can focus on building your application. You can devote all of your
-    attention to usability and let pazpar2 do what it does best -- search.
-   </para>
- </chapter>
-
- <chapter id="license">
-  <title>Pazpar2 License</title>
-  <para>To be decided and written.</para>
- </chapter>
- 
- <chapter id="installation">
-  <title>Installation</title>
-  <para>
-   Pazpar2 depends on the following tools/libraries:
-   <variablelist>
-    <varlistentry><term><ulink url="&url.yaz;">YAZ</ulink></term>
-     <listitem>
-      <para>
-       The popular Z39.50 toolkit for the C language. YAZ must be
-       compiled with Libxml2/Libxslt support.
-      </para>
-     </listitem>
-    </varlistentry>
-   </variablelist>
-  </para>
-  <para>
-   In order to compile Pazpar2 an ANSI C compiler is
-   required. The requirements should be the same as for YAZ.
-  </para>
-
-  <section id="installation.unix">
-   <title>Installation on Unix (from Source)</title>
+  <bookinfo>
+   <title>Pazpar2 - User's Guide and Reference</title>
+   <author>
+    <firstname>Sebastian</firstname><surname>Hammer</surname>
+   </author>
+   <copyright>
+    <year>&copyright-year;</year>
+    <holder>Index Data</holder>
+   </copyright>
+   <abstract>
+    <simpara>
+       Pazpar2 is a high-performance, user interface-independent, data
+       model-independent metasearching
+       middleware featuring merging, relevance ranking, record sorting, 
+       and faceted results.
+    </simpara>
+    <simpara>
+     This document is a guide and reference to Pazpar version &version;.
+    </simpara>
+    <simpara>
+     <inlinemediaobject>
+      <imageobject>
+       <imagedata fileref="common/id.png" format="PNG"/>
+      </imageobject>
+      <imageobject>
+       <imagedata fileref="common/id.eps" format="EPS"/>
+      </imageobject>
+     </inlinemediaobject>
+    </simpara>
+   </abstract>
+  </bookinfo>
+
+  <chapter id="introduction">
+   <title>Introduction</title>
     <para>
-    Here is a quick step-by-step guide on how to compile the
-    tools that Pazpar2 uses. Only few systems have none of the required
-    tools binary packages. If, for example, Libxml2/libxslt are already
-    installed as development packages use these.
+     Pazpar2 is a stand-alone metasearch client with a webservice API, designed
+     to be used either from a browser-based client (JavaScript, Flash, Java,
+     etc.), from from server-side code, or any combination of the two.
+     Pazpar2 is a highly optimized client designed to
+     search many resources in parallel. It implements record merging,
+     relevance-ranking and sorting by arbitrary data content, and facet
+     analysis for browsing purposes. It is designed to be data model
+     independent, and is capable of working with MARC, DublinCore, or any
+     other XML-structured response format -- XSLT is used to normalize and extract
+     data from retrieval records for display and analysis. It can be used
+     against any server which supports the Z39.50 protocol. Proprietary
+     backend modules can be used to support a large number of other protocols
+     (please contact Index Data for further information about this).
     </para>
-   
     <para>
-    Ensure that the development libraries + header files are
-    available on your system before compiling Pazpar2. For installation
-    of YAZ, refer to the YAZ installation chapter.
+      Additional functionality such as
+     user management, attractive displays are expected to be implemented by
+     applications that use pazpar2. Pazpar2 is user interface independent.
+     Its functionality is exposed through a simple REST-style webservice API,
+     designed to be simple to use from an Ajax-enbled browser, Flash
+     animation, Java applet, etc., or from a higher-level server-side language
+     like PHP or Java. Because session information can be shared between
+     browser-based logic and your server-side scripting, there is tremendous
+     flexibility in how you implement your business logic on top of pazpar2.
     </para>
-   <screen>
-    gunzip -c pazpar2-version.tar.gz|tar xf -
-    cd pazpar2-version
-    ./configure
-    make
-    su
-    make install
-   </screen>
-  </section>
-
-  <section id="installation.debian">
-   <title>Installation on Debian GNU/Linux</title>
     <para>
-    All dependencies for Pazpar2 are available as 
-    <ulink url="&url.debian;">Debian</ulink>
-    packages for the sarge (stable in 2005) and etch (testing in 2005)
-    distributions.
+     Once you launch a search in pazpar2, the operation continues behind the
+     scenes. Pazpar2 connects to servers, carries out searches, and
+     retrieves, deduplicates, and stores results internally. Your application
+     code may periodically inquire about the status of an ongoing operation,
+     and ask to see records or other result set facets. Result become
+     available immediately, and it is easy to build end-user interfaces which
+     feel extremely responsive, even when searching more than 100 servers
+     concurrently.
     </para>
     <para>
-    The procedures for Debian based systems, such as
-    <ulink url="&url.ubuntu;">Ubuntu</ulink> is probably similar
+     Pazpar2 is designed to be highly configurable. Incoming records are
+     normalized to XML/UTF-8, and then further normalized using XSLT to a
+     simple internal representation that is suitable for analysis. By
+     providing XSLT stylesheets for different kinds of result records, you
+     can tune pazpar2 to work against different kinds of information
+     retrieval servers. Finally, metadata is extracted, in a configurable
+     way, from this internal record, to support display, merging, ranking,
+     result set facets, and sorting. Pazpar2 is not bound to a specific model
+     of metadata, such as DublinCore or MARC -- by providing the right
+     configuration, it can work with a number of different kinds of data in
+     support of many different applications.
     </para>
-   <screen>
-    apt-get install libyaz-dev
-   </screen>
     <para>
-    With these packages installed, the usual configure + make
-    procedure can be used for Pazpar2 as outlined in
-    <xref linkend="installation.unix"/>.
+     Pazpar2 is designed to be efficient and scalable. You can set it up to
+     search several hundred targets in parallel, or you can use it to support
+     hundreds of concurrent users. It is implemented with the same attention
+     to performance and economy that we use in our indexing engines, so that
+     you can focus on building your application, without worrying about the
+     details of metasearch logic. You can devote all of your attention to
+     usability and let pazpar2 do what it does best -- metasearch.
+    </para>
+    <para>
+      If you wish to connect to commercial or other databases which do not
+      support open standards, please contact Index Data. We have a licensing
+      agreement with a third party vendor which will enable pazpar2 to access
+      thousands of online databases, in addition the vast number of catalogs
+      and online services that support the Z39.50 protocol.
+    </para>
+    <para>
+      Pazpar2 is our attempt to re-think the traditional paradigms for
+      implementing and deploying metasearch logic, with an uncompromising
+      approach to performance, and attempting to make maximum use of the
+      capabilities of modern browsers. The demo user interface that
+      accompanies the distribution is but one example. If you think of new
+      ways of using pazpar2, we hope you'll share them with us, and if we
+      can provide assistance with regards to training, design, programming,
+      integration with different backends, hosting, or support, please don't
+      hesitate to contact us. If you'd like to see functionality in pazpar2
+      that is not there today, please don't hesitate to contact us. It may
+      already be in our development pipeline, or there might be a
+      possibility for you to help out by sponsoring development time or
+      code. Either way, get in touch and we will give you straight answers.
+    </para>
+    <para>
+      Enjoy!
+    </para>
+  </chapter>
+
+
+  <chapter id="license">
+   <title>Pazpar2 License</title>
+   <para>To be decided and written.</para>
+  </chapter>
+  
+  <chapter id="installation">
+   <title>Installation</title>
+   <para>
+    Pazpar2 depends on the following tools/libraries:
+    <variablelist>
+     <varlistentry><term><ulink url="&url.yaz;">YAZ</ulink></term>
+      <listitem>
+       <para>
+       The popular Z39.50 toolkit for the C language. YAZ must be
+       compiled with Libxml2/Libxslt support.
+       </para>
+      </listitem>
+     </varlistentry>
+    </variablelist>
     </para>
-  </section>
- </chapter>
- 
- <reference id="reference">
-  <title>Reference</title>
-  <partintro>
     <para>
-    The material in this chapter is drawn directly from the individual
-    manual entries.
+    In order to compile Pazpar2 an ANSI C compiler is
+    required. The requirements should be the same as for YAZ.
     </para>
-  </partintro>
-  &manref;
- </reference>
+
+   <section id="installation.unix">
+    <title>Installation on Unix (from Source)</title>
+    <para>
+     Here is a quick step-by-step guide on how to compile the
+     tools that Pazpar2 uses. Only few systems have none of the required
+     tools binary packages. If, for example, Libxml2/libxslt are already
+     installed as development packages use these.
+    </para>
+    
+    <para>
+     Ensure that the development libraries + header files are
+     available on your system before compiling Pazpar2. For installation
+     of YAZ, refer to the YAZ installation chapter.
+    </para>
+    <screen>
+     gunzip -c pazpar2-version.tar.gz|tar xf -
+     cd pazpar2-version
+     ./configure
+     make
+     su
+     make install
+    </screen>
+   </section>
+
+   <section id="installation.debian">
+    <title>Installation on Debian GNU/Linux</title>
+    <para>
+     All dependencies for Pazpar2 are available as 
+     <ulink url="&url.debian;">Debian</ulink>
+     packages for the sarge (stable in 2005) and etch (testing in 2005)
+     distributions.
+    </para>
+    <para>
+     The procedures for Debian based systems, such as
+     <ulink url="&url.ubuntu;">Ubuntu</ulink> is probably similar
+    </para>
+    <screen>
+     apt-get install libyaz-dev
+    </screen>
+    <para>
+     With these packages installed, the usual configure + make
+     procedure can be used for Pazpar2 as outlined in
+     <xref linkend="installation.unix"/>.
+    </para>
+   </section>
+  </chapter>
+
+  <chapter id="using">
+    <title>Using pazpar2</title>
+    <para>
+      This chapter provides a general introduction to the use and deployment of pazpar2.
+    </para>
+
+    <section id="architecture">
+      <title>Pazpar2 and your systems architecture</title>
+      <para>
+       Pazpar2 is designed to provide asynchronous, behind-the-scenes
+       metasearching functionality to your application, exposing this
+       functionality using a simple webservice API that can be accessed
+       from any number of development environments. In particular, it is
+       possible to combine pazpar2 either with your server-side dynamic
+       website scripting, with scripting or code running in the browser, or
+       with any combination of the two. Pazpar2 is an excellent tool for
+       building advanced, Ajax-based user interfaces for metasearch
+       functionality, but it isn't a requirement -- you can choose to use
+       pazpar2 entirely as a backend to your regular server-side scripting.
+       When you do use pazpar2 in conjunction
+       with browser scripting (JavaScript/Ajax, Flash, applets, etc.), there are
+       special considerations.
+      </para>
+
+      <para>
+        Pazpar2 implements a simple but efficient HTTP server, and it is
+       designed to interact directly with scripting running in the browser
+       for the best possible performance, and to limit overhead when
+       several browser clients generate numerous webservice requests.
+       However, it is still desirable to use a conventional webserver,
+       such as Apache, to serve up graphics, HTML documents, and
+       server-side scripting. Because the security sandbox environment of
+       most browser-side programming environments only allows communication
+       with the server from which the enclosing HTML page or object
+       originated, pazpar2 is designed so that it can act as a transparent
+       proxy in front of an existing webserver (see <xref
+       linkend="pazpar2_conf"/> for details). In this mode, all regular
+       HTTP requests are transparently passed through to your webserver,
+       while pazpar2 only intercepts search-related webservice requests.
+      </para>
+
+      <para>
+        If you want to expose your combined service on port 80, you can
+       either run your regular webserver on a different port, a different
+       server, or a different IP address associated with the same server.
+      </para>
+
+      <para>
+        Sometimes, it may be necessary to implement functionality on your
+       regular webserver that makes use of search results, for example to
+       implement data import functionality, emailing results, history
+       lists, personal citation lists, interlibrary loan functionality
+       ,etc. Fortunately, it is simple to exchange information between
+       pazpar2, your browser scripting, and backend server-side scripting.
+       You can send a session ID and possibly a record ID from your browser
+       code to your server code, and from there use pazpar2s webservice API
+       to access result sets or individual records. You could even 'hide'
+       all of pazpar2s functionality between your own API implemented on
+       the server-side, and access that from the browser or elsewhere. The
+       possibilities are just about endless.
+      </para>
+    </section>
+
+    <section id="data_model">
+      <title>Your data model</title>
+      <para>
+        Pazpar2 does not have a preconceived model of what makes up a data
+       model. There are no assumption that records have specific fields or
+       that they are organized in any particular way. The only assumption
+       is that data comes packaged in a form that the software can work
+       with (presently, that means XML or MARC), and that you can provide
+       the necessary information to massage it into pazpar2's internal
+       record abstraction.
+      </para>
+
+      <para>
+        Handling retrieval records in pazpar2 is a two-step process. First,
+       you decide which data elements of the source record you are
+       interested in, and you specify any desired massaging or combining of
+       elements using an XSLT stylesheet (MARC records are automatically
+       normalized to MARCXML before this step). If desired, you can run
+       multiple XSLT stylesheets in series to accomplish this, but the
+       output of the last one should be a representation of the record in a
+       schema that pazpar2 understands.
+      </para>
+
+      <para>
+        The intermediate, internal representation of the record looks like
+       this:
+       <screen><![CDATA[
+<record   xmlns="http://www.indexdata.com/pazpar2/1.0"
+         mergekey="title The Shining author King, Stephen">
+
+    <metadata type="title">The Shining</metadata>
+
+    <metadata type="author">King, Stephen</metadata>
+
+    <metadata type="kind">ebook</metadata>
+
+    <!-- ... and so on -->
+</record>
+]]></screen>
+
+        As you can see, there isn't much to it. There are really only a few
+       important elements to this file.
+      </para>
+
+      <para>
+        Elements should belong to the namespace
+       http://www.indexdata.com/pazpar2/1.0. If the root node contains the
+       attribute 'mergekey', then every record that generates the same
+       merge key (normalized for case differences, white space, and
+       truncation) will be joined into a cluster. In other words, you
+       decide how records are merged. If you don't include a merge key,
+       records are never merged. The 'metadata' elements provide the meat
+       of the elements -- the content. the 'type' attribute is used to
+       match each element against processing rules that determine what
+       happens to the data element next.
+      </para>
+
+      <para>
+        The next processing step is the extraction of metadata from the
+       intermediate representation of the record. This is governed by the
+       'metadata' elements in the 'service' section of the configuration
+       file. See <xref linkend="config-server"/> for details. The metadata
+       in the retrieval record ultimately drives merging, sorting, ranking,
+       the extraction of browse facets, and display, all configurable.
+      </para>
+    </section>
+
+    <section id="client">
+      <title>Client development</title>
+      <para>
+        You can use pazpar2 from any environment that allows you to use
+       webservices. The initial goal of the software was to support
+       Ajax-based applications, but there literally are no limits to what
+       you can do. You can use pazpar2 from Javascript, Flash, Java, etc.,
+       on the browser side, and from any development environment on the
+       server side, and you can pass session tokens and record IDs freely
+       around between these environments to build sophisticated applications.
+       Use your imagination.
+      </para>
+
+      <para>
+        The webservice API of pazpar2 is described in detail in <xref
+       linkend="pazpar2_protocol"/>.
+      </para>
+
+      <para>
+        In brief, you use the 'init' command to create a session, a
+       temporary workspace which carries information about the current
+       search. You start a new search using the 'search' command. Once the
+       search has been started, you can follow its progress using the
+       'stat', 'bytarget', 'termlist', or 'show' commands. Detailed records
+       can be fetched using the 'record' command.
+      </para>
+    </section>
+  </chapter> <!-- Using pazpar2 -->
+
+  <reference id="reference">
+   <title>Reference</title>
+   <partintro>
+    <para>
+     The material in this chapter is drawn directly from the individual
+     manual entries.
+    </para>
+   </partintro>
+   &manref;
+  </reference>
  </book>
  
   <!-- Keep this comment at the end of the file
diff --git a/doc/pazpar2_conf.xml b/doc/pazpar2_conf.xml

index 6deafa2..b8e86ea 100644 (file)
--- a/doc/pazpar2_conf.xml
+++ b/doc/pazpar2_conf.xml
@@ -8,7 +8,7 @@
       <!ENTITY % common SYSTEM "common/common.ent">
       %common;
  ]>
-<!-- $Id: pazpar2_conf.xml,v 1.2 2007-01-12 15:31:30 adam Exp $ -->
+<!-- $Id: pazpar2_conf.xml,v 1.3 2007-01-19 18:28:08 quinn Exp $ -->
  <refentry id="pazpar2_conf">
   <refentryinfo>
    <productname>Pazpar2</productname>
@@ -31,8 +31,284 @@
   </refsynopsisdiv>
   
   <refsect1><title>DESCRIPTION</title>
-  <para></para>
+   <para>
+     The pazpar2 configuration file, together with any referenced XSLT files,
+     govern pazpar2's behavior as a client, and control the normalization and
+     extraction of data elements from incoming result records, for the
+     purposes of merging, sorting, facet analysis, and display.
+    </para>
+
+    <para>
+      The file is specified using the option -f on the pazpar2 command line.
+      There is not presently a way to reload the configuration file without
+      restarting pazpar2, although this will most likely be added some time
+      in the future.
+    </para>
   </refsect1>
+
+ <refsect1><title>FORMAT</title>
+   <para>
+     The configuration file is XML-structured. It must be valid XML. All
+     elements specific to pazpar2 should belong to the namespace
+     "http://www.indexdata.com/pazpar2/1.0" (this is assumed in the
+     following examples). The root element is named 'pazpar2'. Under the
+     root element are a number of elements which group categories of
+     information. The categories are described below.
+    </para>
+
+    <refsect2 id="config-server"><title>server</title>
+      <para>
+        This section governs overall behavior of the client. The data
+       elements are described below.
+      </para>
+      <variablelist> <!-- level 1 -->
+        <varlistentry>
+         <term>listen</term>
+         <listitem>
+           <para>
+             Configures the webservice -- this controls how you can connect
+             to pazpar2 from your browser or server-side code. The
+             attributes 'host' and 'port' control the binding of the
+             server. The 'host' attribute can be used to bind the server to
+             a secondary IP address of your system, enabling you to run
+             pazpar2 on port 80 alongside a conventional web server. You
+             can override this setting on the command lineusing the option -h.
+           </para>
+         </listitem>
+       </varlistentry>
+
+       <varlistentry>
+         <term>proxy</term>
+         <listitem>
+           <para>
+             If this item is given, pazpar2 will forward all incoming HTTP
+             requests that do not contain the filename 'search.pz2' to the
+             host and port specified using the 'host' and 'port'
+             attributes. This functionality is crucial if you wish to use
+             pazpar2 in conjunction with browser-based code (JS, Flash,
+             applets, etc.) which operates in a security sandbox. Such code
+             can only connect to the same server from which the enclosing
+             HTML page originated. Pazpar2s proxy functionality enables you
+             to host all of the main pages (plus images, CSS, etc) of your
+             application on a conventional webserver, while efficiently
+             processing webservice requests for metasearch status, results,
+             etc.
+           </para>
+         </listitem>
+       </varlistentry>
+
+       <varlistentry>
+         <term>service</term>
+         <listitem>
+           <para>
+             This nested element controls the behavior of pazpar2 with
+             respect to your data model. In pazpar2, incoming records are
+             normalized, using XSLT, into an internal representation (see
+             the <link
+             id="config-retrievalprofile">retrievalprofile</link> secion.
+             The 'service' section controls the further processing and
+             extraction of data from the internal representation, primarily
+             through the 'metdata' sub-element.
+           </para>
+
+           <variablelist> <!-- Level 2 -->
+             <varlistentry><term>metadata</term>
+               <para>
+                 One of these elements is required for every data element in
+                 the internal representation of the record (see
+                 <xref linkend="data_model"/>. It governs
+                 subsequent processing as pertains to sorting, relevance
+                 ranking, merging, and display of data elements. It supports
+                 the following attributes:
+               </para>
+
+               <variablelist> <!-- level 3 -->
+                 <varlistentry><term>name</term>
+                   <listentry>
+                     <para>
+                       This is the name of the data element. It is matched
+                       against the 'type' attribute of the 'metadata' element
+                       in the normalized record. A warning is produced if
+                       metdata elements with an unknown name are found in the
+                       normalized record. This name is also used to represent
+                       data elements in the records returned by the
+                       webservice API, and to name sort lists and browse
+                       facets.
+                     </para>
+                   </listentry>
+                 </varlistentry>
+
+                 <varlistentry><term>type</term>
+                  <listentry>
+                     <para>
+                       The type of data element. This value governs any
+                       normalization or special processing that might take
+                       place on an element. Possible values are 'generic'
+                       (basic string), 'year' (a range is computed if
+                       multiple years are found in the record). Note: This
+                       list is likely to increase in the future.
+                     </para>
+                   </listentry>
+                 </varlistentry>
+
+                 <varlistentry><term>brief</term>
+                   <listentry>
+                     <para>
+                       If this is set to 'yes', then the data element is
+                       includes in brief records in the webservice API. Note
+                       that this only makes sense for metadata elements that
+                       are merged (see below). The default value is 'no'.
+                     </para>
+                   </listentry>
+                 </varlistentry>
+
+                 <varlistentry><term>sortkey</term>
+                   <listentry>
+                     <para>
+                       Specifies that this data element is to be used for
+                       sorting. The possible values are 'numeric' (numeric
+                       value), 'skiparticle' (string; skip common, leading
+                       articles), and 'no' (no sorting). The default value is
+                       'no'.
+                     </para>
+                   </listentry>
+                 </varlistentry>
+
+                 <varlistentry><term>rank</term>
+                   <listentry>
+                     <para>
+                       Specifies that this element is to be used to help rank
+                       records against the user's query (when ranking is
+                       requested). The value is an integer, used as a
+                       multiplier against the basic TF*IDF score. A value of
+                       1 is the base, higher values give additional weight to
+                       elements of this type. The default is '0', which
+                       excludes this element from the rank calculation.
+                     </para>
+                   </listentry>
+                 </varlistentry>
+
+                 <varlistentry><term>termlist</term>
+                   <listentry>
+                     <para>
+                       Specifies that this element is to be used as a
+                       termlist, or browse facet. Values are tabulated from
+                       incoming records, and a highscore of values (with
+                       their associated frequency) is made available to the
+                       client through the webservice API. The possible values
+                       are 'yes' and 'no' (default).
+                     </para>
+                   </listentry>
+                 </varlistentry>
+
+                 <varlistentry><term>merge</term>
+                   <listentry>
+                     <para>
+                       This governs whether, and how elements are extracted
+                       from individual records and merged into cluster
+                       records. The possible values are: 'unique' (include
+                       all unique elements), 'longest' (include only the
+                       longest element (strlen), 'range' (calculate a range
+                       of values across al matching records), 'all' (include
+                       all elements), or 'no' (don't merge; this is the
+                       default);
+                     </para>
+                   </listentry>
+                 </varlistentry>
+               </variablelist> <!-- attributes to metadata -->
+             </varlistentry>
+           </variablelist>     <!-- Data elements in service directive -->
+         </listitem>
+       </varlistentry>
+      </variablelist>           <!-- Data elements in server directive -->
+    </refsect2>
+
+    <refsect2 id="config-queryprofile">
+      <para>
+        At the moment, this directive is ignored; there is one global
+       CCL-mapping file which governs the mapping of queries to Z39.50
+       type-1. This file is located in etc/default.bib. This will change
+       shortly.
+      </para>
+    </refsect2>
+
+    <refsect2 id="config-retrievalprofile">
+      <para>
+       Note: In the present version, there is a single retrieval
+       profile. However, in a future release, it will be possible to
+       associate unique retrieval profiles with different targets, or to
+       generate retrieval profiles using XSLT from the ZeeRex description of
+       a target.
+      </para>
+      
+      <para>
+        The following data elements are recognized for the retrievalprofile
+       directive:
+      </para>
+      
+      <variablelist>
+        <varlistentry><term>requestsyntax</term>
+         <listitem>
+           <para>
+             This element specifies the request syntax to be used in queries. It only
+             makes sense for Z39.50-type targets.
+           </para>
+         </listitem>
+       </varlistentry>
+
+       <varlistentry><term>nativesyntax</term>
+         <listitem>
+           <para>
+             This element specifies the native syntax and encoding of the
+             result records. The default is XML. The following attributes
+             are defined:
+           </para>
+           <variablelist>
+             <varlistentry><term>name</term>
+               <listitem>
+                 <para>
+                   The name of the syntax. Currently recognized values are
+                   'iso2709' (MARC), and 'xml'.
+                 </para>
+               </listitem>
+             </varlistentry>
+
+             <varlistentry><term>format</term>
+               <listitem>
+                 <para>
+                   The format, or schema, to be expected. Default is
+                   'marc21'.
+                 </para>
+               </listitem>
+             </varlistentry>
+
+             <varlistentry><term>encoding</term>
+               <listitem>
+                 <para>
+                   The encoding of the response record. Typical values for
+                   MARC records are 'marc8' (general MARC-8), 'marc8s'
+                   (MARC-8, but maps to precomposed UTF-8 characters, more
+                   suitable for use in web browsers), 'latin1'.
+                 </para>
+               </listitem>
+             </varlistentry>
+
+             <varlistentry><term>mapto</term>
+               <listitem>
+                 <para>
+                   Specifies the flavor of MARCXML to map results to.
+                   Default is 'marcxml'. 'marcxchange' is also possible, and
+                   useful for Danish DANMARC records.
+                 </para>
+               </listitem>
+             </varlistentry>
+           </variablelist> <!-- parameters to nativesyntax directive -->
+         </listitem>
+       </varlistentry>
+      </variablelist> <!-- sub-elements in retrievalprofile -->
+    </refsect2>
+
+  </refsect1>
   
   <refsect1><title>OPTIONS</title>
    <para></para>
diff --git a/doc/pazpar2_protocol.xml b/doc/pazpar2_protocol.xml

index 537d98d..404f6c3 100644 (file)
--- a/doc/pazpar2_protocol.xml
+++ b/doc/pazpar2_protocol.xml
@@ -8,7 +8,7 @@
       <!ENTITY % common SYSTEM "common/common.ent">
       %common;
  ]>
-<!-- $Id: pazpar2_protocol.xml,v 1.2 2007-01-12 15:21:04 adam Exp $ -->
+<!-- $Id: pazpar2_protocol.xml,v 1.3 2007-01-19 18:28:08 quinn Exp $ -->
  <refentry id="pazpar2_protocol">
   <refentryinfo>
    <productname>Pazpar2</productname>
@@ -27,12 +27,13 @@
   <refsect1><title>DESCRIPTION</title>
    <para>
     Webservice requests are any that refer to filename "search.pz2". Arguments
-   are GET-style parameters. Argument 'command' is required and specifies
-   command. Any request not recognized as a webservice request as described,
-   are forwarded to the HTTP server specified in configuration.
-   This way, the webserver can host the user interface (itself dynamic
-   or static HTML), and AJAX-style calls can be used from JS to interact
-   with the search logic. 
+   are GET-style parameters. Argument 'command' is always required and specifies
+   the operation to perform. Any request not recognized as a webservice
+   request is forwarded to the HTTP server specified in the configuration
+   using the proxy setting.
+   This way, a regular webserver can host the user interface (itself dynamic
+   or static HTML), and AJAX-style calls can be used from JS (or any other client-based
+   scripting environment) to interact with the search logic in pazpar2. 
    </para>
    <para>
     Each command is described in sub sections to follow.
@@ -108,7 +109,7 @@
     <para>
      Example:
      <screen><![CDATA[
-search.pz2?session=2044502273&command=search&query=computer
+search.pz2?session=2044502273&command=search&query=computer+science
  ]]>
       </screen>
      Response:
@@ -123,7 +124,7 @@ search.pz2?session=2044502273&command=search&query=computer
    <refsect2 id="command-stat">
     <title>stat</title>
     <para>
-    Provides status of ongoing search. Parameters:
+    Provides status information about an ongoing search. Parameters:
  
      <variablelist>
       <varlistentry>
@@ -147,7 +148,7 @@ search.pz2?session=2044502273&command=stat
  <stat>
    <activeclients>3</activeclients>
    <hits>7</hits>                   -- Total hitcount
-  <records>7</records>             -- Total number of records fetched
+  <records>7</records>             -- Total number of records fetched in last query
    <clients>1</clients>             -- Total number of associated clients
    <unconnected>0</unconnected>     -- Number of disconnected clients
    <connecting>0</connecting>       -- Number of clients in connecting state
@@ -180,7 +181,7 @@ search.pz2?session=2044502273&command=stat
        <term>start</term>
        <listitem>
         <para>First record to show - 0-indexed.</para>
-      </listitem>
+      </listitem
       </varlistentry>
       
       <varlistentry>
@@ -196,33 +197,47 @@ search.pz2?session=2044502273&command=stat
        <term>block</term>
        <listitem>
         <para>
-       If block is set, the command will hang until there are records ready
+       If block is set to 1, the command will hang until there are records ready
         to display. Use this to show first records rapidly without
         requiring rapid polling.
         </para>
        </listitem>
       </varlistentry>
  
+     <varlistentry>
+       <term>sort</term>
+       <listitem>
+         <para>
+          Specifies sort criteria. The argument is a comma-separated list
+          (no whitespace allowed) of sort fields, with the highest-priority
+          field first. A sort field may be followed by a colon followed by
+          the number '0' or '1', indicating whether results should be sorted in
+          increasing or decreasing order according to that field. 0==Decreasing is
+          the default.
+        </para>
+       </listitem>
+      </varlistentry>
+
      </variablelist>
     </para>
     <para>
      Example:
      <screen><![CDATA[
-search.pz2?session=2044502273&command=show&start=0&num=2
+search.pz2?session=2044502273&command=show&start=0&num=2&sort=title:1
  ]]></screen>
      Output:
      <screen><![CDATA[
  <show>
    <status>OK</status>
-  <activeclients>3</activeclients>
-  <merged>6</merged>
-  <total>7</total>
-  <start>0</start>
-  <num>2</num>
+  <activeclients>3</activeclients>     -- How many clients are still working
+  <merged>6</merged>                   -- Number of merged records
+  <total>7</total>                     -- Total of all hitcounts
+  <start>0</start>                     -- The start number you requested
+  <num>2</num>                         -- Number of records retrieved
    <hit>
      <md-title>How to program a computer, by Jack Collins</md-title>
-    <count>2</count> <!-- Number of merged records -->
-    <recid>6</recid>
+    <count>2</count>                   -- Number of merged records 
+    <recid>6</recid>                   -- Record ID for this record
    </hit>
    <hit>
      <md-title>
@@ -243,6 +258,15 @@ search.pz2?session=2044502273&command=show&start=0&num=2
  
      <variablelist>
       <varlistentry>
+      <term>session</term>
+      <listitem>
+       <para>
+       Session ID
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
        <term>id</term>
        <listitem>
         <para>
@@ -326,14 +350,61 @@ Output:
      <screen><![CDATA[
  <term>
    <name>library2.mcmaster.ca</name>
-  <frequency>11734</frequency>
-  <state>Client_Idle</state>
-  <diagnostic>0</diagnostic>
+  <frequency>11734</frequency>         -- Number of hits
+  <state>Client_Idle</state>           -- See the description of 'bytarget' below
+  <diagnostic>0</diagnostic>           -- Z39.50 diagnostic codes
  </term>
  ]]></screen>
      </para>
    </refsect2>
  
+
+  <refsect2 id="command-bytarget">
+   <title>bytarget</title>
+   <para>
+    Returns information about the status of each active client. Parameters:
+
+    <variablelist>
+     <varlistentry>
+      <term>session</term>
+      <listitem>
+       <para>
+          Session Id.
+       </para>
+      </listitem>
+     </varlistentry>
+    </variablelist>
+   </para>
+   <para> 
+    Example:
+    <screen><![CDATA[
+search.pz2?session=605047297&command=record&id=3
+]]></screen>
+
+    Example output:
+    
+    <screen><![CDATA[
+<bytarget>
+  <status>OK</status>
+  <target>
+    <id>z3950.loc.gov/voyager/</id>
+    <hits>10000</hits>
+    <diagnostic>0</diagnostic>
+    <records>65</records>
+    <state>Client_Presenting</state>
+  </target>
+  <!-- ... more target nodes below as necessary -->
+</bytarget>
+    <screen><![CDATA[
+]]></screen>
+
+   The following client states are defined: Client_Connecting,
+   Client_Connected, Client_Idle, Client_Initializing, Client_Searching,
+   Client_Searching, Client_Presenting, Client_Error, Client_Failed,
+   Client_Disconnected, Client_Stopped.
+   </para>
+  </refsect2>
+
   </refsect1>
  </refentry>
author	Sebastian Hammer <quinn@indexdata.com>
	Fri, 19 Jan 2007 18:28:08 +0000 (18:28 +0000)
committer	Sebastian Hammer <quinn@indexdata.com>
	Fri, 19 Jan 2007 18:28:08 +0000 (18:28 +0000)
doc/book.xml		patch \| blob \| history
doc/pazpar2_conf.xml		patch \| blob \| history
doc/pazpar2_protocol.xml		patch \| blob \| history