X-Git-Url: http://jsfdemo.indexdata.com/?a=blobdiff_plain;f=doc%2Fbook.xml;h=4441b2d5b4330557af675626d3abb8137e91cacb;hb=e954f104fba0c6ef9142f09042e0b7e7f73d7388;hp=27256cd8c9011d4de0356a07e831011f868a9c6b;hpb=b4edc17e33ad611f70f231072bedcd2ab6415476;p=pazpar2-moved-to-github.git

diff --git a/doc/book.xml b/doc/book.xml
index 27256cd..4441b2d 100644
--- a/doc/book.xml
+++ b/doc/book.xml
@@ -9,125 +9,369 @@
      <!ENTITY % common SYSTEM "common/common.ent">
      %common;
 ]>
-<!-- $Id: book.xml,v 1.1 2007-01-10 09:44:20 adam Exp $ -->
+<!-- $Id: book.xml,v 1.6 2007-01-19 21:50:02 adam Exp $ -->
 <book id="book">
  <bookinfo>
-  <title>pazpar2 - User's Guide and Reference</title>
+  <title>Pazpar2 - User's Guide and Reference</title>
   <author>
    <firstname>Sebastian</firstname><surname>Hammer</surname>
   </author>
+  <releaseinfo>&version;</releaseinfo>
   <copyright>
    <year>&copyright-year;</year>
    <holder>Index Data</holder>
   </copyright>
   <abstract>
    <simpara>
-    pazpar2 - High-performance, user-interface 
-    user-interface independtent metasearching middleware.
+    Pazpar2 is a high-performance, user interface-independent, data
+    model-independent metasearching
+    middleware featuring merging, relevance ranking, record sorting, 
+    and faceted results.
    </simpara>
    <simpara>
-    This document is a guide and reference to pazpar version &version;.
+    This document is a guide and reference to Pazpar version &version;.
    </simpara>
    <simpara>
     <inlinemediaobject>
      <imageobject>
       <imagedata fileref="common/id.png" format="PNG"/>
-     </imageobject>
-     <imageobject>
-      <imagedata fileref="common/id.eps" format="EPS"/>
-     </imageobject>
-    </inlinemediaobject>
+   </imageobject>
+    <imageobject>
+     <imagedata fileref="common/id.eps" format="EPS"/>
+   </imageobject>
+   </inlinemediaobject>
    </simpara>
   </abstract>
- </bookinfo>
-
- <chapter id="introduction">
-  <title>Introduction</title>
-  
-  <para>
-   <ulink url="&url.pazpar2;">pazpar2</ulink> is.. To be written.
-  </para>
- </chapter>
-
- <chapter id="license">
-  <title>pazpar2 License</title>
-  <para>To be decided and written.</para>
- </chapter>
+  </bookinfo>
  
- <chapter id="installation">
-  <title>Installation</title>
-  <para>
-   pazpar2 depends on the following tools/libraries:
-   <variablelist>
-    <varlistentry><term><ulink url="&url.yaz;">YAZ</ulink></term>
-     <listitem>
-      <para>
-       The popular Z39.50 toolkit for the C language. YAZ must be
-       compiled with Libxml2/Libxslt support.
-      </para>
-     </listitem>
-    </varlistentry>
-   </variablelist>
-  </para>
-  <para>
-   In order to compile pazpar2 an ANSI C compiler is
-   required. The requirements should be the same as for YAZ.
-  </para>
-
-  <section id="installation.unix">
-   <title>Installation on Unix (from Source)</title>
+  <chapter id="introduction">
+   <title>Introduction</title>
    <para>
-    Here is a quick step-by-step guide on how to compile the
-    tools that pazpar2 uses. Only few systems have none of the required
-    tools binary packages. If, for example, Libxml2/libxslt are already
-    installed as development packages use those (and omit compilation).
+     Pazpar2 is a stand-alone metasearch client with a webservice API, designed
+     to be used either from a browser-based client (JavaScript, Flash, Java,
+     etc.), from from server-side code, or any combination of the two.
+     Pazpar2 is a highly optimized client designed to
+     search many resources in parallel. It implements record merging,
+     relevance-ranking and sorting by arbitrary data content, and facet
+     analysis for browsing purposes. It is designed to be data model
+     independent, and is capable of working with MARC, DublinCore, or any
+     other XML-structured response format -- XSLT is used to normalize and extract
+     data from retrieval records for display and analysis. It can be used
+     against any server which supports the Z39.50 protocol. Proprietary
+     backend modules can be used to support a large number of other protocols
+     (please contact Index Data for further information about this).
    </para>
-   
    <para>
-    Ensure that the development libraries + header files are
-    available on your system before compiling pazpar2. For installation
-    of YAZ, refer to the YAZ installation chapter.
+      Additional functionality such as
+     user management, attractive displays are expected to be implemented by
+     applications that use pazpar2. Pazpar2 is user interface independent.
+     Its functionality is exposed through a simple REST-style webservice API,
+     designed to be simple to use from an Ajax-enbled browser, Flash
+     animation, Java applet, etc., or from a higher-level server-side language
+     like PHP or Java. Because session information can be shared between
+     browser-based logic and your server-side scripting, there is tremendous
+     flexibility in how you implement your business logic on top of pazpar2.
    </para>
-   <screen>
-    gunzip -c pazpar2-version.tar.gz|tar xf -
-    cd pazpar2-version
-    ./configure
-    make
-    su
-    make install
-   </screen>
-  </section>
-
-  <section id="installation.debian">
-   <title>Installation on Debian GNU/Linux</title>
    <para>
-    All dependencies for pazpar2 are available as 
-    <ulink url="&url.debian;">Debian</ulink>
-    packages for the sarge (stable in 2005) and etch (testing in 2005)
-    distributions.
+     Once you launch a search in pazpar2, the operation continues behind the
+     scenes. Pazpar2 connects to servers, carries out searches, and
+     retrieves, deduplicates, and stores results internally. Your application
+     code may periodically inquire about the status of an ongoing operation,
+     and ask to see records or other result set facets. Result become
+     available immediately, and it is easy to build end-user interfaces which
+     feel extremely responsive, even when searching more than 100 servers
+     concurrently.
    </para>
    <para>
-    The procedures for Debian based systems, such as
-    <ulink url="&url.ubuntu;">Ubuntu</ulink> is probably similar
+     Pazpar2 is designed to be highly configurable. Incoming records are
+     normalized to XML/UTF-8, and then further normalized using XSLT to a
+     simple internal representation that is suitable for analysis. By
+     providing XSLT stylesheets for different kinds of result records, you
+     can tune pazpar2 to work against different kinds of information
+     retrieval servers. Finally, metadata is extracted, in a configurable
+     way, from this internal record, to support display, merging, ranking,
+     result set facets, and sorting. Pazpar2 is not bound to a specific model
+     of metadata, such as DublinCore or MARC -- by providing the right
+     configuration, it can work with a number of different kinds of data in
+     support of many different applications.
    </para>
-   <screen>
-    apt-get install libyaz-dev
-   </screen>
    <para>
-    With these packages installed, the usual configure + make
-    procedure can be used for pazpar2 as outlined in
-    <xref linkend="installation.unix"/>.
+     Pazpar2 is designed to be efficient and scalable. You can set it up to
+     search several hundred targets in parallel, or you can use it to support
+     hundreds of concurrent users. It is implemented with the same attention
+     to performance and economy that we use in our indexing engines, so that
+     you can focus on building your application, without worrying about the
+     details of metasearch logic. You can devote all of your attention to
+     usability and let pazpar2 do what it does best -- metasearch.
+    </para>
+    <para>
+      If you wish to connect to commercial or other databases which do not
+      support open standards, please contact Index Data. We have a licensing
+      agreement with a third party vendor which will enable pazpar2 to access
+      thousands of online databases, in addition the vast number of catalogs
+      and online services that support the Z39.50 protocol.
+    </para>
+    <para>
+      Pazpar2 is our attempt to re-think the traditional paradigms for
+      implementing and deploying metasearch logic, with an uncompromising
+      approach to performance, and attempting to make maximum use of the
+      capabilities of modern browsers. The demo user interface that
+      accompanies the distribution is but one example. If you think of new
+      ways of using pazpar2, we hope you'll share them with us, and if we
+      can provide assistance with regards to training, design, programming,
+      integration with different backends, hosting, or support, please don't
+      hesitate to contact us. If you'd like to see functionality in pazpar2
+      that is not there today, please don't hesitate to contact us. It may
+      already be in our development pipeline, or there might be a
+      possibility for you to help out by sponsoring development time or
+      code. Either way, get in touch and we will give you straight answers.
+    </para>
+    <para>
+      Enjoy!
+    </para>
+  </chapter>
+
+
+  <chapter id="license">
+   <title>Pazpar2 License</title>
+   <para>To be decided and written.</para>
+  </chapter>
+  
+  <chapter id="installation">
+   <title>Installation</title>
+   <para>
+    Pazpar2 depends on the following tools/libraries:
+    <variablelist>
+     <varlistentry><term><ulink url="&url.yaz;">YAZ</ulink></term>
+      <listitem>
+       <para>
+	The popular Z39.50 toolkit for the C language. YAZ must be
+	compiled with Libxml2/Libxslt support.
+       </para>
+      </listitem>
+     </varlistentry>
+    </variablelist>
    </para>
-  </section>
- </chapter>
- 
- <reference id="refguide">
-  <title>Reference guide</title>
+   <para>
+    In order to compile Pazpar2 an ANSI C compiler is
+    required. The requirements should be the same as for YAZ.
+   </para>
+
+   <section id="installation.unix">
+    <title>Installation on Unix (from Source)</title>
+    <para>
+     Here is a quick step-by-step guide on how to compile the
+     tools that Pazpar2 uses. Only few systems have none of the required
+     tools binary packages. If, for example, Libxml2/libxslt are already
+     installed as development packages use these.
+    </para>
+    
+    <para>
+     Ensure that the development libraries + header files are
+     available on your system before compiling Pazpar2. For installation
+     of YAZ, refer to the YAZ installation chapter.
+    </para>
+    <screen>
+     gunzip -c pazpar2-version.tar.gz|tar xf -
+     cd pazpar2-version
+     ./configure
+     make
+     su
+     make install
+    </screen>
+   </section>
+
+   <section id="installation.debian">
+    <title>Installation on Debian GNU/Linux</title>
+    <para>
+     All dependencies for Pazpar2 are available as 
+     <ulink url="&url.debian;">Debian</ulink>
+     packages for the sarge (stable in 2005) and etch (testing in 2005)
+     distributions.
+    </para>
+    <para>
+     The procedures for Debian based systems, such as
+     <ulink url="&url.ubuntu;">Ubuntu</ulink> is probably similar
+    </para>
+    <screen>
+     apt-get install libyaz-dev
+    </screen>
     <para>
-     The material in this chapter is drawn directly from the individual
-     manual entries.
+     With these packages installed, the usual configure + make
+     procedure can be used for Pazpar2 as outlined in
+     <xref linkend="installation.unix"/>.
     </para>
-    &manref;
+   </section>
+  </chapter>
+
+  <chapter id="using">
+    <title>Using pazpar2</title>
+    <para>
+      This chapter provides a general introduction to the use and deployment of pazpar2.
+    </para>
+
+    <section id="architecture">
+      <title>Pazpar2 and your systems architecture</title>
+      <para>
+	Pazpar2 is designed to provide asynchronous, behind-the-scenes
+	metasearching functionality to your application, exposing this
+	functionality using a simple webservice API that can be accessed
+	from any number of development environments. In particular, it is
+	possible to combine pazpar2 either with your server-side dynamic
+	website scripting, with scripting or code running in the browser, or
+	with any combination of the two. Pazpar2 is an excellent tool for
+	building advanced, Ajax-based user interfaces for metasearch
+	functionality, but it isn't a requirement -- you can choose to use
+	pazpar2 entirely as a backend to your regular server-side scripting.
+	When you do use pazpar2 in conjunction
+	with browser scripting (JavaScript/Ajax, Flash, applets, etc.), there are
+	special considerations.
+      </para>
+
+      <para>
+        Pazpar2 implements a simple but efficient HTTP server, and it is
+	designed to interact directly with scripting running in the browser
+	for the best possible performance, and to limit overhead when
+	several browser clients generate numerous webservice requests.
+	However, it is still desirable to use a conventional webserver,
+	such as Apache, to serve up graphics, HTML documents, and
+	server-side scripting. Because the security sandbox environment of
+	most browser-side programming environments only allows communication
+	with the server from which the enclosing HTML page or object
+	originated, pazpar2 is designed so that it can act as a transparent
+	proxy in front of an existing webserver (see <xref
+	linkend="pazpar2_conf"/> for details). In this mode, all regular
+	HTTP requests are transparently passed through to your webserver,
+	while pazpar2 only intercepts search-related webservice requests.
+      </para>
+
+      <para>
+        If you want to expose your combined service on port 80, you can
+	either run your regular webserver on a different port, a different
+	server, or a different IP address associated with the same server.
+      </para>
+
+      <para>
+        Sometimes, it may be necessary to implement functionality on your
+	regular webserver that makes use of search results, for example to
+	implement data import functionality, emailing results, history
+	lists, personal citation lists, interlibrary loan functionality
+	,etc. Fortunately, it is simple to exchange information between
+	pazpar2, your browser scripting, and backend server-side scripting.
+	You can send a session ID and possibly a record ID from your browser
+	code to your server code, and from there use pazpar2s webservice API
+	to access result sets or individual records. You could even 'hide'
+	all of pazpar2s functionality between your own API implemented on
+	the server-side, and access that from the browser or elsewhere. The
+	possibilities are just about endless.
+      </para>
+    </section>
+
+    <section id="data_model">
+      <title>Your data model</title>
+      <para>
+        Pazpar2 does not have a preconceived model of what makes up a data
+	model. There are no assumption that records have specific fields or
+	that they are organized in any particular way. The only assumption
+	is that data comes packaged in a form that the software can work
+	with (presently, that means XML or MARC), and that you can provide
+	the necessary information to massage it into pazpar2's internal
+	record abstraction.
+      </para>
+
+      <para>
+        Handling retrieval records in pazpar2 is a two-step process. First,
+	you decide which data elements of the source record you are
+	interested in, and you specify any desired massaging or combining of
+	elements using an XSLT stylesheet (MARC records are automatically
+	normalized to MARCXML before this step). If desired, you can run
+	multiple XSLT stylesheets in series to accomplish this, but the
+	output of the last one should be a representation of the record in a
+	schema that pazpar2 understands.
+      </para>
+
+      <para>
+        The intermediate, internal representation of the record looks like
+	this:
+	<screen><![CDATA[
+<record   xmlns="http://www.indexdata.com/pazpar2/1.0"
+	  mergekey="title The Shining author King, Stephen">
+
+    <metadata type="title">The Shining</metadata>
+
+    <metadata type="author">King, Stephen</metadata>
+
+    <metadata type="kind">ebook</metadata>
+
+    <!-- ... and so on -->
+</record>
+]]></screen>
+
+        As you can see, there isn't much to it. There are really only a few
+	important elements to this file.
+      </para>
+
+      <para>
+        Elements should belong to the namespace
+	http://www.indexdata.com/pazpar2/1.0. If the root node contains the
+	attribute 'mergekey', then every record that generates the same
+	merge key (normalized for case differences, white space, and
+	truncation) will be joined into a cluster. In other words, you
+	decide how records are merged. If you don't include a merge key,
+	records are never merged. The 'metadata' elements provide the meat
+	of the elements -- the content. the 'type' attribute is used to
+	match each element against processing rules that determine what
+	happens to the data element next.
+      </para>
+
+      <para>
+        The next processing step is the extraction of metadata from the
+	intermediate representation of the record. This is governed by the
+	'metadata' elements in the 'service' section of the configuration
+	file. See <xref linkend="config-server"/> for details. The metadata
+	in the retrieval record ultimately drives merging, sorting, ranking,
+	the extraction of browse facets, and display, all configurable.
+      </para>
+    </section>
+
+    <section id="client">
+      <title>Client development</title>
+      <para>
+        You can use pazpar2 from any environment that allows you to use
+	webservices. The initial goal of the software was to support
+	Ajax-based applications, but there literally are no limits to what
+	you can do. You can use pazpar2 from Javascript, Flash, Java, etc.,
+	on the browser side, and from any development environment on the
+	server side, and you can pass session tokens and record IDs freely
+	around between these environments to build sophisticated applications.
+	Use your imagination.
+      </para>
+
+      <para>
+        The webservice API of pazpar2 is described in detail in <xref
+	linkend="pazpar2_protocol"/>.
+      </para>
+
+      <para>
+        In brief, you use the 'init' command to create a session, a
+	temporary workspace which carries information about the current
+	search. You start a new search using the 'search' command. Once the
+	search has been started, you can follow its progress using the
+	'stat', 'bytarget', 'termlist', or 'show' commands. Detailed records
+	can be fetched using the 'record' command.
+      </para>
+    </section>
+  </chapter> <!-- Using pazpar2 -->
+
+ <reference id="reference">
+  <title>Reference</title>
+  <partintro>
+   <para>
+    The material in this chapter is drawn directly from the individual
+    manual entries.
+   </para>
+  </partintro>
+  &manref;
  </reference>
 </book>