Remover records/taxa.xml, simplified config

[idzebra-moved-to-github.git] / doc / examples.xml
diff --git a/doc/examples.xml b/doc/examples.xml

index 3a49b6c..153eaed 100644 (file)
--- a/doc/examples.xml
+++ b/doc/examples.xml
@@ -1,5 +1,5 @@
  <chapter id="examples">
- <!-- $Id: examples.xml,v 1.2 2002-08-29 16:30:22 mike Exp $ -->
+ <!-- $Id: examples.xml,v 1.6 2002-09-20 09:58:04 mike Exp $ -->
   <title>Example Configurations</title>
  
   <sect1>
@@ -43,85 +43,55 @@
    </para>
   </sect1>
  
- <sect1>
-  <title>First Example: Minimal Configuration</title>
+ <sect1 id="example1">
+  <title>Example 1: Minimal Configuration</title>
  
    <para>
-   This example shows how Zebra can be used, with absolutely minimal
-   configuration, to index a body of XML documents, and search them
+   This example shows how Zebra can be used with absolutely minimal
+   configuration to index a body of XML documents, and search them
     using XPath expressions to specify access points.
    </para>
    <para>
-   Go to the
-   <literal>zebra/examples/dinosauricon</literal>
-   directory.  There you will find two significant files:
+   Go to the <literal>zebra/examples/dinosauricon</literal> directory.
+   There you will find a <literal>records</literal> subdirectory,
+   which contains some raw XML data to be added to the database: in
+   this case, two files, <literal>genera.xml</literal> and
+   <literal>taxa.xml</literal>, which contain information about all
+   the known dinosaur genera as of August 2002.
+  </para>
+  <para>
+   Now we need to create the Zebra database, which we do with the
+   Zebra indexer, <literal>zebraidx</literal>.  This program's
+   behaviour is driven by a configuration life, generally called
+   <literal>zebra.cfg</literal>, although this can be changed with the
+   <literal>-c</literal> option.  For our purposes, we don't need any
+   special behaviour - we can use the defaults - so an empty
+   configuration will do just fine.  We can either create an empty
+   <literal>zebra.cfg</literal> or specify the name of an existing
+   empty file using, for example, <literal>-c /dev/null</literal>.
+  </para>
+  <para>
+   In this case, we'll use an empty <literal>zebra.cfg</literal> so
+   we can add more configuration to it later.
    </para>
-
-  <itemizedlist>
-   <listitem>
-    <para>
-     The <literal>records</literal> subdirectory, which contains the
-     raw XML data to be added to the database: in this case, just one
-     file, <literal>genera.xml</literal>, which contains information
-     about all the known dinosaur genera as of October 2000.
-     <!-- ### Get more recent data -->
-    </para>
-   </listitem>
-
-   <listitem>
-    <para>
-     The master configuration file, <literal>zebra.cfg</literal>,
-     which is as short and simple as it can be:
-     <!-- ### Keep this up to date -->
-     <screen>
-       # $Header: /home/cvsroot/idis/doc/examples.xml,v 1.2 2002-08-29 16:30:22 mike Exp $
-       # Bare-bones master configuration file for Zebra
-       profilePath: .:../../tab:../../../yaz/tab
-     </screen>
-     Apart from the comments, which are ignored, all this specifies is
-     that the server should recognise the attribute set described in
-     the file called
-     <literal>bib1.att</literal>.
-    </para>
-    <!-- ### What is an attribute set? -->
-   </listitem>
-
-<!--
-   <listitem>
-    <para>
-     The BIB-1 attribute set configuration file,
-     <literal>bib1.att</literal>, which is also as short as possible:
-     <screen>
-       # $Header: /home/cvsroot/idis/doc/examples.xml,v 1.2 2002-08-29 16:30:22 mike Exp $
-       # Bare-bones BIB-1 attribute set file for Zebra
-       reference Bib-1
-     </screen>
-     Apart from the comments, all this specifies is that reference of
-     the attribute set described by this file is
-     <literal>Bib-1</literal>, a name recognised by the system as
-     referring to a well-known opaque identifier that is transmitted
-     by clients as part of their searches.
-     ### Yeuch!  Surely we can say that better!
-    </para>
-    <para>
-     ### Can't we somehow say this trivial thing in the main
-     configuration file?
-    </para>
-   </listitem>
--->
-  </itemizedlist>
-
    <para>
     That's all you need for a minimal Zebra configuration.  Now you can
     roll the XML records into the database and build the indexes:
     <screen>
         zebraidx -t grs.sgml update records
     </screen>
-   <!-- ### What does "grs.sgml" actually mean? -->
-   and start the server which, by default listens on port 9999:
+   (### What does "grs.sgml" actually mean?)
+  </para>
+  <para>
+   Now start the server.  Like the indexer, its behaviour is
+   controlled by a configuration file, generally
+   <literal>zebra.cfg</literal>; and like the indexer, it works just
+   fine with an empty configuration.
     <screen>
         zebrasrv
     </screen>
+   By default, the server listens on IP port number 9999, although
+   this can easily be changed.
    </para>
    <para>
     Now you can use the Z39.50 client program of your choice to execute
@@ -151,10 +121,154 @@
         &lt;idzebra:size&gt;359&lt;/idzebra:size&gt;&lt;idzebra:localnumber&gt;447&lt;/idzebra:localnumber&gt;&lt;idzebra:filename&gt;records/genera.xml&lt;/idzebra:filename&gt;&lt;/GENUS&gt;
     </screen>
    </para>
+  <para>
+   Now wasn't that easy?
+  </para>
   </sect1>
  
+ <sect1 id="example2">
+  <title>Example 2: Adding Some Configuration</title>
+
+  <para>
+   You may have noticed as <literal>zebraidx</literal> was building
+   the database that it issued several warnings, which we ignored at
+   the time:
+   <screen>
+zebraidx -t grs.sgml update records
+02:12:32-30/08: zebraidx(18151) [warn] default.idx [No such file or directory]
+02:12:32-30/08: zebraidx(18151) [warn] Couldn't open explain.abs [No such file or directory]
+02:12:32-30/08: zebraidx(18151) [warn] records/genera.xml:0 Couldn't open GENUS.abs [No such file or directory]
+02:12:32-30/08: zebraidx(18151) [warn] records/genera.xml:0 Unknown register type: 0
+02:12:32-30/08: zebraidx(18151) [warn] records/genera.xml:0 Unknown register type: w
+02:12:35-30/08: zebraidx(18151) [warn] records/taxa.xml:0 Couldn't open TAXON.abs [No such file or directory]
+   </screen>
+   And the server issued several more as the client connected to it,
+   then searched for and retrieved a record:
+   <screen>
+02:17:10-30/08: zebrasrv(18165) [warn] default.idx [No such file or directory]
+02:17:10-30/08: zebrasrv(18165) [warn] Couldn't open explain.abs [No such file or directory]
+02:17:57-30/08: zebrasrv(18165) [warn] Unknown register type: w
+02:18:42-30/08: zebrasrv(18165) [warn] Couldn't open GENUS.abs [No such file or directory]
+   </screen>
+  </para>
+ </sect1>
  </chapter>
  
+<!--
+
+   <listitem>
+    <para>
+     The master configuration file, <literal>zebra.cfg</literal>,
+     which is as short and simple as it can be:
+     <screen>
+       # $Header: /home/cvsroot/idis/doc/examples.xml,v 1.6 2002-09-20 09:58:04 mike Exp $
+       # Bare-bones master configuration file for Zebra
+       profilePath: .:../../tab:../../../yaz/tab
+     </screen>
+     Apart from the comments, which are ignored, all this specifies is
+     that the server should recognise the attribute set described in
+     the file called
+     <literal>bib1.att</literal>.
+     ### What is an attribute set?
+    </para>
+   </listitem>
+
+   <listitem>
+    <para>
+     The BIB-1 attribute set configuration file,
+     <literal>bib1.att</literal>, which is also as short as possible:
+     <screen>
+       # $Header: /home/cvsroot/idis/doc/examples.xml,v 1.6 2002-09-20 09:58:04 mike Exp $
+       # Bare-bones BIB-1 attribute set file for Zebra
+       reference Bib-1
+     </screen>
+     Apart from the comments, all this specifies is that reference of
+     the attribute set described by this file is
+     <literal>Bib-1</literal>, a name recognised by the system as
+     referring to a well-known opaque identifier that is transmitted
+     by clients as part of their searches.
+     ### Yeuch!  Surely we can say that better!
+    </para>
+    <para>
+     ### Can't we somehow say this trivial thing in the main
+     configuration file?
+    </para>
+   </listitem>
+-->
+
+<!--
+       The simplest hello-world example could go like this:
+       
+       Index the document
+       
+       <book>
+          <title>The art of motorcycle maintenance</title>
+          <subject scheme="Dewey">zen</subject>
+       </book>
+       
+       And search it like
+       
+       f @attr 1=/book/title motorcycle
+       
+       f @attr 1=/book/subject[@scheme=Dewey] zen
+       
+       If you suddenly decide you want broader interop, you can add
+       an abs file (more or less like this):
+       
+       attset bib1.att
+       tagset tagsetg.tag
+       
+       elm (2,1)       title   title
+       elm (2,21)      subject  subject
+-->
+
+<!--
+How to include images:
+
+       <mediaobject>
+         <imageobject>
+           <imagedata fileref="system.eps" format="eps">
+         </imageobject>
+         <imageobject>
+           <imagedata fileref="system.gif" format="gif">
+         </imageobject>
+         <textobject>
+           <phrase>The Multi-Lingual Search System Architecture</phrase>
+         </textobject>
+         <caption>
+           <para>
+             <emphasis role="strong">
+               The Multi-Lingual Search System Architecture.
+             </emphasis>
+             <para>
+               Network connections across local area networks are
+               represented by straight lines, and those over the
+               internet by jagged lines.
+         </caption>
+       </mediaobject>
+
+Whene the three <*object> thingies inside the top-level <mediaobject>
+are decreasingly preferred version to include depending on what the
+rendering engine can handle.  I generated the EPS version of the image
+by exporting a line-drawing done in TGIF, then converted that to the
+GIF using a shell-script called "epstogif" which used an appallingly
+baroque sequence of conversions, which I would prefer not to pollute
+the Zebra build environment with:
+
+       #!/bin/sh
+
+       # Yes, what follows is stupidly convoluted, but I can't find a
+       # more straightforward path from the EPS generated by tgif's
+       # "Print" command into a browser-friendly format.
+
+       file=`echo "$1" | sed 's/\.eps//'`
+       ps2pdf "$1" "$file".pdf
+       pdftopbm "$file".pdf "$file"
+       pnmscale 0.50 < "$file"-000001.pbm | pnmcrop | ppmtogif
+       rm -f "$file".pdf "$file"-000001.pbm
+
+-->
+
   <!-- Keep this comment at the end of the file
   Local variables:
   mode: sgml