Rolling mods to Marc's new ranking prose. (Check in early, check in

author Mike Taylor <mike@indexdata.com>

Tue, 2 May 2006 12:23:02 +0000 (12:23 +0000)

committer Mike Taylor <mike@indexdata.com>

Tue, 2 May 2006 12:23:02 +0000 (12:23 +0000)
author Mike Taylor <mike@indexdata.com>
Tue, 2 May 2006 12:23:02 +0000 (12:23 +0000)
committer Mike Taylor <mike@indexdata.com>
Tue, 2 May 2006 12:23:02 +0000 (12:23 +0000)
diff --git a/doc/administration.xml b/doc/administration.xml

index 773edfd..1dd6a22 100644 (file)
--- a/doc/administration.xml
+++ b/doc/administration.xml
@@ -1,5 +1,5 @@
  <chapter id="administration">
- <!-- $Id: administration.xml,v 1.31 2006-05-01 13:07:40 marc Exp $ -->
+ <!-- $Id: administration.xml,v 1.32 2006-05-02 12:23:02 mike Exp $ -->
   <title>Administrating Zebra</title>
   <!-- ### It's a bit daft that this chapter (which describes half of
            the configuration-file formats) is separated from
@@ -925,6 +925,8 @@
   <sect1 id="administration-ranking">
    <title>Relevance Ranking and Sorting of Result Sets</title>
  
+  <sect2>
+   <title>Overview</title>
     <para>
      The default ordering of a result set is left up to the server,
      which inside Zebra means sorting in ascending document ID order. 
@@ -933,7 +935,7 @@
     </para>
  
     <para> 
-    In case a good presentation ordering can be computed at
+    In cases where a good presentation ordering can be computed at
      indexing time, we can use a fixed <literal>static ranking</literal>
      scheme, which is provided for the <literal>alvis</literal>
      indexing filter. This defines a fixed ordering of hit lists,
@@ -944,12 +946,12 @@
      There are cases, however, where relevance of hit set documents is
      highly dependent on the query processed.
      Simply put, <literal>dynamic relevance ranking</literal> 
-    sortes a set of retrieved 
+    sorts a set of retrieved 
      records such
      that those most likely to be relevant to your request are
      retrieved first. 
-    Internally, Zebra  retrieves all documents ID's that satisfy your
-    search query, and re-orders the hit list to arrange them based on
+    Internally, Zebra retrieves all documents that satisfy your
+    query, and re-orders the hit list to arrange them based on
      a measurement of similarity between your query and the content of
      each record. 
     </para>
@@ -960,7 +962,7 @@
      lexicographical ordering of certain sort indexes created at
      indexing time.
     </para>
-
+  </sect2>
  
  
   <sect2 id="administration-ranking-static">
@@ -995,12 +997,9 @@
      are ordered 
      first by ascending static rank,
      then by ascending document <literal>ID</literal>.
-   </para>
-   <para>
-    This implies that the default rank <literal>0</literal> 
-    is the best rank at the
-    beginning of the list, and <literal>max int</literal> 
-    is the worst static rank.
+    Zero
+    is the ``best'' rank, as it occurs at the
+    beginning of the list; higher numbers represent worse scores.
     </para>
     <para>
      The experimental <literal>alvis</literal> filter provides a
@@ -1009,7 +1008,7 @@
      after <emphasis>ascending</emphasis> static
      rank, and for those doc's which have the same static rank, ordered
      after <emphasis>ascending</emphasis> doc <literal>ID</literal>.
-    See <xref linkend="record-model-alvisxslt"/> for the glory details.
+    See <xref linkend="record-model-alvisxslt"/> for the gory details.
     </para>
      </sect2>
  
@@ -1017,20 +1016,20 @@
   <sect2 id="administration-ranking-dynamic">
    <title>Dynamic Ranking</title>
     <para>
-    If one wants to do a little fiddeling with the static rank order,
-    one has to invoke additional re-ranking/re-ordering using dynamic 
-    reranking or score functions. These functions return positive
-    interger scores, where <emphasis>highest</emphasis> score is 
-    <emphasis>best</emphasis>, which means that the
-    hit sets will be sorted according to
+    In order to fiddle with the static rank order, it is necessary to
+    invoke additional re-ranking/re-ordering using dynamic
+    ranking or score functions. These functions return positive
+    integer scores, where <emphasis>highest</emphasis> score is 
+    ``best'';
+    hit sets are sorted according to
      <emphasis>decending</emphasis> 
      scores (in contrary
      to the index lists which are sorted according to
-    <emphasis>ascending</emphasis> rank  number and document ID).
+    ascending rank number and document ID).
     </para>
     <para>
-    Those are in the zebra config file enabled by a directive like (use
-    only one of these a time!):
+    Dynamic ranking is enabled by a directive like one of the
+    following in the zebra config file (use only one of these a time!):
      <screen> 
      rank: rank-1        # default TDF-IDF like
      rank: rank-static   # dummy do-nothing
@@ -1039,33 +1038,36 @@
      Notice that the <literal>rank-1</literal> and
      <literal>zvrank</literal> do not use the static rank 
      information in the list keys, and will produce the same ordering
-    with our without static ranking enabled.
+    with or without static ranking enabled.
     </para>
     <para>
      The dummy <literal>rank-static</literal> reranking/scoring
      function returns just 
      <literal>score = max int - staticrank</literal>
-    in order to preserve the ordering of hit sets with and without it's
-    call.
-     Obviously, to combine static and dynamic ranking usefully, one wants
+    in order to preserve the static ordering of hit sets that would
+    have been produced had it not been invoked.
+    Obviously, to combine static and dynamic ranking usefully,
+    it is necessary
      to make a new ranking 
-    function, which is left
+    function; this is left
      as an exercise for the reader. 
     </para>
  
  
     <para>
-    Invoking dynamic ranking is done in query time (this is why we
-    call it 'dynamic ranking' in the first place ..). One has to add
+    Dynamic ranking is done at query time rather than
+    indexing time (this is why we
+    call it ``dynamic ranking'' in the first place ...)
+    It is invoked by adding
      the Bib-1 relation attribute with
-    value "relevance" to the PQF query (that is, <literal>@attr
-    2=102</literal>, see also  
+    value ``relevance'' to the PQF query (that is,
+    <literal>@attr&nbsp;2=102</literal>, see also  
      <ulink url="ftp://ftp.loc.gov/pub/z3950/defs/bib1.txt">
       The BIB-1 Attribute Set Semantics</ulink>). 
-    To find all articles with the word 'Eoraptor' in
-    the title, and present them relevance ranked, one issues the PQF query:
+    To find all articles with the word <literal>Eoraptor</literal> in
+    the title, and present them relevance ranked, issue the PQF query:
      <screen>
-     Z> f @attr 2=102 @attr 1=4 Eoraptor
+     @attr 2=102 @attr 1=4 Eoraptor
      </screen>
     </para>
   
@@ -1080,8 +1082,8 @@
        with <literal>estimated hit sizes</literal>, as all documents in
        a hit set must be acessed to compute the correct placing in a
        ranking sorted list. Therefore the use attribute setting
-      <literal>@attr 2=102</literal> clashes with 
-      <literal>@attr 9=</literal>. 
+      <literal>@attr&nbsp;2=102</literal> clashes with 
+      <literal>@attr&nbsp;9=integer</literal>. 
       </para>
     </warning>
author	Mike Taylor <mike@indexdata.com>
	Tue, 2 May 2006 12:23:02 +0000 (12:23 +0000)
committer	Mike Taylor <mike@indexdata.com>
	Tue, 2 May 2006 12:23:02 +0000 (12:23 +0000)