<chapter id="administration">
- <!-- $Id: administration.xml,v 1.31 2006-05-01 13:07:40 marc Exp $ -->
+ <!-- $Id: administration.xml,v 1.32 2006-05-02 12:23:02 mike Exp $ -->
<title>Administrating Zebra</title>
<!-- ### It's a bit daft that this chapter (which describes half of
the configuration-file formats) is separated from
<sect1 id="administration-ranking">
<title>Relevance Ranking and Sorting of Result Sets</title>
+ <sect2>
+ <title>Overview</title>
<para>
The default ordering of a result set is left up to the server,
which inside Zebra means sorting in ascending document ID order.
</para>
<para>
- In case a good presentation ordering can be computed at
+ In cases where a good presentation ordering can be computed at
indexing time, we can use a fixed <literal>static ranking</literal>
scheme, which is provided for the <literal>alvis</literal>
indexing filter. This defines a fixed ordering of hit lists,
There are cases, however, where relevance of hit set documents is
highly dependent on the query processed.
Simply put, <literal>dynamic relevance ranking</literal>
- sortes a set of retrieved
+ sorts a set of retrieved
records such
that those most likely to be relevant to your request are
retrieved first.
- Internally, Zebra retrieves all documents ID's that satisfy your
- search query, and re-orders the hit list to arrange them based on
+ Internally, Zebra retrieves all documents that satisfy your
+ query, and re-orders the hit list to arrange them based on
a measurement of similarity between your query and the content of
each record.
</para>
lexicographical ordering of certain sort indexes created at
indexing time.
</para>
-
+ </sect2>
<sect2 id="administration-ranking-static">
are ordered
first by ascending static rank,
then by ascending document <literal>ID</literal>.
- </para>
- <para>
- This implies that the default rank <literal>0</literal>
- is the best rank at the
- beginning of the list, and <literal>max int</literal>
- is the worst static rank.
+ Zero
+ is the ``best'' rank, as it occurs at the
+ beginning of the list; higher numbers represent worse scores.
</para>
<para>
The experimental <literal>alvis</literal> filter provides a
after <emphasis>ascending</emphasis> static
rank, and for those doc's which have the same static rank, ordered
after <emphasis>ascending</emphasis> doc <literal>ID</literal>.
- See <xref linkend="record-model-alvisxslt"/> for the glory details.
+ See <xref linkend="record-model-alvisxslt"/> for the gory details.
</para>
</sect2>
<sect2 id="administration-ranking-dynamic">
<title>Dynamic Ranking</title>
<para>
- If one wants to do a little fiddeling with the static rank order,
- one has to invoke additional re-ranking/re-ordering using dynamic
- reranking or score functions. These functions return positive
- interger scores, where <emphasis>highest</emphasis> score is
- <emphasis>best</emphasis>, which means that the
- hit sets will be sorted according to
+ In order to fiddle with the static rank order, it is necessary to
+ invoke additional re-ranking/re-ordering using dynamic
+ ranking or score functions. These functions return positive
+ integer scores, where <emphasis>highest</emphasis> score is
+ ``best'';
+ hit sets are sorted according to
<emphasis>decending</emphasis>
scores (in contrary
to the index lists which are sorted according to
- <emphasis>ascending</emphasis> rank number and document ID).
+ ascending rank number and document ID).
</para>
<para>
- Those are in the zebra config file enabled by a directive like (use
- only one of these a time!):
+ Dynamic ranking is enabled by a directive like one of the
+ following in the zebra config file (use only one of these a time!):
<screen>
rank: rank-1 # default TDF-IDF like
rank: rank-static # dummy do-nothing
Notice that the <literal>rank-1</literal> and
<literal>zvrank</literal> do not use the static rank
information in the list keys, and will produce the same ordering
- with our without static ranking enabled.
+ with or without static ranking enabled.
</para>
<para>
The dummy <literal>rank-static</literal> reranking/scoring
function returns just
<literal>score = max int - staticrank</literal>
- in order to preserve the ordering of hit sets with and without it's
- call.
- Obviously, to combine static and dynamic ranking usefully, one wants
+ in order to preserve the static ordering of hit sets that would
+ have been produced had it not been invoked.
+ Obviously, to combine static and dynamic ranking usefully,
+ it is necessary
to make a new ranking
- function, which is left
+ function; this is left
as an exercise for the reader.
</para>
<para>
- Invoking dynamic ranking is done in query time (this is why we
- call it 'dynamic ranking' in the first place ..). One has to add
+ Dynamic ranking is done at query time rather than
+ indexing time (this is why we
+ call it ``dynamic ranking'' in the first place ...)
+ It is invoked by adding
the Bib-1 relation attribute with
- value "relevance" to the PQF query (that is, <literal>@attr
- 2=102</literal>, see also
+ value ``relevance'' to the PQF query (that is,
+ <literal>@attr 2=102</literal>, see also
<ulink url="ftp://ftp.loc.gov/pub/z3950/defs/bib1.txt">
The BIB-1 Attribute Set Semantics</ulink>).
- To find all articles with the word 'Eoraptor' in
- the title, and present them relevance ranked, one issues the PQF query:
+ To find all articles with the word <literal>Eoraptor</literal> in
+ the title, and present them relevance ranked, issue the PQF query:
<screen>
- Z> f @attr 2=102 @attr 1=4 Eoraptor
+ @attr 2=102 @attr 1=4 Eoraptor
</screen>
</para>
with <literal>estimated hit sizes</literal>, as all documents in
a hit set must be acessed to compute the correct placing in a
ranking sorted list. Therefore the use attribute setting
- <literal>@attr 2=102</literal> clashes with
- <literal>@attr 9=</literal>.
+ <literal>@attr 2=102</literal> clashes with
+ <literal>@attr 9=integer</literal>.
</para>
</warning>