+ <refsect1><title>FORMAT</title>
+ <para>
+ The configuration file is XML-structured. It must be valid XML. All
+ elements specific to Pazpar2 should belong to the namespace
+ <literal>http://www.indexdata.com/pazpar2/1.0</literal>
+ (this is assumed in the
+ following examples). The root element is named <literal>pazpar2</literal>.
+ Under the root element are a number of elements which group categories of
+ information. The categories are described below.
+ </para>
+
+ <refsect2 id="config-server"><title>server</title>
+ <para>
+ This section governs overall behavior of the client. The data
+ elements are described below.
+ </para>
+ <variablelist> <!-- level 1 -->
+ <varlistentry>
+ <term>listen</term>
+ <listitem>
+ <para>
+ Configures the webservice -- this controls how you can connect
+ to Pazpar2 from your browser or server-side code. The
+ attributes 'host' and 'port' control the binding of the
+ server. The 'host' attribute can be used to bind the server to
+ a secondary IP address of your system, enabling you to run
+ Pazpar2 on port 80 alongside a conventional web server. You
+ can override this setting on the command line using the option -h.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>proxy</term>
+ <listitem>
+ <para>
+ If this item is given, Pazpar2 will forward all incoming HTTP
+ requests that do not contain the filename 'search.pz2' to the
+ host and port specified using the 'host' and 'port'
+ attributes. The 'myurl' attribute is required, and should provide
+ the base URL of the server. Generally, the HTTP URL for the host
+ specified in the 'listen' parameter. This functionality is
+ crucial if you wish to use
+ Pazpar2 in conjunction with browser-based code (JS, Flash,
+ applets, etc.) which operates in a security sandbox. Such code
+ can only connect to the same server from which the enclosing
+ HTML page originated. Pazpar2s proxy functionality enables you
+ to host all of the main pages (plus images, CSS, etc) of your
+ application on a conventional webserver, while efficiently
+ processing webservice requests for metasearch status, results,
+ etc.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>relevance</term>
+ <listitem>
+ <para>
+ Specifies ICU tokenization and transformation rules
+ for tokens that are used in Pazpar2's relevance ranking. The 'id'
+ attribute is currently not used, and the 'locale'
+ attribute must be set to one of the locale strings
+ defined in ICU. The child elements listed below can be
+ in any order, except the 'index' element which logically
+ belongs to the end of the list. The stated tokenization,
+ transformation and charmapping instructions are performed
+ in order from top to bottom.
+ </para>
+ <variablelist> <!-- Level 2 -->
+ <varlistentry><term>casemap</term>
+ <listitem>
+ <para>
+ The attribute 'rule' defines the direction of the
+ per-character casemapping, allowed values are "l"
+ (lower), "u" (upper), "t" (title).
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry><term>transform</term>
+ <listitem>
+ <para>
+ Normalization and transformation of tokens follows
+ the rules defined in the 'rule' attribute. For
+ possible values we refer to the extensive ICU
+ documentation found at the
+ <ulink url="&url.icu.transform;">ICU
+ transformation</ulink> home page. Set filtering
+ principles are explained at the
+ <ulink url="&url.icu.unicode.set;">ICU set and
+ filtering</ulink> page.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry><term>tokenize</term>
+ <listitem>
+ <para>
+ Tokenization is the only rule in the ICU chain
+ which splits one token into multiple tokens. The
+ 'rule' attribute may have the following values:
+ "s" (sentence), "l" (line-break), "w" (word), and
+ "c" (character), the later probably not being
+ very useful in a pruning Pazpar2 installation.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>sort</term>
+ <listitem>
+ <para>
+ Specifies ICU tokenization and transformation rules
+ for tokens that are used in Pazpar2's sorting. The contents
+ is similar to that of <literal>relevance</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>mergekey</term>
+ <listitem>
+ <para>
+ Specifies ICU tokenization and transformation rules
+ for tokens that are used in Pazpar2's mergekey. The contents
+ is similar to that of <literal>relevance</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>service</term>
+ <listitem>
+ <para>
+ This nested element controls the behavior of Pazpar2 with
+ respect to your data model. In Pazpar2, incoming records are
+ normalized, using XSLT, into an internal representation.
+ The 'service' section controls the further processing and
+ extraction of data from the internal representation, primarily
+ through the 'metadata' sub-element.
+ </para>
+
+ <variablelist> <!-- Level 2 -->
+ <varlistentry><term>metadata</term>
+ <listitem>
+ <para>
+ One of these elements is required for every data element in
+ the internal representation of the record (see
+ <xref linkend="data_model"/>. It governs
+ subsequent processing as pertains to sorting, relevance
+ ranking, merging, and display of data elements. It supports
+ the following attributes:
+ </para>
+
+ <variablelist> <!-- level 3 -->
+ <varlistentry><term>name</term>
+ <listitem>
+ <para>
+ This is the name of the data element. It is matched
+ against the 'type' attribute of the
+ 'metadata' element
+ in the normalized record. A warning is produced if
+ metadata elements with an unknown name are
+ found in the
+ normalized record. This name is also used to
+ represent
+ data elements in the records returned by the
+ webservice API, and to name sort lists and browse
+ facets.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry><term>type</term>
+ <listitem>
+ <para>
+ The type of data element. This value governs any
+ normalization or special processing that might take
+ place on an element. Possible values are 'generic'
+ (basic string), 'year' (a range is computed if
+ multiple years are found in the record). Note: This
+ list is likely to increase in the future.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry><term>brief</term>
+ <listitem>
+ <para>
+ If this is set to 'yes', then the data element is
+ includes in brief records in the webservice API. Note
+ that this only makes sense for metadata elements that
+ are merged (see below). The default value is 'no'.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry><term>sortkey</term>
+ <listitem>
+ <para>
+ Specifies that this data element is to be used for
+ sorting. The possible values are 'numeric' (numeric
+ value), 'skiparticle' (string; skip common, leading
+ articles), and 'no' (no sorting). The default value is
+ 'no'.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry><term>rank</term>
+ <listitem>
+ <para>
+ Specifies that this element is to be used to
+ help rank
+ records against the user's query (when ranking is
+ requested). The value is an integer, used as a
+ multiplier against the basic TF*IDF score. A value of
+ 1 is the base, higher values give additional
+ weight to
+ elements of this type. The default is '0', which
+ excludes this element from the rank calculation.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry><term>termlist</term>
+ <listitem>
+ <para>
+ Specifies that this element is to be used as a
+ termlist, or browse facet. Values are tabulated from
+ incoming records, and a highscore of values (with
+ their associated frequency) is made available to the
+ client through the webservice API.
+ The possible values
+ are 'yes' and 'no' (default).
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry><term>merge</term>
+ <listitem>
+ <para>
+ This governs whether, and how elements are extracted
+ from individual records and merged into cluster
+ records. The possible values are: 'unique' (include
+ all unique elements), 'longest' (include only the
+ longest element (strlen), 'range' (calculate a range
+ of values across all matching records), 'all' (include
+ all elements), or 'no' (don't merge; this is the
+ default);
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry><term>mergekey</term>
+ <listitem>
+ <para>
+ If set to <literal>yes</literal>, the value of this
+ metadata element is appended to the resulting mergekey.
+ By default metadata is not part of a mergekey.
+ </para>
+ </listitem>
+ </varlistentry>
+
+
+ <varlistentry><term>setting</term>
+ <listitem>
+ <para>
+ This attribute allows you to make use of static database
+ settings in the processing of records. Three possible values
+ are allowed. 'no' is the default and doesn't do anything.
+ 'postproc' copies the value of a setting with the same name
+ into the output of the normalization stylesheet(s). 'parameter'
+ makes the value of a setting with the same name available
+ as a parameter to the normalization stylesheet, so you
+ can further process the value inside of the stylesheet, or use
+ the value to decide how to deal with other data values.
+ </para>
+ <para>
+ </para>
+ The purpose of using settings in this way can either be to
+ control the behavior of normalization stylesheet in a database-
+ dependent way, or to easily make database-dependent values
+ available to display-logic in your user interface, without having
+ to implement complicated interactions between the user interface
+ and your configuration system.
+ </listitem>
+ </varlistentry>
+ </variablelist> <!-- attributes to metadata -->
+
+ </listitem>
+ </varlistentry>
+ </variablelist> <!-- Data elements in service directive -->
+ </listitem>
+ </varlistentry>
+ </variablelist> <!-- Data elements in server directive -->
+ </refsect2>
+