1 <!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook V4.1//EN"
2 "http://www.oasis-open.org/docbook/xml/4.1/docbookx.dtd"
4 <!ENTITY % local SYSTEM "local.ent">
6 <!ENTITY % entities SYSTEM "entities.ent">
8 <!ENTITY % common SYSTEM "common/common.ent">
11 <!-- $Id: pazpar2_conf.xml,v 1.7 2007-01-26 18:53:55 quinn Exp $ -->
12 <refentry id="pazpar2_conf">
14 <productname>Pazpar2</productname>
15 <productnumber>&version;</productnumber>
18 <refentrytitle>Pazpar2 conf</refentrytitle>
19 <manvolnum>5</manvolnum>
23 <refname>pazpar2_conf</refname>
24 <refpurpose>Pazpar2 Configuration</refpurpose>
29 <command>pazpar2.conf</command>
33 <refsect1><title>DESCRIPTION</title>
35 The pazpar2 configuration file, together with any referenced XSLT files,
36 govern pazpar2's behavior as a client, and control the normalization and
37 extraction of data elements from incoming result records, for the
38 purposes of merging, sorting, facet analysis, and display.
42 The file is specified using the option -f on the pazpar2 command line.
43 There is not presently a way to reload the configuration file without
44 restarting pazpar2, although this will most likely be added some time
49 <refsect1><title>FORMAT</title>
51 The configuration file is XML-structured. It must be valid XML. All
52 elements specific to pazpar2 should belong to the namespace
53 "http://www.indexdata.com/pazpar2/1.0" (this is assumed in the
54 following examples). The root element is named 'pazpar2'. Under the
55 root element are a number of elements which group categories of
56 information. The categories are described below.
59 <refsect2 id="config-server"><title>server</title>
61 This section governs overall behavior of the client. The data
62 elements are described below.
64 <variablelist> <!-- level 1 -->
69 Configures the webservice -- this controls how you can connect
70 to pazpar2 from your browser or server-side code. The
71 attributes 'host' and 'port' control the binding of the
72 server. The 'host' attribute can be used to bind the server to
73 a secondary IP address of your system, enabling you to run
74 pazpar2 on port 80 alongside a conventional web server. You
75 can override this setting on the command lineusing the option -h.
84 If this item is given, pazpar2 will forward all incoming HTTP
85 requests that do not contain the filename 'search.pz2' to the
86 host and port specified using the 'host' and 'port'
87 attributes. This functionality is crucial if you wish to use
88 pazpar2 in conjunction with browser-based code (JS, Flash,
89 applets, etc.) which operates in a security sandbox. Such code
90 can only connect to the same server from which the enclosing
91 HTML page originated. Pazpar2s proxy functionality enables you
92 to host all of the main pages (plus images, CSS, etc) of your
93 application on a conventional webserver, while efficiently
94 processing webservice requests for metasearch status, results,
104 This nested element controls the behavior of pazpar2 with
105 respect to your data model. In pazpar2, incoming records are
106 normalized, using XSLT, into an internal representation (see
108 linkend="config-retrievalprofile">retrievalprofile</link> secion.
109 The 'service' section controls the further processing and
110 extraction of data from the internal representation, primarily
111 through the 'metdata' sub-element.
114 <variablelist> <!-- Level 2 -->
115 <varlistentry><term>metadata</term>
118 One of these elements is required for every data element in
119 the internal representation of the record (see
120 <xref linkend="data_model"/>. It governs
121 subsequent processing as pertains to sorting, relevance
122 ranking, merging, and display of data elements. It supports
123 the following attributes:
126 <variablelist> <!-- level 3 -->
127 <varlistentry><term>name</term>
130 This is the name of the data element. It is matched
131 against the 'type' attribute of the 'metadata' element
132 in the normalized record. A warning is produced if
133 metdata elements with an unknown name are found in the
134 normalized record. This name is also used to represent
135 data elements in the records returned by the
136 webservice API, and to name sort lists and browse
142 <varlistentry><term>type</term>
145 The type of data element. This value governs any
146 normalization or special processing that might take
147 place on an element. Possible values are 'generic'
148 (basic string), 'year' (a range is computed if
149 multiple years are found in the record). Note: This
150 list is likely to increase in the future.
155 <varlistentry><term>brief</term>
158 If this is set to 'yes', then the data element is
159 includes in brief records in the webservice API. Note
160 that this only makes sense for metadata elements that
161 are merged (see below). The default value is 'no'.
166 <varlistentry><term>sortkey</term>
169 Specifies that this data element is to be used for
170 sorting. The possible values are 'numeric' (numeric
171 value), 'skiparticle' (string; skip common, leading
172 articles), and 'no' (no sorting). The default value is
178 <varlistentry><term>rank</term>
181 Specifies that this element is to be used to help rank
182 records against the user's query (when ranking is
183 requested). The value is an integer, used as a
184 multiplier against the basic TF*IDF score. A value of
185 1 is the base, higher values give additional weight to
186 elements of this type. The default is '0', which
187 excludes this element from the rank calculation.
192 <varlistentry><term>termlist</term>
195 Specifies that this element is to be used as a
196 termlist, or browse facet. Values are tabulated from
197 incoming records, and a highscore of values (with
198 their associated frequency) is made available to the
199 client through the webservice API. The possible values
200 are 'yes' and 'no' (default).
205 <varlistentry><term>merge</term>
208 This governs whether, and how elements are extracted
209 from individual records and merged into cluster
210 records. The possible values are: 'unique' (include
211 all unique elements), 'longest' (include only the
212 longest element (strlen), 'range' (calculate a range
213 of values across al matching records), 'all' (include
214 all elements), or 'no' (don't merge; this is the
219 </variablelist> <!-- attributes to metadata -->
223 </variablelist> <!-- Data elements in service directive -->
226 </variablelist> <!-- Data elements in server directive -->
229 <refsect2 id="config-queryprofile"><title>queryprofile</title>
231 At the moment, this directive is ignored; there is one global
232 CCL-mapping file which governs the mapping of queries to Z39.50
233 type-1. This file is located in etc/default.bib. This will change
238 <refsect2 id="config_retrievalprofile"><title>retrievalprofile</title>
240 Note: In the present version, there is a single retrieval
241 profile. However, in a future release, it will be possible to
242 associate unique retrieval profiles with different targets, or to
243 generate retrieval profiles using XSLT from the ZeeRex description of
248 The following data elements are recognized for the retrievalprofile
253 <varlistentry><term>requestsyntax</term>
256 This element specifies the request syntax to be used in queries. It only
257 makes sense for Z39.50-type targets.
262 <varlistentry><term>nativesyntax</term>
265 This element specifies the native syntax and encoding of the
266 result records. The default is XML. The following attributes
270 <varlistentry><term>name</term>
273 The name of the syntax. Currently recognized values are
274 'iso2709' (MARC), and 'xml'.
279 <varlistentry><term>format</term>
282 The format, or schema, to be expected. Default is
288 <varlistentry><term>encoding</term>
291 The encoding of the response record. Typical values for
292 MARC records are 'marc8' (general MARC-8), 'marc8s'
293 (MARC-8, but maps to precomposed UTF-8 characters, more
294 suitable for use in web browsers), 'latin1'.
299 <varlistentry><term>mapto</term>
302 Specifies the flavor of MARCXML to map results to.
303 Default is 'marcxml'. 'marcxchange' is also possible, and
304 useful for Danish DANMARC records.
308 </variablelist> <!-- parameters to nativesyntax directive -->
311 </variablelist> <!-- sub-elements in retrievalprofile -->
316 <refsect1><title>EXAMPLE</title>
317 <para>Below is a working example configuration:
319 <?xml version="1.0" encoding="UTF-8"?>
320 <pazpar2 xmlns="http://www.indexdata.com/pazpar2/1.0">
323 <listen port="9004"/>
324 <proxy host="us1.indexdata.com"/>
327 <metadata name="title" brief="yes" sortkey="skiparticle" merge="longest" rank="6"/>
328 <metadata name="isbn" merge="unique"/>
329 <metadata name="date" brief="yes" sortkey="numeric" type="year" merge="range"
331 <metadata name="author" brief="yes" termlist="yes" merge="longest" rank="2"/>
332 <metadata name="subject" merge="unique" termlist="yes" rank="3"/>
333 <metadata name="url" merge="unique"/>
337 <queryprofile/> <!-- Like a CCL profile++ . Can optionally refer to XSLT to
338 convert ZeeRex into queryprofile. Multiple profiles can exist. -->
341 <requestsyntax>marc21</requestsyntax>
342 <nativesyntax name="iso2709" format="marc21" encoding="marc8s" mapto="marcxml"/>
343 <map type="xslt" stylesheet="marc21.xsl"/>
351 <!-- Keep this comment at the end of the file
356 sgml-minimize-attributes:nil
357 sgml-always-quote-attributes:t
360 sgml-parent-document:nil
361 sgml-local-catalogs: nil
362 sgml-namecase-general:t