X-Git-Url: http://jsfdemo.indexdata.com/?a=blobdiff_plain;f=doc%2Fpazpar2_conf.xml;h=d7de16fd944949dcc80fd39d1e0e5ae46e5070c1;hb=bb879dfc882455b329168fbb5684f0b277c26b82;hp=c8b85c7d33c6fc23ba76640dd6f87a1c6b84bca7;hpb=61b955f3f2bd41dc16b658a28c17f031801b2331;p=pazpar2-moved-to-github.git
diff --git a/doc/pazpar2_conf.xml b/doc/pazpar2_conf.xml
index c8b85c7..d7de16f 100644
--- a/doc/pazpar2_conf.xml
+++ b/doc/pazpar2_conf.xml
@@ -13,10 +13,13 @@
Pazpar2
&version;
+ Index Data
+
Pazpar2 conf
5
+ File formats and conventions
@@ -30,7 +33,8 @@
- DESCRIPTION
+
+ DESCRIPTION
The Pazpar2 configuration file, together with any referenced XSLT files,
govern Pazpar2's behavior as a client, and control the normalization and
@@ -46,7 +50,8 @@
- FORMAT
+
+ FORMAT
The configuration file is XML-structured. It must be well-formed XML. All
elements specific to Pazpar2 should belong to the namespace
@@ -57,24 +62,27 @@
information. The categories are described below.
- threads
-
- This section is optional and is supported for Pazpar2 version 1.3.1 and
- later . It is identified by element "threads" which
- may include one attribute "number" which specifies
- the number of worker-threads that the Pazpar2 instance is to use.
- A value of 0 (zero) disables worker-threads (all work is carried out
- in main thread).
-
+
+ threads
+
+ This section is optional and is supported for Pazpar2 version 1.3.1 and
+ later . It is identified by element "threads" which
+ may include one attribute "number" which specifies
+ the number of worker-threads that the Pazpar2 instance is to use.
+ A value of 0 (zero) disables worker-threads (all work is carried out
+ in main thread).
+
- server
+
+ server
This section governs overall behavior of a server endpoint. It is identified
by the element "server" which takes an optional attribute, "id", which
identifies this particular Pazpar2 server. Any string value for "id"
may be given.
- The data
+
+ The data
elements are described below. From Pazpar2 version 1.2 this is
a repeatable element.
@@ -118,13 +126,23 @@
- relevance / sort / mergekey
+ icu_chain
- Specifies character set normalization for relevancy / sorting
- and the mergekey - for the server. These definitions serves as
+ Specifies character set normalization for relevancy / sorting /
+ mergekey and facets - for the server. These definitions serves as
default for services that don't have these given. For the meaning
- of these settings refer to the "relevance" element inside service.
+ of these settings refer to the
+ element inside service.
+
+
+
+
+
+ relevance / sort / mergekey / facet
+
+
+ Obsolete. Use element icu_chain instead.
@@ -166,19 +184,21 @@
- metadata
+
+ metadata
One of these elements is required for every data element in
the internal representation of the record (see
. It governs
- subsequent processing as pertains to sorting, relevance
- ranking, merging, and display of data elements. It supports
- the following attributes:
+ subsequent processing as pertains to sorting, relevance
+ ranking, merging, and display of data elements. It supports
+ the following attributes:
- name
+
+ name
This is the name of the data element. It is matched
@@ -196,7 +216,8 @@
- type
+
+ type
The type of data element. This value governs any
@@ -209,7 +230,8 @@
- brief
+
+ brief
If this is set to 'yes', then the data element is
@@ -220,7 +242,8 @@
- sortkey
+
+ sortkey
Specifies that this data element is to be used for
@@ -232,23 +255,45 @@
- rank
+
+ rank
Specifies that this element is to be used to
help rank
records against the user's query (when ranking is
- requested). The value is an integer, used as a
+ requested).
+ The valus is of the form
+
+ M [F N]
+
+ where M is an integer, used as a
multiplier against the basic TF*IDF score. A value of
- 1 is the base, higher values give additional
- weight to
+ 1 is the base, higher values give additional weight to
elements of this type. The default is '0', which
excludes this element from the rank calculation.
+
+ F is a CCL field and N is the multipler for terms
+ that matches those part of the CCL field in search.
+ The F+N combo allows the system to use a different
+ multipler for a certain field. For example, a rank value of
+ "1 au 3" gives a multipler of 3 for
+ all terms part of the au(thor) terms and 1 for everything else.
+
+
+ For Pazpar2 1.6.13 and later, the rank may also defined
+ "per-document", by the normalization stylesheet.
+
+
+ The per field rank was introduced in Pazpar2 1.6.15. Earlier
+ releases only allowed a rank value M (simple integer).
+
- termlist
+
+ termlist
Specifies that this element is to be used as a
@@ -262,7 +307,8 @@
- merge
+
+ merge
This governs whether, and how elements are extracted
@@ -276,8 +322,9 @@
-
- mergekey
+
+
+ mergekey
If set to 'required', the value of this
@@ -300,7 +347,41 @@
- setting
+
+ facetrule
+
+
+ Specifies the ICU rule set to be used for normalizing
+ facets. If facetrule is omitted from metadata, the
+ rule set 'facet' is used.
+
+
+
+
+
+ limitmap
+
+
+ Specifies a default limitmap for this field. This is to avoid mass
+ configuring of targets. However it is important to review/do this on a per
+ target since it is usually target-specific. See limitmap for format.
+
+
+
+
+
+ facetmap
+
+
+ Specifies a default facetmap for this field. This is to avoid mass
+ configuring of targets. However it is important to review/do this on a per
+ target since it is usually target-specific. See facetmap for format.
+
+
+
+
+
+ setting
This attribute allows you to make use of static database
@@ -328,15 +409,40 @@
-
+
- relevance
+ xslt
- Specifies ICU tokenization and transformation rules
- for tokens that are used in Pazpar2's relevance ranking.
- The 'id' attribute is currently not used, and the 'locale'
- attribute must be set to one of the locale strings
+ Defines a XSLT stylesheet. The xslt
+ element takes exactly one attribute id
+ which names the stylesheet. This can be referred to in target
+ settings .
+
+
+ The content of the xslt element is the embedded stylesheet XML
+
+
+
+
+ icu_chain
+
+
+ Specifies a named ICU rule set. The icu_chain element must include
+ attribute 'id' which specifies the identifier (name) for the ICU
+ rule set.
+ Pazpar2 uses the particular rule sets for particular purposes.
+ Rule set 'relevance' is used to normalize
+ terms for relevance ranking. Rule set 'sort' is used to
+ normalize terms for sorting. Rule set 'mergekey' is used to
+ normalize terms for making a mergekey and, finally. Rule set 'facet'
+ is normally used to normalize facet terms, unless
+ facetrule is given for a
+ metadata field.
+
+
+ The icu_chain element must also include a 'locale'
+ attribute which must be set to one of the locale strings
defined in ICU. The child elements listed below can be
in any order, except the 'index' element which logically
belongs to the end of the list. The stated tokenization,
@@ -344,7 +450,8 @@
in order from top to bottom.
- casemap
+
+ casemap
The attribute 'rule' defines the direction of the
@@ -353,7 +460,8 @@
- transform
+
+ transform
Normalization and transformation of tokens follows
@@ -361,14 +469,15 @@
possible values we refer to the extensive ICU
documentation found at the
ICU
- transformation home page. Set filtering
+ transformation home page. Set filtering
principles are explained at the
ICU set and
- filtering page.
+ filtering page.
- tokenize
+
+ tokenize
Tokenization is the only rule in the ICU chain
@@ -390,12 +499,33 @@
+ relevance
+
+
+ Specifies the ICU rule set used for relevance ranking.
+ The child element of 'relevance' must be 'icu_chain' and the
+ 'id' attribute of the icu_chain is ignored. This
+ definition is obsolete and should be replaced by the equivalent
+ construct:
+
+ <icu_chain id="relevance" locale="en">..<icu_chain>
+
+
+
+
+
+
sort
- Specifies ICU tokenization and transformation rules
- for tokens that are used in Pazpar2's sorting. The contents
- is similar to that of relevance.
+ Specifies the ICU rule set used for sorting.
+ The child element of 'sort' must be 'icu_chain' and the
+ 'id' attribute of the icu_chain is ignored. This
+ definition is obsolete and should be replaced by the equivalent
+ construct:
+
+ <icu_chain id="sort" locale="en">..<icu_chain>
+
@@ -405,12 +535,99 @@
Specifies ICU tokenization and transformation rules
- for tokens that are used in Pazpar2's mergekey. The contents
- is similar to that of relevance.
+ for tokens that are used in Pazpar2's mergekey.
+ The child element of 'mergekey' must be 'icu_chain' and the
+ 'id' attribute of the icu_chain is ignored. This
+ definition is obsolete and should be replaced by the equivalent
+ construct:
+
+ <icu_chain id="mergekey" locale="en">..<icu_chain>
+
+
+
+
+
+
+ facet
+
+
+ Specifies ICU tokenization and transformation rules
+ for tokens that are used in Pazpar2's facets.
+ The child element of 'facet' must be 'icu_chain' and the
+ 'id' attribute of the icu_chain is ignored. This
+ definition is obsolete and should be replaced by the equivalent
+ construct:
+
+ <icu_chain id="facet" locale="en">..<icu_chain>
+
+
+
+
+
+
+ ccldirective
+
+
+ Customizes the CCL parsing (interpretation of query parameter
+ in search).
+ The name and value of the CCL directive is gigen by attributes
+ 'name' and 'value' respectively. Refer to possible list of names
+ in the
+
+ YAZ manual
+ .
+
+
+
+
+
+ rank
+
+
+ Customizes the ranking (relevance) algorithm.
+ Attribute 'cluster' is a boolean
+ that controls whether Pazpar2 should boost ranking for merged
+ records. Is 'yes' by default. A value of 'no' will make
+ Pazpar2 average ranking of each record in a cluster.
+
+
+ This configuration was added in pazpar2 1.6.18.
+
+
+
+
+
+ sort-default
+
+
+ Specifies the default sort criteria (default 'relevance'),
+ which previous was hard-coded as default criteria in search.
+ This is a fix/work-around to avoid re-searching when using
+ target-based sorting. In order for this to work efficient,
+ the search must also have the sort critera parameter; otherwise
+ pazpar2 will do re-searching on search criteria changes, if
+ changed between search and show command.
+
+
+ This configuration was added in pazpar2 1.6.20.
+
settings
@@ -444,68 +661,69 @@
-
-
-
-
- EXAMPLE
- Below is a working example configuration:
-
-
-
-
-
-
-
-
+ EXAMPLE
+
+ Below is a working example configuration:
+
+
+
+
+
+
+
+
+
+
-
-
+
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- ]]>
-
+
+
+
+
+
+
+
+
+
+
+
+
+
+ ]]>
+
- INCLUDE FACILITY
+
+ INCLUDE FACILITY
The XML configuration may be partitioned into multiple files by using
the include element which takes a single attribute,
src. The of the src attribute is
regular Shell like glob-pattern. For example,
- ]]>
+
+ ]]>
The include facility requires Pazpar2 version 1.2.
- TARGET SETTINGS
+
+ TARGET SETTINGS
Pazpar2 features a cunning scheme by which you can associate various
kinds of attributes, or settings with search targets. This can be done
@@ -552,7 +770,7 @@
on a per-session basis. This allows the client to override specific CCL fields
for searching, etc., to meet the needs of a session or user.
-
+
Finally, as an extreme case of this, the webservice client can
introduce entirely new targets, on the fly, as part of the
@@ -564,66 +782,70 @@
long as the webservice client is prepared to supply the necessary
information at the beginning of every session.
-
+
- The following discussion of practical issues related to session and settings
- management are cast in terms of a user interface based on Ajax/Javascript
- technology. It would apply equally well to many other kinds of browser-based logic.
+ The following discussion of practical issues related to session
+ and settings management are cast in terms of a user interface based on
+ Ajax/Javascript technology. It would apply equally well to many other
+ kinds of browser-based logic.
-
+
- Typically, a Javascript client is not allowed to directly alter the parameters
- of a session. There are two reasons for this. One has to do with access
- to information; typically, information about a user will be stored in a
- system on the server side, or it will be accessible in some way from the server.
- However, since the Javascript client cannot be entirely trusted (some hostile
- agent might in fact 'pretend' to be a regular ws client), it is more robust
- to control session settings from scripting that you run as part of your
- webserver. Typically, this can be handled during the session initialization,
- as follows:
+ Typically, a Javascript client is not allowed to directly alter the
+ parameters of a session. There are two reasons for this. One has to do
+ with access to information; typically, information about a user will
+ be stored in a system on the server side, or it will be accessible in
+ some way from the server. However, since the Javascript client cannot
+ be entirely trusted (some hostile agent might in fact 'pretend' to be
+ a regular ws client), it is more robust to control session settings
+ from scripting that you run as part of your webserver. Typically, this
+ can be handled during the session initialization, as follows:
-
+
- Step 1: The Javascript client loads, and asks the webserver for a new Pazpar2
- session ID. This can be done using a Javascript call, for instance. Note that
- it is possible to submit Ajax HTTPXmlRequest calls either to Pazpar2 or to the
- webserver that Pazpar2 is proxying for. See (XXX Insert link to Pazpar2 protocol).
-
-
+ Step 1: The Javascript client loads, and asks the webserver for a
+ new Pazpar2 session ID. This can be done using a Javascript call, for
+ instance. Note that it is possible to submit Ajax HTTPXmlRequest calls
+ either to Pazpar2 or to the webserver that Pazpar2 is proxying
+ for. See (XXX Insert link to Pazpar2 protocol).
+
+
Step 2: Code on the webserver authenticates the user, by database lookup,
LDAP access, NCIP, etc. Determines which resources the user has access to,
and any user-specific parameters that are to be applied during this session.
-
+
- Step 3: The webserver initializes a new Pazpar2 settings, and sets user-specific
- parameters as necessary, using the init webservice command. A new session ID is
- returned.
+ Step 3: The webserver initializes a new Pazpar2 settings, and sets
+ user-specific parameters as necessary, using the init webservice
+ command. A new session ID is returned.
-
+
- Step 4: The webserver returns this session ID to the Javascript client, which then
- uses the session ID to submit searches, show results, etc.
+ Step 4: The webserver returns this session ID to the Javascript
+ client, which then uses the session ID to submit searches, show
+ results, etc.
-
+
- Step 5: When the Javascript client ceases to use the session, Pazpar2 destroys
- any session-specific information.
+ Step 5: When the Javascript client ceases to use the session,
+ Pazpar2 destroys any session-specific information.
- SETTINGS FILE FORMAT
+
+ SETTINGS FILE FORMAT
Each file contains a root element named <settings>. It may
contain one or more <set> elements. The settings and set
- elements may contain the following attributes. Attributes in the set node
- overrides those in the setting root element. Each set node must
+ elements may contain the following attributes. Attributes in the set
+ node overrides those in the setting root element. Each set node must
specify (directly, or inherited from the parent node) at least a
target, name, and value.
-
+
target
@@ -648,6 +870,11 @@
multiple overlapping settings with the same name and target
value, the 'precedence' attribute determines what happens.
+
+ For Pazpar2 1.6.4 or later, the target ID may be user-defined, in
+ which case, the actual host, port, etc is given by setting
+ .
+
@@ -686,7 +913,7 @@
-
+
By setting defaults for target, name, or value in the root
settings node, you can use the settings files in many different
@@ -698,80 +925,84 @@
many databases with a given category or class that makes sense
within your application.
-
+
The following examples illustrate uses of the settings system to
associate settings with targets to meet different requirements.
-
+
The example below associates a set of default values that can be
used across many targets. Note the wildcard for targets.
This associates the given settings with all targets for which no
other information is provided.
+
-
+
-
-
+
+
-
-
-
-
-
-
+
+
+
+
+
+
+
+
+
-
+
-
-
+
+
-
-
+
+
-
+
-
-
+
+
-
+
- ]]>
+ ]]>
-
+
The next example shows certain settings overridden for one target,
one which returns XML records containing DublinCore elements, and
which furthermore requires a username/password.
-
-
-
+
+
+
+
-
-
- ]]>
+
+
+ ]]>
-
+
The following example associates a specific name/value combination
with a number of targets. The targets below are access-restricted,
and can only be used by users with special credentials.
-
-
-
- ]]>
+
+
+
+
+ ]]>
-
+
-
- RESERVED SETTING NAMES
+
+
+ RESERVED SETTING NAMES
The following setting names are reserved by Pazpar2 to control the
behavior of the client function.
@@ -860,9 +1091,9 @@
pz:queryencoding
- The encoding of the search terms that a target accepts. Most
- targets do not honor UTF-8 in which case this needs to be specified.
- Each term in a query will be converted if this setting is given.
+ The encoding of the search terms that a target accepts. Most
+ targets do not honor UTF-8 in which case this needs to be specified.
+ Each term in a query will be converted if this setting is given.
@@ -880,13 +1111,21 @@
- pz:xslt
+ pz:xslt
- Is a comma separated list of of files that specifies
+ Is a comma separated list of of stylesheet names that specifies
how to convert incoming records to the internal representation.
+ For each name, the embedded stylesheets (XSL) that comes with the
+ service definition are consulted first and takes precedence over
+ external files; see
+ of service definition).
+ If the name does not match an embedded stylesheet it is
+ considered a filename.
+
+
The suffix of each file specifies the kind of tranformation.
Suffix ".xsl" makes an XSL transform. Suffix
".mmap" will use the MMAP transform (described below).
@@ -902,12 +1141,13 @@
performance with the alternate "MARC map" format. Provide the
path of a file with extension ".mmap" containing on each line:
- <field> <subfield> <metadata element>
+ <field> <subfield> <metadata element>
For example:
- 245 a title
- 500 $ description
- 773 * citation
+ 245 a title
+ 500 $ description
+ 773 * citation
+
To map the field value specify a subfield of '$'. To store a
concatenation of all subfields, specify a subfield of '*'.
@@ -927,9 +1167,10 @@
Allows or denies access to the resources it is applied to. Possible
- values are '0' and '1'. The default is '1' (allow access to this resource).
- See the manual section on authorization and authentication for discussion
- about how to use this setting.
+ values are '0' and '1'.
+ The default is '1' (allow access to this resource).
+ See the manual section on authorization and authentication for
+ discussion about how to use this setting.
@@ -943,6 +1184,15 @@
+ pz:presentchunk
+
+
+ Controls the chunk size in present requests. Pazpar2 will
+ make (maxrecs / chunk) request(s). The default is 20.
+
+
+
+
pz:id
@@ -979,7 +1229,7 @@
This setting enables
- SRU/SOLR
+ SRU/Solr
support.
It has four possible settings.
'get', enables SRU access through GET requests. 'post' enables SRU/POST
@@ -988,8 +1238,8 @@
the protocol.
- A value of 'solr' anables SOLR client support. This is supported
- for Pazpar version 1.5.0 and later.
+ A value of 'solr' enables Solr client support. This is supported
+ for Pazpar version 1.5.0 and later.
@@ -1000,7 +1250,7 @@
This allows SRU version to be specified. If unset Pazpar2
will the default of YAZ (currently 1.2). Should be set
- to 1.1 or 1.2.
+ to 1.1 or 1.2. For Solr, the current supported/tested version is 1.4 and 3.x.
@@ -1010,7 +1260,7 @@
Allows you to specify an arbitrary PQF query language substring.
- The provided string is prefixed the user's query after it has been
+ The provided string is prefixed to the user's query after it has been
normalized to PQF internally in pazpar2.
This allows you to attach complex 'filters' to queries for a given
target, sometimes necessary to select sub-catalogs
@@ -1033,6 +1283,17 @@
@and @attr 1=30 @attr 2=3 %Y %%
would search for current year combined with the original PQF (%%).
+
+ This setting can also be used as more general alternative to
+ pz:pqf_prefix -- a way of embedding the submitted query
+ anywhere in the string rather than appending it to prefix. For
+ example, if it is desired to omit all records satisfying the
+ query @attr 1=pica.bib 0007 then this
+ subquery can be combined with the submitted query as the second
+ argument of @andnot by using the
+ pz:pqf_strftime value @not %% @attr 1=pica.bib
+ 0007.
+
@@ -1051,20 +1312,151 @@
Specifies a filter which allows Pazpar2 to only include
- records that meet a certain criteria in a result. Unmatched records
- will be ignored. The filter takes the form name[~value] , which
+ records that meet a certain criteria in a result.
+ Unmatched records will be ignored.
+ The filter takes the form name, name~value, or name=value, which
will include only records with metadata element (name) that has the
- substring (value) given. If value is omitted all records with the
- metadata present will be included.
+ substring (~value) given, or matches exactly (=value).
+ If value is omitted all records with the named metadata element
+ present will be included.
-
+
+
+ pz:preferred
+
+
+ Specifies that a target is preferred, e.g. possible local, faster
+ target. Using block=pref on show command will wait for all these
+ targets to return records before releasing the block.
+ If no target is preferred, the block=pref will identical to block=1,
+ which release when one target has returned records.
+
+
+
+
+ pz:block_timeout
+
+
+ (Not yet implemented).
+ Specifies the time for which a block should be released anyway.
+
+
+
+
+ pz:termlist_term_count
+
+
+ Specifies number of facet terms to be requested from the target.
+ The default is unspecified e.g. server-decided. Also see pz:facetmap.
+
+
+
+
+ pz:termlist_term_factor
+
+
+ Specifies whether to use a factor for pazpar2 generated facets (1) or not (0).
+ When mixing locallly generated (by the downloaded (pz:maxrecs) samples)
+ facet with native (target-generated) facets, the later will dominated the dominate the facet list
+ since they are generated based on the complete result set.
+ By scaling up the facet count using the ratio between total hit count and the sample size,
+ the total facet count can be approximated and thus better compared with native facets.
+ This is not enabled by default.
+
+
+
+
+
+ pz:facetmap:name
+
+
+ Specifies that for field name, the target
+ supports (native) facets. The value is the name of the
+ field on the target.
+
+
+
+ At this point only Solr targets have been tested with this
+ facility.
+
+
+
+
+
+
+ pz:limitmap:name
+
+
+ Specifies attributes for limiting a search to a field - using
+ the limit parameter for search. It can be used to filter locally
+ or remotely (search in a target). In some cases the mapping of
+ a field to a value is identical to an existing cclmap field; in
+ other cases the field must be specified in a different way - for
+ example to match a complete field (rather than parts of a subfield).
+
+
+ The value of limitmap may have one of three forms: referral to
+ an existing CCL field, a raw PQF string or a local limit. Leading string
+ determines type; either ccl: for CCL field,
+ rpn: for PQF/RPN, or local:
+ for filtering in Pazpar2. The local filtering may be followed
+ by a field a metadata field (default is to use the name of the
+ limitmap itself).
+
+
+
+ The limitmap facility is supported for Pazpar2 version 1.6.0.
+ Local filtering is supported in Pazpar2 1.6.6.
+
+
+
+
+
+ pz:url
+
+
+ Specifies URL for the target and overrides the target ID.
+
+
+
+ pz:url is only recognized for
+ Pazpar2 1.6.4 and later.
+
+
+
+
+
+
+ pz:sortmap:field
+
+
+ Specifies native sorting for a target where
+ field is a sort criteria (see command
+ show). The value has to components separated by colon: strategy and
+ native-field. Strategy is one of z3950,
+ type7, cql,
+ sru11, or embed.
+ The second component, native-field, is the field that is recognized
+ by the target.
+
+
+
+ Only supported for Pazpar2 1.6.4 and later.
+
+
+
+
+
+
+
-
+
- SEE ALSO
+
+ SEE ALSO
pazpar2
@@ -1083,15 +1475,7 @@