X-Git-Url: http://jsfdemo.indexdata.com/?a=blobdiff_plain;ds=sidebyside;f=doc%2Fpazpar2_conf.xml;h=7acdfb50ab2067fab7a90ed8ea9361fc8db60733;hb=3e59f10fe3e32e6e632b46a1118e8a06c2b722a9;hp=17578a9715e373ac86872ef7a14ec2677bebef4e;hpb=f110bc50c58b63a6fe4eaaddf40a4789c27b83bf;p=pazpar2-moved-to-github.git diff --git a/doc/pazpar2_conf.xml b/doc/pazpar2_conf.xml index 17578a9..7acdfb5 100644 --- a/doc/pazpar2_conf.xml +++ b/doc/pazpar2_conf.xml @@ -1,6 +1,6 @@ - %local; @@ -13,10 +13,13 @@ Pazpar2 &version; + Index Data + Pazpar2 conf 5 + File formats and conventions @@ -30,7 +33,8 @@ - DESCRIPTION + + DESCRIPTION The Pazpar2 configuration file, together with any referenced XSLT files, govern Pazpar2's behavior as a client, and control the normalization and @@ -46,7 +50,8 @@ - FORMAT + + FORMAT The configuration file is XML-structured. It must be well-formed XML. All elements specific to Pazpar2 should belong to the namespace @@ -57,24 +62,27 @@ information. The categories are described below. - threads - - This section is optional and is supported for Pazpar2 version 1.3.1 and - later . It is identified by element "threads" which - may include one attribute "number" which specifies - the number of worker-threads that the Pazpar2 instance is to use. - A value of 0 (zero) disables worker-threads (all work is carried out - in main thread). - + + threads + + This section is optional and is supported for Pazpar2 version 1.3.1 and + later . It is identified by element "threads" which + may include one attribute "number" which specifies + the number of worker-threads that the Pazpar2 instance is to use. + A value of 0 (zero) disables worker-threads (all work is carried out + in main thread). + - server + + server This section governs overall behavior of a server endpoint. It is identified by the element "server" which takes an optional attribute, "id", which identifies this particular Pazpar2 server. Any string value for "id" may be given. - The data + + The data elements are described below. From Pazpar2 version 1.2 this is a repeatable element. @@ -118,13 +126,23 @@ - relevance / sort / mergekey + icu_chain - Specifies character set normalization for relevancy / sorting - and the mergekey - for the server. These definitions serves as + Specifies character set normalization for relevancy / sorting / + mergekey and facets - for the server. These definitions serves as default for services that don't have these given. For the meaning - of these settings refer to the "relevance" element inside service. + of these settings refer to the + "icu_chain" element inside service. + + + + + + relevance / sort / mergekey / facet + + + Obsolete. Use element icu_chain instead. @@ -166,19 +184,21 @@ - metadata + + metadata One of these elements is required for every data element in the internal representation of the record (see . It governs - subsequent processing as pertains to sorting, relevance - ranking, merging, and display of data elements. It supports - the following attributes: + subsequent processing as pertains to sorting, relevance + ranking, merging, and display of data elements. It supports + the following attributes: - name + + name This is the name of the data element. It is matched @@ -196,7 +216,8 @@ - type + + type The type of data element. This value governs any @@ -209,7 +230,8 @@ - brief + + brief If this is set to 'yes', then the data element is @@ -220,7 +242,8 @@ - sortkey + + sortkey Specifies that this data element is to be used for @@ -232,7 +255,8 @@ - rank + + rank Specifies that this element is to be used to @@ -248,7 +272,8 @@ - termlist + + termlist Specifies that this element is to be used as a @@ -262,7 +287,8 @@ - merge + + merge This governs whether, and how elements are extracted @@ -276,8 +302,9 @@ - - mergekey + + + mergekey If set to 'required', the value of this @@ -300,7 +327,19 @@ - setting + + facetrule + + + Specifies the ICU rule set to be used for normalizing + facets. If facetrule is omitted from metadata, the + rule set 'facet' is used. + + + + + + setting This attribute allows you to make use of static database @@ -328,15 +367,26 @@ - + - relevance + icu_chain - Specifies ICU tokenization and transformation rules - for tokens that are used in Pazpar2's relevance ranking. - The 'id' attribute is currently not used, and the 'locale' - attribute must be set to one of the locale strings + Specifies a named ICU rule set. The icu_chain element must include + attribute 'id' which specifies the identifier (name) for the ICU + rule set. + Pazpar2 uses the particular rule sets for particular purposes. + Rule set 'relevance' is used to normalize + terms for relevance ranking. Rule set 'sort' is used to + normalize terms for sorting. Rule set 'mergekey' is used to + normalize terms for making a mergekey and, finally. Rule set 'facet' + is normally used to normalize facet terms, unless + facetrule is given for a + metadata field. + + + The icu_chain element must also include a 'locale' + attribute which must be set to one of the locale strings defined in ICU. The child elements listed below can be in any order, except the 'index' element which logically belongs to the end of the list. The stated tokenization, @@ -344,7 +394,8 @@ in order from top to bottom. - casemap + + casemap The attribute 'rule' defines the direction of the @@ -353,7 +404,8 @@ - transform + + transform Normalization and transformation of tokens follows @@ -361,14 +413,15 @@ possible values we refer to the extensive ICU documentation found at the ICU - transformation home page. Set filtering + transformation home page. Set filtering principles are explained at the ICU set and - filtering page. + filtering page. - tokenize + + tokenize Tokenization is the only rule in the ICU chain @@ -390,12 +443,33 @@ + relevance + + + Specifies the ICU rule set used for relevance ranking. + The child element of 'relevance' must be 'icu_chain' and the + 'id' attribute of the icu_chain is ignored. This + definition is obsolete and should be replaced by the equivalent + construct: + + <icu_chain id="relevance" locale="en">..<icu_chain> + + + + + + sort - Specifies ICU tokenization and transformation rules - for tokens that are used in Pazpar2's sorting. The contents - is similar to that of relevance. + Specifies the ICU rule set used for sorting. + The child element of 'sort' must be 'icu_chain' and the + 'id' attribute of the icu_chain is ignored. This + definition is obsolete and should be replaced by the equivalent + construct: + + <icu_chain id="sort" locale="en">..<icu_chain> + @@ -405,13 +479,36 @@ Specifies ICU tokenization and transformation rules - for tokens that are used in Pazpar2's mergekey. The contents - is similar to that of relevance. + for tokens that are used in Pazpar2's mergekey. + The child element of 'mergekey' must be 'icu_chain' and the + 'id' attribute of the icu_chain is ignored. This + definition is obsolete and should be replaced by the equivalent + construct: + + <icu_chain id="mergekey" locale="en">..<icu_chain> + + facet + + + Specifies ICU tokenization and transformation rules + for tokens that are used in Pazpar2's facets. + The child element of 'facet' must be 'icu_chain' and the + 'id' attribute of the icu_chain is ignored. This + definition is obsolete and should be replaced by the equivalent + construct: + + <icu_chain id="facet" locale="en">..<icu_chain> + + + + + + settings @@ -444,68 +541,69 @@ - - - - - EXAMPLE - Below is a working example configuration: - - - - - - - - + EXAMPLE + + Below is a working example configuration: + + + + + + + + + + - - + - - - - - - - - - - - - - - - - - ]]> - + + + + + + + + + + + + + + ]]> + - INCLUDE FACILITY + + INCLUDE FACILITY The XML configuration may be partitioned into multiple files by using the include element which takes a single attribute, src. The of the src attribute is regular Shell like glob-pattern. For example, - ]]> + + ]]> The include facility requires Pazpar2 version 1.2. - TARGET SETTINGS + + TARGET SETTINGS Pazpar2 features a cunning scheme by which you can associate various kinds of attributes, or settings with search targets. This can be done @@ -552,7 +650,7 @@ on a per-session basis. This allows the client to override specific CCL fields for searching, etc., to meet the needs of a session or user. - + Finally, as an extreme case of this, the webservice client can introduce entirely new targets, on the fly, as part of the @@ -564,66 +662,70 @@ long as the webservice client is prepared to supply the necessary information at the beginning of every session. - + - The following discussion of practical issues related to session and settings - management are cast in terms of a user interface based on Ajax/Javascript - technology. It would apply equally well to many other kinds of browser-based logic. + The following discussion of practical issues related to session + and settings management are cast in terms of a user interface based on + Ajax/Javascript technology. It would apply equally well to many other + kinds of browser-based logic. - + - Typically, a Javascript client is not allowed to directly alter the parameters - of a session. There are two reasons for this. One has to do with access - to information; typically, information about a user will be stored in a - system on the server side, or it will be accessible in some way from the server. - However, since the Javascript client cannot be entirely trusted (some hostile - agent might in fact 'pretend' to be a regular ws client), it is more robust - to control session settings from scripting that you run as part of your - webserver. Typically, this can be handled during the session initialization, - as follows: + Typically, a Javascript client is not allowed to directly alter the + parameters of a session. There are two reasons for this. One has to do + with access to information; typically, information about a user will + be stored in a system on the server side, or it will be accessible in + some way from the server. However, since the Javascript client cannot + be entirely trusted (some hostile agent might in fact 'pretend' to be + a regular ws client), it is more robust to control session settings + from scripting that you run as part of your webserver. Typically, this + can be handled during the session initialization, as follows: - + - Step 1: The Javascript client loads, and asks the webserver for a new Pazpar2 - session ID. This can be done using a Javascript call, for instance. Note that - it is possible to submit Ajax HTTPXmlRequest calls either to Pazpar2 or to the - webserver that Pazpar2 is proxying for. See (XXX Insert link to Pazpar2 protocol). - - + Step 1: The Javascript client loads, and asks the webserver for a + new Pazpar2 session ID. This can be done using a Javascript call, for + instance. Note that it is possible to submit Ajax HTTPXmlRequest calls + either to Pazpar2 or to the webserver that Pazpar2 is proxying + for. See (XXX Insert link to Pazpar2 protocol). + + Step 2: Code on the webserver authenticates the user, by database lookup, LDAP access, NCIP, etc. Determines which resources the user has access to, and any user-specific parameters that are to be applied during this session. - + - Step 3: The webserver initializes a new Pazpar2 settings, and sets user-specific - parameters as necessary, using the init webservice command. A new session ID is - returned. + Step 3: The webserver initializes a new Pazpar2 settings, and sets + user-specific parameters as necessary, using the init webservice + command. A new session ID is returned. - + - Step 4: The webserver returns this session ID to the Javascript client, which then - uses the session ID to submit searches, show results, etc. + Step 4: The webserver returns this session ID to the Javascript + client, which then uses the session ID to submit searches, show + results, etc. - + - Step 5: When the Javascript client ceases to use the session, Pazpar2 destroys - any session-specific information. + Step 5: When the Javascript client ceases to use the session, + Pazpar2 destroys any session-specific information. - SETTINGS FILE FORMAT + + SETTINGS FILE FORMAT Each file contains a root element named <settings>. It may contain one or more <set> elements. The settings and set - elements may contain the following attributes. Attributes in the set node - overrides those in the setting root element. Each set node must + elements may contain the following attributes. Attributes in the set + node overrides those in the setting root element. Each set node must specify (directly, or inherited from the parent node) at least a target, name, and value. - + target @@ -686,7 +788,7 @@ - + By setting defaults for target, name, or value in the root settings node, you can use the settings files in many different @@ -698,80 +800,84 @@ many databases with a given category or class that makes sense within your application. - + The following examples illustrate uses of the settings system to associate settings with targets to meet different requirements. - + The example below associates a set of default values that can be used across many targets. Note the wildcard for targets. This associates the given settings with all targets for which no other information is provided. + - + - - + + - - - - - - + + + + + + + + + - + - - + + - - + + - + - - + + - + - ]]> + ]]> - + The next example shows certain settings overridden for one target, one which returns XML records containing DublinCore elements, and which furthermore requires a username/password. - - - + + + + - - - ]]> + + + ]]> - + The following example associates a specific name/value combination with a number of targets. The targets below are access-restricted, and can only be used by users with special credentials. - - - - ]]> + + + + + ]]> - + - - RESERVED SETTING NAMES + + + RESERVED SETTING NAMES The following setting names are reserved by Pazpar2 to control the behavior of the client function. @@ -860,9 +966,9 @@ pz:queryencoding - The encoding of the search terms that a target accepts. Most - targets do not honor UTF-8 in which case this needs to be specified. - Each term in a query will be converted if this setting is given. + The encoding of the search terms that a target accepts. Most + targets do not honor UTF-8 in which case this needs to be specified. + Each term in a query will be converted if this setting is given. @@ -902,12 +1008,13 @@ performance with the alternate "MARC map" format. Provide the path of a file with extension ".mmap" containing on each line: - <field> <subfield> <metadata element> + <field> <subfield> <metadata element> For example: - 245 a title - 500 $ description - 773 * citation + 245 a title + 500 $ description + 773 * citation + To map the field value specify a subfield of '$'. To store a concatenation of all subfields, specify a subfield of '*'. @@ -927,9 +1034,10 @@ Allows or denies access to the resources it is applied to. Possible - values are '0' and '1'. The default is '1' (allow access to this resource). - See the manual section on authorization and authentication for discussion - about how to use this setting. + values are '0' and '1'. + The default is '1' (allow access to this resource). + See the manual section on authorization and authentication for + discussion about how to use this setting. @@ -988,8 +1096,8 @@ the protocol. - A value of 'solr' anables SOLR client support. This is supported - for Pazpar version 1.5.0 and later. + A value of 'solr' anables SOLR client support. This is supported + for Pazpar version 1.5.0 and later. @@ -1000,7 +1108,7 @@ This allows SRU version to be specified. If unset Pazpar2 will the default of YAZ (currently 1.2). Should be set - to 1.1 or 1.2. + to 1.1 or 1.2. For SOLR, the current supported/tested version is 1.4 @@ -1051,20 +1159,88 @@ Specifies a filter which allows Pazpar2 to only include - records that meet a certain criteria in a result. Unmatched records - will be ignored. The filter takes the form name[~value] , which + records that meet a certain criteria in a result. + Unmatched records will be ignored. + The filter takes the form name, name~value, or name=value, which will include only records with metadata element (name) that has the - substring (value) given. If value is omitted all records with the - metadata present will be included. + substring (~value) given, or matches exactly (=value). + If value is omitted all records with the named metadata element + present will be included. + + + pz:preferred + + + Specifies that a target is preferred, e.g. possible local, faster + target. Using block=pref on show command will wait for all these + targets to return records before releasing the block. + If no target is preferred, the block=pref will identical to block=1, + which release when one target has returned records. + + + + + + pz:block_timeout + + + (Not yet implemented). + Specifies the time for which a block should be released anyway. + + + + + + pz:facetmap:name + + + Specifies that for field name, the target + supports (native) facets. The value is the name of the + field on the target. + + + + At this point only SOLR targets have been tested with this + facility. + + + + + + + pz:limitmap:name + + + Specifies attributes for limiting a search to a field - using + the limit parameter for search. In some cases the mapping of + a field to a value is identical to an existing cclmap field; in + other cases the field must be specified in a different way - for + example to match a complete field (rather than parts of a subfield). + + + The value of limitmap may have one of two forms: referral to + an exisiting CCL field or a raw PQF string. Leading string + determines type; either ccl: for CCL field or + rpn: for PQF/RPN. + + + + The limitmap facility is supported for Pazpar2 version 1.6.0. + + + + + - + - SEE ALSO + + SEE ALSO pazpar2 @@ -1083,15 +1259,7 @@