X-Git-Url: http://jsfdemo.indexdata.com/?a=blobdiff_plain;f=doc%2Fpazpar2_conf.xml;h=158b3beaade4a3e74645a51291cdb5f07b731f5b;hb=a58a0ed09301d1a773ad5489fba90b9dddfb1bfd;hp=e2be4c03429cf80661340e244c8a65d447164947;hpb=15ad28a5c988047bc3f55fda74f7351aefa66e3d;p=pazpar2-moved-to-github.git diff --git a/doc/pazpar2_conf.xml b/doc/pazpar2_conf.xml index e2be4c0..158b3be 100644 --- a/doc/pazpar2_conf.xml +++ b/doc/pazpar2_conf.xml @@ -5,10 +5,10 @@ %local; %entities; - - %common; + + %idcommon; ]> - + Pazpar2 @@ -101,17 +101,68 @@ - zproxy + icu_chain - If this item is given, pazpar2 will send all Z39.50 - packages through this Z39.50 proxy server. - At least one of the 'host' and 'post' attributes is required. - The 'host' attribute may contain both host name and port - number, seperated by a colon ':', or only the host name. - An empty 'host' attribute sets the Z39.50 host address - to 'localhost'. + Definition of ICU tokenization and normalization rules + are used if ICU support is compiled in. The 'id' + attribute is currently not used, and the 'locale' + attribute must be set to one of the locale strings + defined in ICU. The child elements listed below can be + in any order, except the 'index' element which logically + belongs to the end of the list. The stated tokenization, + normalization and charmapping instructions are performed + in order from top to bottom. + + casemap + + + The attribure 'rule' defines the direction of the + per-character casemapping, allowed values are "l" + (lower), "u" (upper), "t" (title). + + + + normalize + + + Normalization and transformation of tokens follows + the rules defined in the 'rule' attribute. For + possible values we refer to the extensive ICU + documentation found at the + ICU + transformation home page. Set filtering + principles are explained at the + ICU set and + filtering page. + + + + tokenize + + + Tokenization is the only rule in the ICU chain + which splits one token into multiple tokens. The + 'rule' attribute may have the following values: + "s" (sentence), "l" (line-break), "w" (word), and + "c" (character), the later probably not beeing + very useful in a runing pazpar2 installation. + + + + index + + + Finally the 'index' element instruction - without + any 'rule' attribute - is used to store the tokens + after chain processing in the relevance ranking + unit of Pazpar2. It will always be the last + instruction in the chain. + + + + @@ -144,10 +195,13 @@ This is the name of the data element. It is matched - against the 'type' attribute of the 'metadata' element + against the 'type' attribute of the + 'metadata' element in the normalized record. A warning is produced if - metdata elements with an unknown name are found in the - normalized record. This name is also used to represent + metdata elements with an unknown name are + found in the + normalized record. This name is also used to + represent data elements in the records returned by the webservice API, and to name sort lists and browse facets. @@ -194,11 +248,13 @@ rank - Specifies that this element is to be used to help rank + Specifies that this element is to be used to + help rank records against the user's query (when ranking is requested). The value is an integer, used as a multiplier against the basic TF*IDF score. A value of - 1 is the base, higher values give additional weight to + 1 is the base, higher values give additional + weight to elements of this type. The default is '0', which excludes this element from the rank calculation. @@ -212,7 +268,8 @@ termlist, or browse facet. Values are tabulated from incoming records, and a highscore of values (with their associated frequency) is made available to the - client through the webservice API. The possible values + client through the webservice API. + The possible values are 'yes' and 'no' (default). @@ -254,9 +311,16 @@ - - - + + @@ -473,7 +537,7 @@ - + @@ -647,6 +711,16 @@ + + pz:zproxy + + + The 'pz:zproxy' setting has the value syntax + 'host.internet.adress:port', it is used to tunnel Z39.50 + requests through the named Z39.50 proxy. + + +