X-Git-Url: http://jsfdemo.indexdata.com/?a=blobdiff_plain;f=doc%2Fpazpar2_conf.xml;h=6db09992ecb7a94b483112fb1aca243bcb2c681f;hb=7d4df14710ba99f1530844584a2bba931c4adbcd;hp=4410c9b63f46d1c8156a7211b9c92863e722474e;hpb=a3653d95470cd9b5d587b846052691c8db2c93c4;p=pazpar2-moved-to-github.git diff --git a/doc/pazpar2_conf.xml b/doc/pazpar2_conf.xml index 4410c9b..6db0999 100644 --- a/doc/pazpar2_conf.xml +++ b/doc/pazpar2_conf.xml @@ -8,7 +8,7 @@ %common; ]> - + Pazpar2 @@ -116,6 +116,72 @@ + icu_chain + + + Definition of ICU tokenization and normalization rules + are used if ICU support is compiled in. The 'id' + attribute is currently not used, and the 'locale' + attribute must be set to one of the locale strings + defined in ICU. The child elements listed below can be + in any order, except the 'index' element which logically + belongs to the end of the list. The stated tokenization, + normalization and charmapping instructions are performed + in order from top to bottom. + + + casemap + + + The attribure 'rule' defines the direction of the + per-character casemapping, allowed values are "l" + (lower), "u" (upper), "t" (title). + + + + normalize + + + Normalization and transformation of tokens follows + the rules defined in the 'rule' attribute. For + possible values we refer to the extensive ICU + documentation found at the + ICU + transformation home page. Set filtering + principles are explained at the + ICU set and + filtering page. + + + + tokenize + + + Tokenization is the only rule in the ICU chain + which splits one token into multiple tokens. The + 'rule' attribute may have the following values: + "s" (sentence), "l" (line-break), "w" (word), and + "c" (character), the later probably not beeing + very useful in a runing pazpar2 installation. + + + + index + + + Finally the 'index' element instruction - without + any 'rule' attribute - is used to store the tokens + after chain processing in the relevance ranking + unit of Pazpar2. It will always be the last + instruction in the chain. + + + + + + + + service @@ -144,10 +210,13 @@ This is the name of the data element. It is matched - against the 'type' attribute of the 'metadata' element + against the 'type' attribute of the + 'metadata' element in the normalized record. A warning is produced if - metdata elements with an unknown name are found in the - normalized record. This name is also used to represent + metdata elements with an unknown name are + found in the + normalized record. This name is also used to + represent data elements in the records returned by the webservice API, and to name sort lists and browse facets. @@ -194,11 +263,13 @@ rank - Specifies that this element is to be used to help rank + Specifies that this element is to be used to + help rank records against the user's query (when ranking is requested). The value is an integer, used as a multiplier against the basic TF*IDF score. A value of - 1 is the base, higher values give additional weight to + 1 is the base, higher values give additional + weight to elements of this type. The default is '0', which excludes this element from the rank calculation. @@ -212,7 +283,8 @@ termlist, or browse facet. Values are tabulated from incoming records, and a highscore of values (with their associated frequency) is made available to the - client through the webservice API. The possible values + client through the webservice API. + The possible values are 'yes' and 'no' (default). @@ -258,6 +330,18 @@ + + + + @@ -277,10 +361,10 @@ TARGET SETTINGS Pazpar2 features a cunning scheme by which you can associate various - kinds of attributes, or settings with search targets. This is done - through XML files; each file can associate one or more settings - with one or more targets. The file format is generic in nature, - designed to support a wide range of application requirements. The + kinds of attributes, or settings with search targets. This can be done + through XML files which are read at startup; each file can associate + one or more settings with one or more targets. The file format is generic + in nature, designed to support a wide range of application requirements. The settings can be purely technical things, like, how to perform a title search against a given target, or it can associate arbitrary name=value pairs with groups of targets -- for instance, if you would like to @@ -306,7 +390,73 @@ overriden, to allow use of pazpar2 in a consortial or multi-library environment, where different end-users may need to be represented to some search targets in different ways. This, again, can be managed - using an external database or other lookup mechanism. + using an external database or other lookup mechanism. Setting overrides + can be performed either using the 'init' or the 'settings' webservice + command (see XXX ref to pazpar2 protocol). + + + + In fact, every setting that applies to a database (except pz:id, which + can only be used for filtering targets to use for a search) can be overriden + on a per-session basis. This allows the client to override specific CCL fields + for searching, etc., to meet the needs of a session or user. + + + + Finally, as an extreme case of this, the webservice client can + introduce entirely new targets, on the fly, as part of the init or + settings command. This is useful if you desire to manage information + about your search targets in a separate application such as a database. + You do not need any static settings file whatsoever to run pazpar2 -- as + long as the webservice client is prepared to supply the necessary + information at the beginning of every session. + + + + NOTE: The following discussion of practical issues related to session and settings + management are cast in terms of a user interface based on Ajax/Javascript + technology. It would apply equally well to many other kinds of browser-based logic. + + + + Typically, a Javascript client is not allowed to directly alter the parameters + of a session. There are two reasons for this. One has to do with access + to information; typically, information about a user will be stored in a + system on the server side, or it will be accessible in some way from the server. + However, since the Javascript client cannot be entirely trusted (some hostile + agent might in fact 'pretend' to be a regular ws client), it is more robust + to control session sesttings from scripting that you run as part of your + webserver. Typically, this can be handled during the session initialization, + as follows: + + + + Step 1: The Javascript client loads, and asks the webserver for a new pazpar2 + session ID. This can be done using a Javascript call, for instance. Note that + it is possible to submit Ajax HTTPXmlRequest calls either to pazpar2 or to the + webserver that pazpar2 is proxying for. See (XXX Insert link to pazpar2 protocol). + + + + Step 2: Code on the webserver authenticates the user, by database lookup, + LDAP access, NCIP, etc. Determines which resources the user has access to, + and any user-specific parameters that are to be applied during this session. + + + + Step 3: The webserver initializes a new pazpar2 settings, and sets user-specific + parameters as necessary, using the init webservice command. A new session ID is + returned. + + + + Step 4: The webserver returns this session ID to the Javascript client, which then + uses the session ID to submit searches, show results, etc. + + + + Step 5: When the Javascript client ceases to use the session, pazpar2 destroys + any session-specific information. SETTINGS FILE FORMAT @@ -407,7 +557,7 @@ - +