X-Git-Url: http://jsfdemo.indexdata.com/?a=blobdiff_plain;f=doc%2Fpazpar2_conf.xml;h=158b3beaade4a3e74645a51291cdb5f07b731f5b;hb=10ca5845f61713f01f0f179f38643e0863e17fe5;hp=fac262d29eff8e1fcb331524a72e496b01bd3de5;hpb=d73d99a26ccd3403bcb4805a0145b1f8583efd10;p=pazpar2-moved-to-github.git diff --git a/doc/pazpar2_conf.xml b/doc/pazpar2_conf.xml index fac262d..158b3be 100644 --- a/doc/pazpar2_conf.xml +++ b/doc/pazpar2_conf.xml @@ -5,10 +5,10 @@ %local; %entities; - - %common; + + %idcommon; ]> - + Pazpar2 @@ -101,17 +101,68 @@ - zproxy + icu_chain - If this item is given, pazpar2 will send all Z39.50 - packages through this Z39.50 proxy server. - At least one of the 'host' and 'post' attributes is required. - The 'host' attribute may contain both host name and port - number, seperated by a colon ':', or only the host name. - An empty 'host' attribute sets the Z39.50 host address - to 'localhost'. + Definition of ICU tokenization and normalization rules + are used if ICU support is compiled in. The 'id' + attribute is currently not used, and the 'locale' + attribute must be set to one of the locale strings + defined in ICU. The child elements listed below can be + in any order, except the 'index' element which logically + belongs to the end of the list. The stated tokenization, + normalization and charmapping instructions are performed + in order from top to bottom. + + casemap + + + The attribure 'rule' defines the direction of the + per-character casemapping, allowed values are "l" + (lower), "u" (upper), "t" (title). + + + + normalize + + + Normalization and transformation of tokens follows + the rules defined in the 'rule' attribute. For + possible values we refer to the extensive ICU + documentation found at the + ICU + transformation home page. Set filtering + principles are explained at the + ICU set and + filtering page. + + + + tokenize + + + Tokenization is the only rule in the ICU chain + which splits one token into multiple tokens. The + 'rule' attribute may have the following values: + "s" (sentence), "l" (line-break), "w" (word), and + "c" (character), the later probably not beeing + very useful in a runing pazpar2 installation. + + + + index + + + Finally the 'index' element instruction - without + any 'rule' attribute - is used to store the tokens + after chain processing in the relevance ranking + unit of Pazpar2. It will always be the last + instruction in the chain. + + + + @@ -121,9 +172,7 @@ This nested element controls the behavior of pazpar2 with respect to your data model. In pazpar2, incoming records are - normalized, using XSLT, into an internal representation (see - the retrievalprofile secion. + normalized, using XSLT, into an internal representation. The 'service' section controls the further processing and extraction of data from the internal representation, primarily through the 'metdata' sub-element. @@ -146,10 +195,13 @@ This is the name of the data element. It is matched - against the 'type' attribute of the 'metadata' element + against the 'type' attribute of the + 'metadata' element in the normalized record. A warning is produced if - metdata elements with an unknown name are found in the - normalized record. This name is also used to represent + metdata elements with an unknown name are + found in the + normalized record. This name is also used to + represent data elements in the records returned by the webservice API, and to name sort lists and browse facets. @@ -196,11 +248,13 @@ rank - Specifies that this element is to be used to help rank + Specifies that this element is to be used to + help rank records against the user's query (when ranking is requested). The value is an integer, used as a multiplier against the basic TF*IDF score. A value of - 1 is the base, higher values give additional weight to + 1 is the base, higher values give additional + weight to elements of this type. The default is '0', which excludes this element from the rank calculation. @@ -214,7 +268,8 @@ termlist, or browse facet. Values are tabulated from incoming records, and a highscore of values (with their associated frequency) is made available to the - client through the webservice API. The possible values + client through the webservice API. + The possible values are 'yes' and 'no' (default). @@ -244,91 +299,6 @@ - queryprofile - - At the moment, this directive is ignored; there is one global - CCL-mapping file which governs the mapping of queries to Z39.50 - type-1. This file is located in etc/default.bib. This will change - shortly. - - - - retrievalprofile - - Note: In the present version, there is a single retrieval - profile. However, in a future release, it will be possible to - associate unique retrieval profiles with different targets, or to - generate retrieval profiles using XSLT from the ZeeRex description of - a target. - - - - The following data elements are recognized for the retrievalprofile - directive: - - - - requestsyntax - - - This element specifies the request syntax to be used in queries. It only - makes sense for Z39.50-type targets. - - - - - nativesyntax - - - This element specifies the native syntax and encoding of the - result records. The default is XML. The following attributes - are defined: - - - name - - - The name of the syntax. Currently recognized values are - 'iso2709' (MARC), and 'xml'. - - - - - format - - - The format, or schema, to be expected. Default is - 'marc21'. - - - - - encoding - - - The encoding of the response record. Typical values for - MARC records are 'marc8' (general MARC-8), 'marc8s' - (MARC-8, but maps to precomposed UTF-8 characters, more - suitable for use in web browsers), 'latin1'. - - - - - mapto - - - Specifies the flavor of MARCXML to map results to. - Default is 'marcxml'. 'marcxchange' is also possible, and - useful for Danish DANMARC records. - - - - - - - - - EXAMPLE @@ -341,9 +311,16 @@ - - - + + @@ -356,33 +333,24 @@ - - - - marc21 - - - - ]]> - TARGET SETTINGS + TARGET SETTINGS Pazpar2 features a cunning scheme by which you can associate various - kinds of attributes, or settings with search targets. This is done - through XML files, and each file can associate one or more settings - with one or more targets. The file format is generic in nature, - designed to support a wide range of application requirements. The + kinds of attributes, or settings with search targets. This can be done + through XML files which are read at startup; each file can associate + one or more settings with one or more targets. The file format is generic + in nature, designed to support a wide range of application requirements. The settings can be purely technical things, like, how to perform a title search against a given target, or it can associate arbitrary name=value pairs with groups of targets -- for instance, if you would like to place all commercial full-text bases in one group for selection - purposes, or you would like to control what targets are accessible to a - given user. + purposes, or you would like to control what targets are accessible + to users by default. @@ -391,10 +359,371 @@ process any settings files found therein. + + Clients of the pazpar2 webservice interface can selectively override + settings for individual targets within the scope of one session. This + can be used in conjunction with an external authentication system to + determine which resources are to be accessible to which users. Pazpar2 + itself has no notion of end-users, and so can be used in conjunction + with any type of authentication system. Similarly, the authentication + tokens submitted to access-controlled search targets can similarly be + overriden, to allow use of pazpar2 in a consortial or multi-library + environment, where different end-users may need to be represented to + some search targets in different ways. This, again, can be managed + using an external database or other lookup mechanism. Setting overrides + can be performed either using the 'init' or the 'settings' webservice + command (see XXX ref to pazpar2 protocol). + + + + In fact, every setting that applies to a database (except pz:id, which + can only be used for filtering targets to use for a search) can be overriden + on a per-session basis. This allows the client to override specific CCL fields + for searching, etc., to meet the needs of a session or user. + + + + Finally, as an extreme case of this, the webservice client can + introduce entirely new targets, on the fly, as part of the init or + settings command. This is useful if you desire to manage information + about your search targets in a separate application such as a database. + You do not need any static settings file whatsoever to run pazpar2 -- as + long as the webservice client is prepared to supply the necessary + information at the beginning of every session. + + + + NOTE: The following discussion of practical issues related to session and settings + management are cast in terms of a user interface based on Ajax/Javascript + technology. It would apply equally well to many other kinds of browser-based logic. + + + + Typically, a Javascript client is not allowed to directly alter the parameters + of a session. There are two reasons for this. One has to do with access + to information; typically, information about a user will be stored in a + system on the server side, or it will be accessible in some way from the server. + However, since the Javascript client cannot be entirely trusted (some hostile + agent might in fact 'pretend' to be a regular ws client), it is more robust + to control session sesttings from scripting that you run as part of your + webserver. Typically, this can be handled during the session initialization, + as follows: + + + + Step 1: The Javascript client loads, and asks the webserver for a new pazpar2 + session ID. This can be done using a Javascript call, for instance. Note that + it is possible to submit Ajax HTTPXmlRequest calls either to pazpar2 or to the + webserver that pazpar2 is proxying for. See (XXX Insert link to pazpar2 protocol). + + + + Step 2: Code on the webserver authenticates the user, by database lookup, + LDAP access, NCIP, etc. Determines which resources the user has access to, + and any user-specific parameters that are to be applied during this session. + + + + Step 3: The webserver initializes a new pazpar2 settings, and sets user-specific + parameters as necessary, using the init webservice command. A new session ID is + returned. + + + + Step 4: The webserver returns this session ID to the Javascript client, which then + uses the session ID to submit searches, show results, etc. + + + + Step 5: When the Javascript client ceases to use the session, pazpar2 destroys + any session-specific information. + + SETTINGS FILE FORMAT + Each file contains a root element named <settings>. It may + contain one or more <set> elements. The settings and set + elements may contain the following attributes. Attributes in the set node + overrides those in the setting root element. Each set node must + specify (directly, or inherited from the parent node) at least a + target, name, and value. + + + + target + + + This specifies the search target to which this setting should be + applied. Targets are identified by their Z39.50 URL, generally + including the host, port, and database name, (e.g. + bagel.indexdata.com:210/marc). Two wildcard forms are accepted: + * (asterisk) matches all known targets; + bagel.indexdata.com:210/* matches all known databases on the given + host. + + + A precedence system determines what happens if there are + overlapping values for the same setting name for the same + target. A setting for a specific target name overrides a + setting whch specifies target using a wildcard. This makes it + easy to set defaults for all targets, and then override them + for specific targets or hosts. If there are + multiple overlapping settings with the same name and target + value, the 'precedence' attribute determines what happens. + + + + + name + + + The name of the setting. This can be anything you like. + However, pazpar2 reserves a number of setting names for + specific purposes, all starting with 'pz:', and it is a good + idea to avoid that prefix if you make up your own setting + names. See below for a list of reserved variables. + + + + + value + + + The value of the setting. Generally, this can be anything you + want -- however, some of the reserved settings may expect + specific kinds of values. + + + + + precedence + + + This should be an integer. If not provided, the default value + is 0. If two (or more) settings have the same content for + target and name, the precedence value determines the outcome. + If both settings have the same precedence value, they are both + applied to the target(s). If one has a higher value, then the + value of that setting is applied, and the other one is ignored. + + + + + + + By setting defaults for target, name, or value in the root + settings node, you can use the settings files in many different + ways. For instance, you can use a single file to set defaults for + many different settings, like search fields, retrieval syntaxes, + etc. You can have one file per server, which groups settings for + that server or target. You could also have one file which associates + a number of targets with a given setting, for instance, to associate + many databases with a given category or class that makes sense + within your application. + + + + The following examples illustrate uses of the settings system to + associate settings with targets to meet different requirements. + + + + The example below associates a set of default values that can be + used across many targets. Note the wildcard for targets. + This associates the given settings with all targets for which no + other information is provided. + + + + + + + + + + + + + + + + + + + + + + + + + + + + ]]> + + + + The next example shows certain settings overriden for one target, + one which returns XML records containing DublinCore elements, and + which furthermore requires a username/password. + + + + + + + + ]]> + + + + The following example associates a specific name/value combination + with a number of targets. The targets below are access-restricted, + and can only be used by users with special credentials. + + + + + ]]> + + + + + RESERVED SETTING NAMES + + The following setting names are reserved by pazpar2 to control the + behavior of the client function. + + + + + pz:cclmap:xxx + + + This establishes a CCL field definition or other setting, for + the purpose of mapping end-user queries. XXX is the field or + setting name, and the value of the setting provides parameters + (e.g. parameters to send to the server, etc.). Please consult + the YAZ manual for a full overview of the many capabilities of + the powerful and flexible CCL parser. + + + Note that it is easy to etablish a set of default parameters, + and then override them individually for a given target. + + + + + pz:requestsyntax + + + This specifies the record syntax to use when requesting + records from a given server. The value can be a symbolic name like + marc21 or xml, or it can be a Z39.50-style dot-separated OID. + + + + + pz:elements + + + The element set name to be used when retrieving records from a + server (not yet implemented). + + + + + pz:piggyback + + + Piggybacking enables the server to retrieve records from the + server as part of the search response in Z39.50. Almost all + servers support this (or fail it gracefully), but a few + servers will produce undesirable results. + Set to '1' to enable piggybacking, '0' to disable it. Default + is 1 (piggybacking enabled). + + + + + pz:nativesyntax + + + The representation (syntax) of the retrieval records. Currently + recognized values are iso2709 and xml. + + + For iso2709, can also specify a native character set, e.g. "iso2709;latin-1". + If no character set is provided, MARC-8 is assumed. + + + + + pz:xslt + + + Provides the path of an XSLT stylesheet which will be used to + map incoming records to the internal representation. + + + + + pz:authentication + + + Sets an authentication string for a given server. See the section on + authorization and authentication for discussion. + + + + + pz:allow + + + Allows or denies access to the resources it is applied to. Possible + values are '0' and '1'. The default is '1' (allow access to this resource). + See the manual section on authorization and authentication for discussion + about how to use this setting. + + + + + pz:maxrecs + + + Controls the maximum number of records to be retrieved from a + server. The default is 100 (not yet implemented). + + + + + pz:id + + + This setting can't be 'set' -- it contains the ID (normally + ZURL) for a given target, and is useful for filtering -- + specifically when you want to select one or more specific + targets in the search command. + + + + + pz:zproxy + + + The 'pz:zproxy' setting has the value syntax + 'host.internet.adress:port', it is used to tunnel Z39.50 + requests through the named Z39.50 proxy. + + + + +