X-Git-Url: http://jsfdemo.indexdata.com/?a=blobdiff_plain;f=doc%2Fpazpar2_conf.xml;h=06cce2c788ab8810c3feaf0ac3752c78b5e2dfc1;hb=2fdbd5de9185e926401609c22b328f07af0248bd;hp=b8e86eac72050e8eab2e3a4bd1959579a1ba87e9;hpb=8f48376798d4b43d962726ef68f547cbd471d670;p=pazpar2-moved-to-github.git diff --git a/doc/pazpar2_conf.xml b/doc/pazpar2_conf.xml index b8e86ea..06cce2c 100644 --- a/doc/pazpar2_conf.xml +++ b/doc/pazpar2_conf.xml @@ -5,10 +5,10 @@ %local; %entities; - - %common; + + %idcommon; ]> - + Pazpar2 @@ -84,7 +84,10 @@ If this item is given, pazpar2 will forward all incoming HTTP requests that do not contain the filename 'search.pz2' to the host and port specified using the 'host' and 'port' - attributes. This functionality is crucial if you wish to use + attributes. The 'myurl' attribute is required, and should provide + the base URL of the server. Generally, the HTTP URL for the host + specified in the 'listen' parameter. This functionality is + crucial if you wish to use pazpar2 in conjunction with browser-based code (JS, Flash, applets, etc.) which operates in a security sandbox. Such code can only connect to the same server from which the enclosing @@ -98,14 +101,93 @@ + zproxy + + + If this item is given, pazpar2 will send all Z39.50 + packages through this Z39.50 proxy server. + At least one of the 'host' and 'post' attributes is required. + The 'host' attribute may contain both host name and port + number, seperated by a colon ':', or only the host name. + An empty 'host' attribute sets the Z39.50 host address + to 'localhost'. + + + + + + icu_chain + + + Definition of ICU tokenization and normalization rules + are used if ICU support is compiled in. The 'id' + attribute is currently not used, and the 'locale' + attribute must be set to one of the locale strings + defined in ICU. The child elements listed below can be + in any order, except the 'index' element which logically + belongs to the end of the list. The stated tokenization, + normalization and charmapping instructions are performed + in order from top to bottom. + + + casemap + + + The attribure 'rule' defines the direction of the + per-character casemapping, allowed values are "l" + (lower), "u" (upper), "t" (title). + + + + normalize + + + Normalization and transformation of tokens follows + the rules defined in the 'rule' attribute. For + possible values we refer to the extensive ICU + documentation found at the + ICU + transformation home page. Set filtering + principles are explained at the + ICU set and + filtering page. + + + + tokenize + + + Tokenization is the only rule in the ICU chain + which splits one token into multiple tokens. The + 'rule' attribute may have the following values: + "s" (sentence), "l" (line-break), "w" (word), and + "c" (character), the later probably not beeing + very useful in a runing pazpar2 installation. + + + + index + + + Finally the 'index' element instruction - without + any 'rule' attribute - is used to store the tokens + after chain processing in the relevance ranking + unit of Pazpar2. It will always be the last + instruction in the chain. + + + + + + + + service This nested element controls the behavior of pazpar2 with respect to your data model. In pazpar2, incoming records are - normalized, using XSLT, into an internal representation (see - the retrievalprofile secion. + normalized, using XSLT, into an internal representation. The 'service' section controls the further processing and extraction of data from the internal representation, primarily through the 'metdata' sub-element. @@ -113,109 +195,118 @@ metadata - - One of these elements is required for every data element in - the internal representation of the record (see - . It governs - subsequent processing as pertains to sorting, relevance - ranking, merging, and display of data elements. It supports - the following attributes: - - - - name - - - This is the name of the data element. It is matched - against the 'type' attribute of the 'metadata' element - in the normalized record. A warning is produced if - metdata elements with an unknown name are found in the - normalized record. This name is also used to represent - data elements in the records returned by the - webservice API, and to name sort lists and browse - facets. - - - - - type - - - The type of data element. This value governs any - normalization or special processing that might take - place on an element. Possible values are 'generic' - (basic string), 'year' (a range is computed if - multiple years are found in the record). Note: This - list is likely to increase in the future. - - - - - brief - - - If this is set to 'yes', then the data element is - includes in brief records in the webservice API. Note - that this only makes sense for metadata elements that - are merged (see below). The default value is 'no'. - - - - - sortkey - - - Specifies that this data element is to be used for - sorting. The possible values are 'numeric' (numeric - value), 'skiparticle' (string; skip common, leading - articles), and 'no' (no sorting). The default value is - 'no'. - - - - - rank - - - Specifies that this element is to be used to help rank - records against the user's query (when ranking is - requested). The value is an integer, used as a - multiplier against the basic TF*IDF score. A value of - 1 is the base, higher values give additional weight to - elements of this type. The default is '0', which - excludes this element from the rank calculation. - - - - - termlist - - - Specifies that this element is to be used as a - termlist, or browse facet. Values are tabulated from - incoming records, and a highscore of values (with - their associated frequency) is made available to the - client through the webservice API. The possible values - are 'yes' and 'no' (default). - - - - - merge - - - This governs whether, and how elements are extracted - from individual records and merged into cluster - records. The possible values are: 'unique' (include - all unique elements), 'longest' (include only the - longest element (strlen), 'range' (calculate a range - of values across al matching records), 'all' (include - all elements), or 'no' (don't merge; this is the - default); - - - - + + + One of these elements is required for every data element in + the internal representation of the record (see + . It governs + subsequent processing as pertains to sorting, relevance + ranking, merging, and display of data elements. It supports + the following attributes: + + + + name + + + This is the name of the data element. It is matched + against the 'type' attribute of the + 'metadata' element + in the normalized record. A warning is produced if + metdata elements with an unknown name are + found in the + normalized record. This name is also used to + represent + data elements in the records returned by the + webservice API, and to name sort lists and browse + facets. + + + + + type + + + The type of data element. This value governs any + normalization or special processing that might take + place on an element. Possible values are 'generic' + (basic string), 'year' (a range is computed if + multiple years are found in the record). Note: This + list is likely to increase in the future. + + + + + brief + + + If this is set to 'yes', then the data element is + includes in brief records in the webservice API. Note + that this only makes sense for metadata elements that + are merged (see below). The default value is 'no'. + + + + + sortkey + + + Specifies that this data element is to be used for + sorting. The possible values are 'numeric' (numeric + value), 'skiparticle' (string; skip common, leading + articles), and 'no' (no sorting). The default value is + 'no'. + + + + + rank + + + Specifies that this element is to be used to + help rank + records against the user's query (when ranking is + requested). The value is an integer, used as a + multiplier against the basic TF*IDF score. A value of + 1 is the base, higher values give additional + weight to + elements of this type. The default is '0', which + excludes this element from the rank calculation. + + + + + termlist + + + Specifies that this element is to be used as a + termlist, or browse facet. Values are tabulated from + incoming records, and a highscore of values (with + their associated frequency) is made available to the + client through the webservice API. + The possible values + are 'yes' and 'no' (default). + + + + + merge + + + This governs whether, and how elements are extracted + from individual records and merged into cluster + records. The possible values are: 'unique' (include + all unique elements), 'longest' (include only the + longest element (strlen), 'range' (calculate a range + of values across al matching records), 'all' (include + all elements), or 'no' (don't merge; this is the + default); + + + + + + @@ -223,106 +314,428 @@ - + + + EXAMPLE + Below is a working example configuration: + + + + + + + + + + + + + + + + + + + + + + + + + + +]]> + + + + TARGET SETTINGS + + Pazpar2 features a cunning scheme by which you can associate various + kinds of attributes, or settings with search targets. This can be done + through XML files which are read at startup; each file can associate + one or more settings with one or more targets. The file format is generic + in nature, designed to support a wide range of application requirements. The + settings can be purely technical things, like, how to perform a title + search against a given target, or it can associate arbitrary name=value + pairs with groups of targets -- for instance, if you would like to + place all commercial full-text bases in one group for selection + purposes, or you would like to control what targets are accessible + to users by default. + + + + During startup, pazpar2 will recursively read a specified directory + (can be identified in the pazpar2.cfg file or on the command line), and + process any settings files found therein. + + + + Clients of the pazpar2 webservice interface can selectively override + settings for individual targets within the scope of one session. This + can be used in conjunction with an external authentication system to + determine which resources are to be accessible to which users. Pazpar2 + itself has no notion of end-users, and so can be used in conjunction + with any type of authentication system. Similarly, the authentication + tokens submitted to access-controlled search targets can similarly be + overriden, to allow use of pazpar2 in a consortial or multi-library + environment, where different end-users may need to be represented to + some search targets in different ways. This, again, can be managed + using an external database or other lookup mechanism. Setting overrides + can be performed either using the 'init' or the 'settings' webservice + command (see XXX ref to pazpar2 protocol). + + + + In fact, every setting that applies to a database (except pz:id, which + can only be used for filtering targets to use for a search) can be overriden + on a per-session basis. This allows the client to override specific CCL fields + for searching, etc., to meet the needs of a session or user. + + + + Finally, as an extreme case of this, the webservice client can + introduce entirely new targets, on the fly, as part of the init or + settings command. This is useful if you desire to manage information + about your search targets in a separate application such as a database. + You do not need any static settings file whatsoever to run pazpar2 -- as + long as the webservice client is prepared to supply the necessary + information at the beginning of every session. + + + + NOTE: The following discussion of practical issues related to session and settings + management are cast in terms of a user interface based on Ajax/Javascript + technology. It would apply equally well to many other kinds of browser-based logic. + + + + Typically, a Javascript client is not allowed to directly alter the parameters + of a session. There are two reasons for this. One has to do with access + to information; typically, information about a user will be stored in a + system on the server side, or it will be accessible in some way from the server. + However, since the Javascript client cannot be entirely trusted (some hostile + agent might in fact 'pretend' to be a regular ws client), it is more robust + to control session sesttings from scripting that you run as part of your + webserver. Typically, this can be handled during the session initialization, + as follows: + + + + Step 1: The Javascript client loads, and asks the webserver for a new pazpar2 + session ID. This can be done using a Javascript call, for instance. Note that + it is possible to submit Ajax HTTPXmlRequest calls either to pazpar2 or to the + webserver that pazpar2 is proxying for. See (XXX Insert link to pazpar2 protocol). + + + + Step 2: Code on the webserver authenticates the user, by database lookup, + LDAP access, NCIP, etc. Determines which resources the user has access to, + and any user-specific parameters that are to be applied during this session. + + + + Step 3: The webserver initializes a new pazpar2 settings, and sets user-specific + parameters as necessary, using the init webservice command. A new session ID is + returned. + + + + Step 4: The webserver returns this session ID to the Javascript client, which then + uses the session ID to submit searches, show results, etc. + + + + Step 5: When the Javascript client ceases to use the session, pazpar2 destroys + any session-specific information. + + + SETTINGS FILE FORMAT + + Each file contains a root element named <settings>. It may + contain one or more <set> elements. The settings and set + elements may contain the following attributes. Attributes in the set node + overrides those in the setting root element. Each set node must + specify (directly, or inherited from the parent node) at least a + target, name, and value. + + + + + target + + + This specifies the search target to which this setting should be + applied. Targets are identified by their Z39.50 URL, generally + including the host, port, and database name, (e.g. + bagel.indexdata.com:210/marc). Two wildcard forms are accepted: + * (asterisk) matches all known targets; + bagel.indexdata.com:210/* matches all known databases on the given + host. + + + A precedence system determines what happens if there are + overlapping values for the same setting name for the same + target. A setting for a specific target name overrides a + setting whch specifies target using a wildcard. This makes it + easy to set defaults for all targets, and then override them + for specific targets or hosts. If there are + multiple overlapping settings with the same name and target + value, the 'precedence' attribute determines what happens. + + + + + name + + + The name of the setting. This can be anything you like. + However, pazpar2 reserves a number of setting names for + specific purposes, all starting with 'pz:', and it is a good + idea to avoid that prefix if you make up your own setting + names. See below for a list of reserved variables. + + + + + value + + + The value of the setting. Generally, this can be anything you + want -- however, some of the reserved settings may expect + specific kinds of values. + + + + + precedence + + + This should be an integer. If not provided, the default value + is 0. If two (or more) settings have the same content for + target and name, the precedence value determines the outcome. + If both settings have the same precedence value, they are both + applied to the target(s). If one has a higher value, then the + value of that setting is applied, and the other one is ignored. + + + + + - At the moment, this directive is ignored; there is one global - CCL-mapping file which governs the mapping of queries to Z39.50 - type-1. This file is located in etc/default.bib. This will change - shortly. + By setting defaults for target, name, or value in the root + settings node, you can use the settings files in many different + ways. For instance, you can use a single file to set defaults for + many different settings, like search fields, retrieval syntaxes, + etc. You can have one file per server, which groups settings for + that server or target. You could also have one file which associates + a number of targets with a given setting, for instance, to associate + many databases with a given category or class that makes sense + within your application. - - - Note: In the present version, there is a single retrieval - profile. However, in a future release, it will be possible to - associate unique retrieval profiles with different targets, or to - generate retrieval profiles using XSLT from the ZeeRex description of - a target. + The following examples illustrate uses of the settings system to + associate settings with targets to meet different requirements. - + - The following data elements are recognized for the retrievalprofile - directive: + The example below associates a set of default values that can be + used across many targets. Note the wildcard for targets. + This associates the given settings with all targets for which no + other information is provided. + + + + + + + + + + + + + + + + + + + + + + + + + + + + ]]> + + + + The next example shows certain settings overriden for one target, + one which returns XML records containing DublinCore elements, and + which furthermore requires a username/password. + + + + + + + + ]]> + + + + The following example associates a specific name/value combination + with a number of targets. The targets below are access-restricted, + and can only be used by users with special credentials. + + + + + ]]> - + + + + RESERVED SETTING NAMES + + The following setting names are reserved by pazpar2 to control the + behavior of the client function. + + - requestsyntax + + pz:cclmap:xxx - This element specifies the request syntax to be used in queries. It only - makes sense for Z39.50-type targets. + This establishes a CCL field definition or other setting, for + the purpose of mapping end-user queries. XXX is the field or + setting name, and the value of the setting provides parameters + (e.g. parameters to send to the server, etc.). Please consult + the YAZ manual for a full overview of the many capabilities of + the powerful and flexible CCL parser. + + + Note that it is easy to etablish a set of default parameters, + and then override them individually for a given target. - - nativesyntax + + pz:requestsyntax - This element specifies the native syntax and encoding of the - result records. The default is XML. The following attributes - are defined: + This specifies the record syntax to use when requesting + records from a given server. The value can be a symbolic name like + marc21 or xml, or it can be a Z39.50-style dot-separated OID. + + + + + pz:elements + + + The element set name to be used when retrieving records from a + server (not yet implemented). + + + + + pz:piggyback + + + Piggybacking enables the server to retrieve records from the + server as part of the search response in Z39.50. Almost all + servers support this (or fail it gracefully), but a few + servers will produce undesirable results. + Set to '1' to enable piggybacking, '0' to disable it. Default + is 1 (piggybacking enabled). + + + + + pz:nativesyntax + + + The representation (syntax) of the retrieval records. Currently + recognized values are iso2709 and xml. + + + For iso2709, can also specify a native character set, e.g. "iso2709;latin-1". + If no character set is provided, MARC-8 is assumed. + + + + + pz:xslt + + + Provides the path of an XSLT stylesheet which will be used to + map incoming records to the internal representation. + + + + + pz:authentication + + + Sets an authentication string for a given server. See the section on + authorization and authentication for discussion. + + + + + pz:allow + + + Allows or denies access to the resources it is applied to. Possible + values are '0' and '1'. The default is '1' (allow access to this resource). + See the manual section on authorization and authentication for discussion + about how to use this setting. + + + + + pz:maxrecs + + + Controls the maximum number of records to be retrieved from a + server. The default is 100 (not yet implemented). - - name - - - The name of the syntax. Currently recognized values are - 'iso2709' (MARC), and 'xml'. - - - - - format - - - The format, or schema, to be expected. Default is - 'marc21'. - - - - - encoding - - - The encoding of the response record. Typical values for - MARC records are 'marc8' (general MARC-8), 'marc8s' - (MARC-8, but maps to precomposed UTF-8 characters, more - suitable for use in web browsers), 'latin1'. - - - - - mapto - - - Specifies the flavor of MARCXML to map results to. - Default is 'marcxml'. 'marcxchange' is also possible, and - useful for Danish DANMARC records. - - - - - + + pz:id + + + This setting can't be 'set' -- it contains the ID (normally + ZURL) for a given target, and is useful for filtering -- + specifically when you want to select one or more specific + targets in the search command. + + + + - - - OPTIONS - - - - EXAMPLES - - - - FILES - -