X-Git-Url: http://jsfdemo.indexdata.com/?a=blobdiff_plain;f=doc%2Fpazpar2_conf.xml;h=5e8ef1a81e20d7a26ff97f58a6f43a7fb238e631;hb=cce5da278d7ce802452bbc14b4c2d577f638291b;hp=6db09992ecb7a94b483112fb1aca243bcb2c681f;hpb=b7b3b09b5bf04a832b9602d4717d7e1eb512079c;p=pazpar2-moved-to-github.git diff --git a/doc/pazpar2_conf.xml b/doc/pazpar2_conf.xml index 6db0999..5e8ef1a 100644 --- a/doc/pazpar2_conf.xml +++ b/doc/pazpar2_conf.xml @@ -1,3 +1,4 @@ + %entities; - - %common; + + %idcommon; ]> - Pazpar2 @@ -31,290 +31,322 @@ DESCRIPTION - - The pazpar2 configuration file, together with any referenced XSLT files, - govern pazpar2's behavior as a client, and control the normalization and - extraction of data elements from incoming result records, for the - purposes of merging, sorting, facet analysis, and display. - - - - The file is specified using the option -f on the pazpar2 command line. - There is not presently a way to reload the configuration file without - restarting pazpar2, although this will most likely be added some time - in the future. - + + The Pazpar2 configuration file, together with any referenced XSLT files, + govern Pazpar2's behavior as a client, and control the normalization and + extraction of data elements from incoming result records, for the + purposes of merging, sorting, facet analysis, and display. + + + + The file is specified using the option -f on the Pazpar2 command line. + There is not presently a way to reload the configuration file without + restarting Pazpar2, although this will most likely be added some time + in the future. + - + FORMAT + + The configuration file is XML-structured. It must be valid XML. All + elements specific to Pazpar2 should belong to the namespace + http://www.indexdata.com/pazpar2/1.0 + (this is assumed in the + following examples). The root element is named pazpar2. + Under the root element are a number of elements which group categories of + information. The categories are described below. + + + server - The configuration file is XML-structured. It must be valid XML. All - elements specific to pazpar2 should belong to the namespace - "http://www.indexdata.com/pazpar2/1.0" (this is assumed in the - following examples). The root element is named 'pazpar2'. Under the - root element are a number of elements which group categories of - information. The categories are described below. - + This section governs overall behavior of the client. The data + elements are described below. + + + + listen + + + Configures the webservice -- this controls how you can connect + to Pazpar2 from your browser or server-side code. The + attributes 'host' and 'port' control the binding of the + server. The 'host' attribute can be used to bind the server to + a secondary IP address of your system, enabling you to run + Pazpar2 on port 80 alongside a conventional web server. You + can override this setting on the command line using the option -h. + + + + + + proxy + + + If this item is given, Pazpar2 will forward all incoming HTTP + requests that do not contain the filename 'search.pz2' to the + host and port specified using the 'host' and 'port' + attributes. The 'myurl' attribute is required, and should provide + the base URL of the server. Generally, the HTTP URL for the host + specified in the 'listen' parameter. This functionality is + crucial if you wish to use + Pazpar2 in conjunction with browser-based code (JS, Flash, + applets, etc.) which operates in a security sandbox. Such code + can only connect to the same server from which the enclosing + HTML page originated. Pazpar2s proxy functionality enables you + to host all of the main pages (plus images, CSS, etc) of your + application on a conventional webserver, while efficiently + processing webservice requests for metasearch status, results, + etc. + + + + + + relevance + + + Specifies ICU tokenization and transformation rules + for tokens that are used in Pazpar2's relevance ranking. The 'id' + attribute is currently not used, and the 'locale' + attribute must be set to one of the locale strings + defined in ICU. The child elements listed below can be + in any order, except the 'index' element which logically + belongs to the end of the list. The stated tokenization, + transformation and charmapping instructions are performed + in order from top to bottom. + + + casemap + + + The attribute 'rule' defines the direction of the + per-character casemapping, allowed values are "l" + (lower), "u" (upper), "t" (title). + + + + transform + + + Normalization and transformation of tokens follows + the rules defined in the 'rule' attribute. For + possible values we refer to the extensive ICU + documentation found at the + ICU + transformation home page. Set filtering + principles are explained at the + ICU set and + filtering page. + + + + tokenize + + + Tokenization is the only rule in the ICU chain + which splits one token into multiple tokens. The + 'rule' attribute may have the following values: + "s" (sentence), "l" (line-break), "w" (word), and + "c" (character), the later probably not being + very useful in a pruning Pazpar2 installation. + + + + + + - server + + sort + + + Specifies ICU tokenization and transformation rules + for tokens that are used in Pazpar2's sorting. The contents + is similar to that of relevance. + + + + + + mergekey + + + Specifies ICU tokenization and transformation rules + for tokens that are used in Pazpar2's mergekey. The contents + is similar to that of relevance. + + + + + + service + - This section governs overall behavior of the client. The data - elements are described below. + This nested element controls the behavior of Pazpar2 with + respect to your data model. In Pazpar2, incoming records are + normalized, using XSLT, into an internal representation. + The 'service' section controls the further processing and + extraction of data from the internal representation, primarily + through the 'metadata' sub-element. - - - listen - + + + metadata + + + One of these elements is required for every data element in + the internal representation of the record (see + . It governs + subsequent processing as pertains to sorting, relevance + ranking, merging, and display of data elements. It supports + the following attributes: + + + + name + - Configures the webservice -- this controls how you can connect - to pazpar2 from your browser or server-side code. The - attributes 'host' and 'port' control the binding of the - server. The 'host' attribute can be used to bind the server to - a secondary IP address of your system, enabling you to run - pazpar2 on port 80 alongside a conventional web server. You - can override this setting on the command lineusing the option -h. + This is the name of the data element. It is matched + against the 'type' attribute of the + 'metadata' element + in the normalized record. A warning is produced if + metadata elements with an unknown name are + found in the + normalized record. This name is also used to + represent + data elements in the records returned by the + webservice API, and to name sort lists and browse + facets. - - - - - proxy - + + + + type + - If this item is given, pazpar2 will forward all incoming HTTP - requests that do not contain the filename 'search.pz2' to the - host and port specified using the 'host' and 'port' - attributes. The 'myurl' attribute is required, and should provide - the base URL of the server. Generally, the HTTP URL for the host - specified in the 'listen' parameter. This functionality is - crucial if you wish to use - pazpar2 in conjunction with browser-based code (JS, Flash, - applets, etc.) which operates in a security sandbox. Such code - can only connect to the same server from which the enclosing - HTML page originated. Pazpar2s proxy functionality enables you - to host all of the main pages (plus images, CSS, etc) of your - application on a conventional webserver, while efficiently - processing webservice requests for metasearch status, results, - etc. + The type of data element. This value governs any + normalization or special processing that might take + place on an element. Possible values are 'generic' + (basic string), 'year' (a range is computed if + multiple years are found in the record). Note: This + list is likely to increase in the future. - - - - - zproxy - + + + + brief + - If this item is given, pazpar2 will send all Z39.50 - packages through this Z39.50 proxy server. - At least one of the 'host' and 'post' attributes is required. - The 'host' attribute may contain both host name and port - number, seperated by a colon ':', or only the host name. - An empty 'host' attribute sets the Z39.50 host address - to 'localhost'. + If this is set to 'yes', then the data element is + includes in brief records in the webservice API. Note + that this only makes sense for metadata elements that + are merged (see below). The default value is 'no'. - - - - - icu_chain - + + + + sortkey + - Definition of ICU tokenization and normalization rules - are used if ICU support is compiled in. The 'id' - attribute is currently not used, and the 'locale' - attribute must be set to one of the locale strings - defined in ICU. The child elements listed below can be - in any order, except the 'index' element which logically - belongs to the end of the list. The stated tokenization, - normalization and charmapping instructions are performed - in order from top to bottom. + Specifies that this data element is to be used for + sorting. The possible values are 'numeric' (numeric + value), 'skiparticle' (string; skip common, leading + articles), and 'no' (no sorting). The default value is + 'no'. - - casemap - - - The attribure 'rule' defines the direction of the - per-character casemapping, allowed values are "l" - (lower), "u" (upper), "t" (title). - - - - normalize - - - Normalization and transformation of tokens follows - the rules defined in the 'rule' attribute. For - possible values we refer to the extensive ICU - documentation found at the - ICU - transformation home page. Set filtering - principles are explained at the - ICU set and - filtering page. - - - - tokenize - - - Tokenization is the only rule in the ICU chain - which splits one token into multiple tokens. The - 'rule' attribute may have the following values: - "s" (sentence), "l" (line-break), "w" (word), and - "c" (character), the later probably not beeing - very useful in a runing pazpar2 installation. - - - - index - - - Finally the 'index' element instruction - without - any 'rule' attribute - is used to store the tokens - after chain processing in the relevance ranking - unit of Pazpar2. It will always be the last - instruction in the chain. - - - - - - - - - service - + + + + rank + - This nested element controls the behavior of pazpar2 with - respect to your data model. In pazpar2, incoming records are - normalized, using XSLT, into an internal representation. - The 'service' section controls the further processing and - extraction of data from the internal representation, primarily - through the 'metdata' sub-element. + Specifies that this element is to be used to + help rank + records against the user's query (when ranking is + requested). The value is an integer, used as a + multiplier against the basic TF*IDF score. A value of + 1 is the base, higher values give additional + weight to + elements of this type. The default is '0', which + excludes this element from the rank calculation. - - - metadata - - - One of these elements is required for every data element in - the internal representation of the record (see - . It governs - subsequent processing as pertains to sorting, relevance - ranking, merging, and display of data elements. It supports - the following attributes: - - - - name - - - This is the name of the data element. It is matched - against the 'type' attribute of the - 'metadata' element - in the normalized record. A warning is produced if - metdata elements with an unknown name are - found in the - normalized record. This name is also used to - represent - data elements in the records returned by the - webservice API, and to name sort lists and browse - facets. - - - - - type - - - The type of data element. This value governs any - normalization or special processing that might take - place on an element. Possible values are 'generic' - (basic string), 'year' (a range is computed if - multiple years are found in the record). Note: This - list is likely to increase in the future. - - - - - brief - - - If this is set to 'yes', then the data element is - includes in brief records in the webservice API. Note - that this only makes sense for metadata elements that - are merged (see below). The default value is 'no'. - - - - - sortkey - - - Specifies that this data element is to be used for - sorting. The possible values are 'numeric' (numeric - value), 'skiparticle' (string; skip common, leading - articles), and 'no' (no sorting). The default value is - 'no'. - - - - - rank - - - Specifies that this element is to be used to - help rank - records against the user's query (when ranking is - requested). The value is an integer, used as a - multiplier against the basic TF*IDF score. A value of - 1 is the base, higher values give additional - weight to - elements of this type. The default is '0', which - excludes this element from the rank calculation. - - - - - termlist - - - Specifies that this element is to be used as a - termlist, or browse facet. Values are tabulated from - incoming records, and a highscore of values (with - their associated frequency) is made available to the - client through the webservice API. - The possible values - are 'yes' and 'no' (default). - - - - - merge - - - This governs whether, and how elements are extracted - from individual records and merged into cluster - records. The possible values are: 'unique' (include - all unique elements), 'longest' (include only the - longest element (strlen), 'range' (calculate a range - of values across al matching records), 'all' (include - all elements), or 'no' (don't merge; this is the + + + + termlist + + + Specifies that this element is to be used as a + termlist, or browse facet. Values are tabulated from + incoming records, and a highscore of values (with + their associated frequency) is made available to the + client through the webservice API. + The possible values + are 'yes' and 'no' (default). + + + + + merge + + + This governs whether, and how elements are extracted + from individual records and merged into cluster + records. The possible values are: 'unique' (include + all unique elements), 'longest' (include only the + longest element (strlen), 'range' (calculate a range + of values across all matching records), 'all' (include + all elements), or 'no' (don't merge; this is the default); - - - - + + + - - - - - - - + mergekey + + + If set to yes, the value of this + metadata element is appended to the resulting mergekey. + By default metadata is not part of a mergekey. + + + - + + setting + + + This attribute allows you to make use of static database + settings in the processing of records. Three possible values + are allowed. 'no' is the default and doesn't do anything. + 'postproc' copies the value of a setting with the same name + into the output of the normalization stylesheet(s). 'parameter' + makes the value of a setting with the same name available + as a parameter to the normalization stylesheet, so you + can further process the value inside of the stylesheet, or use + the value to decide how to deal with other data values. + + + + The purpose of using settings in this way can either be to + control the behavior of normalization stylesheet in a database- + dependent way, or to easily make database-dependent values + available to display-logic in your user interface, without having + to implement complicated interactions between the user interface + and your configuration system. + + + + + + + + + + + + + EXAMPLE Below is a working example configuration: @@ -326,11 +358,6 @@ - - - - - - @@ -572,7 +603,10 @@ - + + + + @@ -582,13 +616,13 @@ ]]> - - - - The next example shows certain settings overriden for one target, - one which returns XML records containing DublinCore elements, and - which furthermore requires a username/password. - + + + The next example shows certain settings overridden for one target, + one which returns XML records containing DublinCore elements, and + which furthermore requires a username/password. + @@ -597,144 +631,232 @@ ]]> - - - - The following example associates a specific name/value combination - with a number of targets. The targets below are access-restricted, - and can only be used by users with special credentials. - + + + The following example associates a specific name/value combination + with a number of targets. The targets below are access-restricted, + and can only be used by users with special credentials. + ]]> + + + + + RESERVED SETTING NAMES + + The following setting names are reserved by Pazpar2 to control the + behavior of the client function. + + + + + pz:cclmap:xxx + + + This establishes a CCL field definition or other setting, for + the purpose of mapping end-user queries. XXX is the field or + setting name, and the value of the setting provides parameters + (e.g. parameters to send to the server, etc.). Please consult + the YAZ manual for a full overview of the many capabilities of + the powerful and flexible CCL parser. + + + Note that it is easy to establish a set of default parameters, + and then override them individually for a given target. + + + + + pz:requestsyntax + + + This specifies the record syntax to use when requesting + records from a given server. The value can be a symbolic name like + marc21 or xml, or it can be a Z39.50-style dot-separated OID. + + + + + pz:elements + + + The element set name to be used when retrieving records from a + server. + + + + + pz:piggyback + + + Piggybacking enables the server to retrieve records from the + server as part of the search response in Z39.50. Almost all + servers support this (or fail it gracefully), but a few + servers will produce undesirable results. + Set to '1' to enable piggybacking, '0' to disable it. Default + is 1 (piggybacking enabled). + + + + + pz:nativesyntax + + + The representation (syntax) of the retrieval records. Currently + recognized values are iso2709 and xml. + + + For iso2709, can also specify a native character set, e.g. "iso2709;latin-1". + If no character set is provided, MARC-8 is assumed. + + If pz:nativesyntax is not specified, pazpar2 will attempt to determine + the value based on the response from the server. + + + - + + pz:queryencoding + + + The encoding of the search terms that a target accepts. Most + targets do not honor UTF-8 in which case this needs to be specified. + Each term in a query will be converted if this setting is given. + + + - RESERVED SETTING NAMES + + pz:xslt + + + Provides the path of an XSLT stylesheet which will be used to + map incoming records to the internal representation. + + + + + pz:authentication + - The following setting names are reserved by pazpar2 to control the - behavior of the client function. + Sets an authentication string for a given server. See the section on + authorization and authentication for discussion. + + + + pz:allow + + + Allows or denies access to the resources it is applied to. Possible + values are '0' and '1'. The default is '1' (allow access to this resource). + See the manual section on authorization and authentication for discussion + about how to use this setting. + + + + + pz:maxrecs + + + Controls the maximum number of records to be retrieved from a + server. The default is 100. + + + + + pz:id + + + This setting can't be 'set' -- it contains the ID (normally + ZURL) for a given target, and is useful for filtering -- + specifically when you want to select one or more specific + targets in the search command. + + + + + pz:zproxy + + + The 'pz:zproxy' setting has the value syntax + 'host.internet.adress:port', it is used to tunnel Z39.50 + requests through the named Z39.50 proxy. + + + - - - pz:cclmap:xxx - - - This establishes a CCL field definition or other setting, for - the purpose of mapping end-user queries. XXX is the field or - setting name, and the value of the setting provides parameters - (e.g. parameters to send to the server, etc.). Please consult - the YAZ manual for a full overview of the many capabilities of - the powerful and flexible CCL parser. - - - Note that it is easy to etablish a set of default parameters, - and then override them individually for a given target. - - - - - pz:requestsyntax - - - This specifies the record syntax to use when requesting - records from a given server. The value can be a symbolic name like - marc21 or xml, or it can be a Z39.50-style dot-separated OID. - - - - - pz:elements - - - The element set name to be used when retrieving records from a - server (not yet implemented). - - - - - pz:piggyback - - - Piggybacking enables the server to retrieve records from the - server as part of the search response in Z39.50. Almost all - servers support this (or fail it gracefully), but a few - servers will produce undesirable results. - Set to '1' to enable piggybacking, '0' to disable it. Default - is 1 (piggybacking enabled). - - - - - pz:nativesyntax - - - The representation (syntax) of the retrieval records. Currently - recognized values are iso2709 and xml. - - - For iso2709, can also specify a native character set, e.g. "iso2709;latin-1". - If no character set is provided, MARC-8 is assumed. - - - - - pz:xslt - - - Provides the path of an XSLT stylesheet which will be used to - map incoming records to the internal representation. - - - - - pz:authentication - - - Sets an authentication string for a given server. See the section on - authorization and authentication for discussion. - - - - - pz:allow - - - Allows or denies access to the resources it is applied to. Possible - values are '0' and '1'. The default is '1' (allow access to this resource). - See the manual section on authorization and authentication for discussion - about how to use this setting. - - - - - pz:maxrecs - - - Controls the maximum number of records to be retrieved from a - server. The default is 100 (not yet implemented). - - - - - pz:id - - - This setting can't be 'set' -- it contains the ID (normally - ZURL) for a given target, and is useful for filtering -- - specifically when you want to select one or more specific - targets in the search command. - - - - - + + pz:apdulog + + + If the 'pz:apdulog' setting is defined and has other value than 0, + then Z39.50 APDUs are written to the log. + + + + + + pz:sru + + + This setting enables SRU/SRW support. It has three possible settings. + 'get', enables SRU access through GET requests. 'post' enables SRU/POST + support, less commonly supported, but useful if very large requests are + to be submitted. 'srw' enables the SRW variation of the protocol. + + + + + + pz:sru_version + + + This allows SRU version to be specified. If unset Pazpar2 + will the default of YAZ (currently 1.2). Should be set + to 1.1 or 1.2. + + + + + + pz:pqf_prefix + + + Allows you to specify an arbitrary PQF query language substring. The provided + string is prefixed the user's query after it has been normalized to PQF + internally in pazpar2. This allows you to attach complex 'filters' to + queries for a gien target, sometimes necessary to select sub-catalogs + in union catalog systems, etc. + + + + + + SEE ALSO + + + pazpar2 + 8 + + + yaz-icu + 1 + + + pazpar2_protocol + 7 + + +