X-Git-Url: http://jsfdemo.indexdata.com/?a=blobdiff_plain;f=doc%2Ftools.xml;h=5b09ab305811d81f111927be7cb4b24edbf3e70f;hb=06fcbb2ac6af85e8a659ff9b000e16455a5cb847;hp=8b3fe80fe73a0166080fc9f1d9d3f3973d6c2b65;hpb=0f72f09a46621eb0aa9960b990dd35c221333e4d;p=yaz-moved-to-github.git
diff --git a/doc/tools.xml b/doc/tools.xml
index 8b3fe80..5b09ab3 100644
--- a/doc/tools.xml
+++ b/doc/tools.xml
@@ -1,675 +1,2302 @@
-
-Supporting Tools
-
-
-In support of the service API - primarily the ASN module, which
-provides the programmatic interface to the Z39.50 APDUs, YAZ contains
-a collection of tools that support the development of applications.
-
-
-Query Syntax Parsers
-
-
-Since the type-1 (RPN) query structure has no direct, useful string
-representation, every origin application needs to provide some form of
-mapping from a local query notation or representation to a
-Z_RPNQuery structure. Some programmers will prefer to
-construct the query manually, perhaps using odr_malloc()
-to simplify memory management. The &yaz; distribution includes two separate,
-query-generating tools that may be of use to you.
-
-
-Prefix Query Format
-
-
-Since RPN or reverse polish notation is really just a fancy way of
-describing a suffix notation format (operator follows operands), it
-would seem that the confusion is total when we now introduce a prefix
-notation for RPN. The reason is one of simple laziness - it's somewhat
-simpler to interpret a prefix format, and this utility was designed
-for maximum simplicity, to provide a baseline representation for use
-in simple test applications and scripting environments (like Tcl). The
-demonstration client included with YAZ uses the PQF.
-
-
-The PQF is defined by the pquery module in the YAZ library. The
-pquery.h file provides the declaration of the functions
-
-
-Z_RPNQuery *p_query_rpn (ODR o, oid_proto proto, const char *qbuf);
-
-Z_AttributesPlusTerm *p_query_scan (ODR o, oid_proto proto,
- Odr_oid **attributeSetP, const char *qbuf);
-
-int p_query_attset (const char *arg);
-
-
-The function p_query_rpn() takes as arguments an
-&odr; stream (see section The ODR Module)
-to provide a memory source (the structure created is released on
-the next call to odr_reset() on the stream), a
-protocol identifier (one of the constants PROTO_Z3950 and
-PROTO_SR), an attribute set
-reference, and finally a null-terminated string holding the query
-string.
-
-
-If the parse went well, p_query_rpn() returns a
-pointer to a Z_RPNQuery structure which can be
-placed directly into a Z_SearchRequest.
-
-
-
-The p_query_attset specifies which attribute set to use if
-the query doesn't specify one by the @attrset operator.
-The p_query_attset returns 0 if the argument is a
-valid attribute set specifier; otherwise the function returns -1.
-
-
-
-The grammar of the PQF is as follows:
-
-
-
-Query ::= [ AttSet ] QueryStruct.
-
-AttSet ::= string.
-
-QueryStruct ::= { Attribute } Simple | Complex.
-
-Attribute ::= '@attr' AttributeType '=' AttributeValue.
-
-AttributeType ::= integer.
-
-AttributeValue ::= integer.
-
-Complex ::= Operator QueryStruct QueryStruct.
-
-Operator ::= '@and' | '@or' | '@not' | '@prox' Proximity.
-
-Simple ::= ResultSet | Term.
-
-ResultSet ::= '@set' string.
-
-Term ::= string | '"' string '"'.
-
-Proximity ::= Exclusion Distance Ordered Relation WhichCode UnitCode.
-
-Exclusion ::= '1' | '0' | 'void'.
-
-Distance ::= integer.
-
-Ordered ::= '1' | '0'.
-
-Relation ::= integer.
-
-WhichCode ::= 'known' | 'private' | integer.
-
-UnitCode ::= integer.
-
-
-
-You will note that the syntax above is a fairly faithful
-representation of RPN, except for the Attibute, which has been
-moved a step away from the term, allowing you to associate one or more
-attributes with an entire query structure. The parser will
-automatically apply the given attributes to each term as required.
-
-
-
-The following are all examples of valid queries in the PQF.
-
-
-
-dylan
-
-"bob dylan"
-
-@or "dylan" "zimmerman"
-
-@set Result-1
-
-@or @and bob dylan @set Result-1
-
-@attr 4=1 @and @attr 1=1 "bob dylan" @attr 1=4 "slow train coming"
-
-@attr 4=1 @attr 1=4 "self portrait"
-
-@prox 0 3 1 2 k 2 dylan zimmerman
-
-
-
-Common Command Language
-
-
-Not all users enjoy typing in prefix query structures and numerical
-attribute values, even in a minimalistic test client. In the library
-world, the more intuitive Common Command Language (or ISO 8777) has
-enjoyed some popularity - especially before the widespread
-availability of graphical interfaces. It is still useful in
-applications where you for some reason or other need to provide a
-symbolic language for expressing boolean query structures.
-
-
-
-The EUROPAGATE research project working under the Libraries programme
-of the European Commission's DG XIII has, amongst other useful tools,
-implemented a general-purpose CCL parser which produces an output
-structure that can be trivially converted to the internal RPN
-representation of YAZ (The Z_RPNQuery structure).
-Since the CCL utility - along with the rest of the software
-produced by EUROPAGATE - is made freely available on a liberal license, it
-is included as a supplement to YAZ.
-
-
-CCL Syntax
-
-
-The CCL parser obeys the following grammar for the FIND argument.
-The syntax is annotated by in the lines prefixed by
-‐‐.
-
-
-
-CCL-Find ::= CCL-Find Op Elements
- | Elements.
-
-Op ::= "and" | "or" | "not"
--- The above means that Elements are separated by boolean operators.
-
-Elements ::= '(' CCL-Find ')'
- | Set
- | Terms
- | Qualifiers Relation Terms
- | Qualifiers Relation '(' CCL-Find ')'
- | Qualifiers '=' string '-' string
--- Elements is either a recursive definition, a result set reference, a
--- list of terms, qualifiers followed by terms, qualifiers followed
--- by a recursive definition or qualifiers in a range (lower - upper).
-
-Set ::= 'set' = string
--- Reference to a result set
-
-Terms ::= Terms Prox Term
- | Term
--- Proximity of terms.
-
-Term ::= Term string
- | string
--- This basically means that a term may include a blank
-
-Qualifiers ::= Qualifiers ',' string
- | string
--- Qualifiers is a list of strings separated by comma
-
-Relation ::= '=' | '>=' | '<=' | '<>' | '>' | '<'
--- Relational operators. This really doesn't follow the ISO8777
--- standard.
-
-Prox ::= '%' | '!'
--- Proximity operator
-
-
-
-
-The following queries are all valid:
-
-
-
-dylan
-
-"bob dylan"
-
-dylan or zimmerman
-
-set=1
-
-(dylan and bob) or set=1
-
-
-
-Assuming that the qualifiers ti, au
-and date are defined we may use:
-
-
-
-ti=self portrait
-
-au=(bob dylan and slow train coming)
-
-date>1980 and (ti=((self portrait)))
-
-
-
-
-CCL Qualifiers
-
-
-Qualifiers are used to direct the search to a particular searchable
-index, such as title (ti) and author indexes (au). The CCL standard
-itself doesn't specify a particular set of qualifiers, but it does
-suggest a few short-hand notations. You can customize the CCL parser
-to support a particular set of qualifiers to relect the current target
-profile. Traditionally, a qualifier would map to a particular
-use-attribute within the BIB-1 attribute set. However, you could also
-define qualifiers that would set, for example, the
-structure-attribute.
-
-
-
-Consider a scenario where the target support ranked searches in the
-title-index. In this case, the user could specify
-
-
->
-ti,ranked=knuth computer
-
-
-and the ranked would map to structure=free-form-text
-(4=105) and the ti would map to title (1=4).
-
-
-
-A "profile" with a set predefined CCL qualifiers can be read from a
-file. The YAZ client reads its CCL qualifiers from a file named
-default.bib. Each line in the file has the form:
-
-
-
-qualifier-name
- type=valtype=val ...
-
-
-
-where qualifier-name is the name of the
-qualifier to be used (eg. ti),
-type is a BIB-1 category type and
-val is the corresponding BIB-1 attribute value.
-The type can be either numeric or it may be
-either u (use), r (relation),
-p (position), s (structure),
-t (truncation) or c (completeness).
-The qualifier-nameterm has a
-special meaning. The types and values for this definition is used when
-no qualifiers are present.
-
-
-
-Consider the following definition:
-
-
-
-ti u=4 s=1
-au u=1 s=1
-term s=105
-
-
-Two qualifiers are defined, ti and au.
-They both set the structure-attribute to phrase (1). ti
-sets the use-attribute to 4. au sets the use-attribute
-to 1. When no qualifiers are used in the query the structure-attribute is
-set to free-form-text (105).
-
-
-
-CCL API
-
-All public definitions can be found in the header file
-ccl.h. A profile identifier is of type
-CCL_bibset. A profile must be created with the call to
-the function ccl_qual_mk which returns a profile
-handle of type CCL_bibset.
-
-
-
-To read a file containing qualifier definitions the function
-ccl_qual_file may be convenient. This function takes
-an already opened FILE handle pointer as argument
-along with a CCL_bibset handle.
-
-
-
-To parse a simple string with a FIND query use the function
-
-
- struct ccl_rpn_node *ccl_find_str (CCL_bibset bibset, const char *str,
- int *error, int *pos);
-
-
-which takes the CCL profile (bibset) and query
-(str) as input. Upon successful completion the RPN
-tree is returned. If an error eccur, such as a syntax error, the integer
-pointed to by error holds the error code and
-pos holds the offset inside query string in which
-the parsing failed.
-
-
-
-An english representation of the error may be obtained by calling
-the ccl_err_msg function. The error codes are listed in
-ccl.h.
-
-
-
-To convert the CCL RPN tree (type struct ccl_rpn_node *)
-to the Z_RPNQuery of YAZ the function ccl_rpn_query
-must be used. This function which is part of YAZ is implemented in
-yaz-ccl.c.
-After calling this function the CCL RPN tree is probably no longer
-needed. The ccl_rpn_delete destroys the CCL RPN tree.
-
-
-
-A CCL profile may be destroyed by calling the ccl_qual_rm
-function.
-
-
-
-The token names for the CCL operators may be changed by setting the
-globals (all type char *)
-ccl_token_and, ccl_token_or,
-ccl_token_not and ccl_token_set.
-An operator may have aliases, i.e. there may be more than one name for
-the operator. To do this, separate each alias with a space character.
-
-
-
-
-Object Identifiers
-
-
-The basic YAZ representation of an OID is an array of integers,
-terminated with the value -1. The &odr; module provides two
-utility-functions to create and copy this type of data elements:
-
-
-
- Odr_oid *odr_getoidbystr(ODR o, char *str);
-
-
-
-Creates an OID based on a string-based representation using dots (.)
-to separate elements in the OID.
-
-
-
-Odr_oid *odr_oiddup(ODR odr, Odr_oid *o);
-
-
-
-Creates a copy of the OID referenced by the o parameter.
-Both functions take an &odr; stream as parameter. This stream is used to
-allocate memory for the data elements, which is released on a
-subsequent call to odr_reset() on that stream.
-
-
-
-The OID module provides a higher-level representation of the
-family of object identifers which describe the Z39.50 protocol and its
-related objects. The definition of the module interface is given in
-the oid.h file.
-
-
-
-The interface is mainly based on the oident structure. The
-definition of this structure looks like this:
-
-
-
-typedef struct oident
-{
- oid_proto proto;
- oid_class oclass;
- oid_value value;
- int oidsuffix[OID_SIZE];
- char *desc;
-} oident;
-
-
-
-The proto field takes one of the values
-
-
-
-PROTO_Z3950
-PROTO_SR
-
-
-
-If you don't care about talking to SR-based implementations (few
-exist, and they may become fewer still if and when the ISO SR and ANSI
-Z39.50 documents are merged into a single standard), you can ignore
-this field on incoming packages, and always set it to PROTO_Z3950
-for outgoing packages.
-
-
-
-The oclass field takes one of the values
-
-
-
-CLASS_APPCTX
-CLASS_ABSYN
-CLASS_ATTSET
-CLASS_TRANSYN
-CLASS_DIAGSET
-CLASS_RECSYN
-CLASS_RESFORM
-CLASS_ACCFORM
-CLASS_EXTSERV
-CLASS_USERINFO
-CLASS_ELEMSPEC
-CLASS_VARSET
-CLASS_SCHEMA
-CLASS_TAGSET
-CLASS_GENERAL
-
-
-
-corresponding to the OID classes defined by the Z39.50 standard.
-
-Finally, the value field takes one of the values
-
-
-
-VAL_APDU
-VAL_BER
-VAL_BASIC_CTX
-VAL_BIB1
-VAL_EXP1
-VAL_EXT1
-VAL_CCL1
-VAL_GILS
-VAL_WAIS
-VAL_STAS
-VAL_DIAG1
-VAL_ISO2709
-VAL_UNIMARC
-VAL_INTERMARC
-VAL_CCF
-VAL_USMARC
-VAL_UKMARC
-VAL_NORMARC
-VAL_LIBRISMARC
-VAL_DANMARC
-VAL_FINMARC
-VAL_MAB
-VAL_CANMARC
-VAL_SBN
-VAL_PICAMARC
-VAL_AUSMARC
-VAL_IBERMARC
-VAL_EXPLAIN
-VAL_SUTRS
-VAL_OPAC
-VAL_SUMMARY
-VAL_GRS0
-VAL_GRS1
-VAL_EXTENDED
-VAL_RESOURCE1
-VAL_RESOURCE2
-VAL_PROMPT1
-VAL_DES1
-VAL_KRB1
-VAL_PRESSET
-VAL_PQUERY
-VAL_PCQUERY
-VAL_ITEMORDER
-VAL_DBUPDATE
-VAL_EXPORTSPEC
-VAL_EXPORTINV
-VAL_NONE
-VAL_SETM
-VAL_SETG
-VAL_VAR1
-VAL_ESPEC1
-
-
-
-again, corresponding to the specific OIDs defined by the standard.
-
-
-
-The desc field contains a brief, mnemonic name for the OID in question.
-
-
-
-The function
-
-
-
- struct oident *oid_getentbyoid(int *o);
-
-
-
-takes as argument an OID, and returns a pointer to a static area
-containing an oident structure. You typically use
-this function when you receive a PDU containing an OID, and you wish
-to branch out depending on the specific OID value.
-
-
-
-The function
-
-
-
- int *oid_ent_to_oid(struct oident *ent, int *dst);
-
-
-
-Takes as argument an oident structure - in which
-the proto, oclass/, and
-value fields are assumed to be set correctly -
-and returns a pointer to a the buffer as given by dst
-containing the base
-representation of the corresponding OID. The function returns
-NULL and the array dst is unchanged if a mapping couldn't place.
-The array dst should be at least of size
-OID_SIZE.
-
-
-
-The oid_ent_to_oid() function can be used whenever
-you need to prepare a PDU containing one or more OIDs. The separation of
-the protocol element from the remainer of the
-OID-description makes it simple to write applications that can
-communicate with either Z39.50 or OSI SR-based applications.
-
-
-
-The function
-
-
-<
- oid_value oid_getvalbyname(const char *name);
-
-
-
-takes as argument a mnemonic OID name, and returns the
-/value field of the first entry in the database that
-contains the given name in its desc field.
-
-
-
-Finally, the module provides the following utility functions, whose
-meaning should be obvious:
-
-
-
- void oid_oidcpy(int *t, int *s);
- void oid_oidcat(int *t, int *s);
- int oid_oidcmp(int *o1, int *o2);
- int oid_oidlen(int *o);
-
-
-
-
-The OID module has been criticized - and perhaps rightly so
-- for needlessly abstracting the
-representation of OIDs. Other toolkits use a simple
-string-representation of OIDs with good results. In practice, we have
-found the interface comfortable and quick to work with, and it is a
-simple matter (for what it's worth) to create applications compatible with
-both ISO SR and Z39.50. Finally, the use of the /oident
-database is by no means mandatory. You can easily create your
-own system for representing OIDs, as long as it is compatible with the
-low-level integer-array representation of the ODR module.
-
-
-
-
-
-Nibble Memory
-
-
-Sometimes when you need to allocate and construct a large,
-interconnected complex of structures, it can be a bit of a pain to
-release the associated memory again. For the structures describing the
-Z39.50 PDUs and related structures, it is convenient to use the
-memory-management system of the &odr; subsystem (see
-Using ODR). However, in some circumstances
-where you might otherwise benefit from using a simple nibble memory
-management system, it may be impractical to use
-odr_malloc() and odr_reset().
-For this purpose, the memory manager which also supports the &odr; streams
-is made available in the NMEM module. The external interface to this module is given in the nmem.h file.
-
-
-
-The following prototypes are given:
-
-
-
-NMEM nmem_create(void);
-void nmem_destroy(NMEM n);
-void *nmem_malloc(NMEM n, int size);
-void nmem_reset(NMEM n);
-int nmem_total(NMEM n);
-void nmem_init(void);
-
-
-
-The nmem_create() function returns a pointer to a
-memory control handle, which can be released again by
-nmem_destroy() when no longer needed.
-The function nmem_malloc() allocates a block of
-memory of the requested size. A call to nmem_reset() or
-nmem_destroy() will release all memory allocated on
-the handle since it was created (or since the last call to
-nmem_reset(). The function
-nmem_total() returns the number of bytes currently
-allocated on the handle.
-
-
-
-
-The nibble memory pool is shared amonst threads. POSIX
-mutex'es and WIN32 Critical sections are introduced to keep the
-module thread safe. On WIN32 function nmem_init()
-initialises the Critical Section handle and should be called once before any
-other nmem function is used.
-
-
-
-
-
\ No newline at end of file
+ Supporting Tools
+
+
+ In support of the service API - primarily the ASN module, which
+ provides the pro-grammatic interface to the Z39.50 APDUs, &yaz; contains
+ a collection of tools that support the development of applications.
+
+
+ Query Syntax Parsers
+
+
+ Since the type-1 (RPN) query structure has no direct, useful string
+ representation, every origin application needs to provide some form of
+ mapping from a local query notation or representation to a
+ Z_RPNQuery structure. Some programmers will prefer to
+ construct the query manually, perhaps using
+ odr_malloc() to simplify memory management.
+ The &yaz; distribution includes three separate, query-generating tools
+ that may be of use to you.
+
+
+ Prefix Query Format
+
+
+ Since RPN or reverse polish notation is really just a fancy way of
+ describing a suffix notation format (operator follows operands), it
+ would seem that the confusion is total when we now introduce a prefix
+ notation for RPN. The reason is one of simple laziness - it's somewhat
+ simpler to interpret a prefix format, and this utility was designed
+ for maximum simplicity, to provide a baseline representation for use
+ in simple test applications and scripting environments (like Tcl). The
+ demonstration client included with YAZ uses the PQF.
+
+
+
+
+ The PQF have been adopted by other parties developing Z39.50
+ software. It is often referred to as Prefix Query Notation
+ - PQN.
+
+
+
+ The PQF is defined by the pquery module in the YAZ library.
+ There are two sets of function that have similar behavior. First
+ set operates on a PQF parser handle, second set doesn't. First set
+ set of functions are more flexible than the second set. Second set
+ is obsolete and is only provided to ensure backwards compatibility.
+
+
+ First set of functions all operate on a PQF parser handle:
+
+
+ #include <yaz/pquery.h>
+
+ YAZ_PQF_Parser yaz_pqf_create (void);
+
+ void yaz_pqf_destroy (YAZ_PQF_Parser p);
+
+ Z_RPNQuery *yaz_pqf_parse (YAZ_PQF_Parser p, ODR o, const char *qbuf);
+
+ Z_AttributesPlusTerm *yaz_pqf_scan (YAZ_PQF_Parser p, ODR o,
+ Odr_oid **attributeSetId, const char *qbuf);
+
+
+ int yaz_pqf_error (YAZ_PQF_Parser p, const char **msg, size_t *off);
+
+
+ A PQF parser is created and destructed by functions
+ yaz_pqf_create and
+ yaz_pqf_destroy respectively.
+ Function yaz_pqf_parse parses query given
+ by string qbuf. If parsing was successful,
+ a Z39.50 RPN Query is returned which is created using ODR stream
+ o. If parsing failed, a NULL pointer is
+ returned.
+ Function yaz_pqf_scan takes a scan query in
+ qbuf. If parsing was successful, the function
+ returns attributes plus term pointer and modifies
+ attributeSetId to hold attribute set for the
+ scan request - both allocated using ODR stream o.
+ If parsing failed, yaz_pqf_scan returns a NULL pointer.
+ Error information for bad queries can be obtained by a call to
+ yaz_pqf_error which returns an error code and
+ modifies *msg to point to an error description,
+ and modifies *off to the offset within last
+ query were parsing failed.
+
+
+ The second set of functions are declared as follows:
+
+
+ #include <yaz/pquery.h>
+
+ Z_RPNQuery *p_query_rpn (ODR o, oid_proto proto, const char *qbuf);
+
+ Z_AttributesPlusTerm *p_query_scan (ODR o, oid_proto proto,
+ Odr_oid **attributeSetP, const char *qbuf);
+
+ int p_query_attset (const char *arg);
+
+
+ The function p_query_rpn() takes as arguments an
+ &odr; stream (see section The ODR Module)
+ to provide a memory source (the structure created is released on
+ the next call to odr_reset() on the stream), a
+ protocol identifier (one of the constants PROTO_Z3950 and
+ PROTO_SR), an attribute set reference, and
+ finally a null-terminated string holding the query string.
+
+
+ If the parse went well, p_query_rpn() returns a
+ pointer to a Z_RPNQuery structure which can be
+ placed directly into a Z_SearchRequest.
+ If parsing failed, due to syntax error, a NULL pointer is returned.
+
+
+ The p_query_attset specifies which attribute set
+ to use if the query doesn't specify one by the
+ @attrset operator.
+ The p_query_attset returns 0 if the argument is a
+ valid attribute set specifier; otherwise the function returns -1.
+
+
+
+ The grammar of the PQF is as follows:
+
+
+
+ query ::= top-set query-struct.
+
+ top-set ::= [ '@attrset' string ]
+
+ query-struct ::= attr-spec | simple | complex | '@term' term-type query
+
+ attr-spec ::= '@attr' [ string ] string query-struct
+
+ complex ::= operator query-struct query-struct.
+
+ operator ::= '@and' | '@or' | '@not' | '@prox' proximity.
+
+ simple ::= result-set | term.
+
+ result-set ::= '@set' string.
+
+ term ::= string.
+
+ proximity ::= exclusion distance ordered relation which-code unit-code.
+
+ exclusion ::= '1' | '0' | 'void'.
+
+ distance ::= integer.
+
+ ordered ::= '1' | '0'.
+
+ relation ::= integer.
+
+ which-code ::= 'known' | 'private' | integer.
+
+ unit-code ::= integer.
+
+ term-type ::= 'general' | 'numeric' | 'string' | 'oid' | 'datetime' | 'null'.
+
+
+
+ You will note that the syntax above is a fairly faithful
+ representation of RPN, except for the Attribute, which has been
+ moved a step away from the term, allowing you to associate one or more
+ attributes with an entire query structure. The parser will
+ automatically apply the given attributes to each term as required.
+
+
+
+ The @attr operator is followed by an attribute specification
+ (attr-spec above). The specification consists
+ of an optional attribute set, an attribute type-value pair and
+ a sub-query. The attribute type-value pair is packed in one string:
+ an attribute type, an equals sign, and an attribute value, like this:
+ @attr 1=1003.
+ The type is always an integer but the value may be either an
+ integer or a string (if it doesn't start with a digit character).
+ A string attribute-value is encoded as a Type-1 ``complex''
+ attribute with the list of values containing the single string
+ specified, and including no semantic indicators.
+
+
+
+ Version 3 of the Z39.50 specification defines various encoding of terms.
+ Use @term type
+ string,
+ where type is one of: general,
+ numeric or string
+ (for InternationalString).
+ If no term type has been given, the general form
+ is used. This is the only encoding allowed in both versions 2 and 3
+ of the Z39.50 standard.
+
+
+
+ Using Proximity Operators with PQF
+
+
+ This is an advanced topic, describing how to construct
+ queries that make very specific requirements on the
+ relative location of their operands.
+ You may wish to skip this section and go straight to
+ the example PQF queries.
+
+
+
+
+ Most Z39.50 servers do not support proximity searching, or
+ support only a small subset of the full functionality that
+ can be expressed using the PQF proximity operator. Be
+ aware that the ability to express a
+ query in PQF is no guarantee that any given server will
+ be able to execute it.
+
+
+
+
+
+ The proximity operator @prox is a special
+ and more restrictive version of the conjunction operator
+ @and. Its semantics are described in
+ section 3.7.2 (Proximity) of Z39.50 the standard itself, which
+ can be read on-line at
+
+
+
+ In PQF, the proximity operation is represented by a sequence
+ of the form
+
+@prox exclusiondistanceorderedrelationwhich-codeunit-code
+
+ in which the meanings of the parameters are as described in in
+ the standard, and they can take the following values:
+
+ exclusion
+ 0 = false (i.e. the proximity condition specified by the
+ remaining parameters must be satisfied) or
+ 1 = true (the proximity condition specified by the
+ remaining parameters must not be
+ satisifed).
+
+ distance
+ An integer specifying the difference between the locations
+ of the operands: e.g. two adjacent words would have
+ distance=1 since their locations differ by one unit.
+
+ ordered
+ 1 = ordered (the operands must occur in the order the
+ query specifies them) or
+ 0 = unordered (they may appear in either order).
+
+ relation
+ Recognised values are
+ 1 (lessThan),
+ 2 (lessThanOrEqual),
+ 3 (equal),
+ 4 (greaterThanOrEqual),
+ 5 (greaterThan) and
+ 6 (notEqual).
+
+ which-code
+ known
+ or
+ k
+ (the unit-code parameter is taken from the well-known list
+ of alternatives described in below) or
+ private
+ or
+ p
+ (the unit-code paramater has semantics specific to an
+ out-of-band agreement such as a profile).
+
+ unit-code
+ If the which-code parameter is known
+ then the recognised values are
+ 1 (character),
+ 2 (word),
+ 3 (sentence),
+ 4 (paragraph),
+ 5 (section),
+ 6 (chapter),
+ 7 (document),
+ 8 (element),
+ 9 (subelement),
+ 10 (elementType) and
+ 11 (byte).
+ If which-code is private then the
+ acceptable values are determined by the profile.
+
+
+ (The numeric values of the relation and well-known unit-code
+ parameters are taken straight from
+ the ASN.1 of the proximity structure in the standard.)
+
+
+
+ PQF queries
+
+
+ PQF queries using simple terms
+
+
+ dylan
+
+ "bob dylan"
+
+
+
+
+ PQF boolean operators
+
+
+ @or "dylan" "zimmerman"
+
+ @and @or dylan zimmerman when
+
+ @and when @or dylan zimmerman
+
+
+
+
+ PQF references to result sets
+
+
+ @set Result-1
+
+ @and @set seta @set setb
+
+
+
+
+ Attributes for terms
+
+
+ @attr 1=4 computer
+
+ @attr 1=4 @attr 4=1 "self portrait"
+
+ @attrset exp1 @attr 1=1 CategoryList
+
+ @attr gils 1=2008 Copenhagen
+
+ @attr 1=/book/title computer
+
+
+
+
+ PQF Proximity queries
+
+
+ @prox 0 3 1 2 k 2 dylan zimmerman
+
+
+ Here the parameters 0, 3, 1, 2, k and 2 represent exclusion,
+ distance, ordered, relation, which-code and unit-code, in that
+ order. So:
+
+
+ exclusion = 0: the proximity condition must hold
+
+
+ distance = 3: the terms must be three units apart
+
+
+ ordered = 1: they must occur in the order they are specified
+
+
+ relation = 2: lessThanOrEqual (to the distance of 3 units)
+
+
+ which-code is ``known'', so the standard unit-codes are used
+
+
+ unit-code = 2: word.
+
+
+ So the whole proximity query means that the words
+ dylan and zimmerman must
+ both occur in the record, in that order, differing in position
+ by three or fewer words (i.e. with two or fewer words between
+ them.) The query would find ``Bob Dylan, aka. Robert
+ Zimmerman'', but not ``Bob Dylan, born as Robert Zimmerman''
+ since the distance in this case is four.
+
+
+
+
+ PQF specification of search term type
+
+
+ @term string "a UTF-8 string, maybe?"
+
+
+
+
+ PQF mixed queries
+
+
+ @or @and bob dylan @set Result-1
+
+ @attr 4=1 @and @attr 1=1 "bob dylan" @attr 1=4 "slow train coming"
+
+ @and @attr 2=4 @attr gils 1=2038 -114 @attr 2=2 @attr gils 1=2039 -109
+
+
+
+ The last of these examples is a spatial search: in
+ the GILS attribute set,
+ access point
+ 2038 indicates West Bounding Coordinate and
+ 2030 indicates East Bounding Coordinate,
+ so the query is for areas extending from -114 degrees
+ to no more than -109 degrees.
+
+
+
+
+
+
+ CCL
+
+
+ Not all users enjoy typing in prefix query structures and numerical
+ attribute values, even in a minimalistic test client. In the library
+ world, the more intuitive Common Command Language - CCL (ISO 8777)
+ has enjoyed some popularity - especially before the widespread
+ availability of graphical interfaces. It is still useful in
+ applications where you for some reason or other need to provide a
+ symbolic language for expressing boolean query structures.
+
+
+
+ CCL Syntax
+
+
+ The CCL parser obeys the following grammar for the FIND argument.
+ The syntax is annotated by in the lines prefixed by
+ --.
+
+
+
+ CCL-Find ::= CCL-Find Op Elements
+ | Elements.
+
+ Op ::= "and" | "or" | "not"
+ -- The above means that Elements are separated by boolean operators.
+
+ Elements ::= '(' CCL-Find ')'
+ | Set
+ | Terms
+ | Qualifiers Relation Terms
+ | Qualifiers Relation '(' CCL-Find ')'
+ | Qualifiers '=' string '-' string
+ -- Elements is either a recursive definition, a result set reference, a
+ -- list of terms, qualifiers followed by terms, qualifiers followed
+ -- by a recursive definition or qualifiers in a range (lower - upper).
+
+ Set ::= 'set' = string
+ -- Reference to a result set
+
+ Terms ::= Terms Prox Term
+ | Term
+ -- Proximity of terms.
+
+ Term ::= Term string
+ | string
+ -- This basically means that a term may include a blank
+
+ Qualifiers ::= Qualifiers ',' string
+ | string
+ -- Qualifiers is a list of strings separated by comma
+
+ Relation ::= '=' | '>=' | '<=' | '<>' | '>' | '<'
+ -- Relational operators. This really doesn't follow the ISO8777
+ -- standard.
+
+ Prox ::= '%' | '!'
+ -- Proximity operator
+
+
+
+
+ CCL queries
+
+ The following queries are all valid:
+
+
+
+ dylan
+
+ "bob dylan"
+
+ dylan or zimmerman
+
+ set=1
+
+ (dylan and bob) or set=1
+
+
+
+ Assuming that the qualifiers ti,
+ au
+ and date are defined we may use:
+
+
+
+ ti=self portrait
+
+ au=(bob dylan and slow train coming)
+
+ date>1980 and (ti=((self portrait)))
+
+
+
+
+
+
+ CCL Qualifiers
+
+
+ Qualifiers are used to direct the search to a particular searchable
+ index, such as title (ti) and author indexes (au). The CCL standard
+ itself doesn't specify a particular set of qualifiers, but it does
+ suggest a few short-hand notations. You can customize the CCL parser
+ to support a particular set of qualifiers to reflect the current target
+ profile. Traditionally, a qualifier would map to a particular
+ use-attribute within the BIB-1 attribute set. It is also
+ possible to set other attributes, such as the structure
+ attribute.
+
+
+
+ A CCL profile is a set of predefined CCL qualifiers that may be
+ read from a file or set in the CCL API.
+ The YAZ client reads its CCL qualifiers from a file named
+ default.bib. There are four types of
+ lines in a CCL profile: qualifier specification,
+ qualifier alias, comments and directives.
+
+
+ Qualifier specification
+
+ A qualifier specification is of the form:
+
+
+
+ qualifier-name
+ [attributeset,]type=val
+ [attributeset,]type=val ...
+
+
+
+ where qualifier-name is the name of the
+ qualifier to be used (eg. ti),
+ type is attribute type in the attribute
+ set (Bib-1 is used if no attribute set is given) and
+ val is attribute value.
+ The type can be specified as an
+ integer or as it be specified either as a single-letter:
+ u for use,
+ r for relation,p for position,
+ s for structure,t for truncation
+ or c for completeness.
+ The attributes for the special qualifier name term
+ are used when no CCL qualifier is given in a query.
+
+ Common Bib-1 attributes
+
+
+
+
+
+ Type
+ Description
+
+
+
+
+ u=value
+
+ Use attribute (1). Common use attributes are
+ 1 Personal-name, 4 Title, 7 ISBN, 8 ISSN, 30 Date,
+ 62 Subject, 1003 Author), 1016 Any. Specify value
+ as an integer.
+
+
+
+
+ r=value
+
+ Relation attribute (2). Common values are
+ 1 <, 2 <=, 3 =, 4 >=, 5 >, 6 <>,
+ 100 phonetic, 101 stem, 102 relevance, 103 always matches.
+
+
+
+
+ p=value
+
+ Position attribute (3). Values: 1 first in field, 2
+ first in any subfield, 3 any position in field.
+
+
+
+
+ s=value
+
+ Structure attribute (4). Values: 1 phrase, 2 word,
+ 3 key, 4 year, 5 date, 6 word list, 100 date (un),
+ 101 name (norm), 102 name (un), 103 structure, 104 urx,
+ 105 free-form-text, 106 document-text, 107 local-number,
+ 108 string, 109 numeric string.
+
+
+
+
+ t=value
+
+ Truncation attribute (5). Values: 1 right, 2 left,
+ 3 left& right, 100 none, 101 process #, 102 regular-1,
+ 103 regular-2, 104 CCL.
+
+
+
+
+ c=value
+
+ Completeness attribute (6). Values: 1 incomplete subfield,
+ 2 complete subfield, 3 complete field.
+
+
+
+
+
+
+
+
+ Refer to or the complete
+ list of Bib-1 attributes
+
+
+ It is also possible to specify non-numeric attribute values,
+ which are used in combination with certain types.
+ The special combinations are:
+
+
+ Special attribute combos
+
+
+
+
+
+ Name
+ Description
+
+
+
+
+ s=pw
+ The structure is set to either word or phrase depending
+ on the number of tokens in a term (phrase-word).
+
+
+
+ s=al
+ Each token in the term is ANDed. (and-list).
+ This does not set the structure at all.
+
+
+
+ s=ol
+ Each token in the term is ORed. (or-list).
+ This does not set the structure at all.
+
+
+
+ r=o
+ Allows ranges and the operators greather-than, less-than, ...
+ equals.
+ This sets Bib-1 relation attribute accordingly (relation
+ ordered). A query construct is only treated as a range if
+ dash is used and that is surrounded by white-space. So
+ -1980 is treated as term
+ "-1980" not <= 1980.
+ If - 1980 is used, however, that is
+ treated as a range.
+
+
+
+ r=r
+ Similar to r=o but assumes that terms
+ are non-negative (not prefixed with -).
+ Thus, a dash will always be treated as a range.
+ The construct 1980-1990 is
+ treated as a range with r=r but as a
+ single term "1980-1990" with
+ r=o. The special attribute
+ r=r is available in YAZ 2.0.24 or later.
+
+
+
+ t=l
+ Allows term to be left-truncated.
+ If term is of the form ?x, the resulting
+ Type-1 term is x and truncation is left.
+
+
+
+ t=r
+ Allows term to be right-truncated.
+ If term is of the form x?, the resulting
+ Type-1 term is x and truncation is right.
+
+
+
+ t=n
+ If term is does not include ?, the
+ truncation attribute is set to none (100).
+
+
+
+ t=b
+ Allows term to be both left&right truncated.
+ If term is of the form ?x?, the
+ resulting term is x and trunctation is
+ set to both left&right.
+
+
+
+
+
+
+ CCL profile
+
+ Consider the following definition:
+
+
+
+ ti u=4 s=1
+ au u=1 s=1
+ term s=105
+ ranked r=102
+ date u=30 r=o
+
+
+ ti and au both set
+ structure attribute to phrase (s=1).
+ ti
+ sets the use-attribute to 4. au sets the
+ use-attribute to 1.
+ When no qualifiers are used in the query the structure-attribute is
+ set to free-form-text (105) (rule for term).
+ The date sets the relation attribute to
+ the relation used in the CCL query and sets the use attribute
+ to 30 (Bib-1 Date).
+
+
+ You can combine attributes. To Search for "ranked title" you
+ can do
+
+ ti,ranked=knuth computer
+
+ which will set relation=ranked, use=title, structure=phrase.
+
+
+ Query
+
+ date > 1980
+
+ is a valid query. But
+
+ ti > 1980
+
+ is invalid.
+
+
+
+
+ Qualifier alias
+
+ A qualifier alias is of the form:
+
+
+ q
+ q1q2 ..
+
+
+ which declares q to
+ be an alias for q1,
+ q2... such that the CCL
+ query q=x is equivalent to
+ q1=x or q2=x or ....
+
+
+
+
+ Comments
+
+ Lines with white space or lines that begin with
+ character # are treated as comments.
+
+
+
+
+ Directives
+
+ Directive specifications takes the form
+
+ @directivevalue
+
+
+ CCL directives
+
+
+
+
+
+
+ Name
+ Description
+ Default
+
+
+
+
+ truncation
+ Truncation character
+ ?
+
+
+ field
+ Specifies how multiple fields are to be
+ combined. There are two modes: or:
+ multiple qualifier fields are ORed,
+ merge: attributes for the qualifier
+ fields are merged and assigned to one term.
+
+ merge
+
+
+ case
+ Specificies if CCL operatores and qualifiers should be
+ compared with case sensitivity or not. Specify 0 for
+ case sensitive; 1 for case insensitive.
+ 0
+
+
+
+ and
+ Specifies token for CCL operator AND.
+ and
+
+
+
+ or
+ Specifies token for CCL operator OR.
+ or
+
+
+
+ not
+ Specifies token for CCL operator NOT.
+ not
+
+
+
+ set
+ Specifies token for CCL operator SET.
+ set
+
+
+
+
+
+
+
+ CCL API
+
+ All public definitions can be found in the header file
+ ccl.h. A profile identifier is of type
+ CCL_bibset. A profile must be created with the call
+ to the function ccl_qual_mk which returns a profile
+ handle of type CCL_bibset.
+
+
+
+ To read a file containing qualifier definitions the function
+ ccl_qual_file may be convenient. This function
+ takes an already opened FILE handle pointer as
+ argument along with a CCL_bibset handle.
+
+
+
+ To parse a simple string with a FIND query use the function
+
+
+struct ccl_rpn_node *ccl_find_str (CCL_bibset bibset, const char *str,
+ int *error, int *pos);
+
+
+ which takes the CCL profile (bibset) and query
+ (str) as input. Upon successful completion the RPN
+ tree is returned. If an error occur, such as a syntax error, the integer
+ pointed to by error holds the error code and
+ pos holds the offset inside query string in which
+ the parsing failed.
+
+
+
+ An English representation of the error may be obtained by calling
+ the ccl_err_msg function. The error codes are
+ listed in ccl.h.
+
+
+
+ To convert the CCL RPN tree (type
+ struct ccl_rpn_node *)
+ to the Z_RPNQuery of YAZ the function ccl_rpn_query
+ must be used. This function which is part of YAZ is implemented in
+ yaz-ccl.c.
+ After calling this function the CCL RPN tree is probably no longer
+ needed. The ccl_rpn_delete destroys the CCL RPN tree.
+
+
+
+ A CCL profile may be destroyed by calling the
+ ccl_qual_rm function.
+
+
+
+ The token names for the CCL operators may be changed by setting the
+ globals (all type char *)
+ ccl_token_and, ccl_token_or,
+ ccl_token_not and ccl_token_set.
+ An operator may have aliases, i.e. there may be more than one name for
+ the operator. To do this, separate each alias with a space character.
+
+
+
+ CQL
+
+ CQL
+ - Common Query Language - was defined for the
+ SRU protocol.
+ In many ways CQL has a similar syntax to CCL.
+ The objective of CQL is different. Where CCL aims to be
+ an end-user language, CQL is the protocol
+ query language for SRU.
+
+
+
+ If you are new to CQL, read the
+ Gentle Introduction.
+
+
+
+ The CQL parser in &yaz; provides the following:
+
+
+
+ It parses and validates a CQL query.
+
+
+
+
+ It generates a C structure that allows you to convert
+ a CQL query to some other query language, such as SQL.
+
+
+
+
+ The parser converts a valid CQL query to PQF, thus providing a
+ way to use CQL for both SRU servers and Z39.50 targets at the
+ same time.
+
+
+
+
+ The parser converts CQL to
+ XCQL.
+ XCQL is an XML representation of CQL.
+ XCQL is part of the SRU specification. However, since SRU
+ supports CQL only, we don't expect XCQL to be widely used.
+ Furthermore, CQL has the advantage over XCQL that it is
+ easy to read.
+
+
+
+
+ CQL parsing
+
+ A CQL parser is represented by the CQL_parser
+ handle. Its contents should be considered &yaz; internal (private).
+
+#include <yaz/cql.h>
+
+typedef struct cql_parser *CQL_parser;
+
+CQL_parser cql_parser_create(void);
+void cql_parser_destroy(CQL_parser cp);
+
+ A parser is created by cql_parser_create and
+ is destroyed by cql_parser_destroy.
+
+
+ To parse a CQL query string, the following function
+ is provided:
+
+int cql_parser_string(CQL_parser cp, const char *str);
+
+ A CQL query is parsed by the cql_parser_string
+ which takes a query str.
+ If the query was valid (no syntax errors), then zero is returned;
+ otherwise -1 is returned to indicate a syntax error.
+
+
+
+int cql_parser_stream(CQL_parser cp,
+ int (*getbyte)(void *client_data),
+ void (*ungetbyte)(int b, void *client_data),
+ void *client_data);
+
+int cql_parser_stdio(CQL_parser cp, FILE *f);
+
+ The functions cql_parser_stream and
+ cql_parser_stdio parses a CQL query
+ - just like cql_parser_string.
+ The only difference is that the CQL query can be
+ fed to the parser in different ways.
+ The cql_parser_stream uses a generic
+ byte stream as input. The cql_parser_stdio
+ uses a FILE handle which is opened for reading.
+
+
+
+ CQL tree
+
+ The the query string is valid, the CQL parser
+ generates a tree representing the structure of the
+ CQL query.
+
+
+
+struct cql_node *cql_parser_result(CQL_parser cp);
+
+ cql_parser_result returns the
+ a pointer to the root node of the resulting tree.
+
+
+ Each node in a CQL tree is represented by a
+ struct cql_node.
+ It is defined as follows:
+
+#define CQL_NODE_ST 1
+#define CQL_NODE_BOOL 2
+struct cql_node {
+ int which;
+ union {
+ struct {
+ char *index;
+ char *index_uri;
+ char *term;
+ char *relation;
+ char *relation_uri;
+ struct cql_node *modifiers;
+ } st;
+ struct {
+ char *value;
+ struct cql_node *left;
+ struct cql_node *right;
+ struct cql_node *modifiers;
+ } boolean;
+ } u;
+};
+
+ There are two node types: search term (ST) and boolean (BOOL).
+ A modifier is treated as a search term too.
+
+
+ The search term node has five members:
+
+
+
+ index: index for search term.
+ If an index is unspecified for a search term,
+ index will be NULL.
+
+
+
+
+ index_uri: index URi for search term
+ or NULL if none could be resolved for the index.
+
+
+
+
+ term: the search term itself.
+
+
+
+
+ relation: relation for search term.
+
+
+
+
+ relation_uri: relation URI for search term.
+
+
+
+
+ modifiers: relation modifiers for search
+ term. The modifiers list itself of cql_nodes
+ each of type ST.
+
+
+
+
+
+
+ The boolean node represents both and,
+ or, not as well as
+ proximity.
+
+
+
+ left and right: left
+ - and right operand respectively.
+
+
+
+
+ modifiers: proximity arguments.
+
+
+
+
+
+
+ CQL to PQF conversion
+
+ Conversion to PQF (and Z39.50 RPN) is tricky by the fact
+ that the resulting RPN depends on the Z39.50 target
+ capabilities (combinations of supported attributes).
+ In addition, the CQL and SRU operates on index prefixes
+ (URI or strings), whereas the RPN uses Object Identifiers
+ for attribute sets.
+
+
+ The CQL library of &yaz; defines a cql_transform_t
+ type. It represents a particular mapping between CQL and RPN.
+ This handle is created and destroyed by the functions:
+
+cql_transform_t cql_transform_open_FILE (FILE *f);
+cql_transform_t cql_transform_open_fname(const char *fname);
+void cql_transform_close(cql_transform_t ct);
+
+ The first two functions create a tranformation handle from
+ either an already open FILE or from a filename respectively.
+
+
+ The handle is destroyed by cql_transform_close
+ in which case no further reference of the handle is allowed.
+
+
+ When a cql_transform_t handle has been created
+ you can convert to RPN.
+
+int cql_transform_buf(cql_transform_t ct,
+ struct cql_node *cn, char *out, int max);
+
+ This function converts the CQL tree cn
+ using handle ct.
+ For the resulting PQF, you supply a buffer out
+ which must be able to hold at at least max
+ characters.
+
+
+ If conversion failed, cql_transform_buf
+ returns a non-zero SRU error code; otherwise zero is returned
+ (conversion successful). The meanings of the numeric error
+ codes are listed in the SRU specifications at
+
+
+
+ If conversion fails, more information can be obtained by calling
+
+int cql_transform_error(cql_transform_t ct, char **addinfop);
+
+ This function returns the most recently returned numeric
+ error-code and sets the string-pointer at
+ *addinfop to point to a string containing
+ additional information about the error that occurred: for
+ example, if the error code is 15 (``Illegal or unsupported context
+ set''), the additional information is the name of the requested
+ context set that was not recognised.
+
+
+ The SRU error-codes may be translated into brief human-readable
+ error messages using
+
+const char *cql_strerror(int code);
+
+
+
+ If you wish to be able to produce a PQF result in a different
+ way, there are two alternatives.
+
+void cql_transform_pr(cql_transform_t ct,
+ struct cql_node *cn,
+ void (*pr)(const char *buf, void *client_data),
+ void *client_data);
+
+int cql_transform_FILE(cql_transform_t ct,
+ struct cql_node *cn, FILE *f);
+
+ The former function produces output to a user-defined
+ output stream. The latter writes the result to an already
+ open FILE.
+
+
+
+ Specification of CQL to RPN mappings
+
+ The file supplied to functions
+ cql_transform_open_FILE,
+ cql_transform_open_fname follows
+ a structure found in many Unix utilities.
+ It consists of mapping specifications - one per line.
+ Lines starting with # are ignored (comments).
+
+
+ Each line is of the form
+
+ CQL pattern = RPN equivalent
+
+
+
+ An RPN pattern is a simple attribute list. Each attribute pair
+ takes the form:
+
+ [set] type=value
+
+ The attribute set is optional.
+ The type is the attribute type,
+ value the attribute value.
+
+
+ The character * (asterisk) has special meaning
+ when used in the RPN pattern.
+ Each occurrence of * is substituted with the
+ CQL matching name (index, relation, qualifier etc).
+ This facility can be used to copy a CQL name verbatim to the RPN result.
+
+
+ The following CQL patterns are recognized:
+
+
+ index.set.name
+
+
+
+ This pattern is invoked when a CQL index, such as
+ dc.title is converted. set
+ and name are the context set and index
+ name respectively.
+ Typically, the RPN specifies an equivalent use attribute.
+
+
+ For terms not bound by an index the pattern
+ index.cql.serverChoice is used.
+ Here, the prefix cql is defined as
+ http://www.loc.gov/zing/cql/cql-indexes/v1.0/.
+ If this pattern is not defined, the mapping will fail.
+
+
+ The pattern,
+ index.set.*
+ is used when no other index pattern is matched.
+
+
+
+
+ qualifier.set.name
+ (DEPRECATED)
+
+
+
+ For backwards compatibility, this is recognised as a synonym of
+ index.set.name
+
+
+
+
+ relation.relation
+
+
+
+ This pattern specifies how a CQL relation is mapped to RPN.
+ pattern is name of relation
+ operator. Since = is used as
+ separator between CQL pattern and RPN, CQL relations
+ including = cannot be
+ used directly. To avoid a conflict, the names
+ ge,
+ eq,
+ le,
+ must be used for CQL operators, greater-than-or-equal,
+ equal, less-than-or-equal respectively.
+ The RPN pattern is supposed to include a relation attribute.
+
+
+ For terms not bound by a relation, the pattern
+ relation.scr is used. If the pattern
+ is not defined, the mapping will fail.
+
+
+ The special pattern, relation.* is used
+ when no other relation pattern is matched.
+
+
+
+
+
+ relationModifier.mod
+
+
+
+ This pattern specifies how a CQL relation modifier is mapped to RPN.
+ The RPN pattern is usually a relation attribute.
+
+
+
+
+
+ structure.type
+
+
+
+ This pattern specifies how a CQL structure is mapped to RPN.
+ Note that this CQL pattern is somewhat to similar to
+ CQL pattern relation.
+ The type is a CQL relation.
+
+
+ The pattern, structure.* is used
+ when no other structure pattern is matched.
+ Usually, the RPN equivalent specifies a structure attribute.
+
+
+
+
+
+ position.type
+
+
+
+ This pattern specifies how the anchor (position) of
+ CQL is mapped to RPN.
+ The type is one
+ of first, any,
+ last, firstAndLast.
+
+
+ The pattern, position.* is used
+ when no other position pattern is matched.
+
+
+
+
+
+ set.prefix
+
+
+
+ This specification defines a CQL context set for a given prefix.
+ The value on the right hand side is the URI for the set -
+ not RPN. All prefixes used in
+ index patterns must be defined this way.
+
+
+
+
+
+ set
+
+
+
+ This specification defines a default CQL context set for index names.
+ The value on the right hand side is the URI for the set.
+
+
+
+
+
+
+
+ CQL to RPN mapping file
+
+ This simple file defines two context sets, three indexes and three
+ relations, a position pattern and a default structure.
+
+
+
+
+ With the mappings above, the CQL query
+
+ computer
+
+ is converted to the PQF:
+
+ @attr 1=1016 @attr 2=3 @attr 4=1 @attr 3=3 @attr 6=1 "computer"
+
+ by rules index.cql.serverChoice,
+ relation.scr, structure.*,
+ position.any.
+
+
+ CQL query
+
+ computer^
+
+ is rejected, since position.right is
+ undefined.
+
+
+ CQL query
+
+ >my = "http://www.loc.gov/zing/cql/dc-indexes/v1.0/" my.title = x
+
+ is converted to
+
+ @attr 1=4 @attr 2=3 @attr 4=1 @attr 3=3 @attr 6=1 "x"
+
+
+
+
+ CQL to RPN string attributes
+
+ In this example we allow any index to be passed to RPN as
+ a use attribute.
+
+
+
+
+ The http://bogus/rpn context set is also the default
+ so we can make queries such as
+
+ title = a
+
+ which is converted to
+
+ @attr 2=3 @attr 4=1 @attr 3=3 @attr 1=title "a"
+
+
+
+
+ CQL to RPN using Bath Profile
+
+ The file etc/pqf.properties has mappings from
+ the Bath Profile and Dublin Core to RPN.
+ If YAZ is installed as a package it's usually located
+ in /usr/share/yaz/etc and part of the
+ development package, such as libyaz-dev.
+
+
+
+ CQL to XCQL conversion
+
+ Conversion from CQL to XCQL is trivial and does not
+ require a mapping to be defined.
+ There three functions to choose from depending on the
+ way you wish to store the resulting output (XML buffer
+ containing XCQL).
+
+int cql_to_xml_buf(struct cql_node *cn, char *out, int max);
+void cql_to_xml(struct cql_node *cn,
+ void (*pr)(const char *buf, void *client_data),
+ void *client_data);
+void cql_to_xml_stdio(struct cql_node *cn, FILE *f);
+
+ Function cql_to_xml_buf converts
+ to XCQL and stores result in a user supplied buffer of a given
+ max size.
+
+
+ cql_to_xml writes the result in
+ a user defined output stream.
+ cql_to_xml_stdio writes to a
+ a file.
+
+
+
+
+ Object Identifiers
+
+
+ The basic YAZ representation of an OID is an array of integers,
+ terminated with the value -1. This integer is of type
+ Odr_oid.
+
+
+ Fundamental OID operations and the type Odr_oid
+ are defined in yaz/oid_util.h.
+
+
+ An OID can either be declared as a automatic variable or it can
+ allocated using the memory utilities or ODR/NMEM. It's
+ guaranteed that an OID can fit in OID_SIZE integers.
+
+ Create OID on stack
+
+ We can create an OID for the Bib-1 attribute set with:
+
+ Odr_oid bib1[OID_SIZE];
+ bib1[0] = 1;
+ bib1[1] = 2;
+ bib1[2] = 840;
+ bib1[3] = 10003;
+ bib1[4] = 3;
+ bib1[5] = 1;
+ bib1[6] = -1;
+
+
+
+
+ And OID may also be filled from a string-based representation using
+ dots (.). This is achieved by function
+
+ int oid_dotstring_to_oid(const char *name, Odr_oid *oid);
+
+ This functions returns 0 if name could be converted; -1 otherwise.
+
+ Using oid_oiddotstring_to_oid
+
+ We can fill the Bib-1 attribute set OID easier with:
+
+ Odr_oid bib1[OID_SIZE];
+ oid_oiddotstring_to_oid("1.2.840.10003.3.1", bib1);
+
+
+
+
+ We can also allocate an OID dynamically on a ODR stream with:
+
+ Odr_oid *odr_getoidbystr(ODR o, const char *str);
+
+ This creates an OID from string-based representation using dots.
+ This function take an &odr; stream as parameter. This stream is used to
+ allocate memory for the data elements, which is released on a
+ subsequent call to odr_reset() on that stream.
+
+
+ Using odr_getoidbystr
+
+ We can create a OID for the Bib-1 attribute set with:
+
+ Odr_oid *bib1 = odr_getoidbystr(odr, "1.2.840.10003.3.1");
+
+
+
+
+
+ The function
+
+ char *oid_oid_to_dotstring(const Odr_oid *oid, char *oidbuf)
+
+ does the reverse of oid_oiddotstring_to_oid. It
+ converts an OID to the string-based representation using dots.
+ The supplied char buffer oidbuf holds the resulting
+ string and must be at least OID_STR_MAX in size.
+
+
+
+ OIDs can be copied with oid_oidcpy which takes
+ two OID lists as arguments. Alternativly, an OID copy can be allocated
+ on a ODR stream with:
+
+ Odr_oid *odr_oiddup(ODR odr, const Odr_oid *o);
+
+
+
+
+ OIDs can be compared with oid_oidcmp which returns
+ zero if the two OIDs provided are identical; non-zero otherwise.
+
+
+ OID database
+
+ From YAZ version 3 and later, the oident system has been replaced
+ by an OID database. OID database is a misnomer .. the old odient
+ system was also a database.
+
+
+ The OID database is really just a map between named Object Identifiers
+ (string) and their OID raw equivalents. Most operations either
+ convert from string to OID or other way around.
+
+
+ Unfortunately, whenever we supply a string we must also specify the
+ OID class. The class is necessary because some
+ strings correspond to multiple OIDs. An example of such a string is
+ Bib-1 which may either be an attribute-set
+ or a diagnostic-set.
+
+
+ Applications using the YAZ database should include
+ yaz/oid_db.h.
+
+
+ A YAZ database handle is of type yaz_oid_db_t.
+ Actually that's a pointer. You need not think deal with that.
+ YAZ has a built-in database which can be considered "constant" for
+ most purposes.
+ We can get hold that by using function yaz_oid_std.
+
+
+ All functions with prefix yaz_string_to_oid
+ converts from class + string to OID. We have variants of this
+ operation due to different memory allocation strategies.
+
+
+ All functions with prefix
+ yaz_oid_to_string converts from OID to string
+ + class.
+
+
+ Create OID with YAZ DB
+
+ We can create an OID for the Bib-1 attribute set on the ODR stream
+ odr with:
+
+ Odr_oid *bib1 =
+ yaz_string_to_oid_odr(yaz_oid_std(), CLASS_ATTSET, "Bib-1", odr);
+
+ This is more complex than using odr_getoidbystr.
+ You would only use yaz_string_to_oid_odr when the
+ string (here Bib-1) is supplied by a user or configuration.
+
+
+
+
+ Standard OIDs
+
+
+ All the object identifers in the standard OID database as returned
+ by yaz_oid_std can referenced directly in a
+ program as a constant OID.
+ Each constant OID is prefixed with yaz_oid_ -
+ followed by OID class (lowercase) - then by OID name (normalized and
+ lowercase).
+
+
+ See for list of all object identifiers
+ built into YAZ.
+ These are declared in yaz/oid_std.h but are
+ included by yaz/oid_db.h as well.
+
+
+ Use a built-in OID
+
+ We can allocate our own OID filled with the constant OID for
+ Bib-1 with:
+
+ Odr_oid *bib1 = odr_oiddup(o, yaz_oid_attset_bib1);
+
+
+
+
+
+ Nibble Memory
+
+
+ Sometimes when you need to allocate and construct a large,
+ interconnected complex of structures, it can be a bit of a pain to
+ release the associated memory again. For the structures describing the
+ Z39.50 PDUs and related structures, it is convenient to use the
+ memory-management system of the &odr; subsystem (see
+ ). However, in some circumstances
+ where you might otherwise benefit from using a simple nibble memory
+ management system, it may be impractical to use
+ odr_malloc() and odr_reset().
+ For this purpose, the memory manager which also supports the &odr;
+ streams is made available in the NMEM module. The external interface
+ to this module is given in the nmem.h file.
+
+
+
+ The following prototypes are given:
+
+
+
+ NMEM nmem_create(void);
+ void nmem_destroy(NMEM n);
+ void *nmem_malloc(NMEM n, int size);
+ void nmem_reset(NMEM n);
+ int nmem_total(NMEM n);
+ void nmem_init(void);
+ void nmem_exit(void);
+
+
+
+ The nmem_create() function returns a pointer to a
+ memory control handle, which can be released again by
+ nmem_destroy() when no longer needed.
+ The function nmem_malloc() allocates a block of
+ memory of the requested size. A call to nmem_reset()
+ or nmem_destroy() will release all memory allocated
+ on the handle since it was created (or since the last call to
+ nmem_reset(). The function
+ nmem_total() returns the number of bytes currently
+ allocated on the handle.
+
+
+
+ The nibble memory pool is shared amongst threads. POSIX
+ mutex'es and WIN32 Critical sections are introduced to keep the
+ module thread safe. Function nmem_init()
+ initializes the nibble memory library and it is called automatically
+ the first time the YAZ.DLL is loaded. &yaz; uses
+ function DllMain to achieve this. You should
+ not call nmem_init or
+ nmem_exit unless you're absolute sure what
+ you're doing. Note that in previous &yaz; versions you'd have to call
+ nmem_init yourself.
+
+
+
+
+ Log
+
+ &yaz; has evolved a fairly complex log system which should be useful both
+ for debugging &yaz; itself, debugging applications that use &yaz;, and for
+ production use of those applications.
+
+
+ The log functions are declared in header yaz/log.h
+ and implemented in src/log.c.
+ Due to name clash with syslog and some math utilities the logging
+ interface has been modified as of YAZ 2.0.29. The obsolete interface
+ is still available if in header file yaz/log.h.
+ The key points of the interface are:
+
+
+ void yaz_log(int level, const char *fmt, ...)
+
+ void yaz_log_init(int level, const char *prefix, const char *name);
+ void yaz_log_init_file(const char *fname);
+ void yaz_log_init_level(int level);
+ void yaz_log_init_prefix(const char *prefix);
+ void yaz_log_time_format(const char *fmt);
+ void yaz_log_init_max_size(int mx);
+
+ int yaz_log_mask_str(const char *str);
+ int yaz_log_module_level(const char *name);
+
+
+
+ The reason for the whole log module is the yaz_log
+ function. It takes a bitmask indicating the log levels, a
+ printf-like format string, and a variable number of
+ arguments to log.
+
+
+
+ The log level is a bit mask, that says on which level(s)
+ the log entry should be made, and optionally set some behaviour of the
+ logging. In the most simple cases, it can be one of YLOG_FATAL,
+ YLOG_DEBUG, YLOG_WARN, YLOG_LOG. Those can be combined with bits
+ that modify the way the log entry is written:YLOG_ERRNO,
+ YLOG_NOTIME, YLOG_FLUSH.
+ Most of the rest of the bits are deprecated, and should not be used. Use
+ the dynamic log levels instead.
+
+
+
+ Applications that use &yaz;, should not use the LOG_LOG for ordinary
+ messages, but should make use of the dynamic loglevel system. This consists
+ of two parts, defining the loglevel and checking it.
+
+
+
+ To define the log levels, the (main) program should pass a string to
+ yaz_log_mask_str to define which log levels are to be
+ logged. This string should be a comma-separated list of log level names,
+ and can contain both hard-coded names and dynamic ones. The log level
+ calculation starts with YLOG_DEFAULT_LEVEL and adds a bit
+ for each word it meets, unless the word starts with a '-', in which case it
+ clears the bit. If the string 'none' is found,
+ all bits are cleared. Typically this string comes from the command-line,
+ often identified by -v. The
+ yaz_log_mask_str returns a log level that should be
+ passed to yaz_log_init_level for it to take effect.
+
+
+
+ Each module should check what log bits it should be used, by calling
+ yaz_log_module_level with a suitable name for the
+ module. The name is cleared from a preceding path and an extension, if any,
+ so it is quite possible to use __FILE__ for it. If the
+ name has been passed to yaz_log_mask_str, the routine
+ returns a non-zero bitmask, which should then be used in consequent calls
+ to yaz_log. (It can also be tested, so as to avoid unnecessary calls to
+ yaz_log, in time-critical places, or when the log entry would take time
+ to construct.)
+
+
+
+ Yaz uses the following dynamic log levels:
+ server, session, request, requestdetail for the server
+ functionality.
+ zoom for the zoom client api.
+ ztest for the simple test server.
+ malloc, nmem, odr, eventl for internal debugging of yaz itself.
+ Of course, any program using yaz is welcome to define as many new ones, as
+ it needs.
+
+
+
+ By default the log is written to stderr, but this can be changed by a call
+ to yaz_log_init_file or
+ yaz_log_init. If the log is directed to a file, the
+ file size is checked at every write, and if it exceeds the limit given in
+ yaz_log_init_max_size, the log is rotated. The
+ rotation keeps one old version (with a .1 appended to
+ the name). The size defaults to 1GB. Setting it to zero will disable the
+ rotation feature.
+
+
+
+ A typical yaz-log looks like this
+ 13:23:14-23/11 yaz-ztest(1) [session] Starting session from tcp:127.0.0.1 (pid=30968)
+ 13:23:14-23/11 yaz-ztest(1) [request] Init from 'YAZ' (81) (ver 2.0.28) OK
+ 13:23:17-23/11 yaz-ztest(1) [request] Search Z: @attrset Bib-1 foo OK:7 hits
+ 13:23:22-23/11 yaz-ztest(1) [request] Present: [1] 2+2 OK 2 records returned
+ 13:24:13-23/11 yaz-ztest(1) [request] Close OK
+
+
+
+ The log entries start with a time stamp. This can be omitted by setting the
+ YLOG_NOTIME bit in the loglevel. This way automatic tests
+ can be hoped to produce identical log files, that are easy to diff. The
+ format of the time stamp can be set with
+ yaz_log_time_format, which takes a format string just
+ like strftime.
+
+
+
+ Next in a log line comes the prefix, often the name of the program. For
+ yaz-based servers, it can also contain the session number. Then
+ comes one or more logbits in square brackets, depending on the logging
+ level set by yaz_log_init_level and the loglevel
+ passed to yaz_log_init_level. Finally comes the format
+ string and additional values passed to yaz_log
+
+
+
+ The log level YLOG_LOGLVL, enabled by the string
+ loglevel, will log all the log-level affecting
+ operations. This can come in handy if you need to know what other log
+ levels would be useful. Grep the logfile for [loglevel].
+
+
+
+ The log system is almost independent of the rest of &yaz;, the only
+ important dependence is of nmem, and that only for
+ using the semaphore definition there.
+
+
+
+ The dynamic log levels and log rotation were introduced in &yaz; 2.0.28. At
+ the same time, the log bit names were changed from
+ LOG_something to YLOG_something,
+ to avoid collision with syslog.h.
+
+
+
+
+ MARC
+
+
+ YAZ provides a fast utility that decodes MARC records and
+ encodes to a varity of output formats. The MARC records must
+ be encoded in ISO2709.
+
+
+
+ /* create handler */
+ yaz_marc_t yaz_marc_create(void);
+ /* destroy */
+ void yaz_marc_destroy(yaz_marc_t mt);
+
+ /* set XML mode YAZ_MARC_LINE, YAZ_MARC_SIMPLEXML, ... */
+ void yaz_marc_xml(yaz_marc_t mt, int xmlmode);
+ #define YAZ_MARC_LINE 0
+ #define YAZ_MARC_SIMPLEXML 1
+ #define YAZ_MARC_OAIMARC 2
+ #define YAZ_MARC_MARCXML 3
+ #define YAZ_MARC_ISO2709 4
+ #define YAZ_MARC_XCHANGE 5
+
+ /* supply iconv handle for character set conversion .. */
+ void yaz_marc_iconv(yaz_marc_t mt, yaz_iconv_t cd);
+
+ /* set debug level, 0=none, 1=more, 2=even more, .. */
+ void yaz_marc_debug(yaz_marc_t mt, int level);
+
+ /* decode MARC in buf of size bsize. Returns >0 on success; <=0 on failure.
+ On success, result in *result with size *rsize. */
+ int yaz_marc_decode_buf (yaz_marc_t mt, const char *buf, int bsize,
+ char **result, int *rsize);
+
+ /* decode MARC in buf of size bsize. Returns >0 on success; <=0 on failure.
+ On success, result in WRBUF */
+ int yaz_marc_decode_wrbuf (yaz_marc_t mt, const char *buf,
+ int bsize, WRBUF wrbuf);
+]]>
+
+
+ A MARC conversion handle must be created by using
+ yaz_marc_create and destroyed
+ by calling yaz_marc_destroy.
+
+
+ All other function operate on a yaz_marc_t handle.
+ The output is specified by a call to yaz_marc_xml.
+ The xmlmode must be one of
+
+
+ YAZ_MARC_LINE
+
+
+ A simple line-by-line format suitable for display but not
+ recommend for further (machine) processing.
+
+
+
+
+
+ YAZ_MARC_MARCXML
+
+
+ The resulting record is converted to MARCXML.
+
+
+
+
+
+ YAZ_MARC_ISO2709
+
+
+ The resulting record is converted to ISO2709 (MARC).
+
+
+
+
+
+
+ The actual conversion functions are
+ yaz_marc_decode_buf and
+ yaz_marc_decode_wrbuf which decodes and encodes
+ a MARC record. The former function operates on simple buffers, the
+ stores the resulting record in a WRBUF handle (WRBUF is a simple string
+ type).
+
+
+ Display of MARC record
+
+ The followint program snippet illustrates how the MARC API may
+ be used to convert a MARC record to the line-by-line format:
+
+
+
+
+
+
+
+ Retrieval Facility
+
+ YAZ version 2.1.20 or later includes a Retrieval facility tool
+ which allows a SRU/Z39.50 to describe itself and perform record
+ conversions. The idea is the following:
+
+
+
+
+ An SRU/Z39.50 client sends a retrieval request which includes
+ a combination of the following parameters: syntax (format),
+ schema (or element set name).
+
+
+
+
+
+ The retrieval facility is invoked with parameters in a
+ server/proxy. The retrieval facility matches the parameters a set of
+ "supported" retrieval types.
+ If there is no match, the retrieval signals an error
+ (syntax and / or schema not supported).
+
+
+
+
+
+ For a successful match, the backend is invoked with the same
+ or altered retrieval parameters (syntax, schema). If
+ a record is received from the backend, it is converted to the
+ frontend name / syntax.
+
+
+
+
+
+ The resulting record is sent back the client and tagged with
+ the frontend syntax / schema.
+
+
+
+
+
+
+ The Retrieval facility is driven by an XML configuration. The
+ configuration is neither Z39.50 ZeeRex or SRU ZeeRex. But it
+ should be easy to generate both of them from the XML configuration.
+ (unfortunately the two versions
+ of ZeeRex differ substantially in this regard).
+
+
+ Retrieval XML format
+
+ All elements should be covered by namespace
+ http://indexdata.com/yaz .
+ The root element node must be retrievalinfo.
+
+
+ The retrievalinfo must include one or
+ more retrieval elements. Each
+ retrieval defines specific combination of
+ syntax, name and identifier supported by this retrieval service.
+
+
+ The retrieval element may include any of the
+ following attributes:
+
+ syntax (REQUIRED)
+
+
+ Defines the record syntax. Possible values is any
+ of the names defined in YAZ' OID database or a raw
+ OID in (n.n ... n).
+
+
+
+ name (OPTIONAL)
+
+
+ Defines the name of the retrieval format. This can be
+ any string. For SRU, the value, is equivalent to schema (short-hand);
+ for Z39.50 it's equivalent to simple element set name.
+ For YAZ 3.0.24 and later this name may be specified as a glob
+ expression with operators
+ * and ?.
+
+
+
+ identifier (OPTIONAL)
+
+
+ Defines the URI schema name of the retrieval format. This can be
+ any string. For SRU, the value, is equivalent to URI schema.
+ For Z39.50, there is no equivalent.
+
+
+
+
+
+
+ The retrieval may include one
+ backend element. If a backend
+ element is given, it specifies how the records are retrieved by
+ some backend and how the records are converted from the backend to
+ the "frontend".
+
+
+ The attributes, name and syntax
+ may be specified for the backend element. These
+ semantics of these attributes is equivalent to those for the
+ retrieval. However, these values are passed to
+ the "backend".
+
+
+ The backend element may includes one or more
+ conversion instructions (as children elements). The supported
+ conversions are:
+
+ marc
+
+
+ The marc element specifies a conversion
+ to - and from ISO2709 encoded MARC and
+ &acro.marcxml;/MarcXchange.
+ The following attributes may be specified:
+
+
+ inputformat (REQUIRED)
+
+
+ Format of input. Supported values are
+ marc (for ISO2709); and xml
+ for MARCXML/MarcXchange.
+
+
+
+
+ outputformat (REQUIRED)
+
+
+ Format of output. Supported values are
+ line (MARC line format);
+ marcxml (for MARCXML),
+ marc (ISO2709),
+ marcxhcange (for MarcXchange).
+
+
+
+
+ inputcharset (OPTIONAL)
+
+
+ Encoding of input. For XML input formats, this need not
+ be given, but for ISO2709 based inputformats, this should
+ be set to the encoding used. For MARC21 records, a common
+ inputcharset value would be marc-8.
+
+
+
+
+ outputcharset (OPTIONAL)
+
+
+ Encoding of output. If outputformat is XML based, it is
+ strongly recommened to use utf-8.
+
+
+
+
+
+
+
+
+ xslt
+
+
+ The xslt element specifies a conversion
+ via &acro.xslt;. The following attributes may be specified:
+
+
+ stylesheet (REQUIRED)
+
+
+ Stylesheet file.
+
+
+
+
+
+
+
+
+
+
+
+
+ Retrieval Facility Examples
+
+ MARC21 backend
+
+ A typical way to use the retrieval facility is to enable XML
+ for servers that only supports ISO2709 encoded MARC21 records.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+]]>
+
+
+ This means that our frontend supports:
+
+
+
+ MARC21 F(ull) records.
+
+
+
+
+ MARC21 B(rief) records.
+
+
+
+
+
+ MARCXML records.
+
+
+
+
+
+ Dublin core records.
+
+
+
+
+
+
+
+ API
+
+ It should be easy to use the retrieval systems from applications. Refer
+ to the headers
+ yaz/retrieval.h and
+ yaz/record_conv.h.
+
+
+
+
+
+