From a4d2f62568bcf788630502fc1cbcad1163d3f87a Mon Sep 17 00:00:00 2001 From: Marc Cromme Date: Tue, 13 Jun 2006 09:26:59 +0000 Subject: [PATCH] added chapter on query model, PQF, bib1-attribute sets. Much documentation still needed on these issues --- doc/Makefile.am | 18 +- doc/administration.xml | 70 +---- doc/architecture.xml | 104 +------ doc/entities.ent | 3 +- doc/querymodel.xml | 750 ++++++++++++++++++++++++++++++++++++++++++++++++ doc/server.xml | 329 +-------------------- doc/zebra.xml | 3 +- 7 files changed, 771 insertions(+), 506 deletions(-) create mode 100644 doc/querymodel.xml diff --git a/doc/Makefile.am b/doc/Makefile.am index d088bdf..9d74037 100644 --- a/doc/Makefile.am +++ b/doc/Makefile.am @@ -1,30 +1,34 @@ -## $Id: Makefile.am,v 1.47 2006-06-12 09:39:17 marc Exp $ +## $Id: Makefile.am,v 1.48 2006-06-13 09:26:59 marc Exp $ docdir=$(datadir)/doc/@PACKAGE@ SUBDIRS = common -XMLFILES = zebra.xml \ +XMLFILES = \ administration.xml \ architecture.xml \ examples.xml \ + idzebra-config-man.xml \ indexdata.xml \ installation.xml \ introduction.xml \ license.xml \ marc_indexing.xml \ + querymodel.xml \ quickstart.xml \ recordmodel-alvisxslt.xml \ recordmodel-grs.xml \ server.xml \ + zebra.xml \ zebraidx-commands.xml \ + zebraidx-man.xml \ zebraidx-options.xml \ zebraidx.xml \ + zebrasrv-man.xml \ zebrasrv-options.xml \ zebrasrv-synopsis.xml \ zebrasrv-virtual.xml HTMLFILES = \ - administration-cql-to-pqf.html \ administration-extended-services.html \ administration-ranking.html \ administration.html \ @@ -43,6 +47,8 @@ HTMLFILES = \ gfs-config.html \ grs-exchange-formats.html \ grs-internal-representation.html \ + htmlhelp.hhp \ + index.html \ indexdata.html \ installation.debian.html \ installation.html \ @@ -51,6 +57,9 @@ HTMLFILES = \ license.html \ locating-records.html \ protocol-support.html \ + querymodel-cql-to-pqf.html \ + querymodel-pqf.html \ + querymodel.html \ quick-start.html \ record-model-alvisxslt-conf.html \ record-model-alvisxslt.html \ @@ -63,9 +72,10 @@ HTMLFILES = \ shadow-registers.html \ simple-indexing.html \ support.html \ - index.html \ + toc.hhc \ zebraidx.html + PNGFILES=zebra.png EPSFILES=zebra.eps diff --git a/doc/administration.xml b/doc/administration.xml index caaeaf4..10babae 100644 --- a/doc/administration.xml +++ b/doc/administration.xml @@ -1,5 +1,5 @@ - + Administrating Zebra - - - For example, using server-side CQL-to-PQF conversion, one might - query a zebra server like this: - - querytype cql - Z> find text=(plant and soil) - ]]> - - and - if properly configured - even static relevance ranking can - be performed using CQL query syntax: - - find text = /relevant (plant and soil) - ]]> - - - - - By the way, the same configuration can be used to - search using client-side CQL-to-PQF conversion: - (the only difference is querytype cql2rpn - instead of - querytype cql, and the call specifying a local - conversion file) - - querytype cql2rpn - Z> find text=(plant and soil) - ]]> - - - - - Exhaustive information can be found in the - Section "Specification of CQL to RPN mappings" in the YAZ manual. - - http://www.indexdata.dk/yaz/doc/tools.tkl#tools.cql.map, - and shall therefore not be repeated here. - - - - - diff --git a/doc/architecture.xml b/doc/architecture.xml index 60281e3..ff840a6 100644 --- a/doc/architecture.xml +++ b/doc/architecture.xml @@ -1,5 +1,5 @@ - + Overview of Zebra Architecture @@ -382,108 +382,6 @@ - - diff --git a/doc/entities.ent b/doc/entities.ent index a953715..b0713e4 100644 --- a/doc/entities.ent +++ b/doc/entities.ent @@ -1,10 +1,11 @@ - + + diff --git a/doc/querymodel.xml b/doc/querymodel.xml new file mode 100644 index 0000000..bae113f --- /dev/null +++ b/doc/querymodel.xml @@ -0,0 +1,750 @@ + + + Query Model + + + Query Model Overview + + + Zebra is born as a networking Information Retrieval engine adhering + to the international standards + Z39.50 and + SRU, + and implement the query model defined there. + Unfortunately, the Z39.50 query model has only defined a binary + encoded representation, which is used as transport packaging in + the Z39.50 protocol layer. This representation is not human + readable, nor defines any convenient way to specify queries. + + + Therefore, Index Data has defined a textual representaion in the + Prefix Query Format, short + PQF, which then has been adopted by other + parties developing Z39.50 software. It is also often referred to as + Prefix Query Notation, or in short + PQN, and is thoroughly explained in + . + + + + In addition, Zebra can be configured to understand and map the + Common Query Language + (CQL) + to PQF. See an introduction on the mapping to the internal query + representation in + . + + + + + Prefix Query Format structure and syntax + + The + PQF + grammer is documented in the YAZ manual. + This textual PQF representation + is always during search mapped to the equivalent Zebra internal + query parse tree. + + + + + + + Explain Attribute Set + + The attribute-set exp-1 is defined for + searching an Explain IR-Explain-1 database. + It consists of a single Use (type 1) attribute. + + + In addition, the non-Use + bib-1 attributes, that is, the types + Relation, Position, + Structure, Truncation, + and Completeness are imported from + the bib-1 attrubute set, and may be used + within any explain query. + + + + Use Attributes (type = 1) + + The following Explain search atributes are supported: + ExplainCategory (@attr 1=1), + DatabaseName (@attr 1=3), + DateAdded (@attr 1=9), + DateChanged(@attr 1=10). + + + A search in the use attribute ExplainCategory + supports only these predefined values: + CategoryList, TargetInfo, + DatabaseInfo, AttributeDetails. + + + See tab/explain.att and the + for more information. + + + + + Explain searches with yaz-client + + List supported categories to find out which explain commands are + supported: + + Z> base IR-Explain-1 + Z> @attr exp1 1=1 categorylist + Z> form sutrs + Z> show 1+2 + + + + + Get target info, that is, investigate which databases exist at + this server endpoint: + + Z> base IR-Explain-1 + Z> @attr exp1 1=1 targetinfo + Z> form xml + Z> show 1+1 + Z> form grs-1 + Z> show 1+1 + Z> form sutrs + Z> show 1+1 + + + + + List all supported databases, the number of hits + is the number of databases found, which most commonly are the + following two: + the Default and the + IR-Explain-1 databases. + + Z> base IR-Explain-1 + Z> f @attr exp1 1=1 databaseinfo + Z> form sutrs + Z> show 1+2 + + + + + Get database info record for database Default. + + Z> base IR-Explain-1 + Z> @and @attr exp1 1=1 databaseinfo @attr exp1 1=3 Default + + Identical query with explicitly specified attribute set: + + Z> base IR-Explain-1 + Z> @attrset exp1 @and @attr 1=1 databaseinfo @attr 1=3 Default + + + + + Get attribute details record for database + Default. + This query is very useful to study the internal Zebra indexes. + If records have been indexed using the alvis + XSLT filter, the string representation names of the known indexes can be + found. + + Z> base IR-Explain-1 + Z> @and @attr exp1 1=1 attributedetails @attr exp1 1=3 Default + + Identical query with explicitly specified attribute set: + + Z> base IR-Explain-1 + Z> @attrset exp1 @and @attr 1=1 attributedetails @attr 1=3 Default + + + + + + + + Bib1 Attribute Set + + Something about querying to be written .. + + + Most of the information contained in this section is an excerpt of + the ATTRIBUTE SET BIB-1 (Z39.50-1995) + SEMANTICS, found at The BIB-1 + Attribute Set Semantics from 1995, also in an updated + Bib-1 + Attribute Set + version from 2003. Index Data is not the copyright holder of this + information. + + + + + Use Attributes (type = 1) + + + + Relation Attributes (type = 2) + + + + + + Position Attributes (type = 3) + + + + Structure Attributes (type = 4) + + + + Truncation Attributes (type = 5) + + + + Completeness Attributes (type = 6) + + + + Zebra Extention Sorting Attributes (type = 7) + + + + Zebra Extention Search Estimation Attributes (type = 8) + + + + Zebra Extention Weight Attributes (type = 9) + + + + + + Mapping from Bib1 Attributes to Zebra internal + register indexes + + + + + Use attributes are interpreted according to the + attribute sets which have been loaded in the + zebra.cfg file, and are matched against specific + fields as specified in the .abs file which + describes the profile of the records which have been loaded. + If no Use attribute is provided, a default of Bib-1 Any is assumed. + + + + If a Structure attribute of + Phrase is used in conjunction with a + Completeness attribute of + Complete (Sub)field, the term is matched + against the contents of the phrase (long word) register, if one + exists for the given Use attribute. + A phrase register is created for those fields in the + .abs file that contains a + p-specifier. + + + + + If Structure=Phrase is + used in conjunction with Incomplete Field - the + default value for Completeness, the + search is directed against the normal word registers, but if the term + contains multiple words, the term will only match if all of the words + are found immediately adjacent, and in the given order. + The word search is performed on those fields that are indexed as + type w in the .abs file. + + + + If the Structure attribute is + Word List, + Free-form Text, or + Document Text, the term is treated as a + natural-language, relevance-ranked query. + This search type uses the word register, i.e. those fields + that are indexed as type w in the + .abs file. + + + + If the Structure attribute is + Numeric String the term is treated as an integer. + The search is performed on those fields that are indexed + as type n in the .abs file. + + + + If the Structure attribute is + URx the term is treated as a URX (URL) entity. + The search is performed on those fields that are indexed as type + u in the .abs file. + + + + If the Structure attribute is + Local Number the term is treated as + native Zebra Record Identifier. + + + + If the Relation attribute is + Equals (default), the term is matched + in a normal fashion (modulo truncation and processing of + individual words, if required). + If Relation is Less Than, + Less Than or Equal, + Greater than, or Greater than or + Equal, the term is assumed to be numerical, and a + standard regular expression is constructed to match the given + expression. + If Relation is Relevance, + the standard natural-language query processor is invoked. + + + + For the Truncation attribute, + No Truncation is the default. + Left Truncation is not supported. + Process # in search term is supported, as is + Regxp-1. + Regxp-2 enables the fault-tolerant (fuzzy) + search. As a default, a single error (deletion, insertion, + replacement) is accepted when terms are matched against the register + contents. + + + + + Regular expressions + + + Each term in a query is interpreted as a regular expression if + the truncation value is either Regxp-1 (102) + or Regxp-2 (103). + Both query types follow the same syntax with the operands: + + + + x + + + Matches the character x. + + + + + . + + + Matches any character. + + + + + [..] + + + Matches the set of characters specified; + such as [abc] or [a-c]. + + + + + and the operators: + + + + x* + + + Matches x zero or more times. Priority: high. + + + + + x+ + + + Matches x one or more times. Priority: high. + + + + + x? + + + Matches x zero or once. Priority: high. + + + + + xy + + + Matches x, then y. + Priority: medium. + + + + + x|y + + + Matches either x or y. + Priority: low. + + + + + The order of evaluation may be changed by using parentheses. + + + + If the first character of the Regxp-2 query + is a plus character (+) it marks the + beginning of a section with non-standard specifiers. + The next plus character marks the end of the section. + Currently Zebra only supports one specifier, the error tolerance, + which consists one digit. + + + + Since the plus operator is normally a suffix operator the addition to + the query syntax doesn't violate the syntax for standard regular + expressions. + + + + + + Query examples + + + Phrase search for information retrieval in + the title-register: + + @attr 1=4 "information retrieval" + + + + + Ranked search for the same thing: + + @attr 1=4 @attr 2=102 "Information retrieval" + + + + + Phrase search with a regular expression: + + @attr 1=4 @attr 5=102 "informat.* retrieval" + + + + + Ranked search with a regular expression: + + @attr 1=4 @attr 5=102 @attr 2=102 "informat.* retrieval" + + + + + In the GILS schema (gils.abs), the + west-bounding-coordinate is indexed as type n, + and is therefore searched by specifying + structure=Numeric String. + To match all those records with west-bounding-coordinate greater + than -114 we use the following query: + + @attr 4=109 @attr 2=5 @attr gils 1=2038 -114 + + + + + + + + + + + + + Server Side CQL to PQF Query Translation + + Using the + <cql2rpn>l2rpn.txt</cql2rpn> + YAZ Frontend Virtual + Hosts option, one can configure + the YAZ Frontend CQL-to-PQF + converter, specifying the interpretation of various + CQL + indexes, relations, etc. in terms of Type-1 query attributes. + + + + For example, using server-side CQL-to-PQF conversion, one might + query a zebra server like this: + + querytype cql + Z> find text=(plant and soil) + ]]> + + and - if properly configured - even static relevance ranking can + be performed using CQL query syntax: + + find text = /relevant (plant and soil) + ]]> + + + + + By the way, the same configuration can be used to + search using client-side CQL-to-PQF conversion: + (the only difference is querytype cql2rpn + instead of + querytype cql, and the call specifying a local + conversion file) + + querytype cql2rpn + Z> find text=(plant and soil) + ]]> + + + + + Exhaustive information can be found in the + Section "Specification of CQL to RPN mappings" in the YAZ manual. + + http://www.indexdata.dk/yaz/doc/tools.tkl#tools.cql.map, + and shall therefore not be repeated here. + + + + + + + + + + + diff --git a/doc/server.xml b/doc/server.xml index 4fa607b..4113196 100644 --- a/doc/server.xml +++ b/doc/server.xml @@ -1,5 +1,5 @@ - + The Z39.50 Server @@ -242,243 +242,6 @@ also the following section). - - Use attributes are interpreted according to the - attribute sets which have been loaded in the - zebra.cfg file, and are matched against specific - fields as specified in the .abs file which - describes the profile of the records which have been loaded. - If no Use attribute is provided, a default of Bib-1 Any is assumed. - - - - If a Structure attribute of - Phrase is used in conjunction with a - Completeness attribute of - Complete (Sub)field, the term is matched - against the contents of the phrase (long word) register, if one - exists for the given Use attribute. - A phrase register is created for those fields in the - .abs file that contains a - p-specifier. - - - - - If Structure=Phrase is - used in conjunction with Incomplete Field - the - default value for Completeness, the - search is directed against the normal word registers, but if the term - contains multiple words, the term will only match if all of the words - are found immediately adjacent, and in the given order. - The word search is performed on those fields that are indexed as - type w in the .abs file. - - - - If the Structure attribute is - Word List, - Free-form Text, or - Document Text, the term is treated as a - natural-language, relevance-ranked query. - This search type uses the word register, i.e. those fields - that are indexed as type w in the - .abs file. - - - - If the Structure attribute is - Numeric String the term is treated as an integer. - The search is performed on those fields that are indexed - as type n in the .abs file. - - - - If the Structure attribute is - URx the term is treated as a URX (URL) entity. - The search is performed on those fields that are indexed as type - u in the .abs file. - - - - If the Structure attribute is - Local Number the term is treated as - native Zebra Record Identifier. - - - - If the Relation attribute is - Equals (default), the term is matched - in a normal fashion (modulo truncation and processing of - individual words, if required). - If Relation is Less Than, - Less Than or Equal, - Greater than, or Greater than or - Equal, the term is assumed to be numerical, and a - standard regular expression is constructed to match the given - expression. - If Relation is Relevance, - the standard natural-language query processor is invoked. - - - - For the Truncation attribute, - No Truncation is the default. - Left Truncation is not supported. - Process # in search term is supported, as is - Regxp-1. - Regxp-2 enables the fault-tolerant (fuzzy) - search. As a default, a single error (deletion, insertion, - replacement) is accepted when terms are matched against the register - contents. - - - - Regular expressions - - - Each term in a query is interpreted as a regular expression if - the truncation value is either Regxp-1 (102) - or Regxp-2 (103). - Both query types follow the same syntax with the operands: - - - - x - - - Matches the character x. - - - - - . - - - Matches any character. - - - - - [..] - - - Matches the set of characters specified; - such as [abc] or [a-c]. - - - - - and the operators: - - - - x* - - - Matches x zero or more times. Priority: high. - - - - - x+ - - - Matches x one or more times. Priority: high. - - - - - x? - - - Matches x zero or once. Priority: high. - - - - - xy - - - Matches x, then y. - Priority: medium. - - - - - x|y - - - Matches either x or y. - Priority: low. - - - - - The order of evaluation may be changed by using parentheses. - - - - If the first character of the Regxp-2 query - is a plus character (+) it marks the - beginning of a section with non-standard specifiers. - The next plus character marks the end of the section. - Currently Zebra only supports one specifier, the error tolerance, - which consists one digit. - - - - Since the plus operator is normally a suffix operator the addition to - the query syntax doesn't violate the syntax for standard regular - expressions. - - - - - - Query examples - - - Phrase search for information retrieval in - the title-register: - - @attr 1=4 "information retrieval" - - - - - Ranked search for the same thing: - - @attr 1=4 @attr 2=102 "Information retrieval" - - - - - Phrase search with a regular expression: - - @attr 1=4 @attr 5=102 "informat.* retrieval" - - - - - Ranked search with a regular expression: - - @attr 1=4 @attr 5=102 @attr 2=102 "informat.* retrieval" - - - - - In the GILS schema (gils.abs), the - west-bounding-coordinate is indexed as type n, - and is therefore searched by specifying - structure=Numeric String. - To match all those records with west-bounding-coordinate greater - than -114 we use the following query: - - @attr 4=109 @attr 2=5 @attr gils 1=2038 -114 - - - @@ -569,96 +332,6 @@ will not be searchable. - - The following Explain categories are supported: - CategoryList, TargetInfo, - DatabaseInfo, AttributeDetails. - - - The following Explain search atributes are supported: - ExplainCategory (@attr 1=1), - DatabaseName (@attr 1=3), - DateAdded (@attr 1=9), - DateChanged(@ayyt 1=10). - See tab/explain.att for more information. - - - - Example searches with yaz-client - - - - List supported categories to find out which explain commands are - supported: - - Z> base IR-Explain-1 - Z> @attr exp1 1=1 categorylist - Z> form sutrs - Z> show 1+2 - - - - - Get target info, that is, investigate which databases exist at - this server endpoint: - - Z> base IR-Explain-1 - Z> @attr exp1 1=1 targetinfo - Z> form xml - Z> show 1+1 - Z> form grs-1 - Z> show 1+1 - Z> form sutrs - Z> show 1+1 - - - - - List all supported databases, the number of hits - is the number of databases found, which most commonly are the - following two: - the Default and the - IR-Explain-1 databases. - - Z> base IR-Explain-1 - Z> f @attr exp1 1=1 databaseinfo - Z> form sutrs - Z> show 1+2 - - - - - Get database info record for database Default. - - Z> base IR-Explain-1 - Z> @and @attr exp1 1=1 databaseinfo @attr exp1 1=3 Default - - Identical query with explicitly specified attribute set: - - Z> base IR-Explain-1 - Z> @attrset exp1 @and @attr 1=1 databaseinfo @attr 1=3 Default - - - - - Get attribute details record for database - Default. - This query is very useful to study the internal Zebra indexes. - If records have been indexed using the alvis - XSLT filter, the string representation names of the known indexes can be - found. - - Z> base IR-Explain-1 - Z> @and @attr exp1 1=1 attributedetails @attr exp1 1=3 Default - - Identical query with explicitly specified attribute set: - - Z> base IR-Explain-1 - Z> @attrset exp1 @and @attr 1=1 attributedetails @attr 1=3 Default - - - - diff --git a/doc/zebra.xml b/doc/zebra.xml index 7ff943f..7047a6f 100644 --- a/doc/zebra.xml +++ b/doc/zebra.xml @@ -9,7 +9,7 @@ %common; ]> - + Zebra - User's Guide and Reference @@ -65,6 +65,7 @@ &chap-quickstart; &chap-examples; &chap-architecture; + &chap-querymodel; &chap-administration; &chap-recordmodel-grs; &chap-recordmodel-alvisxslt; -- 1.7.10.4