From 08b7f3f08d97b9efdd5a3aef7992a359b71910d6 Mon Sep 17 00:00:00 2001 From: mike Date: Sun, 3 Nov 2002 16:49:37 +0000 Subject: [PATCH] Lots of changes, mostly to documentation, towards initial release. Also, add the random testing script. --- README | 106 ++++++++++++++++------------- docs/README | 2 + etc/generate.properties | 3 +- lib/.cvsignore | 1 + lib/README | 5 ++ src/org/z3950/zing/cql/CQLGenerator.java | 25 ++++--- src/org/z3950/zing/cql/CQLParser.java | 107 +++++++++++++++++++++--------- src/org/z3950/zing/cql/Makefile | 13 ++-- test/Makefile | 18 +++-- test/README | 12 ++-- test/queries.raw | 86 ++++++++++++++++++++++++ test/random/README | 15 +++++ test/random/mkrandom | 21 ++++++ test/raw | 87 ------------------------ 14 files changed, 310 insertions(+), 191 deletions(-) create mode 100644 lib/.cvsignore create mode 100644 lib/README create mode 100644 test/queries.raw create mode 100644 test/random/README create mode 100755 test/random/mkrandom delete mode 100644 test/raw diff --git a/README b/README index 9fa8487..c3553a4 100644 --- a/README +++ b/README @@ -1,15 +1,22 @@ -$Id: README,v 1.9 2002-11-02 01:24:41 mike Exp $ +$Id: README,v 1.10 2002-11-03 16:49:37 mike Exp $ -cql-java -- a free CQL compiler for Java +cql-java - a free CQL compiler, and other CQL tools, for Java -This project provides a set of classes for representing a CQL parse -tree (CQLBooleanNode, CQLTermNode, etc.) and a CQLCompiler class which -builds a parse tree given a CQL query as input. It also provides -compiler back-ends to render out the parse tree as XCQL (the XML -representation), as PQF (Yaz-style Prefix Query Format) and as CQL -(i.e. decompiling the parse-tree). Oh, and there's a random query -generator, too. +INTRODUCTION +------------ + +cql-java is a Free Software project that provides: + +* A set of classes for representing a CQL parse tree (a base CQLNode + class, CQLBooleanNode and its subclasses, CQLTermNode, etc.) +* A CQLCompiler class (and its lexer) which builds a parse tree given + a CQL query as input. +* A selection of compiler back-ends to render out the parse tree as: + * XCQL (the standard XML representation) + * CQL (i.e. decompiling the parse-tree) + * PQF (Yaz-style Prefix Query Format) [### NOT YET] +* A random query generator, useful for testing. CQL is "Common Query Language", a new query language designed under the umbrella of the ZING initiative (Z39.59-International Next @@ -24,22 +31,41 @@ which is supposed to be easier to parse. More information at But if you didn't know that, why are you even reading this? :-) +What's what in this distribution? + + README This file + src Source-code for the cql-java library + lib The compiled library file, "cql-java.jar" + bin Simple shell-scripts to invoke the test-harnesses + docs Documentation automatically generated by "javadoc" + test Various testing and sanity-checking frameworks + etc Other files: CQL Grammar, generator properties, etc. + +"Installation" of this package would consist of putting the bin +directory on your PATH and the lib directory on your CLASSPATH. + + SYNOPSIS -------- -Test-harness: +Using the test-harnesses: - $ echo "foo and (bar or baz)" | java org.z3950.zing.cql.CQLParser + $ CQLParser 'title=foo and author=(bar or baz)' + $ CQLLexer 'title=foo and author=(bar or baz)' + (not very interesting unless you're debugging) + $ CQLGenerator etc/generate.properties seed 18 -Library: +Using the library in your own applications: import org.z3950.zing.cql.* // Building a parse-tree by hand - CQLNode n1 = new CQLTermNode("dc.author", "=", "kernighan"); - CQLNode n2 = new CQLTermNode("dc.title", "all", "elements style"); + CQLNode n1 = new CQLTermNode("dc.author", new CQLRelation("="), + "kernighan"); + CQLNode n2 = new CQLTermNode("dc.title", new CQLRelation("all"), + "elements style"); CQLNode root = new CQLAndNode(n1, n2); - System.out.println(root.toXCQL(3)); + System.out.println(root.toXCQL(0)); // Parsing a CQL query CQLParser parser = new CQLParser(); @@ -60,29 +86,20 @@ subdirectory. (It's not all there yet, but it's coming.) AUTHOR ------ -Mike Taylor -http://www.miketaylor.org.uk +All code and documentation by Mike Taylor + http://www.miketaylor.org.uk +Please email me with bug-reports, wishlist items, patches, deployment +stories and, of course, large cash donations. LICENCE ------- -This software is open source, but I've not yet decided exactly what +This software is Open Source, but I've not yet decided exactly what licence to use. Be good. Assume I'm going with the GPL (most -restrictive) until I say otherwise. - - -TESTING -------- - -Ways of testing the parser and other components include: - -* Generate a random tree with CQLGenerate, take a copy, and - canonicalise it with CQLparser -c. Since the CQLGenerate output is - in canonical form anyway, the before-and-after versions should be - identical. - -* ... others :-) +restrictive) until I say otherwise. For what it's worth, I think the +most likely licence is the LGPL (GNU's Lesser General Public Licence) +which lets you deploy cql-java as a part of a non-free larger work. SEE ALSO @@ -93,23 +110,23 @@ Rob Sanderson's CQL compiler, written in Python. All the other free CQL compilers everyone's going to write :-) -TO DO ------ +THINGS TO DO +------------ -* ### Fix bug where "9x" is parsed as two tokens, a NUMBER and a - WORD. (And why is "x9" OK?) - -* Allow CQLGenerate test-harness to take some of its configuration - parameters on the command-line as well as or instead of a file. +* ### Fix bug where "9x" is parsed as two tokens, a TT_NUMBER followed + by a TT_WORD. The problem here is that I don't think it's actually + possible to fix this without throwing out StreakTokenizer and + rolling our own, which we absolutely _don't_ want to do. * Some niceties for the cql-decompiling back-end: * don't emit redundant parentheses. * don't put spaces around relations that don't need them. -* Write pqn-generating back-end (will need to be driven from a - configuation file specifying how to represent the qualifiers, +* Write the PQN-generating back-end. This will need to be driven from + a configuation file specifying how to represent the qualifiers, relations, relation modifiers and wildcard characters as z39.50 - attributes.) + attributes. I think Ray has such a thing, though perhaps not yet in + a form sufficiently rigorous to be computer-readable. * Consider the utility of yet another back-end that translates a CQLNode tree into a Type-1 query tree using the JZKit data @@ -117,7 +134,7 @@ TO DO query-type; but you could achieve the same effect by generating PQN, and running that through JZKit's existing PQN-to-Type-1 compiler. -* Refinements to random query generator: +* Many refinements to the random query generator: * Generate relation modifiers * Proximity support * Don't always generate qualifier/relation for terms @@ -127,6 +144,3 @@ TO DO * Generate multi-word terms * Write fuller "javadoc" comments. - -* Write generic test suite. - diff --git a/docs/README b/docs/README index 3b47757..1b55b71 100644 --- a/docs/README +++ b/docs/README @@ -1,2 +1,4 @@ +$Id: README,v 1.2 2002-11-03 16:49:38 mike Exp $ + Automatically-generated documentation should appear here. cd ../src/org/z3950/zing/cql && make javadocs diff --git a/etc/generate.properties b/etc/generate.properties index 7ef2b65..0494623 100644 --- a/etc/generate.properties +++ b/etc/generate.properties @@ -1,10 +1,9 @@ -# $Id: generate.properties,v 1.1 2002-10-30 09:19:26 mike Exp $ +# $Id: generate.properties,v 1.2 2002-11-03 16:49:38 mike Exp $ # # Propeties file to drive the org.z3950.zing.cql.CQLGenerator # test-harness. See that class's documentation for the semantics of # these properties. # -#seed=18398 complexQuery=0.4 complexClause=0.4 equalsRelation=0.5 diff --git a/lib/.cvsignore b/lib/.cvsignore new file mode 100644 index 0000000..9950dc8 --- /dev/null +++ b/lib/.cvsignore @@ -0,0 +1 @@ +cql-java.jar diff --git a/lib/README b/lib/README new file mode 100644 index 0000000..6b38df4 --- /dev/null +++ b/lib/README @@ -0,0 +1,5 @@ +$Id: README,v 1.1 2002-11-03 16:49:38 mike Exp $ + +The library file "cql-java.jar" will appear here when you do a build +in ../src/org/z3950/zing/cql. Put it on your CLASSPATH to use the +cql-java utilities. diff --git a/src/org/z3950/zing/cql/CQLGenerator.java b/src/org/z3950/zing/cql/CQLGenerator.java index 67e70d1..991f60d 100644 --- a/src/org/z3950/zing/cql/CQLGenerator.java +++ b/src/org/z3950/zing/cql/CQLGenerator.java @@ -1,4 +1,4 @@ -// $Id: CQLGenerator.java,v 1.2 2002-10-30 11:13:18 mike Exp $ +// $Id: CQLGenerator.java,v 1.3 2002-11-03 16:49:38 mike Exp $ package org.z3950.zing.cql; import java.util.Properties; @@ -22,7 +22,7 @@ import java.io.FileNotFoundException; * this distribution - there is a generate_x() method * for each grammar element X. * - * @version $Id: CQLGenerator.java,v 1.2 2002-10-30 11:13:18 mike Exp $ + * @version $Id: CQLGenerator.java,v 1.3 2002-11-03 16:49:38 mike Exp $ * @see http://zing.z3950.org/cql/index.html */ @@ -260,13 +260,14 @@ public class CQLGenerator { * A simple test-harness for the generator. *

* It generates a single random query using the parameters - * specified in a nominated properties file, and decompiles it - * into CQL which is written to standard output. + * specified in a nominated properties file, plus any additional + * name value pairs provided on the command-line, and + * decompiles it into CQL which is written to standard output. *

* For example, - * java org.z3950.zing.cql.CQLGenerator etc/generate.properties + * java org.z3950.zing.cql.CQLGenerator + * etc/generate.properties seed 18398, * where the file generate.properties contains:

-     *	seed=18398
      *	complexQuery=0.4
      *	complexClause=0.4
      *	equalsRelation=0.5
@@ -281,13 +282,19 @@ public class CQLGenerator {
      * @param configFile
      *	The name of a properties file from which to read the
      *	configuration parameters (see above).
+     * @param name
+     *	The name of a configuration parameter.
+     * @param value
+     *	The value to assign to the configuration parameter named in
+     *	the immediately preceding command-line argument.
      * @return
      *	A CQL query expressed in a form that should be comprehensible
      *	to all conformant CQL compilers.
      */
     public static void main (String[] args) throws Exception {
-	if (args.length != 1) {
-	    System.err.println("Usage: CQLGenerator ");
+	if (args.length % 2 != 1) {
+	    System.err.println("Usage: CQLGenerator  "+
+			       "[ ]...");
 	    System.exit(1);
 	}
 
@@ -300,6 +307,8 @@ public class CQLGenerator {
 	Properties params = new Properties();
 	params.load(f);
 	f.close();
+	for (int i = 1; i < args.length; i += 2)
+	    params.setProperty(args[i], args[i+1]);
 
 	CQLGenerator generator = new CQLGenerator(params);
 	CQLNode tree = generator.generate();
diff --git a/src/org/z3950/zing/cql/CQLParser.java b/src/org/z3950/zing/cql/CQLParser.java
index 96e67f0..79abc31 100644
--- a/src/org/z3950/zing/cql/CQLParser.java
+++ b/src/org/z3950/zing/cql/CQLParser.java
@@ -1,4 +1,4 @@
-// $Id: CQLParser.java,v 1.13 2002-11-02 01:24:14 mike Exp $
+// $Id: CQLParser.java,v 1.14 2002-11-03 16:49:38 mike Exp $
 
 package org.z3950.zing.cql;
 import java.io.IOException;
@@ -6,10 +6,9 @@ import java.util.Vector;
 
 
 /**
- * Compiles a CQL string into a parse tree.
- * ##
+ * Compiles CQL strings into parse trees of CQLNode subtypes.
  *
- * @version	$Id: CQLParser.java,v 1.13 2002-11-02 01:24:14 mike Exp $
+ * @version	$Id: CQLParser.java,v 1.14 2002-11-03 16:49:38 mike Exp $
  * @see		http://zing.z3950.org/cql/index.html
  */
@@ -23,6 +22,20 @@ public class CQLParser {
 	    System.err.println("PARSEDEBUG: " + str);
     }
 
+    /**
+     * Compiles a CQL query.
+     * 

+ * The resulting parse tree may be further processed by hand (see + * the individual node-types' documentation for details on the + * data structure) or, more often, simply rendered out in the + * desired form using one of the back-ends. toCQL() + * returns a decompiled CQL query equivalent to the one that was + * compiled in the first place; and toXCQL() returns an + * XML snippet representing the query. + * + * @param cql The query + * @return A CQLNode object which is the root of a parse + * tree representing the query. */ public CQLNode parse(String cql) throws CQLParseException, IOException { lexer = new CQLLexer(cql, LEXDEBUG); @@ -188,7 +201,7 @@ public class CQLParser { match(lexer.ttype); } - boolean isBaseRelation() { + private boolean isBaseRelation() { debug("isBaseRelation: checking ttype=" + lexer.ttype + " (" + lexer.render() + ")"); return (isProxRelation() || @@ -197,7 +210,7 @@ public class CQLParser { lexer.ttype == lexer.TT_EXACT); } - boolean isProxRelation() { + private boolean isProxRelation() { debug("isProxRelation: checking ttype=" + lexer.ttype + " (" + lexer.render() + ")"); return (lexer.ttype == '<' || @@ -222,33 +235,61 @@ public class CQLParser { } - // Test harness. - // - // e.g. echo '(au=Kerninghan or au=Ritchie) and ti=Unix' | - // java org.z3950.zing.cql.CQLParser - // yields: - // - // and - // - // or - // - // au - // = - // Kerninghan - // - // - // au - // = - // Ritchie - // - // - // - // ti - // = - // Unix - // - // - // + /** + * Simple test-harness for the CQLParser class. + *

+ * Reads a CQL query either from its command-line argument, if + * there is one, or standard input otherwise. So these two + * invocations are equivalent: + *

+     *  CQLParser 'au=(Kerninghan or Ritchie) and ti=Unix'
+     *  echo au=(Kerninghan or Ritchie) and ti=Unix | CQLParser
+     * 
+ * The test-harness parses the supplied query and renders is as + * XCQL, so that both of the invocations above produce the + * following output: + *
+     *	<triple>
+     *	  <boolean>
+     *	    <value>and</value>
+     *	  </boolean>
+     *	  <triple>
+     *	    <boolean>
+     *	      <value>or</value>
+     *	    </boolean>
+     *	    <searchClause>
+     *	      <index>au</index>
+     *	      <relation>
+     *	        <value>=</value>
+     *	      </relation>
+     *	      <term>Kerninghan</term>
+     *	    </searchClause>
+     *	    <searchClause>
+     *	      <index>au</index>
+     *	      <relation>
+     *	        <value>=</value>
+     *	      </relation>
+     *	      <term>Ritchie</term>
+     *	    </searchClause>
+     *	  </triple>
+     *	  <searchClause>
+     *	    <index>ti</index>
+     *	    <relation>
+     *	      <value>=</value>
+     *	    </relation>
+     *	    <term>Unix</term>
+     *	  </searchClause>
+     *	</triple>
+     * 
+ *

+ * @param -c + * Causes the output to be written in CQL rather than XCQL - that + * is, a query equivalent to that which was input, is output. In + * effect, the test harness acts as a query canonicaliser. + * @return + * The input query, either as XCQL [default] or CQL [if the + * -c option is supplied]. + */ public static void main (String[] args) { boolean canonicalise = false; Vector argv = new Vector(); diff --git a/src/org/z3950/zing/cql/Makefile b/src/org/z3950/zing/cql/Makefile index e91d911..b0f160a 100644 --- a/src/org/z3950/zing/cql/Makefile +++ b/src/org/z3950/zing/cql/Makefile @@ -1,13 +1,18 @@ -# $Id: Makefile,v 1.3 2002-10-31 22:22:01 mike Exp $ +# $Id: Makefile,v 1.4 2002-11-03 16:49:38 mike Exp $ -all: Utils.class \ +OBJ = Utils.class \ CQLNode.class CQLTermNode.class CQLBooleanNode.class \ CQLAndNode.class CQLOrNode.class CQLNotNode.class \ CQLRelation.class CQLProxNode.class ModifierSet.class \ CQLParser.class CQLLexer.class CQLParseException.class \ CQLGenerator.class ParameterMissingException.class -docs: +../../../../../lib/cql-java.jar: $(OBJ) + cd ../../../..; jar cf ../lib/cql-java.jar org/z3950/zing/cql/*.class + +docs: ../../../../../docs/overview-tree.html + +../../../../../docs/overview-tree.html: *.java nice javadoc -d ../../../../../docs -author -version \ -windowtitle cql-java org.z3950.zing.cql @@ -15,7 +20,7 @@ docs: javac $< clean: - rm -f *.class + rm -f $(CLASS) cleandocs: rm -r docs/* diff --git a/test/Makefile b/test/Makefile index 75aade5..530b2cd 100644 --- a/test/Makefile +++ b/test/Makefile @@ -1,15 +1,19 @@ -# $Id: Makefile,v 1.2 2002-11-02 01:19:23 mike Exp $ - -tests: sections/01/01.xcql - ./runtests CQLParser cat +# $Id: Makefile,v 1.3 2002-11-03 16:49:38 mike Exp $ sections/01/01.xcql: sections - ./mkanswers ../../srw/cql/cqlparse3 + ./mkanswers CQLParser +# OR ./mkanswers ../../srw/cql/cqlparse3 # OR ./mkanswers ../../rob/CQLParser.py -sections: mktests raw +sections: mktests queries.raw rm -rf sections - ./mktests raw + ./mktests queries.raw + +adam-tests: sections/01/01.xcql + ./runtests ../../srw/cql/cqlparse3 + +rob-tests: sections/01/01.xcql + ./runtests ../../rob/CQLParser.py clean: find sections -name '*.xcql' -print | xargs rm -f diff --git a/test/README b/test/README index 567e0cb..614a2ca 100644 --- a/test/README +++ b/test/README @@ -1,6 +1,6 @@ -$Id: README,v 1.2 2002-11-02 01:19:23 mike Exp $ +$Id: README,v 1.3 2002-11-03 16:49:38 mike Exp $ -"raw" is the file of test queries as provided by Rob. +"queries.raw" is the file of test queries as provided by Rob. "mktests" parses the raw file into sections and individual queries "sections" is the top-level directory created by that program. "01", "02" etc. represent the sections within the raw file @@ -21,12 +21,16 @@ reliable, and you want to test my parser, cql-java's CQLParser class, against its results, do this: rm -rf sections - ./mktests raw + ./mktests queries.raw ./mkanswers CQLParser.py ./runtests CQLParser sgmlnorm (Except that sgmlnorm is useless -- gotta find something better.) -Also: there's a nasty hacl here called "showtest" which, when run like +Also: there's a nasty hack here called "showtest" which, when run like ``./showtest 07/03'', will show you the ways in which my output differs from Adam's. I'll probably delete it soon. + +Also: there's a subdirectory "random" which tests in a completely +different way. That ought to be a sister directory with this one, and +will be when I move the rest of this stuff down a level. diff --git a/test/queries.raw b/test/queries.raw new file mode 100644 index 0000000..fd42750 --- /dev/null +++ b/test/queries.raw @@ -0,0 +1,86 @@ +# Simple + +cat +"cat" +comp.os.linux +xml:element +"" +"=" +"prox/word/>=/5" +("cat") +((dog)) + +# index relation term + +title = "fish" +title exact fish +title any fish +title all fish +title > 9 +title >= 23 +dc.title any "fish chips" +dc.title any/stem fish +dc.fish all/stem/fuzzy "fish chips" +(title any frog) +((dc.title any/stem "frog pond")) + +# Simple Boolean + +cat or dog +cat and fish +cat not frog +(cat not frog) +"cat" not "fish food" +xml and "prox/word/" +a or b and c not d + +# I/R/T plus Boolean + +bath.author any fish and dc.title all "cat dog" +(title any/stem "fish dog" or "and") + +# Prox + +cat prox hat +cat prox/word/=/3/ordered hat +cat prox///3 hat +"fish food" prox/sentence "and" +title all "chips frog" prox/word//5 "any" +(dc.author exact "jones" prox///5 title >= "smith") +((cat prox hat)) + +# Special characters +(cat^) +"cat" +"^cat says \"fish\"" +"cat*fish" +cat?dog +(("^cat*fishdog\"horse?")) + +# Nesting Parens + +(((cat or dog) or horse) and frog) +(cat and dog) or (horse and frog) +(cat and (horse or frog)) and chips + +# Lame searches + +"any" or "all:stem" and "all" exact "any" prox/word "prox"="fuzzy" +((((((((("any"))))))))) + + +# Invalid searches [should error] + +> +=== +cat or +index any +index any/wrong term +a prox/wrong b +() +(a +index any fish) +(cat any dog or ()) +fred and any +((fred or all)) +sorry = (mike) diff --git a/test/random/README b/test/random/README new file mode 100644 index 0000000..0b09089 --- /dev/null +++ b/test/random/README @@ -0,0 +1,15 @@ +$Id: README,v 1.1 2002-11-03 16:49:38 mike Exp $ + +In this directory, we test the integrity of the cql-java tools as +follows: + +* Generate a random tree with CQLGenerate +* Take a copy +* Canonicalise it with CQLparser -c. +* Compare the before-and-after versions. + + Since the CQLGenerate output is in canonical form anyway, the + before-and-after versions should be identical. This process + exercises the comprehensiveness and bullet-proofing of the parser, + as well as the accuracy of the rendering. + diff --git a/test/random/mkrandom b/test/random/mkrandom new file mode 100755 index 0000000..1543e08 --- /dev/null +++ b/test/random/mkrandom @@ -0,0 +1,21 @@ +#!/usr/bin/perl -w + +use strict; + +my $n = 1; +if (@ARGV > 1) { + print STDERR "Usage: $0 []\n"; + exit 1; +} elsif (@ARGV == 1) { + $n = $ARGV[0]; +} + +for (my $i = 0; $i < $n; $i++) { + print $i+1, " of $n -- "; + my $query=`CQLGenerator ../../etc/generate.properties`; + print $query; + my $canon=`CQLParser -c '$query'`; + if ($canon ne $query) { + print "ERROR: canonicalised query differs from original\n"; + } +} diff --git a/test/raw b/test/raw deleted file mode 100644 index 12a16b7..0000000 --- a/test/raw +++ /dev/null @@ -1,87 +0,0 @@ -# Simple - -cat -"cat" -comp.os.linux -xml:element -"" -"=" -"prox/word/>=/5" -("cat") -((dog)) - -# index relation term - -title = "fish" -title exact fish -title any fish -title all fish -title > 9 -title >= 23 -dc.title any "fish chips" -dc.title any/stem fish -dc.fish all/stem/fuzzy "fish chips" -(title any frog) -((dc.title any/stem "frog pond")) - -# Simple Boolean - -cat or dog -cat and fish -cat not frog -(cat not frog) -"cat" not "fish food" -xml and "prox/word/" -a or b and c not d - -# I/R/T plus Boolean - -bath.author any fish and dc.title all "cat dog" -(title any/stem "fish dog" or "and") - -# Prox - -cat prox hat -cat prox/word/=/3/ordered hat -cat prox///3 hat -"fish food" prox/sentence "and" -title all "chips frog" prox/word//5 "any" -(dc.author exact "jones" prox///5 title >= "smith") -((cat prox hat)) - -# Special characters -(cat^) -"cat" -"^cat says \"fish\"" -"cat*fish" -cat?dog -(("^cat*fishdog\"horse?")) - -# Nesting Parens - -(((cat or dog) or horse) and frog) -(cat and dog) or (horse and frog) -(cat and (horse or frog)) and chips - -# Lame searches - -any or all:stem and all exact any prox/word prox=fuzzy -(((((((((any))))))))) - - -# Invalid searches [should error] - -^ -> -=== -cat or -index any -index any/wrong term -a prox/wrong b -() -(a -index any fish) -(cat any dog or ()) -sorry = (mike) -fred and any -((fred or all)) -- 1.7.10.4