X-Git-Url: http://jsfdemo.indexdata.com/?a=blobdiff_plain;f=doc%2Fbook.xml;h=26fd0a5ba76102ef45101a54c69f50b86731e173;hb=9ac41f74e33f58fbbb507f0b3ae9ccdce306f525;hp=13abe5ef088fc2b0431b56ccf6e766db106158dc;hpb=84364c07e4830831703c5bca5f900878753b6ac7;p=metaproxy-moved-to-github.git diff --git a/doc/book.xml b/doc/book.xml index 13abe5e..26fd0a5 100644 --- a/doc/book.xml +++ b/doc/book.xml @@ -1,11 +1,34 @@ - + + + + + + %common; + + + + +]> + + Metaproxy - User's Guide and Reference - MikeTaylor + AdamDickmeiss - AdamDickmeiss + MarcCromme + + + MikeTaylor 2006 @@ -32,10 +55,10 @@ using the filter API. - The terms under which Metaproxy will be distributed have yet to be - established, but it will not necessarily be open source; so users - should not at this stage redistribute the code without explicit - written permission from the copyright holders, Index Data ApS. + Metaproxy is not open-source software, but + may be freely downloaded, unpacked, inspected, built and run for + evaluation purposes. Deployment requires a separate, commercial, + license. @@ -56,7 +79,7 @@ Metaproxy - is a standalone program that acts as a universal router, proxy and + is a stand alone program that acts as a universal router, proxy and encapsulated metasearcher for information retrieval protocols such as Z39.50, and in the future SRU and SRW. @@ -84,13 +107,13 @@ being more powerful, flexible, configurable and extensible. Among its many advantages over the older, more pedestrian work are support for multiplexing (encapsulated metasearching), routing by - database name, authentication and authorisation and serving local + database name, authentication and authorization and serving local files via HTTP. Equally significant, its modular architecture facilitites the creation of pluggable modules implementing further functionality. - This manual will briefly describe Metaproxy's licensing situation + This manual will describe how to install Metaproxy before giving an overview of its architecture, then discussing the key concept of a filter in some depth and giving an overview of the various filter types, then discussing the configuration file @@ -105,26 +128,60 @@ - - - - The Metaproxy Licence - - - No decision has yet been made on the terms under which - Metaproxy will be distributed. - - It is possible that, unlike - other Index Data products, metaproxy may not be released under a - free-software licence such as the GNU GPL. Until a decision is - made and a public statement made, then, and unless it has been - delivered to you other specific terms, please treat Metaproxy as - though it were proprietary software. - The code should not be redistributed without explicit - written permission from the copyright holders, Index Data ApS. - + + The Metaproxy License + + + + You are allowed to download this software for evaluation purposes. + You can unpack it, build it, run it, see how it works and how it fits + your needs, all at zero cost. + + + + + You may NOT deploy the software. For the purposes of this license, + deployment means running it for any purpose other than evaluation, + whether or not you or anyone else makes a profit from doing so. If + you wish to deploy the software, you must first contact Index Data and + arrange to purchase a DEPLOYMENT LICENCE. If you are unsure + whether or not your proposed use of the software constitutes + deployment, email us at info@indexdata.com + for clarification. + + + + + You may modify your copy of the software (fix bugs, add features) + if you need to. We encourage you to send your changes back to us for + integration into the master copy, but you are not obliged to do so. You + may NOT pass your changes on to any other party. + + + + + There is NO WARRANTY for this software, to the extent permitted by + applicable law. We provide the software ``as is'' without warranty of + any kind, either expressed or implied, including, but not limited to, the + implied warranties of MERCHANTABILITY and FITNESS FOR A + PARTICULAR PURPOSE. The entire risk as to the quality and + performance of the software is with you. Should the software prove + defective, you assume the cost of all necessary servicing, repair or + correction. In no event unless required by applicable law will we be + liable to you for damages, arising out of the use of the software, + including but not limited to loss of data or data being rendered + inaccurate. + + + + + All rights to the software are reserved by Index Data except where + this license explicitly says otherwise. + + + - + Installation @@ -164,7 +221,7 @@ for more information. - We have succesfully used Metaproxy with Boost using the compilers + We have successfully built Metaproxy using the compilers GCC version 4.0 and Microsoft Visual Studio 2003/2005. @@ -241,47 +298,87 @@
- Installation on Debian + Installation on Debian GNU/Linux - ### To be written + All dependencies for Metaproxy are available as + Debian + packages for the sarge (stable in 2005) and etch (testing in 2005) + distributions. - (Of course, since Debian is a Unix system, the instructions in the - previous section can be used.) + The procedures for Debian based systems, such as + Ubuntu is probably similar -
+ + There is currently no official Debian package for YAZ++. + And the Debian package for YAZ is probably too old. + Update the /etc/apt/sources.list + to include the Index Data repository. + See YAZ' Download Debian + for more information. + + + apt-get install libxslt1-dev + apt-get install libyazpp-dev + apt-get install libboost-dev + apt-get install libboost-thread-dev + apt-get install libboost-date-time-dev + apt-get install libboost-program-options-dev + apt-get install libboost-test-dev + + + With these packages installed, the usual configure + make + procedure can be used for Metaproxy as outlined in + . + +
Installation on Windows - Compilation of Metaproxy can be done using - Microsoft Visual Studio. - We know Version 2003 works. We expect Version 2005 to - work as well. + Metaproxy can be compiled with Microsoft + Visual Studio. + Version 2003 (C 7.1) and 2005 (C 8.0) is known to work.
Boost Get Boost from its home page. You also need Boost Jam (an alternative to make). - That's also available from this - home page. The files download are called something like: - boost_1_33-1.exe + That's also available from the Boost home page. + The files to be downloaded are called something like: + boost_1_33-1.exe and - boost-jam-3.1.12-1-ntx86.zip. - Unpack Boost Jam first. Put bjam.exe + boost-jam-3.1.12-1-ntx86.zip. + Unpack Boost Jam first. Put bjam.exe in your system path. Make a command prompt and ensure it can be found automatically. If not check the PATH. The Boost .exe is a self-extracting exe with complete source for Boost. Compile that source with Boost Jam (An alternative to Make). The compilation takes a while. - By default, the Boost build process puts the resulting + For Visual Studio 2003, use + + bjam "-sTOOLS=vc-7_1" + + Here vc-7_1 refers to a "Toolset" (compiler system). + For Visual Studio 2005, use + + bjam "-sTOOLS=vc-8_0" + + To install the libraries in a common place, use + + bjam "-sTOOLS=vc-7_1" install + + (or vc-8_0 for VS 2005). + + + By default, the Boost build process installs the resulting libraries + header files in \boost\lib, \boost\include. - For more informatation about installing Boost refer to the + For more information about installing Boost refer to the getting started pages. @@ -295,7 +392,7 @@ here. - Libxslt has other dependencies, but thes can all be downloaded + Libxslt has other dependencies, but these can all be downloaded from the same site. Get the following: iconv, zlib, libxml2, libxslt. @@ -327,18 +424,71 @@
Metaproxy - Metaproxy is shipped with NMAKE makfiles as well - similar + Metaproxy is shipped with NMAKE makefiles as well - similar to those found in the YAZ++/YAZ packages. Adjust this Makefile to point to the proper locations of Boost, Libxslt, Libxml2, zlib, iconv, yaz and yazpp. + + + DEBUG + + If set to 1, the software is + compiled with debugging libraries (code generation is + multi-threaded debug DLL). + If set to 0, the software is compiled with release libraries + (code generation is multi-threaded DLL). + + + + + BOOST + + + Boost install location + + + + + + BOOST_VERSION + + + Boost version (replace . with _). + + + + + + BOOST_TOOLSET + + + Boost toolset. + + + + + + LIBXSLT_DIR, + LIBXML2_DIR .. + + + Specify the locations of Libxslt, libiconv, libxml2 and + libxslt. + + + + + + - After succesful compilation you'll find + After successful compilation you'll find metaproxy.exe in the bin directory.
+
@@ -369,7 +519,7 @@ In general, packages are doctored as they pass through Metaproxy. For example, when the proxy performs authentication - and authorisation on a Z39.50 Init request, it removes the + and authorization on a Z39.50 Init request, it removes the authentication credentials from the package so that they are not passed onto the back-end server; and when search-response packages are obtained from multiple servers, they are merged @@ -408,7 +558,7 @@ The word ``filter'' is sometimes used rather loosely, in two different ways: it may be used to mean a particular type of filter, as when we speak of ``the - auth_simplefilter'' or ``the multi filter''; or it may be used + auth_simple filter'' or ``the multi filter''; or it may be used to be a specific instance of a filter within a Metaproxy configuration. For example, a single configuration will often contain multiple instances of the @@ -465,7 +615,7 @@ as part of Metaproxy, and others may be provided by third parties and dynamically loaded. They all conform to the same simple API of essentially two methods: configure() is - called at startup time, and is passed a DOM tree representing that + called at startup time, and is passed an XML DOM tree representing that part of the configuration file that pertains to this filter instance: it is expected to walk that tree extracting relevant information; and process() is called every @@ -479,6 +629,7 @@ others are sinks: they consume packages and return a result (z3950_client, backend_test, + bounce, http_file); the others are true filters, that read, process and pass on the packages they are fed @@ -486,7 +637,9 @@ log, multi, query_rewrite, + record_transform, session_shared, + sru_z3950, template, virt_db). @@ -498,7 +651,7 @@ We now briefly consider each of the types of filter supported by the core Metaproxy binary. This overview is intended to give a - flavour of the available functionality; more detailed information + flavor of the available functionality; more detailed information about each type of filter is included below in the reference guide to Metaproxy filters. @@ -516,11 +669,34 @@ The filters are here listed in alphabetical order: + +
<literal>auth_simple</literal> (mp::filter::AuthSimple) - Simple authentication and authorisation. The configuration + Simple authentication and authorization. The configuration specifies the name of a file that is the user register, which lists username:password pairs, one per line, colon separated. When a session begins, it @@ -539,7 +715,7 @@ <literal>backend_test</literal> (mp::filter::Backend_test) - A sink that provides dummy responses in the manner of the + A partial sink that provides dummy responses in the manner of the yaz-ztest Z39.50 server. This is useful only for testing. Seriously, you don't need this. Pretend you didn't even read this section. @@ -547,13 +723,31 @@
+ <literal>bounce</literal> + (mp::filter::Bounce) + + A sink that swallows all packages, + and returns them almost unprocessed. + It never sends any package of any type further down the row, but + sets Z39.50 packages to Z_Close, and HTTP_Request packages to + HTTP_Response err code 400 packages, and adds a suitable bounce + message. + The bounce filter is usually added at end of each filter chain + config.xml to prevent infinite hanging of for example HTTP + requests packages when only the Z39.50 client partial sink + filter is found in the + route. + +
+ +
<literal>frontend_net</literal> (mp::filter::FrontendNet) A source that accepts Z39.50 connections from a port specified in the configuration, reads protocol units, and feeds them into the next filter in the route. When the result is - revceived, it is returned to the original origin. + received, it is returned to the original origin.
@@ -561,8 +755,12 @@ <literal>http_file</literal> (mp::filter::HttpFile) - A sink that returns the contents of files from the local - filesystem in response to HTTP requests. (Yes, Virginia, this + A partial sink which swallows only HTTP_Request packages, and + returns the contents of files from the local + filesystem in response to HTTP requests. + It lets Z39.50 packages and all other forthcoming package types + pass untouched. + (Yes, Virginia, this does mean that Metaproxy is also a Web-server in its spare time. So far it does not contain either an email-reader or a Lisp interpreter, but that day is surely coming.) @@ -574,7 +772,8 @@ (mp::filter::Log) Writes logging information to standard output, and passes on - the package unchanged. + the package unchanged. A log file name can be specified, as well + as multiple different logging formats.
@@ -602,6 +801,21 @@ + +
+ <literal>record_transform</literal> + (mp::filter::RecordTransform) + + This filter acts only on Z3950 present requests, and let all + other types of packages and requests pass untouched. It's use is + twofold: blocking Z3950 present requests, which the backend + server does not understand and can not honor, and transforming + the present syntax and elementset name according to the rules + specified, to fetch only existing record formats, and transform + them on the fly to requested record syntaxes. + +
+
<literal>session_shared</literal> (mp::filter::SessionShared) @@ -619,6 +833,16 @@
+ +
+ <literal>sru_z3950</literal> + (mp::filter::SRUtoZ3950) + + This filter transforms valid + SRU/GET or SRU/SOAP requests to Z3950 requests, and wraps the + received hit counts and XML records into suitable SRU response messages. + +
<literal>template</literal> @@ -635,7 +859,7 @@ <section> <title><literal>virt_db</literal> - (mp::filter::Virt_db) + (mp::filter::VirtualDB) Performs virtual database selection: based on the name of the database in the search request, a server is selected, and its @@ -652,13 +876,16 @@ <literal>z3950_client</literal> (mp::filter::Z3950Client) - Performs Z39.50 searching and retrieval by proxying the + A partial sink which swallows only Z39.50 packages. + It performs Z39.50 searching and retrieval by proxying the packages that are passed to it. Init requests are sent to the address specified in the VAL_PROXY otherInfo attached to the request: this may have been specified by client, or generated by a virt_db filter earlier in the route. Subsequent requests are sent to the same address, which is remembered at Init time in a Session object. + HTTP_Request packages and all other forthcoming package types + are passed untouched.
@@ -683,34 +910,10 @@ - frontend_sru (source) - - - Receive SRU (and perhaps SRW) requests. - - - - - sru2z3950 (filter) - - - Translate SRU requests into Z39.50 requests. - - - - sru_client (sink) - SRU searching and retrieval. - - - - - srw_client (sink) - - - SRW searching and retrieval. + SRU/GET and SRU/SOAP searching and retrieval. @@ -748,37 +951,21 @@ implementation detail - they could just as well have been written in YAML or Lisp-like S-expressions, or in a custom syntax.) - - Since XML has been chosen, an XML schema, - config.xsd, is provided for validating - configuration files. This file is supplied in the - etc directory of the Metaproxy distribution. It - can be used by (among other tools) the xmllint - program supplied as part of the libxml2 - distribution: - - - xmllint --noout --schema etc/config.xsd my-config-file.xml - - - (A recent version of libxml2 is required, as - support for XML Schemas is a relatively recent addition.) -
- Overview of XML structure + Overview of the config file XML structure All elements and attributes are in the namespace - . + . This is most easily achieved by setting the default namespace on the top-level element, as here: - <yp2 xmlns="http://indexdata.dk/yp2/config/1"> + <metaproxy xmlns="http://indexdata.com/metaproxy" version="1.0"> - The top-level element is <yp2>. This contains a + The top-level element is <metaproxy>. This contains a <start> element, a <filters> element and a <routes> element, in that order. <filters> is optional; the other two are mandatory. All three are @@ -827,14 +1014,14 @@ The following is a small, but complete, Metaproxy configuration file (included in the distribution as - metaproxy/etc/config0.xml). + metaproxy/etc/config1.xml). This file defines a very simple configuration that simply proxies to whatever back-end server the client requests, but logs each request and response. This can be useful for debugging complex client-server dialogues. - + @@ -848,13 +1035,14 @@ + - + ]]> It works by defining a single route, called - start, which consists of a sequence of three + start, which consists of a sequence of four filters. The first and last of these are included by reference: their <filter> elements have refid attributes that refer to filters defined @@ -862,18 +1050,51 @@ middle filter is included inline in the route. - The three filters in the route are as follows: first, a + The four filters in the route are as follows: first, a frontend_net filter accepts Z39.50 requests from any host on port 9000; then these requests are passed through a log filter that emits a message for each request; they are then fed into a z3950_client - filter, which forwards the requests to the client-specified - back-end Z39.509 server. When the response arrives, it is handed + filter, which forwards all Z39.50 requests to the client-specified + back-end Z39.509 server. Those Z39.50 packages are returned by the + z3950_client filter, with the response data + filled by the external Z39.50 server targeted. + All non-Z39.50 packages are passed through to the + bounce filter, which definitely bounces + everything, including fish, bananas, cold pyjamas, + mutton, beef and trout packages. + When the response arrives, it is handed back to the log filter, which emits another - message; and then to the front-end filter, which returns the - response to the client. + message; and then to the frontend_net filter, + which returns the response to the client.
+
+ Config file syntax checking + + The distribution contains RelaxNG Compact and XML syntax checking + files, as well as XML Schema files. These are found in the + distribution paths + + xml/schema/metaproxy.rnc + xml/schema/metaproxy.rng + xml/schema/metaproxy.xsd + + and can be used to verify or debug the XML structure of + configuration files. For example, using the utility + xmllint, syntax checking is done like this: + + xmllint --noout --schema xml/schema/metaproxy.xsd etc/config-local.xml + xmllint --noout --relaxng xml/schema/metaproxy.rng etc/config-local.xml + + (A recent version of libxml2 is required, as + support for XML Schemas is a relatively recent addition.) + + + You can of course use any other RelaxNG or XML Schema compliant tool + you wish. + +
@@ -896,14 +1117,14 @@ The interaction between these two filters is necessarily complex: it reflects the real, irreducible complexity of multi-database searching in a protocol such - as Z39.50 that separates initialisation from searching, and in - which the database to be searched is not known at initialisation + as Z39.50 that separates initialization from searching, and in + which the database to be searched is not known at initialization time. It's possible to use these filters without understanding the details of their functioning and the interaction between them; the - next two sections of this chapter are ``HOWTO'' guides for doing + next two sections of this chapter are ``HOW-TO'' guides for doing just that. However, debugging complex configurations will require a deeper understanding, which the last two sections of this chapters attempt to provide. @@ -941,7 +1162,7 @@ marc - indexdata.dk/marc + indexdata.com/marc ]]> @@ -966,7 +1187,7 @@ Index Data's tiny testing database of MARC records: - + @@ -981,21 +1202,22 @@ marc - indexdata.dk/marc + indexdata.com/marc all z3950.loc.gov:7090/voyager - indexdata.dk/marc + indexdata.com/marc 30 + -]]> +]]> (Using a virt_db @@ -1091,6 +1313,27 @@ Z> be metasearched in this way: issues of resource usage and administrative complexity dictate the practical limits. + + What happens when one of the databases doesn't respond? By default, + the entire multi-database search fails, and the appropriate + diagnostic is returned to the client. This is usually appropriate + during development, when technicians need maximum information, but + can be inconvenient in deployment, when users typically don't want + to be bothered with problems of this kind and prefer just to get + the records from the databases that are available. To obtain this + latter behavior add an empty + <hideunavailable> + element inside the + multi filter: + + + + ]]> + + Under this regime, an error is reported to the client only if + all the databases in a multi-database search + are unavailable. + @@ -1128,15 +1371,18 @@ Z> >the HTTP 1.1 specification. - The role of the virt_db filter is to rewrite - this otherInfo packet dependent on the virtual database that the - client wants to search. + Within Metaproxy, Search requests that are part of the same + session as an Init request that carries a + VAL_PROXY otherInfo are also annotated with the + same information. The role of the virt_db + filter is to rewrite this otherInfo packet dependent on the + virtual database that the client wants to search. When Metaproxy receives a Z39.50 Init request from a client, it doesn't immediately forward that request to the back-end server. Why not? Because it doesn't know which - back-end server to forward it to until the client sends a search + back-end server to forward it to until the client sends a Search request that specifies the database that it wants to search in. Instead, it just treasures the Init request up in its heart; and, later, the first time the client does a search on one of the @@ -1161,9 +1407,35 @@ Z> through it. - ### Describe the use of multiple VAL_PROXY - otherInfos, added by virt_db and used by - multi. + It is possible for a virt_db filter to contain + multiple + <target> + elements. What does this mean? Only that the filter will add + multiple VAL_PROXY otherInfo packets to the + Search requests that pass through it. That's because the virtual + DB filter is dumb, and does exactly what it's told - no more, no + less. + If a Search request with multiple VAL_PROXY + otherInfo packets reaches a z3950_client + filter, this is an error. That filter doesn't know how to deal + with multiple targets, so it will either just pick one and search + in it, or (better) fail with an error message. + + + The multi filter comes to the rescue! This is + the only filter that knows how to deal with multiple + VAL_PROXY otherInfo packets, and it does so by + making multiple copies of the entire Search request: one for each + VAL_PROXY. Each of these new copies is then + passed down through the remaining filters in the route. (The + copies are handled in parallel though the + spawning of new threads.) Since the copies each have only one + VAL_PROXY otherInfo, they can be handled by the + z3950_client filter, which happily deals with + each one individually. When the results of the individual + searches come back up to the multi filter, it + merges them into a single Search response, which is what + eventually makes it back to the client. @@ -1173,7 +1445,7 @@ Z> - + @@ -1184,9 +1456,8 @@ Z> [Here there should be a diagram showing the progress of packages through the filters during a simple virtual-database search and a multi-database search, but is seems that your - toolchain has not been able to include the diagram in this - document. This is because of LaTeX suckage. Time to move to - OpenOffice. Yes, really.] + tool chain has not been able to include the diagram in this + document.]