X-Git-Url: http://jsfdemo.indexdata.com/?a=blobdiff_plain;ds=sidebyside;f=doc%2Fbook.xml;h=4dcb2f47e3928c9950c1a8580472379f7ad63edf;hb=271eaaa60ec419d64669cf0e9b5753d05365b798;hp=21f0ac7843696c7358a6cfb5341b7001fd086383;hpb=270fb3d6cf7a5ab91e1c10371488e73652ba90da;p=metaproxy-moved-to-github.git diff --git a/doc/book.xml b/doc/book.xml index 21f0ac7..4dcb2f4 100644 --- a/doc/book.xml +++ b/doc/book.xml @@ -1,4 +1,4 @@ - + Metaproxy - User's Guide and Reference @@ -9,16 +9,20 @@ 2006 - Index Data + Index Data ApS Metaproxy is a universal router, proxy and encapsulated metasearcher for information retrieval protocols. It accepts, processes, interprets and redirects requests from IR clients using - standard protocols such as ANSI/NISO Z39.50 (and in the future SRU - and SRW), as well as functioning as a limited - HTTP server. Metaproxy is configured by an XML file which + standard protocols such as + ANSI/NISO Z39.50 + (and in the future SRU + and SRW), as + well as functioning as a limited + HTTP server. + Metaproxy is configured by an XML file which specifies how the software should function in terms of routes that the request packets can take through the proxy, each step on a route being an instantiation of a filter. Filters come in many @@ -33,6 +37,16 @@ should not at this stage redistribute the code without explicit written permission from the copyright holders, Index Data ApS. + + + + + + + + + + @@ -40,39 +54,55 @@ Introduction - - Metaproxy - is a standalone program that acts as a universal router, proxy and - encapsulated metasearcher for information retrieval protocols such - as Z39.50, and in the future SRU and SRW. To clients, it acts as a - server of these - protocols: it can be searched, records can be retrieved from it, - etc. To servers, it acts as a client: it searches in them, - retrieves records from them, etc. it satisfies its clients' - requests by transforming them, multiplexing them, forwarding them - on to zero or more servers, merging the results, transforming - them, and delivering them back to the client. In addition, it - acts as a simple HTTP server; support for further protocols can be - added in a module fashion, through the creation of new filters. - - - Anything goes in! - Anything goes out! - Cold bananas, fish, pyjamas, - Mutton, beef and trout! + + Metaproxy + is a standalone program that acts as a universal router, proxy and + encapsulated metasearcher for information retrieval protocols such + as Z39.50, and in the future + SRU and SRW. + To clients, it acts as a server of these protocols: it can be searched, + records can be retrieved from it, etc. + To servers, it acts as a client: it searches in them, + retrieves records from them, etc. it satisfies its clients' + requests by transforming them, multiplexing them, forwarding them + on to zero or more servers, merging the results, transforming + them, and delivering them back to the client. In addition, it + acts as a simple HTTP server; support + for further protocols can be added in a modular fashion, through the + creation of new filters. + + + Anything goes in! + Anything goes out! + Cold bananas, fish, pyjamas, + Mutton, beef and trout! - attributed to Cole Porter. - - - Metaproxy is a more capable alternative to - YAZ Proxy, - being more powerful, flexible, configurable and extensible. Among - its many advantages over the older, more pedestrian work are - support for multiplexing (encapsulated metasearching), routing by - database name, authentication and authorisation and serving local - files via HTTP. Equally significant, its modular architecture - facilitites the creation of pluggable modules implementing further - functionality. - + + + Metaproxy is a more capable alternative to + YAZ Proxy, + being more powerful, flexible, configurable and extensible. Among + its many advantages over the older, more pedestrian work are + support for multiplexing (encapsulated metasearching), routing by + database name, authentication and authorisation and serving local + files via HTTP. Equally significant, its modular architecture + facilitites the creation of pluggable modules implementing further + functionality. + + + This manual will briefly describe Metaproxy's licensing situation + before giving an overview of its architecture, then discussing the + key concept of a filter in some depth and giving an overview of + the various filter types, then discussing the configuration file + format. After this come several optional chapters which may be + freely skipped: a detailed discussion of virtual databases and + multi-database searching, some notes on writing extensions + (additional filter types) and a high-level description of the + source code. Finally comes the reference guide, which contains + instructions for invoking the metaproxy + program, and detailed information on each type of filter, + including examples. + @@ -81,8 +111,8 @@ The Metaproxy Licence - No decision has yet been made on the terms under which - Metaproxy will be distributed. + No decision has yet been made on the terms under which + Metaproxy will be distributed. It is possible that, unlike other Index Data products, metaproxy may not be released under a @@ -95,8 +125,134 @@ + + Installation + + Metaproxy depends on the folloing tools/libraries: + + YAZ++ + + + This is a C++ library based on YAZ. + + + + Libxslt + + This is an XSLT processor - based on + Libxml2. Both Libxml2 and + Libxslt must be installed with the development components. + + + + Boost + + + The popular C++ library. + + + + + + + In order to compile Metaproxy a modern C++ compiler is + required. Boost, in particular, requires the C++ compiler + to facilitate the newest features. Refer to Boost + Compiler Status + for more information. + + + We have succesfully used Metaproxy with Boost using the compilers + GCC version 4.0 and + Microsoft Visual Studio 2003/2005. + + +
+ Installation on Unix (from Source) + + Here is a quick step-by-step guide on how to compile all the + tools that Metaproxy uses. Only few systems have none of the required + tools binary packages. If, for example, Libxml2/libxslt are already + installed as development packages use those (and omit compilation). + + + + Libxml2/libxslt: + + + gunzip -c libxml2-version.tar.gz|tar xf - + cd libxml2-version + ./configure + make + su + make install + + + gunzip -c libxslt-version.tar.gz|tar xf - + cd libxslt-version + ./configure + make + su + make install + + + YAZ/YAZ++: + + + gunzip -c yaz-version.tar.gz|tar xf - + cd yaz-version + ./configure + make + su + make install + + + gunzip -c yazpp-version.tar.gz|tar xf - + cd yazpp-version + ./configure + make + su + make install + + + Boost: + + + gunzip -c boost-version.tar.gz|tar xf - + cd boost-version + ./configure + make + su + make install + + + Metaproxy: + + + gunzip -c metaproxy-version.tar.gz|tar xf - + cd metaproxy-version + ./configure + make + su + make install + +
+
+ Installation on Debian + + ### To be written + +
+
+ Installation on Windows + + ### To be written + +
+
+ The Metaproxy Architecture @@ -248,14 +404,15 @@ -
+
Overview of filter types We now briefly consider each of the types of filter supported by the core Metaproxy binary. This overview is intended to give a flavour of the available functionality; more detailed information - about each type of filter is included below in the Module - Reference. + about each type of filter is included below in + the reference guide to Metaproxy filters. The filters are here named by the string that is used as the @@ -418,7 +575,7 @@
-
+
Future directions Some other filters that do not yet exist, but which would be @@ -482,32 +639,6 @@ - - Virtual databases and multi-database searching - - -
- Introductory notes - - Two of Metaproxy's filters are concerned with multiple-database - operations. Of these, virt_db can work alone - to control the routing of searches to one of a number of servers, - while multi can work with the output of - virt_db to perform multicast searching, merging - the results into a unified result-set. The interaction between - these two filters is necessarily complex, reflecting the real - complexity of multicast searching in a protocol such as Z39.50 - that separates initialisation from searching, with the database to - search known only during the latter operation. - - - ### Much, much more to say! - -
-
- - - Configuration: the Metaproxy configuration file format @@ -519,7 +650,9 @@ its configuration file can be thought of as a program for that interpreter. Configuration is by means of a single file, the name of which is supplied as the sole command-line argument to the - yp2 program. + metaproxy program. (See + the reference guide + below for more information on invoking Metaproxy.)
The configuration files are written in XML. (But that's just an @@ -544,7 +677,7 @@
-
+
Overview of XML structure All elements and attributes are in the namespace @@ -565,15 +698,19 @@ The <start> element is empty, but carries a route attribute, whose value is the name of - route at which to start running - analogouse to the name of the + route at which to start running - analogous to the name of the start production in a formal grammar. If present, <filters> contains zero or more <filter> - elements; filters carry a type attribute and - contain various elements that provide suitable configuration for - filters of that type. The filter-specific elements are described - below. Filters defined in this part of the file must carry an + elements. Each filter carries a type attribute + which specifies what kind of filter is being defined + (frontend_net, log, etc.) + and contain various elements that provide suitable configuration + for a filter of its type. The filter-specific elements are + described in + the reference guide below. + Filters defined in this part of the file must carry an id attribute so that they can be referenced from elsewhere. @@ -589,159 +726,183 @@ <filters> section. Alternatively, a route within a filter may omit the refid attribute, but contain configuration elements similar to those used for filters defined - in the <filters> section. + in the <filters> section. (In other words, each filter in a + route may be included either by reference or by physical + inclusion.)
-
- Filter configuration +
+ An example configuration - All <filter> elements have in common that they must carry a - type attribute whose value is one of the - supported ones, listed in the schema file and discussed below. In - additional, <filters>s occurring the <filters> section - must have an id attribute, and those occurring - within a route must have either a refid - attribute referencing a previously defined filter or contain its - own configuration information. + The following is a small, but complete, Metaproxy configuration + file (included in the distribution as + metaproxy/etc/config0.xml). + This file defines a very simple configuration that simply proxies + to whatever backend server the client requests, but logs each + request and response. This can be useful for debugging complex + client-server dialogues. + + + + + + @:9000 + + + + + + + + + + + + +]]> - In general, each filter recognises different configuration - elements within its element, as each filter has different - functionality. These are as follows: + It works by defining a single route, called + start, which consists of a sequence of three + filters. The first and last of these are included by reference: + their <filter> elements have + refid attributes that refer to filters defined + within the prior <filters> section. The + middle filter is included inline in the route. + + The three filters in the route are as follows: first, a + frontend_net filter accepts Z39.50 requests + from any host on port 9000; then these requests are passed through + a log filter that emits a message for each + request; they are then fed into a z3950_client + filter, which forwards the requests to the client-specified + backend Z39.509 server. When the response arrives, it is handed + back to the log filter, which emits another + message; and then to the front-end filter, which returns the + response to the client. + +
+ -
- <literal>auth_simple</literal> - - <filter type="auth_simple"> - <userRegister>../etc/example.simple-auth</userRegister> - </filter> - -
- -
- <literal>backend_test</literal> - - <filter type="backend_test"/> - -
- -
- <literal>frontend_net</literal> - - <filter type="frontend_net"> - <threads>10</threads> - <port>@:9000</port> - </filter> - -
- -
- <literal>http_file</literal> - - <filter type="http_file"> - <mimetypes>/etc/mime.types</mimetypes> - <area> - <documentroot>.</documentroot> - <prefix>/etc</prefix> - </area> - </filter> - -
- -
- <literal>log</literal> - - <filter type="log"> - <message>B</message> - </filter> - -
- -
- <literal>multi</literal> - - <filter type="multi"/> - -
- -
- <literal>query_rewrite</literal> - - <filter type="query_rewrite"> - <xslt>pqf2pqf.xsl</xslt> - </filter> - -
-
- <literal>session_shared</literal> - - <filter type="session_shared"> - ### Not yet defined - </filter> - -
-
- <literal>template</literal> - - <filter type="template"/> - -
+ + Virtual databases and multi-database searching -
- <literal>virt_db</literal> - - <filter type="virt_db"> - <virtual> - <database>loc</database> - <target>z3950.loc.gov:7090/voyager</target> - </virtual> - <virtual> - <database>idgils</database> - <target>indexdata.dk/gils</target> - </virtual> - </filter> - -
-
- <literal>z3950_client</literal> - - <filter type="z3950_client"> - <timeout>30</timeout> - </filter> - -
+
+ Introductory notes + + Lark's vomit + + This chapter goes into a level of technical detail that is + probably not necessary in order to configure and use Metaproxy. + It is provided only for those who like to know how things work. + You should feel free to skip on to the next section if this one + doesn't seem like fun. + + + + Two of Metaproxy's filters are concerned with multiple-database + operations. Of these, virt_db can work alone + to control the routing of searches to one of a number of servers, + while multi can work with the output of + virt_db to perform multicast searching, merging + the results into a unified result-set. The interaction between + these two filters is necessarily complex: it reflecting the real, + irreducible complexity of multicast searching in a protocol such + as Z39.50 that separates initialisation from searching, and in + which the database to be searched is not known at initialisation + time. + + + Hold on tight - this may get a little hairy. + + + In the general course of things, a Z39.50 Init request may carry + with it an otherInfo packet of type VAL_PROXY, + whose value indicates the address of a Z39.50 server to which the + ultimate connection is to be made. (This otherInfo packet is + supported by YAZ-based Z39.50 clients and servers, but has not yet + been ratified by the Maintenance Agency and so is not widely used + in non-Index Data software. We're working on it.) + The VAL_PROXY packet functions + analogously to the absoluteURI-style Request-URI used with the GET + method when a web browser asks a proxy to forward its request: see + the + Request-URI + section of + the HTTP 1.1 specification. + + + The role of the virt_db filter is to rewrite + this otherInfo packet dependent on the virtual database that the + client wants to search. For example, a virt_db + filter could be set up so that searches in the virtual database + ``lc'' are forwarded to the Library of Congress server, and + searches in the virtual database ``id'' are forwarded to the toy + GILS database that Index Data hosts for testing purposes. A + virt_db configuration to make this switch would + look like this: + + + + lc + z3950.loc.gov:7090/Voyager + + + id + indexdata.dk/gils + + ]]> + + When Metaproxy receives a Z39.50 Init request from a client, it + doesn't immediately forward that request to the back-end server. + Why not? Because it doesn't know which + back-end server to forward it to until the client sends a search + request that specifies the database that it wants to search in. + Instead, it just treasures the Init request up in its heart; and, + later, the first time the client does a search on one of the + specified virtual databases, a connection is forged to the + appropriate server and the Init request is forwarded to it. If, + later in the session, the same client searches in a different + virtual database, then a connection is forged to the server that + hosts it, and the same cached Init request is forwarded there, + too. + + + All of this clever Init-delaying is done by the + frontend_net filter. The + virt_db filter knows nothing about it; in + fact, because the Init request that is received from the client + doesn't get forwarded until a Search reqeust is received, the + virt_db filter (and the + z3950_client filter behind it) doesn't even get + invoked at Init time. The only thing that a + virt_db filter ever does is rewrite the + VAL_PROXY otherInfo in the requests that pass + through it. +
- - Metaproxy invocation - - The material in this chapter includes the man pages material. - - &progref; - - - - Reference guide to Metaproxy filters - - The material in this chapter includes the man pages material. - - &manref; - - Writing extensions for Metaproxy - ### + ### To be written + + + Classes in the Metaproxy source code @@ -750,7 +911,18 @@ Introductory notes Stop! Do not read this! - You won't enjoy it at all. + You won't enjoy it at all. You should just skip ahead to + the reference guide, + which tells + + you things you really need to know, like the fact that the + fabulously beautiful planet Bethselamin is now so worried about + the cumulative erosion by ten billion visiting tourists a year + that any net imbalance between the amount you eat and the amount + you excrete whilst on the planet is surgically removed from your + bodyweight when you leave: so every time you go to the lavatory it + is vitally important to get a receipt. This chapter contains documentation of the Metaproxy source code, and is @@ -773,7 +945,7 @@
-
+
Individual classes The classes making up the Metaproxy application are here listed by @@ -806,7 +978,7 @@ structures, which are listed in its constructor. Merely instantiating this class registers all the static classes. It is for the benefit of this class that struct - yp2_filter_struct exists, and that all the filter + metaproxy_1_filter_struct exists, and that all the filter classes provide a static object of that type.
@@ -900,7 +1072,7 @@ <literal>mp::RouterChain</literal> (<filename>router_chain.cpp</filename>) - ### + ### to be written
@@ -908,7 +1080,7 @@ <literal>mp::RouterFleXML</literal> (<filename>router_flexml.cpp</filename>) - ### + ### to be written
@@ -916,7 +1088,7 @@ <literal>mp::Session</literal> (<filename>session.cpp</filename>) - ### + ### to be written
@@ -924,7 +1096,7 @@ <literal>mp::ThreadPoolSocketObserver</literal> (<filename>thread_pool_observer.cpp</filename>) - ### + ### to be written
@@ -950,7 +1122,7 @@ -
+
Other Source Files In addition to the Metaproxy source files that define the classes @@ -962,7 +1134,7 @@ metaproxy_prog.cpp - The main function of the yp2 program. + The main function of the metaproxy program. @@ -990,35 +1162,35 @@ plainfile.cpp, tstdl.cpp. - - - - - -- - - - - - - - -
- + + + + Reference guide + + The material in this chapter is drawn directly from the individual + manual entries. In particular, the Metaproxy invocation section is + available using man metaproxy, and the section + on each individual filter is available using the name of the filter + as the argument to the man command. + + + +
+ Metaproxy invocation + &progref; +
+ + +
+ Reference guide to Metaproxy filters + &manref; +
+
+ \ No newline at end of file