X-Git-Url: http://jsfdemo.indexdata.com/?a=blobdiff_plain;f=doc%2Fbook.xml;h=1db13c54aec60a18fad6ec00f5298eefa23e403e;hb=284f44ae4942dc7e6eae6c696674c2738da8b2a4;hp=ad11c218fc3f17929b86e71fdd890f8019ac3533;hpb=4b5158c40900ee67ec56fc6933fb5cd2dcca72b9;p=metaproxy-moved-to-github.git diff --git a/doc/book.xml b/doc/book.xml index ad11c21..1db13c5 100644 --- a/doc/book.xml +++ b/doc/book.xml @@ -1,4 +1,4 @@ - + Metaproxy - User's Guide and Reference @@ -9,14 +9,15 @@ 2006 - Index Data + Index Data ApS - Metaproxy is a universal Z39.50/SRU router, proxy and encapsulated - metasearcher. It accepts, processes, interprets and redirects - requests from IR clients using standard protocols such as - ANSI/NISO Z39.50, SRU and SRW, as well as functioning as a limited + Metaproxy is a universal router, proxy and encapsulated + metasearcher for information retrieval protocols. It accepts, + processes, interprets and redirects requests from IR clients using + standard protocols such as ANSI/NISO Z39.50 (and in the future SRU + and SRW), as well as functioning as a limited HTTP server. Metaproxy is configured by an XML file which specifies how the software should function in terms of routes that the request packets can take through the proxy, each step on a @@ -40,10 +41,11 @@ - Metaproxy + Metaproxy is a standalone program that acts as a universal router, proxy and encapsulated metasearcher for information retrieval protocols such - as Z39.50 and SRU/SRW. To clients, it acts as a server of these + as Z39.50, and in the future SRU and SRW. To clients, it acts as a + server of these protocols: it can be searched, records can be retrieved from it, etc. To servers, it acts as a client: it searches in them, retrieves records from them, etc. it satisfies its clients' @@ -51,7 +53,7 @@ on to zero or more servers, merging the results, transforming them, and delivering them back to the client. In addition, it acts as a simple HTTP server; support for further protocols can be - added in a module fashion, through the creation of new filters. + added in a modular fashion, through the creation of new filters. Anything goes in! @@ -62,7 +64,7 @@ Metaproxy is a more capable alternative to - YAZ Proxy, + YAZ Proxy, being more powerful, flexible, configurable and extensible. Among its many advantages over the older, more pedestrian work are support for multiplexing (encapsulated metasearching), routing by @@ -71,6 +73,20 @@ facilitites the creation of pluggable modules implementing further functionality. + + This manual will briefly describe Metaproxy's licensing situation + before giving an overview of its architecture, then discussing the + key concept of a filter in some depth and giving an overview of + the various filter types, then discussing the configuration file + format. After this come several optional chapters which may be + freely skipped: a detailed discussion of virtual databases and + multi-database searching, some notes on writing extensions + (additional filter types) and a high-level description of the + source code. Finally comes the reference guide, which contains + instructions for invoking the metaproxy + program, and detailed information on each type of filter, + including examples. + @@ -203,7 +219,7 @@ complex data type, namely the ``package''. - A package represents a Z39.50 or SRW/U request (whether for Init, + A package represents a Z39.50 or SRU/W request (whether for Init, Search, Scan, etc.) together with information about where it came from. Packages are created by front-end filters such as frontend_net (see below), which reads them from @@ -246,14 +262,15 @@ -
+
Overview of filter types We now briefly consider each of the types of filter supported by the core Metaproxy binary. This overview is intended to give a flavour of the available functionality; more detailed information - about each type of filter is included below in the Module - Reference. + about each type of filter is included below in + the reference guide to Metaproxy filters. The filters are here named by the string that is used as the @@ -302,7 +319,7 @@ <literal>frontend_net</literal> (mp::filter::FrontendNet) - A source that accepts Z39.50 and SRW connections from a port + A source that accepts Z39.50 connections from a port specified in the configuration, reads protocol units, and feeds them into the next filter in the route. When the result is revceived, it is returned to the original origin. @@ -416,7 +433,7 @@
-
+
Future directions Some other filters that do not yet exist, but which would be @@ -435,19 +452,19 @@ - srw2z3950 (filter) + frontend_sru (source) - Translate SRW requests into Z39.50 requests. + Receive SRU (and perhaps SRW) requests. - srw_client (sink) + sru2z3950 (filter) - SRW searching and retrieval. - + Translate SRU requests into Z39.50 requests. + @@ -459,6 +476,14 @@ + srw_client (sink) + + + SRW searching and retrieval. + + + + opensearch_client (sink) @@ -472,32 +497,6 @@ - - Virtual databases and multi-database searching - - -
- Introductory notes - - Two of Metaproxy's filters are concerned with multiple-database - operations. Of these, virt_db can work alone - to control the routing of searches to one of a number of servers, - while multi can work with the output of - virt_db to perform multicast searching, merging - the results into a unified result-set. The interaction between - these two filters is necessarily complex, reflecting the real - complexity of multicast searching in a protocol such as Z39.50 - that separates initialisation from searching, with the database to - search known only during the latter operation. - - - ### Much, much more to say! - -
-
- - - Configuration: the Metaproxy configuration file format @@ -509,7 +508,9 @@ its configuration file can be thought of as a program for that interpreter. Configuration is by means of a single file, the name of which is supplied as the sole command-line argument to the - yp2 program. + metaproxy program. (See + the reference guide + below for more information on invoking Metaproxy.)
The configuration files are written in XML. (But that's just an @@ -534,7 +535,7 @@
-
+
Overview of XML structure All elements and attributes are in the namespace @@ -555,15 +556,19 @@ The <start> element is empty, but carries a route attribute, whose value is the name of - route at which to start running - analogouse to the name of the + route at which to start running - analogous to the name of the start production in a formal grammar. If present, <filters> contains zero or more <filter> - elements; filters carry a type attribute and - contain various elements that provide suitable configuration for - filters of that type. The filter-specific elements are described - below. Filters defined in this part of the file must carry an + elements. Each filter carries a type attribute + which specifies what kind of filter is being defined + (frontend_net, log, etc.) + and contain various elements that provide suitable configuration + for a filter of its type. The filter-specific elements are + described in + the reference guide below. + Filters defined in this part of the file must carry an id attribute so that they can be referenced from elsewhere. @@ -579,151 +584,183 @@ <filters> section. Alternatively, a route within a filter may omit the refid attribute, but contain configuration elements similar to those used for filters defined - in the <filters> section. + in the <filters> section. (In other words, each filter in a + route may be included either by reference or by physical + inclusion.)
-
- Filter configuration +
+ An example configuration - All <filter> elements have in common that they must carry a - type attribute whose value is one of the - supported ones, listed in the schema file and discussed below. In - additional, <filters>s occurring the <filters> section - must have an id attribute, and those occurring - within a route must have either a refid - attribute referencing a previously defined filter or contain its - own configuration information. + The following is a small, but complete, Metaproxy configuration + file (included in the distribution as + metaproxy/etc/config0.xml). + This file defines a very simple configuration that simply proxies + to whatever backend server the client requests, but logs each + request and response. This can be useful for debugging complex + client-server dialogues. + + + + + + @:9000 + + + + + + + + + + + + +]]> - In general, each filter recognises different configuration - elements within its element, as each filter has different - functionality. These are as follows: + It works by defining a single route, called + start, which consists of a sequence of three + filters. The first and last of these are included by reference: + their <filter> elements have + refid attributes that refer to filters defined + within the prior <filters> section. The + middle filter is included inline in the route. + + The three filters in the route are as follows: first, a + frontend_net filter accepts Z39.50 requests + from any host on port 9000; then these requests are passed through + a log filter that emits a message for each + request; they are then fed into a z3950_client + filter, which forwards the requests to the client-specified + backend Z39.509 server. When the response arrives, it is handed + back to the log filter, which emits another + message; and then to the front-end filter, which returns the + response to the client. + +
+ -
- <literal>auth_simple</literal> - - <filter type="auth_simple"> - <userRegister>../etc/example.simple-auth</userRegister> - </filter> - -
- -
- <literal>backend_test</literal> - - <filter type="backend_test"/> - -
- -
- <literal>frontend_net</literal> - - <filter type="frontend_net"> - <threads>10</threads> - <port>@:9000</port> - </filter> - -
- -
- <literal>http_file</literal> - - <filter type="http_file"> - <mimetypes>/etc/mime.types</mimetypes> - <area> - <documentroot>.</documentroot> - <prefix>/etc</prefix> - </area> - </filter> - -
- -
- <literal>log</literal> - - <filter type="log"> - <message>B</message> - </filter> - -
- -
- <literal>multi</literal> - - <filter type="multi"/> - -
- -
- <literal>query_rewrite</literal> - - <filter type="query_rewrite"> - <xslt>pqf2pqf.xsl</xslt> - </filter> - -
-
- <literal>session_shared</literal> - - <filter type="session_shared"> - ### Not yet defined - </filter> - -
-
- <literal>template</literal> - - <filter type="template"/> - -
+ + Virtual databases and multi-database searching -
- <literal>virt_db</literal> - - <filter type="virt_db"> - <virtual> - <database>loc</database> - <target>z3950.loc.gov:7090/voyager</target> - </virtual> - <virtual> - <database>idgils</database> - <target>indexdata.dk/gils</target> - </virtual> - </filter> - -
-
- <literal>z3950_client</literal> - - <filter type="z3950_client"> - <timeout>30</timeout> - </filter> - -
+
+ Introductory notes + + Lark's vomit + + This chapter goes into a level of technical detail that is + probably not necessary in order to configure and use Metaproxy. + It is provided only for those who like to know how things work. + You should feel free to skip on to the next section if this one + doesn't seem like fun. + + + + Two of Metaproxy's filters are concerned with multiple-database + operations. Of these, virt_db can work alone + to control the routing of searches to one of a number of servers, + while multi can work with the output of + virt_db to perform multicast searching, merging + the results into a unified result-set. The interaction between + these two filters is necessarily complex: it reflecting the real, + irreducible complexity of multicast searching in a protocol such + as Z39.50 that separates initialisation from searching, and in + which the database to be searched is not known at initialisation + time. + + + Hold on tight - this may get a little hairy. + + + In the general course of things, a Z39.50 Init request may carry + with it an otherInfo packet of type VAL_PROXY, + whose value indicates the address of a Z39.50 server to which the + ultimate connection is to be made. (This otherInfo packet is + supported by YAZ-based Z39.50 clients and servers, but has not yet + been ratified by the Maintenance Agency and so is not widely used + in non-Index Data software. We're working on it.) + The VAL_PROXY packet functions + analogously to the absoluteURI-style Request-URI used with the GET + method when a web browser asks a proxy to forward its request: see + the + Request-URI + section of + the HTTP 1.1 specification. + + + The role of the virt_db filter is to rewrite + this otherInfo packet dependent on the virtual database that the + client wants to search. For example, a virt_db + filter could be set up so that searches in the virtual database + ``lc'' are forwarded to the Library of Congress server, and + searches in the virtual database ``id'' are forwarded to the toy + GILS database that Index Data hosts for testing purposes. A + virt_db configuration to make this switch would + look like this: + + + + lc + z3950.loc.gov:7090/Voyager + + + id + indexdata.dk/gils + + ]]> + + When Metaproxy receives a Z39.50 Init request from a client, it + doesn't immediately forward that request to the back-end server. + Why not? Because it doesn't know which + back-end server to forward it to until the client sends a search + request that specifies the database that it wants to search in. + Instead, it just treasures the Init request up in its heart; and, + later, the first time the client does a search on one of the + specified virtual databases, a connection is forged to the + appropriate server and the Init request is forwarded to it. If, + later in the session, the same client searches in a different + virtual database, then a connection is forged to the server that + hosts it, and the same cached Init request is forwarded there, + too. + + + All of this clever Init-delaying is done by the + frontend_net filter. The + virt_db filter knows nothing about it; in + fact, because the Init request that is received from the client + doesn't get forwarded until a Search reqeust is received, the + virt_db filter (and the + z3950_client filter behind it) doesn't even get + invoked at Init time. The only thing that a + virt_db filter ever does is rewrite the + VAL_PROXY otherInfo in the requests that pass + through it. +
- - Module Reference - - The material in this chapter includes the man pages material - - &manref; - - Writing extensions for Metaproxy - ### + ### To be written + + + Classes in the Metaproxy source code @@ -732,7 +769,18 @@ Introductory notes Stop! Do not read this! - You won't enjoy it at all. + You won't enjoy it at all. You should just skip ahead to + the reference guide, + which tells + + you things you really need to know, like the fact that the + fabulously beautiful planet Bethselamin is now so worried about + the cumulative erosion by ten billion visiting tourists a year + that any net imbalance between the amount you eat and the amount + you excrete whilst on the planet is surgically removed from your + bodyweight when you leave: so every time you go to the lavatory it + is vitally important to get a receipt. This chapter contains documentation of the Metaproxy source code, and is @@ -755,7 +803,7 @@
-
+
Individual classes The classes making up the Metaproxy application are here listed by @@ -788,7 +836,7 @@ structures, which are listed in its constructor. Merely instantiating this class registers all the static classes. It is for the benefit of this class that struct - yp2_filter_struct exists, and that all the filter + metaproxy_1_filter_struct exists, and that all the filter classes provide a static object of that type.
@@ -882,7 +930,7 @@ <literal>mp::RouterChain</literal> (<filename>router_chain.cpp</filename>) - ### + ### to be written
@@ -890,7 +938,7 @@ <literal>mp::RouterFleXML</literal> (<filename>router_flexml.cpp</filename>) - ### + ### to be written
@@ -898,7 +946,7 @@ <literal>mp::Session</literal> (<filename>session.cpp</filename>) - ### + ### to be written
@@ -906,7 +954,7 @@ <literal>mp::ThreadPoolSocketObserver</literal> (<filename>thread_pool_observer.cpp</filename>) - ### + ### to be written
@@ -932,7 +980,7 @@ -
+
Other Source Files In addition to the Metaproxy source files that define the classes @@ -944,7 +992,7 @@ metaproxy_prog.cpp - The main function of the yp2 program. + The main function of the metaproxy program. @@ -972,23 +1020,36 @@ plainfile.cpp, tstdl.cpp. - - - - - -- - - - - - - - -
+ + + + Reference guide + + The material in this chapter is drawn directly from the individual + manual entries. In particular, the Metaproxy invocation section is + available using man metaproxy, and the section + on each individual filter is available using the name of the filter + as the argument to the man command. + + + +
+ Metaproxy invocation + &progref; +
+ + +
+ Reference guide to Metaproxy filters + &manref; +
+
+ + +