1 <!doctype linuxdoc system>
4 $Id: egate.sgml,v 1.7 1995/07/20 08:14:47 adam Exp $
8 <title>Email/Z39.50 gateway guide
9 <author>Europagate, 1995
10 <date>$Revision: 1.7 $
12 This document describes a Email server that provides access to the
21 This document describes an email server subsystem developed
22 within the EUROPAGATE project. The first part of this document
23 serves as an administrators guide, while the second part is a
24 follow-up on the Design deliverable (WP4.1) that outline the
25 deviations from the design. Also, the second part contains
26 a quick overview of the source code.
31 An ANSI C compiler is required in order to compile the ES software.
33 The ES can use either CNIDR's Zdist package or the YAZ package from
34 Index Data to interface the Z39.50 protocol. So you need to obtain
37 The Zdist package can be found in:
39 <url url="ftp://ftp.cnidr.org/pub/NIDR.tools/zdist/zdist102b1-1.tar.Z" >
41 The Zdist package doesn't support result-set references. Also, it has a few
42 bugs. Therefore we've included a patch <tt/zdist.patch/ which fixes
44 Run patch in the directory above <tt/zdist102b1-1/:
49 The ES server only depends on <tt>libz3950.a</tt> so you only need
50 to build the Zdist software in the directory <tt/libz3950/.
54 <url url="ftp://ftp.algonet.se/pub/index/yaz/">.
56 The ES also use GNU's regex package to parse regular expressions.
57 The ES has been tested with regex-0.12. Some systems, such as Linux,
58 come with the regex package preinstalled.
60 Unpack <tt>egate.tar.gz</tt> and edit the top level <tt/Makefile/. Specify
61 where the GNU regex package is located and specify whether you use
62 YAZ or Zdist. One some systems, you may have to set the <tt/NETLIB/ as
65 The shell variables <tt/CC/ and <tt/CFLAGS/ are used by the
66 <tt/Makefile/ so you may modify these before compiling.
73 If the compilation succeeds, you should install the software.
74 Edit the <tt/Makefile/ and set the LIBDIR to the installation
75 directory. Since, the ES is executed by the mail system, and not by a
76 user, this directory shouldn't be globally executable.
78 When satisfied, type <tt/make install/.
80 Three executables are installed in LIBDIR:
82 <tag/eti/ The email transport interface. This program receives
83 incoming mail, identifies the user, and delivers the mail request
84 to the monitor or kernel (depending on configuration).
85 <tag/monitor/ The monitor
86 is optional component. The main objective
87 of the monitor is to limit the number of simultanous running kernel
89 <tag/kernel/ The kernel process is the core of the ES. It parses
90 the user's requests and interfaces the Z39.50 protocols.
93 The <tt/sendmail/ or a similar program delivers the mail to the
94 <tt/eti/ program. The <tt/sendmail/ program usually runs as user
95 <tt/mail/ or some other special user name. We strongly suggest that
96 you create a special user and group for the ES software. In this case
97 you should use <tt/chmod/ to and set the 'set user ID on execution'
98 bits on the executable files and give that user read/write/execute
99 permissions in LIBDIR.
101 The mail system needs to know about the ES. Pick some name that serves
102 as the ES user and edit <tt/aliases/ used by your mail system (usually
103 <tt>usr/lib/aliases</tt>). Now add the following line:
105 <tt>es:"|/usr/local/lib/es/eti </tt><em>options</em><tt>"</tt>
107 In this example the mail user name is <tt/es/ and the LIBDIR is
108 <tt>/usr/local/lib/es</tt>.
110 The ES system can operate with or without the monitor. When using
111 the monitor the number of simultanous running kernels can be
112 controlled. If the <tt>eti</tt> program is started with
113 two dashes (<tt>--</tt>) it will operate without the monitor and
114 the options specified after the two dashes are transferred to the
117 <sect1>Running with the monitor
120 The monitor must be running at all times in this mode. You should
121 start the monitor in one of your boot scripts (rc). For example this
122 might be put in a boot script:
125 (cd /usr/local/lib/es; ./monitor -d -l mon.log -- -d -l kernel.log &)
128 Here the monitor is started with the options <tt>-d -l mon.log</tt>
129 and the options after the two dashes are transferred to the
130 kernel. In this mode, the eti should contact the monitor (and not
131 the kernel), so the following might be put in the aliases file:
134 es:"|/usr/local/lib/es/eti -c /usr/local/lib/es"
137 The eti sets current directory to the path specified by option <tt>-c</tt>.
139 <sect1>Running without the monitor
142 In this mode you should never start the monitor.
143 The eti will contact the kernel directly. The following line could
144 be put in your aliases file:
147 es:"|/usr/local/lib/es/eti -c /usr/local/lib/es -- -d -l kernel.log"
153 The eti program accepts the following options:
155 <tag><tt>-l </tt>log</tag> The log file. If absent stderr is used.
156 <tag><tt>-d</tt></tag> Turns on debugging.
157 <tag><tt>-c </tt>dir</tag> Sets current directory to dir.
158 <tag><tt>-H</tt></tag> Help message.
159 <tag><tt>--</tt></tag> Indicates that the eti program should contact the
160 kernel (and not the monitor. All options after this one are transferred
167 The monitor program accepts the following command line options:
169 <tag><tt>-l </tt>log</tag> The log file. If absent stderr is used.
170 <tag><tt>-d</tt></tag> Turns on debugging.
171 <tag><tt>-H</tt></tag> Help message.
172 <tag><tt>--</tt></tag> Precedes options that are transferred to the kernel
175 The monitor normally reads the resource <tt>default.res</tt> in
176 current directory. You can change this behaviour by specifying an
177 alternate file on the command line.
182 List of options observed by the kernel:
184 <tag><tt>-d</tt></tag> Turns on debugging.
185 <tag><tt>-t </tt>target</tag> Opens connection to target (for testing only).
186 <tag><tt>-g </tt>lang</tag> Set language name.
187 <tag><tt>-o </tt>res</tag> Overriding resource file name. These
188 resources override both <tt>default.res</tt> and all user resources.
189 <tag><tt>-h </tt>host</tag> Override host name (for testing only).
190 <tag><tt>-p </tt>port</tag> Override port no (for testing only).
191 <tag><tt>-l </tt>log</tag> Specify log file.
192 <tag><tt>-H</tt></tag> Help message.
195 The kernel normally reads the resource <tt>default.res</tt> in
196 current directory. You can change this behaviour by specifying an
197 alternate file on the command line.
199 <sect>Managing the system
201 <sect1>Summary of files
204 To maintain the ES you need to know the files it uses. These are:
206 <tag>*.res</tag> Resource files with several settings that control
207 how the system operates, such as definition of targets, messages, etc.
208 <tag>*.bib</tag> Bib-1 attribute mapping files. These files describe
209 the mapping between CCL and the RPN query.
210 <tag>user.db</tag> Database of users. Only the eti process accesses
212 <tag>user.*.r</tag> Resource file for a user — accessed by the kernel
213 — only created when the user uses the <tt>def</tt> command.
214 <tag>user.*.p</tag> Persistency file for a user — accessed by
219 The ES system is mostly managed by resource files. The following
220 are example resource files that comes with the ES:
222 <tag><tt>default.res</tt></tag> General resource with reasonable defaults.
223 This file is read by the monitor and the kernel.
224 <tag><tt>loc.res</tt></tag> Resource file for Library of Congress test
226 <tag><tt>drewdb.res</tt></tag> Resource file for Data Research's test
228 <tag><tt>lang.uk.res</tt></tag> Resource file for english conversation.
229 <tag><tt>lang.dk.res</tt></tag> Resource file for danish conversation.
235 Most general resources should be set in the file <tt>default.res</tt>.
236 Some of the resources may be changed (overridden) by the user, while
237 others may be overridden by individual target defintions.
238 The complete scenario is depicted below:
244 |<---------| "target.res" |
248 |<---------| user.x.res |
252 |<---------| "lang.res" |
256 |<---------| "override" |
261 The following describes the general resources:
263 <tag>gw.reply.mta</tag> Name of MTA program — default
264 <tt>/usr/lib/sendmail</tt>.
265 <tag>gw.reply.tmp.prefix</tag> Prefix of temporary files used by the ES.
266 <tag>gw.reply.tmp.dir</tag> Name of directory with temporary files.
267 <tag>gw.marc.log</tag> If this resource is specified, retrieved MARC
268 records will be appended to this file.
269 <tag>gw.timeout</tag> Idle time before the kernel exits. When the
270 kernel exits, the Z39.50 persistency layer will reconnect when
272 <tag>gw.resultset</tag> If this setting is 1, the Z39.50 client will
273 use named result sets. If 0, the Z39.50 system will always use
274 <tt/Default/ as result-set name.
275 <tag>gw.persist</tag> If this setting 1, the persistency is enabled;
277 <tag>gw.max.process</tag> This settings is the maximum number of
278 simultaneous kernel processes — only used by the monitor.
279 <tag>gw.ignore.which</tag> Some targets doesn't indicate whether
280 a record is a diagnostic messaage or a database record. If this
281 setting is 1, the ES will always try to interpret the record as a
282 database record in ISO2709 format. If 0, the ES will use the
284 <tag>gw.default.show</tag> Default number of records to retrieve and display
285 when using the show command. This setting may be changed by the user
286 with the <tt>def defaultshow</tt> command.
287 <tag>gw.max.show</tag> This setting specifies the maximum number of
288 records the user may retrieve in one show command — default 100.
289 <tag>gw.autoshow</tag> Number of records to retrieve in a find
290 command — default 0. This setting may be changed by the user by
291 the <tt>def autoshow</tt> command.
292 <tag>gw.display.format</tag> Default display format. This setting may
293 be changed by the user by the <tt>def f</tt> command.
294 <tag>gw.language</tag> Current language. This setting may be
295 changed by the user with the <tt>def lang</tt> command. When the
296 langauge is set to something, say x, then the resource gw.lang.x
297 should hold a name of a resource file read by the kernel.
298 <tag>gw.lang.<em/x/</tag> Specifies name of resource file for
300 <tag>gw.target.<em/name/ </tag> Name of resource file of target
302 <tag>gw.portno</tag> Z39.50 target port number — default 210.
303 <tag>gw.hostname</tag> Z39.50 target host name.
304 <tag>gw.bibset</tag> Name of file with Bib-1 attribute mapping.
305 <tag>gw.databases</tag> Available databases on target.
306 <tag>gw.description</tag> Description of a target. This message
307 is returned to the user when the connection is established with the
309 <tag>gw.account</tag> Z39.50 Authentication string — default
316 There are several resource settings that deal with language
317 dependencies. These fall into the following categories that
318 depend on the resource name prefixes:
320 <tag>gw.msg</tag> Miscellaneous messages.
321 <tag>gw.err</tag> Error messages.
322 <tag>gw.bib1.diag.<em/no/</tag> Diagnostic error message indicated by
324 <tag>gw.help</tag> Help/description of various commands.
325 <tag>ccl.command</tag> CCL command names.
326 <tag>ccl.token</tag> CCL tokens names.
329 Refer to the sample files, <tt>default.res</tt>, <tt>lang.uk.res</tt>
330 and <tt>lang.dk.res</tt> for all available settings.
332 <sect1>Target definitions
335 To add a target definition called <em/mytarget/ you need to make a resource
336 entry in <tt>default.res</tt> called <tt>gw.target.</tt><em>mytarget</em>.
337 The value of this resource is the name of a resource file — for
338 example <em>mytarget</em><tt>.res</tt>. The resource file should at least
339 define the resources: <tt/gw.hostname/, <tt/gw.databases/ and
340 <tt/gw.description/. You might also consider specifying
341 <tt/gw.account/, <tt/gw.bibset/, <tt/gw.resultset/ and <tt/gw.portno/
342 in the target resource file. The user only needs to use the command
343 <tt>target </tt><em>mytarget</em> to use the target. Also, since we
344 already specified database names, the user doesn't need to use the
347 <sect1>CCL to RPN mapping
350 The mapping between CCL-queries and RPN are stored in files —
351 normally with the suffix <tt>.bib</tt>. We will refer these
352 files as bibset-files. You might consult the file <tt/default.bib/
353 to see an example of such file.
355 The mapping is necessary because targets usually only support a little
356 subset of the Bib-1 attribute set and because the CCL qualifiers
357 (field names) are not standardized. A bibset-file is specified
358 by the <tt/gw.bibset/ resource.
360 Column zero of a bib-file line either hold a hash character (<tt/#/)
361 indicating a comment in which case the rest of the line is
362 ignored; or a CCL qualifier.
364 The name of the CCL qualifier is up to you. However, the special
365 qualifier name <tt/term/ applies to the case where no qualifier
366 is specified in CCL. The CCL qualifier is
367 followed by one or more mapping specifications. A mapping
368 specification takes the form:
370 <em/type/<tt/=/<em/value/<tt/,/<em/value/...
372 The type is simply one of the six Bib-1 attribute query types:
374 <tag/u/ Use attribute. Value is an integer.
375 <tag/t/ Truncation attribute. Value is an integer; or the
376 value is a combination of:
378 <tag/l/ This character indicates that the CCL parser should allow
379 left truncation (2) if indicated by a <tt/?/ on the left side
381 <tag/r/ This character indicates that the CCL parser should allow
382 right truncation (1) if indicated by a <tt/?/ on the right side
384 <tag/b/ This character indicates that the CCL parser should allow
385 both left and right (3) truncation indicated by a <tt/?/ on both
386 left and right side of a term.
387 <tag/n/ This character indicates that the CCL parser should announce
388 no truncation (100) if no truncation was specified.
390 <tag/p/ Position attribute. Valus is an integer.
391 <tag/s/ Structure attribute. Value is an integer; or the
392 value is <tt/pw/ in which case the CCL parser announces word (2) or
393 phrase (1) depending on the number of adjacent terms.
394 <tag/r/ Relation attribute. Value is an integer; or the value is
395 <tt/o/ in which case, the CCL parser will select <em/less than/,
396 <em/less than or equal/, ... <em/greater than/ — depending on
397 the relation specified in CCL.
398 <tag/p/ Position attribute. Value is an integer.
401 Consider these bibset-lines:
407 The first line describes the mapping in when no qualifiers are
412 In this case the right truncation is enabled and the structure is
415 The second line is used in this search:
419 where the use attribute is <em/author/ and the structure is <em/word/.
421 The third line is used in:
425 where the use attribute is <em/date/ and the relation is <em/greater than/.
430 The implementation of the email server includes all the modules described
431 in the design deliverable.
433 The work was roughly carried out as follows:
435 <item>The logging facilities and resource management utilities were
436 implemented — virtually all other modules depend on these
438 <item>A minimal ES was implemented — including a high-level
439 API to the Z39.50 sub-system and a CCL parser with a few
440 commands, such as FIND and SHOW. This version displayed MARC
441 records in a raw format. This version served as base for the URP.
442 <item>The first version of the MARC display formatting tool, FML,
443 was implemented and included in the ES.
444 <item>The ETI program was implemented along with the IPC
445 (interprocess communication) utilities based on FIFOs. Facilities
446 to keep connections alive (to Z39.50 targets) was implemented.
447 To identify a user, a file-resident symbol table (small database) was
448 implemented which maps a email username to a unique integer (email userid).
449 <item>The protocol persistency was implemented and more CCL commands
451 <item>The monitor program was implemented.
454 The following sections cover the most important modules in the ES and
455 deviations from the design.
457 <sect1>Z39.50 Interface layer
460 The design report specified that the Zdist toolkit from CNIDR would
461 be used in the ES to provide access to the Z39.50 protocol. The package
462 was choosen bacause it is easy to use and, more important, we felt
463 that the API would be reasonably stable and supported.
465 Nevertheless it turned out that CNIDR choose to change the API
466 completely around January 1995 and announced a new version
469 <em>Note: As of this date the newest version of Zdist is still
470 zdist102b1-1. CNIDR seems to concentracte on their Isite package
471 which also includes a Zdist package presumably similar to the
472 standalone Zdist package</em>
474 During the work with the Zdist package a few bugs were discovered.
475 Fortunately, they could be solved within a few days. We also
476 discovered that the package lacks result-set references.
477 We posted the bug fixes to Kevin Gamiel who is responsible for
478 the package but we didn't get responses. So, eventually, we weren't
479 satisfied with the package after all.
481 In February some of us began the development of a new Z39.50 package
482 called YAZ — in retrospect somewhat motivated by the
483 experiences with existing Z39.50/SR toolkits.
485 To support result-set references we chose to incorporate a YAZ
486 interface in the ES also. And we designed and implemented a
487 simple high-level Z39.50 origin API that supported both Zdist and YAZ.
489 The protocol persistency module was implemented on top of
490 the high-level API and not on top of Zdist. The obvious
491 advantage is that the persistency module is not tied to one
492 particular Z39.50/SR package.
494 Persistency information stored for each user is simply:
496 <item>hostname and port number.
497 <item>authentication string
498 <item>selected database(s)
499 <item>next result set number
500 <item>next result set position
501 <item>result set information
504 Information about each result set includes:
507 <item>size (number of hits)
512 A persistency file is removed each time a new target is selected.
513 It is our experiences that the persistency files are very small.
518 The CCL was implemented as described in the design. A CCL utility
519 was made as a separate module which implements a tokenization
520 package and a parser which translates from FIND to RPN. The
521 data structure used to represent the RPN query is also used in
522 Z39.50 search API on top of YAZ or Zdist.
524 The CCL parser is quite configurable. Token names can be redefined to
525 one or more names (aliases). Also, the specification of mapping
526 between CCL field names (qualifiers) and Bib-1 attributes can be
527 specified in either the C API or a file.
529 Although the Z39.50 system in the ES uses the Bib-1 attribute set, the
530 CCL parser itself is not tied to Bib-1.
535 The FML system is used to handle the presentation of MARC
536 records. There are some deviations to the design report, however.
537 The most important changes are:
539 <item>The <tt/expr/ function is not implemented. Instead arithmetic
540 operators <tt/plus/, <tt/minus/, <tt/mult/ and <tt/div/ are
541 implemented. Also relational operators <tt/gt/, <tt/lt/ ... are
543 <item>The <tt/lindex/ function is called <tt/index/ and it is a binary
544 operator where the left operand is the list and the right operand is
546 <item>The MARC extraction routines are not implemented.
547 Instead, a MARC record is transferred as an argument
548 to a formatting-routine (in list notation). The formatting
549 routine then extracts fields from the list by list/string
550 manipulation functions.
551 <item>A new statement, <tt/bin/, is implemented to define
552 binary operators (functions).
558 As described in the design, FIFOs are used to communicate between
559 the ETI, monitor and kernel. The ES can run without the monitor,
560 however. The primary reason for the presence of the monitor was
561 to assure that the kernel releases the resources used by the
562 persistency layer. But, since the persistency layer did turn out to
563 use virtually no disk space at all, there was no point in starting
564 a kernel process to remove its files — hence this facility
565 was not implemented. The only purpose of the monitor is to keep the
566 number of running kernels at a maximum level and even that
567 is probably useless since most unices will swap kernel processes
571 before a kernel exits and saves its persistency file is not
572 controlled by the monitor. Saving the persistency file and
573 keeping it is usually a good approach — even when a
574 user doesn't reference/show old result-sets since the user
575 has a notion of <em/current target/ and database.
580 In this section a short description of each source module is
581 given. Each module is implemented in a separate sub directory.
582 Any public headers are located in the <tt/include/ directory.
585 <tag/res+log/ is an implementation of the logging system
586 and the resource management sub system. Note that the
587 resource module depends on the logging facility. Logging
588 is implemented in <tt>gw-log.c</tt> and <tt/gw-log.h/. The
589 file <tt>gw-log-test.c</tt> is small test program for the
590 logging system. The core of the resource management is implemented
591 in <tt>gw-res.c</tt>. The files <tt>gw-res-bool.c</tt> and
592 <tt>gw-res-int.c</tt> implement two utility routines &mdash
593 on top of the resource management. The header file
594 <tt>gw-resp.h</tt> is a private header file and <tt>gw-res.h</tt>
595 is a public header file.
597 <tag/ccl/ implements CCL to RPN mapping and a tokenization
598 utility for other CCL commands. The mapping function is
599 implemented in <tt>cclfind.c</tt>. Qualifiers are handled in
600 <tt>cclqual.c</tt> while reading of qualifier mappings from a
601 file is implemented in <tt>cclqfile.c</tt>. Scanning is implemented
602 in <tt>ccltoken.c</tt>. String utilities, which might be changed if
603 other character sets are needed, is implemented in
604 <tt>cclstr.c</tt>. Table of error messages is implemented in
607 <tag/util/ implements various utilities:
609 <tag>MARC utility</tag> implemented in <tt>iso2709</tt>...
610 <tag>Database utility</tag> implemented in <tt>gw-db.[ch]</tt>. This
611 utility is used to map a user (email) to an integer.
612 <tag>String queue utility</tag> implemented in <tt>strqueue.[ch]</tt>. This
613 utiltiy is used to queue incoming mail in the ETI, kernel and
615 <tag>Pretty printer</tag> implemented in <tt>ttyemit.[ch]</tt>
616 — used by the URP.
617 <tag>FIFO IPC utiltiy</tag> implemented in <tt>gip*.[ch]</tt> —
618 used by the ETI, kernel and monitor.
621 <tag/fml/ implements FML. The top level functions are implemented
622 in <tt>fml.c</tt>, <tt>fmlcall.c</tt> and <tt>fmlcalls.c</tt>.
623 Scanning is implemented in <tt>fmltoken.c</tt>.
624 Memory management is implemented in <tt>fmlmem.c</tt>.
625 Arithmetic operators are implemented in <tt>fmlarit.c</tt>.
626 String manipulation functions are implemented in <tt>fmlstr.c</tt>.
627 Relational operators are implemented in <tt>fmlrel.c</tt>.
628 List maniuplations are performed in <tt>fmllist.c</tt>.
629 FML symbol table management is implemented in <tt>fmlsym.c</tt>.
630 Conversion from ISO2709 to list notation is implemented in
633 <tag/zlayer-zdist/ implements the high-level Z39.50 API on top
634 of Zdist. This task is implemented in <tt>zaccess.c</tt>. The
635 public header file is called <tt>zaccess.h</tt>.
637 <tag/zlayer-yaz/ implements the high-level Z39.50 API on top
638 of YAZ. This task is implemented in <tt>zaccess.c</tt>. The
639 public header file is called <tt>zaccess.h</tt>.
641 <tag/kernel/ implements the ETI, kernel and monitor. The kernel
642 itself is implemented in <tt>main.c</tt>, <tt>urp.c</tt> and
643 <tt>persist.c</tt>. The ETI is implemented in <tt>eti.c</tt> and
644 the monitor is implemented <tt>monitor.c</tt>.
650 Copyright © 1995, the EUROPAGATE consortium (see below).
652 The EUROPAGATE consortium members are:
655 <item>University College Dublin
656 <item>Danmarks Teknologiske Videnscenter
657 <item>An Chomhairle Leabharlanna
658 <item>Consejo Superior de Investigaciones Cientificas
661 Permission to use, copy, modify, distribute, and sell this software and
662 its documentation, in whole or in part, for any purpose, is hereby granted,
665 1. This copyright and permission notice appear in all copies of the
666 software and its documentation. Notices of copyright or attribution
667 which appear at the beginning of any file must remain unchanged.
669 2. The names of EUROPAGATE or the project partners may not be used to
670 endorse or promote products derived from this software without specific
671 prior written permission.
673 3. Users of this software (implementors and gateway operators) agree to
674 inform the EUROPAGATE consortium of their use of the software. This
675 information will be used to evaluate the EUROPAGATE project and the
676 software, and to plan further developments. The consortium may use
677 the information in later publications.
679 4. Users of this software agree to make their best efforts, when
680 documenting their use of the software, to acknowledge the EUROPAGATE
681 consortium, and the role played by the software in their work.
683 THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT WARRANTY OF ANY KIND,
684 EXPRESS, IMPLIED, OR OTHERWISE, INCLUDING WITHOUT LIMITATION, ANY
685 WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
686 IN NO EVENT SHALL THE EUROPAGATE CONSORTIUM OR ITS MEMBERS BE LIABLE
687 FOR ANY SPECIAL, INCIDENTAL, INDIRECT OR CONSEQUENTIAL DAMAGES OF
688 ANY KIND, OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA
689 OR PROFITS, WHETHER OR NOT ADVISED OF THE POSSIBILITY OF DAMAGE, AND
690 ON ANY THEORY OF LIABILITY, ARISING OUT OF OR IN CONNECTION WITH THE
691 USE OR PERFORMANCE OF THIS SOFTWARE.