<chapter id="record-model-alvisxslt">
- <!-- $Id: recordmodel-alvisxslt.xml,v 1.13 2007-02-01 21:26:30 marc Exp $ -->
- <title>ALVIS XML Record Model and Filter Module</title>
+ <!-- $Id: recordmodel-alvisxslt.xml,v 1.14 2007-02-02 09:58:39 marc Exp $ -->
+ <title>ALVIS &xml; Record Model and Filter Module</title>
<para>
The record model described in this chapter applies to the fundamental,
- structured XML
+ structured &xml;
record type <literal>alvis</literal>, introduced in
- <xref linkend="componentmodulesalvis"/>. The ALVIS XML record model
+ <xref linkend="componentmodulesalvis"/>. The ALVIS &xml; record model
is experimental, and it's inner workings might change in future
- releases of the Zebra Information Server.
+ releases of the &zebra; Information Server.
</para>
<para> This filter has been developed under the
<section id="record-model-alvisxslt-filter">
<title>ALVIS Record Filter</title>
<para>
- The experimental, loadable Alvis XML/XSLT filter module
+ The experimental, loadable Alvis &xml;/XSLT filter module
<literal>mod-alvis.so</literal> is packaged in the GNU/Debian package
<literal>libidzebra1.4-mod-alvis</literal>.
It is invoked by the <filename>zebra.cfg</filename> configuration statement
path <filename>db/filter_alvis_conf.xml</filename>.
</para>
<para>The Alvis XSLT filter configuration file must be
- valid XML. It might look like this (This example is
+ valid &xml;. It might look like this (This example is
used for indexing and display of OAI harvested records):
<screen>
<?xml version="1.0" encoding="UTF-8"?>
</para>
<para>
The <literal><split level="2"/></literal> decides where the
- XML Reader shall split the
+ &xml; Reader shall split the
collections of records into individual records, which then are
loaded into DOM, and have the indexing XSLT stylesheet applied.
</para>
<section id="record-model-alvisxslt-internal">
<title>ALVIS Internal Record Representation</title>
- <para>When indexing, an XML Reader is invoked to split the input
- files into suitable record XML pieces. Each record piece is then
- transformed to an XML DOM structure, which is essentially the
+ <para>When indexing, an &xml; Reader is invoked to split the input
+ files into suitable record &xml; pieces. Each record piece is then
+ transformed to an &xml; DOM structure, which is essentially the
record model. Only XSLT transformations can be applied during
index, search and retrieval. Consequently, output formats are
- restricted to whatever XSLT can deliver from the record XML
- structure, be it other XML formats, HTML, or plain text. In case
+ restricted to whatever XSLT can deliver from the record &xml;
+ structure, be it other &xml; formats, HTML, or plain text. In case
you have <literal>libxslt1</literal> running with EXSLT support,
you can use this functionality inside the Alvis
filter configuration XSLT stylesheets.
</z:record>
</screen>
</para>
- <para>This means the following: From the original XML file
- <literal>one-record.xml</literal> (or from the XML record DOM of the
+ <para>This means the following: From the original &xml; file
+ <literal>one-record.xml</literal> (or from the &xml; record DOM of the
same form coming from a splitted input file), the indexing
- stylesheet produces an indexing XML record, which is defined by
+ stylesheet produces an indexing &xml; record, which is defined by
the <literal>record</literal> element in the magic namespace
<literal>xmlns:z="http://indexdata.dk/zebra/xslt/1"</literal>.
- Zebra uses the content of
+ &zebra; uses the content of
<literal>z:id="oai:JTRS:CP-3290---Volume-I"</literal> as internal
record ID, and - in case static ranking is set - the content of
<literal>z:rank="47896"</literal> as static rank. Following the
<para>
As mentioned above, there can be only one indexing
stylesheet, and configuration of the indexing process is a synonym
- of writing an XSLT stylesheet which produces XML output containing the
+ of writing an XSLT stylesheet which produces &xml; output containing the
magic elements discussed in
<xref linkend="record-model-alvisxslt-internal"/>.
Obviously, there are million of different ways to accomplish this
<para>
Stylesheets can be written in the <emphasis>pull</emphasis> or
the <emphasis>push</emphasis> style: <emphasis>pull</emphasis>
- means that the output XML structure is taken as starting point of
+ means that the output &xml; structure is taken as starting point of
the internal structure of the XSLT stylesheet, and portions of
- the input XML are <emphasis>pulled</emphasis> out and inserted
- into the right spots of the output XML structure. On the other
+ the input &xml; are <emphasis>pulled</emphasis> out and inserted
+ into the right spots of the output &xml; structure. On the other
side, <emphasis>push</emphasis> XSLT stylesheets are recursavly
calling their template definitions, a process which is commanded
- by the input XML structure, and avake to produce some output XML
+ by the input &xml; structure, and avake to produce some output &xml;
whenever some special conditions in the input styelsheets are
met. The <emphasis>pull</emphasis> type is well-suited for input
- XML with strong and well-defined structure and semantcs, like the
+ &xml; with strong and well-defined structure and semantcs, like the
following OAI indexing example, whereas the
<emphasis>push</emphasis> type might be the only possible way to
- sort out deeply recursive input XML formats.
+ sort out deeply recursive input &xml; formats.
</para>
<para>
A <emphasis>pull</emphasis> stylesheet example used to index
Notice also,
that the names and types of the indexes can be defined in the
indexing XSLT stylesheet <emphasis>dynamically according to
- content in the original XML records</emphasis>, which has
+ content in the original &xml; records</emphasis>, which has
opportunities for great power and wizardery as well as grande
disaster.
</para>
<para>
The following excerpt of a <emphasis>push</emphasis> stylesheet
<emphasis>might</emphasis>
- be a good idea according to your strict control of the XML
+ be a good idea according to your strict control of the &xml;
input format (due to rigerours checking against well-defined and
- tight RelaxNG or XML Schema's, for example):
+ tight RelaxNG or &xml; Schema's, for example):
<screen>
<![CDATA[
<xsl:template name="element-name-indexes">
]]>
</screen>
This template creates indexes which have the name of the working
- node of any input XML file, and assigns a '1' to the index.
+ node of any input &xml; file, and assigns a '1' to the index.
The example query
<literal>find @attr 1=xyz 1</literal>
finds all files which contain at least one
- <literal>xyz</literal> XML element. In case you can not control
+ <literal>xyz</literal> &xml; element. In case you can not control
which element names the input files contain, you might ask for
disaster and bad karma using this technique.
</para>
XSLT transformation, as far as the stylesheet is registered in
the main Alvis XSLT filter configuration file, see
<xref linkend="record-model-alvisxslt-filter"/>.
- In principle anything that can be expressed in XML, HTML, and
+ In principle anything that can be expressed in &xml;, HTML, and
TEXT can be the output of a <literal>schema</literal> or
<literal>element set</literal> directive during search, as long as
the information comes from the
- <emphasis>original input record XML DOM tree</emphasis>
- (and not the transformed and <emphasis>indexed</emphasis> XML!!).
+ <emphasis>original input record &xml; DOM tree</emphasis>
+ (and not the transformed and <emphasis>indexed</emphasis> &xml;!!).
</para>
<para>
- In addition, internal administrative information from the Zebra
+ In addition, internal administrative information from the &zebra;
indexer can be accessed during record retrieval. The following
example is a summary of the possibilities:
<screen>
see: http://www.indexdata.com/yaz/doc/tools.tkl#tools.cql.map
- in db/ an indexing XSLT stylesheet. This is a PULL-type XSLT thing,
- as it constructs the new XML structure by pulling data out of the
+ as it constructs the new &xml; structure by pulling data out of the
respective elements/attributes of the old structure.
Notice the special zebra namespace, and the special elements in this
indicates that a new record with given id and static rank has to be updated.
<z:index name="title" type="w">
- encloses all the text/XML which shall be indexed in the index named
+ encloses all the text/&xml; which shall be indexed in the index named
"title" and of index type "w" (see file default.idx in your zebra
installation)