<sect1 id="marc"><title>MARC</title>
<para>
- YAZ provides a fast utility that decodes MARC records and
- encodes to a varity of output formats. The MARC records must
- be encoded in ISO2709.
+ YAZ provides a fast utility for working with MARC records.
+ Early versions of the MARC utility only allowed decoding of ISO2709.
+ Today the utility may both encode - and decode to a varity of formats.
</para>
<synopsis><![CDATA[
#include <yaz/marcdisp.h>
#define YAZ_MARC_MARCXML 3
#define YAZ_MARC_ISO2709 4
#define YAZ_MARC_XCHANGE 5
+ #define YAZ_MARC_CHECK 6
+ #define YAZ_MARC_TURBOMARC 7
/* supply iconv handle for character set conversion .. */
void yaz_marc_iconv(yaz_marc_t mt, yaz_iconv_t cd);
/* decode MARC in buf of size bsize. Returns >0 on success; <=0 on failure.
On success, result in *result with size *rsize. */
- int yaz_marc_decode_buf (yaz_marc_t mt, const char *buf, int bsize,
- char **result, int *rsize);
+ int yaz_marc_decode_buf(yaz_marc_t mt, const char *buf, int bsize,
+ const char **result, size_t *rsize);
/* decode MARC in buf of size bsize. Returns >0 on success; <=0 on failure.
On success, result in WRBUF */
- int yaz_marc_decode_wrbuf (yaz_marc_t mt, const char *buf,
- int bsize, WRBUF wrbuf);
+ int yaz_marc_decode_wrbuf(yaz_marc_t mt, const char *buf,
+ int bsize, WRBUF wrbuf);
]]>
</synopsis>
+ <note>
+ <para>
+ The synopsis is just a basic subset of all functionality. Refer
+ to the actual header file <filename>marcdisp.h</filename> for
+ details.
+ </para>
+ </note>
<para>
A MARC conversion handle must be created by using
<function>yaz_marc_create</function> and destroyed
<term>YAZ_MARC_MARCXML</term>
<listitem>
<para>
- The resulting record is converted to MARCXML.
+ <ulink url="&url.marcxml;">MARCXML</ulink>.
</para>
</listitem>
</varlistentry>
<term>YAZ_MARC_ISO2709</term>
<listitem>
<para>
- The resulting record is converted to ISO2709 (MARC).
+ ISO2709 (sometimes just referred to as "MARC").
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>YAZ_MARC_XCHANGE</term>
+ <listitem>
+ <para>
+ <ulink url="&url.marcxchange;">MarcXchange</ulink>.
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term>YAZ_MARC_CHECK</term>
+ <listitem>
+ <para>
+ Pseudo format for validation only. Does not generate
+ any real output except diagnostics.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>YAZ_MARC_TURBOMARC</term>
+ <listitem>
+ <para>
+ XML format with same semantics as MARCXML but more compact
+ and geared towards fast processing with XSLT. Refer to
+ <xref linkend="tools.turbomarc"/> for more information.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</para>
<para>
<example id="example.marc.display">
<title>Display of MARC record</title>
<para>
- The followint program snippet illustrates how the MARC API may
+ The following program snippet illustrates how the MARC API may
be used to convert a MARC record to the line-by-line format:
<programlisting><![CDATA[
void print_marc(const char *marc_buf, int marc_buf_size)
{
char *result; /* for result buf */
- int result_len; /* for size of result */
+ size_t result_len; /* for size of result */
yaz_marc_t mt = yaz_marc_create();
yaz_marc_xml(mt, YAZ_MARC_LINE);
yaz_marc_decode_buf(mt, marc_buf, marc_buf_size,
</programlisting>
</para>
</example>
+ <sect2 id="tools.turbomarc">
+ <title>TurboMARC</title>
+ <para>
+ TurboMARC is yet another XML encoding of a MARC record. The format
+ was designed for fast processing with XSLT.
+ </para>
+ <para>
+ Applications like
+ Pazpar2 uses XSLT to convert an XML encode MARC record to an internal
+ representation. This conversion mostly check the tag of a MARC field
+ to determine the basic rules in the conversion. This check is
+ costly when that is tag is encoded as an attribute in MARCXML.
+ By having the tag value as the element instead, makes processing
+ many times faster (at least for Libxslt).
+ </para>
+ <para>
+ TurboMARC is encoded as follows:
+ <itemizedlist>
+ <listitem><para>
+ Record elements is part of namespace
+ "<literal>http://www.indexdata.com/MARC21/turboxml</literal>".
+ </para></listitem>
+ <listitem><para>
+ A record is enclosed in element <literal>r</literal>.
+ </para></listitem>
+ <listitem><para>
+ A collection of records is enclosed in element
+ <literal>collection</literal>.
+ </para></listitem>
+ <listitem><para>
+ The leader is encoded as element <literal>l</literal> with the
+ leader content as its (text) value.
+ </para></listitem>
+ <listitem><para>
+ A control field is encoded as element <literal>c</literal> concatenated
+ with the tag value of the control field if the tag value
+ matches the regular expression <literal>[a-zA-Z0-9]*</literal>.
+ If the tag value do not match the regular expression
+ <literal>[a-zA-Z0-9]*</literal> the control field is encoded
+ as element <literal>c</literal> and attribute <literal>code</literal>
+ will hold the tag value.
+ This rule ensure that in the rare cases where a tag value might
+ result in a non-wellformed XML YAZ encode it as a coded attribute
+ (as in MARCXML).
+ </para>
+ <para>
+ The control field content is the the text value of this element.
+ Indicators are encoded as attribute names
+ <literal>i1</literal>, <literal>i2</literal>, etc.. and
+ corresponding values for each indicator.
+ </para></listitem>
+ <listitem><para>
+ A data field is encoded as element <literal>d</literal> concatenated
+ with the tag value of the data field or using the attribute
+ <literal>code</literal> as described in the rules for control fields.
+ The children of the data field element is subfield elements.
+ Each subfield element is encoded as <literal>s</literal>
+ concatenated with the sub field code.
+ The text of the subfield element is the contents of the subfield.
+ Indicators are encoded as attributes for the data field element similar
+ to the encoding for control fields.
+ </para></listitem>
+ </itemizedlist>
+ </para>
+ </sect2>
</sect1>
<sect1 id="tools.retrieval">
Specifies input format. Must be one of
<literal>marcxml</literal>, <literal>marc</literal> (ISO2709),
<literal>marcxchange</literal> (ISO25577),
- <literal>line</literal> (line mode MARC).
+ <literal>line</literal> (line mode MARC),
+ or <literal>turbomarc</literal> (Turbo MARC).
</para></listitem>
</varlistentry>
Specifies output format. Must be one of
<literal>marcxml</literal>, <literal>marc</literal> (ISO2709),
<literal>marcxchange</literal> (ISO25577),
- <literal>line</literal> (line mode MARC).
+ <literal>line</literal> (line mode MARC),
+ or <literal>turbomarc</literal> (Turbo MARC).
</para></listitem>
</varlistentry>
yaz-marcdump -f MARC-8 -t UTF-8 -o marcxml marc21.raw >marcxml.xml
</screen>
</para>
+
+ <para>
+ Turbo MARC is a compact XML notation with same semantics as
+ MARCXML, but which allows for faster processing via XSLT. In order
+ to generate Turbo MARC records encoded in UTF-8 from MARC21 (ISO), one
+ could use:
+ <screen>
+ yaz-marcdump -f MARC8 -t UTF8 -o turbomarc -i marc marc21.raw >out.xml
+ </screen>
+ </para>
</refsect1>
<refsect1><title>FILES</title>
/** \brief Output format: MarcXchange (ISO25577) */
#define YAZ_MARC_XCHANGE 5
/** \brief Output format: check only (no marc output) */
-#define YAZ_MARC_CHECK 6
-/** \brief Output format: Turbo MARCXML Index Data format*/
-#define YAZ_MARC_TMARCXML 7
+#define YAZ_MARC_CHECK 6
+/** \brief Output format: Turbo MARC Index Data format (XML based) */
+#define YAZ_MARC_TURBOMARC 7
/** \brief set iconv handle for character set conversion */
YAZ_EXPORT void yaz_marc_iconv(yaz_marc_t mt, yaz_iconv_t cd);
void *client_data);
#if YAZ_HAVE_XML2
-/** \brief parses MARCXML/MarcXchange record from xmlNode pointer
+/** \brief parses MARCXML/MarcXchange/TurboMARC record from xmlNode pointer
\param mt handle
\param ptr is a pointer to root xml node
\retval 0 OK
*/
YAZ_EXPORT int yaz_marc_write_marcxml(yaz_marc_t mt, WRBUF wrbuf);
-/** \brief writes record in TMARCXML format
+/** \brief writes record in TurboMARC format
\param mt handle
\param wrbuf WRBUF for output
\retval 0 OK
\retval -1 ERROR
*/
-YAZ_EXPORT int yaz_marc_write_turbo_xml(yaz_marc_t mt, WRBUF wrbuf);
+YAZ_EXPORT int yaz_marc_write_turbomarc(yaz_marc_t mt, WRBUF wrbuf);
/** \brief writes record in MarcXchange XML (ISO25577)
\param mt handle
}
else if (!strcmp((const char *) ptr->name, "r"))
{
- format = YAZ_MARC_TMARCXML;
+ format = YAZ_MARC_TURBOMARC;
break;
}
else
{
case YAZ_MARC_MARCXML:
return yaz_marc_read_xml_fields(mt, ptr->next);
- case YAZ_MARC_TMARCXML:
+ case YAZ_MARC_TURBOMARC:
return yaz_marc_read_turbo_xml_fields(mt, ptr->next);
}
return -1;
switch(mt->output_format)
{
case YAZ_MARC_MARCXML:
- case YAZ_MARC_TMARCXML:
+ case YAZ_MARC_TURBOMARC:
wrbuf_printf(wr, "</collection>\n");
break;
case YAZ_MARC_XCHANGE:
return yaz_marc_write_line(mt, wr);
case YAZ_MARC_MARCXML:
return yaz_marc_write_marcxml(mt, wr);
- case YAZ_MARC_TMARCXML:
- return yaz_marc_write_turbo_xml(mt, wr);
+ case YAZ_MARC_TURBOMARC:
+ return yaz_marc_write_turbomarc(mt, wr);
case YAZ_MARC_XCHANGE:
return yaz_marc_write_marcxchange(mt, wr, 0, 0); /* no format, type */
case YAZ_MARC_ISO2709:
0, 0, 0);
}
-int yaz_marc_write_turbo_xml(yaz_marc_t mt, WRBUF wr)
+int yaz_marc_write_turbomarc(yaz_marc_t mt, WRBUF wr)
{
/* set leader 09 to 'a' for UNICODE */
/* http://www.loc.gov/marc/bibliographic/ecbdldrd.html#mrcblea */
mode = YAZ_MARC_ISO2709;
if (!strcmp(arg, "marcxml"))
mode = YAZ_MARC_MARCXML;
- if (!strcmp(arg, "tmarcxml"))
- mode = YAZ_MARC_TMARCXML;
+ if (!strcmp(arg, "turbomarc"))
+ mode = YAZ_MARC_TURBOMARC;
if (!strcmp(arg, "marcxchange"))
mode = YAZ_MARC_XCHANGE;
if (!strcmp(arg, "line"))
if (input_charset && !output_charset)
output_charset = "utf-8";
}
- else if (!strcmp(output_format, "tmarcxml"))
+ else if (!strcmp(output_format, "turbomarc"))
{
- output_format_mode = YAZ_MARC_TMARCXML;
+ output_format_mode = YAZ_MARC_TURBOMARC;
if (input_charset && !output_charset)
output_charset = "utf-8";
}
ret = -1;
}
else if (r->u.marc.input_format == YAZ_MARC_MARCXML ||
- r->u.marc.input_format == YAZ_MARC_TMARCXML)
+ r->u.marc.input_format == YAZ_MARC_TURBOMARC)
{
xmlDocPtr doc = xmlParseMemory(wrbuf_buf(record),
wrbuf_len(record));
}
else if (!strcmp(type, "txml"))
{
- return get_record_format(rec, len, npr, YAZ_MARC_TMARCXML, charset,
+ return get_record_format(rec, len, npr, YAZ_MARC_TURBOMARC, charset,
format);
}
else if (!strcmp(type, "raw"))
echo "binmarc -> marcxml(libxml2): $?"
fi
-binmarc_convert "tmarcxml" "tmarcxml" "t"
-echo "binmarc -> tmarcxml: $?"
+binmarc_convert "turbomarc" "turbomarc" "t"
+echo "binmarc -> turbomarc: $?"
if test -z "$noxmlwrite"; then
-binmarc_convert "xml,tmarcxml" "tmarcxml" "xml2t"
-echo "binmarc -> tmarcxml(libxml2): $?"
+binmarc_convert "xml,turbomarc" "turbomarc" "xml2t"
+echo "binmarc -> turbomarc(libxml2): $?"
fi
exit $ecode
yaz_marc_write_using_libxml2(mt, write_using_libxml2);
yaz_marc_debug(mt, verbose);
- if (input_format == YAZ_MARC_MARCXML || input_format == YAZ_MARC_TMARCXML || input_format == YAZ_MARC_XCHANGE)
+ if (input_format == YAZ_MARC_MARCXML || input_format == YAZ_MARC_TURBOMARC || input_format == YAZ_MARC_XCHANGE)
{
#if YAZ_HAVE_XML2
marcdump_read_xml(mt, fname);