1 <!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook V4.1//EN"
2 "http://www.oasis-open.org/docbook/xml/4.1/docbookx.dtd"
4 <!ENTITY % local SYSTEM "local.ent">
6 <!ENTITY % entities SYSTEM "entities.ent">
8 <!ENTITY % idcommon SYSTEM "common/common.ent">
11 <!-- $Id: yaz-icu-man.xml,v 1.1 2007-11-12 11:13:05 adam Exp $ -->
12 <refentry id="zoomsh">
14 <productname>YAZ</productname>
15 <productnumber>&version;</productnumber>
19 <refentrytitle>yaz-icu</refentrytitle>
20 <manvolnum>1</manvolnum>
24 <refname>yaz-icu</refname>
25 <refpurpose>YAZ ICU utility</refpurpose>
30 <command>yaz-icu</command>
31 <arg choice="opt" rep="repeat">commands</arg>
32 <arg>-c <replaceable>config</replaceable></arg>
33 <arg>-p <replaceable>opt</replaceable></arg>
38 <refsect1><title>DESCRIPTION</title>
40 <command>yaz-icu</command> is utility which demonstrates
41 the ICU chain module of yaz. (<filename>yaz/icu.h</filename>).
45 <refsect1><title>OPTIONS</title>
48 <term>-c <replaceable>config</replaceable></term>
50 Specifies the file containing ICU chain configuration
56 <term>-p <replaceable>type</replaceable></term>
58 Specifies extra information to be printed about the ICU system.
59 If <replaceable>type</replaceable> is <literal>c</literal>
60 then ICU converters are printed.
61 If <replaceable>type</replaceable> is <literal>l</literal>
62 available locates are printed.
63 If <replaceable>type</replaceable> is <literal>t</literal>
64 available transliterators are printed.
69 <term>-x <replaceable>config</replaceable></term>
71 Specifies that output should be XML based rather than
78 <refsect1><title>ICU chain configuration</title>
80 The ICU chain configuration speicifies one or more rules to convert
81 text data into tokens. The configuration format is XML based.
84 The toplevel element must be named <literal>icu_chain</literal>.
85 The <literal>icu_chain</literal> element has one required attribute
86 <literal>locale</literal> which specifies the ICU locale to be used
87 in the conversion steps.
90 The <literal>icu_chain</literal> element must include elements where
91 each element specifies a conversion step. The conversion is performed
92 in the order in which the conversion steps are specified.
93 Each conversion element takes one attribute: <literal>rule</literal>
94 which serves as argument to the conversion step.
97 The following conversion elements are available:
103 Converts case and rule specifies how:
109 <para>Lowercase using ICU function u_strToLower. </para>
116 <para>Upper case using ICU function u_strToUpper.</para>
123 <para>To title using UCU function u_strToTitle.</para>
130 <para>Fold case using ICU function u_strFoldCase.</para>
141 This is a meta step which specifies that a term/token is to
142 be displayed. This term is retrieved in an application
143 using function icu_chain_token_display (<filename>yaz/icu.h</filename>).
148 <term>transform</term>
150 Specifies an ICU transform rule. The rule attribute is the
151 custom transformation rule to be used. This is a text based format
152 which is offered by the ICU transform system. See
153 <ulink url="&url.icu.transform;">ICU Transforms</ulink> for
159 <term>tokenize</term>
161 Breaks / tokenizes a string into components using
162 ICU functions ubrk_open, ubrk_setText, .. . The rule is
168 <para>Line. ICU: UBRK_LINE.</para>
175 <para>Sentence. ICU: UBRK_SENTENCE.</para>
182 <para>Word. ICU: UBRK_WORD.</para>
189 <para>Character. ICU: UBRK_CHARACTER.</para>
196 <para>Title. ICU: UBRK_TITLE.</para>
208 <refsect1><title>EXAMPLES</title>
210 The following command analyzes text in file <filename>text</filename>
211 using ICU chain configuration <filename>chain.xml</filename>:
213 cat text | yaz-icu -c chain.xml
215 The chain.xml might look as follows:
217 <icu_chain locale="en">
218 <transform rule="[:Control:] Any-Remove"/>
220 <transform rule="[[:WhiteSpace:][:Punctuation:]] Remove"/>
228 <refsect1><title>SEE ALSO</title>
231 <refentrytitle>yaz</refentrytitle>
232 <manvolnum>7</manvolnum>
236 <ulink url="&url.icu;">ICU Home</ulink>
239 <ulink url="&url.icu.transform;">ICU Transforms</ulink>
244 <!-- Keep this comment at the end of the file
249 sgml-minimize-attributes:nil
250 sgml-always-quote-attributes:t
253 sgml-parent-document:nil
254 sgml-local-catalogs: nil
255 sgml-namecase-general:t