<chapter id="fields-and-charsets">
- <!-- $Id: field-structure.xml,v 1.5 2006-11-17 14:54:00 marc Exp $ -->
+ <!-- $Id: field-structure.xml,v 1.6 2006-11-23 09:03:50 marc Exp $ -->
<title>Field Structure and Character Sets
</title>
would both produce the same results.
</para>
</section>
- <section id="default-idx-debug">
- <title>Field structure debugging using the special
- <literal>zebra::index::</literal> element set</title>
+
+ <section id="default-idx-zebra">
+ <title>Accessing Zebra internal record data using
+ the <literal>zebra::</literal> element sets</title>
+ <para>
+ Starting with <literal>Zebra</literal> version
+ <literal>2.0.4-2</literal> or newer, one has the possibility to
+ use the special
+ <literal>zebra::data</literal>,
+ <literal>zebra::meta</literal> and
+ <literal>zebra::index</literal> element set names.
+ </para>
+ <note>
+ <para>
+ Usage of the <literal>zebra::</literal> element sets accesses
+ record data directly from the internal storage, and will
+ therefore work exactly the same way, irrespectively of indexing
+ filter used.
+ </para>
+ <para>
+ These element set names are optimized for retrieval speed, and
+ will perform better than using for example
+ <literal>alvis</literal> filter XSLT based extraction of small
+ parts of the records.
+ </para>
+ </note>
+ <para>
+ For example, to fetch the raw binary record data stored in the
+ zebra internal storage, or on the filesystem, the following
+ commands can be issued:
+ <screen>
+ Z> f @attr 1=title my
+ Z> format xml
+ Z> elements zebra::data
+ Z> s 1+1
+ Z> format sutrs
+ Z> s 1+1
+ Z> format usmarc
+ Z> s 1+1
+ </screen>
+ </para>
+ <note>
+ <para>
+ The special
+ <literal>zebra::data</literal> element set name is
+ defined for any record syntax, but will always fetch
+ the raw record data in exactly the original form. No record syntax
+ specific transformations will be applied to the raw record data.
+ </para>
+ </note>
<para>
- At some time, it is very hard to figure out what exactly has been
+ Also, Zebra internal metadata about the record can be accessed:
+ <screen>
+ Z> f @attr 1=title my
+ Z> format xml
+ Z> elements zebra::meta::sysno
+ Z> s 1+1
+ </screen>
+ displays in <literal>XML</literal> record syntax only internal
+ record system number, whereas
+ <screen>
+ Z> f @attr 1=title my
+ Z> format xml
+ Z> elements zebra::meta
+ Z> s 1+1
+ </screen>
+ displays all available metadata on the record. These include sytem
+ number, database name, indexed filename, filter used for indexing,
+ score and static ranking information and finally bytesize of record.
+ </para>
+ <note>
+ <para>
+ The special
+ <literal>zebra::meta</literal> element set names are only
+ defined for
+ <literal>SUTRS</literal> and <literal>XML</literal> record
+ syntaxes.
+ </para>
+ </note>
+ <para>
+ Sometimes, it is very hard to figure out what exactly has been
indexed how and in which indexes. Using the indexing stylesheet of
the Alvis filter, one can at least see which portion of the record
went into which index, but a similar aid does not exist for all
other indexing filters.
</para>
<para>
- Starting with <literal>Zebra</literal> version
- <literal>2.0.4-2</literal> or newer, one has the possibility to
- use the special
- <literal>zebra::index::</literal> element set name, which is only defined for
- the <literal>SUTRS</literal> and <literal>XML</literal> record
- formats.
+ The special
+ <literal>zebra::index</literal> element set names are provided to
+ access information on per record indexed fields. For example, the
+ queries
<screen>
- Z> f @attr 1=dc_all minutter
+ Z> f @attr 1=title my
Z> format sutrs
- Z> elements zebra::index::
+ Z> elements zebra::index
Z> s 1+1
</screen>
will display all indexed tokens from all indexed fields of the
first record, and it will display in <literal>SUTRS</literal>
record syntax, whereas
<screen>
- Z> f @attr 1=dc_all minutter
+ Z> f @attr 1=title my
Z> format xml
- Z> elements zebra::index::dc_publisher
+ Z> elements zebra::index::title
Z> s 1+1
- Z> elements zebra::index::dc_publisher:p
+ Z> elements zebra::index::title:p
Z> s 1+1
</screen>
displays in <literal>XML</literal> record syntax only the content
- of the zebra string index <literal>dc_publisher</literal>, or
+ of the zebra string index <literal>title</literal>, or
even only the type <literal>p</literal> phrase indexed part of it.
</para>
+ <note>
+ <para>
+ The special <literal>zebra::index</literal>
+ element set names are only
+ defined for
+ <literal>SUTRS</literal> and <literal>XML</literal> record
+ syntaxes.
+ </para>
+ <para> Trying to access numeric <literal>Bib-1</literal> use
+ attributes or trying to access non-existent zebra intern string
+ access points will result in a
+ <literal>
+ Diagnostic [25]: Specified element set name not valid for specified database
+ </literal>
+ </para>
+ </note>
</section>
</chapter>
<!-- Keep this comment at the end of the file
-/* $Id: retrieve.c,v 1.55 2006-11-21 22:17:49 adam Exp $
+/* $Id: retrieve.c,v 1.56 2006-11-23 09:03:51 marc Exp $
Copyright (C) 1995-2006
Index Data ApS
-static void parse_zebra_elem(const char *elem,
+static int parse_zebra_elem(const char *elem,
const char **index, size_t *index_len,
const char **type, size_t *type_len)
{
- *type = 0;
- *type_len = 0;
-
*index = 0;
*index_len = 0;
+ *type = 0;
+ *type_len = 0;
+
if (elem && *elem)
{
- const char *cp = strchr(elem, ':');
+ char *cp;
+ /* verify that '::' is in the beginning of *elem
+ and something more follows */
+ if (':' != *elem
+ || !(elem +1) || ':' != *(elem +1)
+ || !(elem +2) || '\0' == *(elem +2))
+ return 0;
+
+ /* pick out info from string after '::' */
+ elem = elem + 2;
+ cp = strchr(elem, ':');
- if (!cp) /* no colon */
+ if (!cp) /* index, no colon, no type */
{
*index = elem;
*index_len = strlen(elem);
}
- else if (cp[1] == '\0') /* 'index:' */
+ else if (cp[1] == '\0') /* colon, but no following type */
{
- *index = elem;
- *index_len = cp - elem;
+ return 0;
}
- else
+ else /* index, colon and type */
{
*index = elem;
*index_len = cp - elem;
*type_len = strlen(cp+1);
}
}
+ return 1;
}
return YAZ_BIB1_NO_SYNTAXES_AVAILABLE_FOR_THIS_REQUEST;
}
- parse_zebra_elem(elemsetname,
+ if (!parse_zebra_elem(elemsetname,
&retrieval_index, &retrieval_index_len,
- &retrieval_type, &retrieval_type_len);
+ &retrieval_type, &retrieval_type_len))
+ return YAZ_BIB1_SPECIFIED_ELEMENT_SET_NAME_NOT_VALID_FOR_SPECIFIED_;
if (retrieval_type_len != 0 && retrieval_type_len != 1)
{
(retrieval_type_len == 0 ? -1 :
retrieval_type[0]),
retrieval_index_cstr) == -1)
- {
return YAZ_BIB1_SPECIFIED_ELEMENT_SET_NAME_NOT_VALID_FOR_SPECIFIED_;
- }
}
}
"<record xmlns="
"\"http://www.indexdata.com/zebra/\""
" sysno=\"" ZINT_FORMAT "\""
- " set=\"zebra::index::%s/\">\n",
+ " set=\"zebra::index%s/\">\n",
sysno, elemsetname);
}
else if (input_format == VAL_SUTRS)
zebraExplain_lookup_ord(zh->reg->zei, ord, &index_type, &db,
&string_index);
string_index_len = strlen(string_index);
+
+ /* process only if index is not defined,
+ or if defined and matching */
if (retrieval_index == 0
|| (string_index_len == retrieval_index_len
&& !memcmp(string_index, retrieval_index,
string_index_len))){
-
+
+ /* process only if type is not defined, or is matching */
if (retrieval_type == 0
|| (retrieval_type_len == 1
&& retrieval_type[0] == index_type)){
- if (input_format == VAL_TEXT_XML){
- wrbuf_printf(wrbuf, " <index name=\"%s\"",
- string_index);
-
- wrbuf_printf(wrbuf, " type=\"%c\"", index_type);
+
+ zebra_term_untrans(zh, index_type, dst_buf, str);
+ if (strlen(dst_buf)){
+
+ if (input_format == VAL_TEXT_XML){
+ wrbuf_printf(wrbuf, " <index name=\"%s\"",
+ string_index);
+
+ wrbuf_printf(wrbuf, " type=\"%c\"", index_type);
+
+ wrbuf_printf(wrbuf, " seq=\"" ZINT_FORMAT "\">",
+ key_in.mem[key_in.len -1]);
- wrbuf_printf(wrbuf, " seq=\"" ZINT_FORMAT "\">",
- key_in.mem[key_in.len -1]);
+ wrbuf_xmlputs(wrbuf, dst_buf);
+ wrbuf_printf(wrbuf, "</index>\n");
+ }
+ else if (input_format == VAL_SUTRS){
+ wrbuf_printf(wrbuf, "%s ", string_index);
+
+ wrbuf_printf(wrbuf, "%c", index_type);
+
+ for (i = 1; i < key_in.len; i++)
+ wrbuf_printf(wrbuf, " " ZINT_FORMAT,
+ key_in.mem[i]);
+
+ /* zebra_term_untrans(zh, index_type, dst_buf, str); */
+ wrbuf_printf(wrbuf, " %s", dst_buf);
- zebra_term_untrans(zh, index_type, dst_buf, str);
- wrbuf_xmlputs(wrbuf, dst_buf);
- wrbuf_printf(wrbuf, "</index>\n");
+ wrbuf_printf(wrbuf, "\n");
+ }
}
- else if (input_format == VAL_SUTRS){
- wrbuf_printf(wrbuf, "%s ", string_index);
- wrbuf_printf(wrbuf, "%c", index_type);
-
- for (i = 1; i < key_in.len; i++)
- wrbuf_printf(wrbuf, " " ZINT_FORMAT,
- key_in.mem[i]);
-
- zebra_term_untrans(zh, index_type, dst_buf, str);
- wrbuf_printf(wrbuf, " %s", dst_buf);
-
- wrbuf_printf(wrbuf, "\n");
- }
}
}
}
}
-int zebra_special_fetch(ZebraHandle zh, zint sysno, ODR odr,
+
+int zebra_special_fetch(ZebraHandle zh, zint sysno, int score, ODR odr,
const char *elemsetname,
oid_value input_format,
oid_value *output_format,
return YAZ_BIB1_SYSTEM_ERROR_IN_PRESENTING_RECORDS;
}
+ /* processing special elementsetnames zebra::meta:: */
+ if (elemsetname && 0 == strcmp(elemsetname, "meta")){
+ int ret = 0;
+ char rec_str[1024];
+ RecordAttr *recordAttr = rec_init_attr(zh->reg->zei, rec);
+
+ if (input_format == VAL_TEXT_XML){
+ *output_format = VAL_TEXT_XML;
+
+ sprintf(rec_str,
+ "<record xmlns="
+ "\"http://www.indexdata.com/zebra/\""
+ " sysno=\"" ZINT_FORMAT "\""
+ " base=\"%s\""
+ " file=\"%s\""
+ " type=\"%s\""
+ " score=\"%i\""
+ " rank=\"" ZINT_FORMAT "\""
+ " size=\"%i\""
+ " set=\"zebra::%s/\">\n",
+ sysno,
+ rec->info[recInfo_databaseName],
+ rec->info[recInfo_filename],
+ rec->info[recInfo_fileType],
+ score,
+ recordAttr->staticrank,
+ recordAttr->recordSize,
+ elemsetname);
+ }
+ else if (input_format == VAL_SUTRS){
+ *output_format = VAL_SUTRS;
+ sprintf(rec_str,
+ "sysno " ZINT_FORMAT "\n"
+ "base %s\n"
+ "file %s\n"
+ "type %s\n"
+ "score %i\n"
+ "rank " ZINT_FORMAT "\n"
+ "size %i\n"
+ "set zebra::%s\n",
+ sysno,
+ rec->info[recInfo_databaseName],
+ rec->info[recInfo_filename],
+ rec->info[recInfo_fileType],
+ score,
+ recordAttr->staticrank,
+ recordAttr->recordSize,
+ elemsetname);
+ }
+
+
+ *rec_lenp = strlen(rec_str);
+ if (*rec_lenp){
+ *rec_bufp = odr_strdup(odr, rec_str);
+ ret = 0;
+ } else {
+ ret = YAZ_BIB1_SYSTEM_ERROR_IN_PRESENTING_RECORDS;
+ }
+
+ rec_free(&rec);
+ return ret;
+ }
+
/* processing special elementsetnames zebra::index:: */
- if (elemsetname && 0 == strncmp(elemsetname, "index::", 7)){
+ if (elemsetname && 0 == strncmp(elemsetname, "index", 5)){
int ret = zebra_special_index_fetch(zh, sysno, odr, rec,
- elemsetname + 7,
+ elemsetname + 5,
input_format, output_format,
rec_bufp, rec_lenp);
/* processing zebra special elementset names of form 'zebra:: */
if (elemsetname && 0 == strncmp(elemsetname, "zebra::", 7))
- return zebra_special_fetch(zh, sysno, odr,
+ return zebra_special_fetch(zh, sysno, score, odr,
elemsetname + 7,
input_format, output_format,
rec_bufp, rec_lenp);