implemented full metadata display of fast element set names

author Marc Cromme <marc@indexdata.dk>

Thu, 23 Nov 2006 09:03:50 +0000 (09:03 +0000)

committer Marc Cromme <marc@indexdata.dk>

Thu, 23 Nov 2006 09:03:50 +0000 (09:03 +0000)
author Marc Cromme <marc@indexdata.dk>
Thu, 23 Nov 2006 09:03:50 +0000 (09:03 +0000)
committer Marc Cromme <marc@indexdata.dk>
Thu, 23 Nov 2006 09:03:50 +0000 (09:03 +0000)
diff --git a/doc/field-structure.xml b/doc/field-structure.xml

index 67619d2..6318dbe 100644 (file)
--- a/doc/field-structure.xml
+++ b/doc/field-structure.xml
@@ -1,5 +1,5 @@
   <chapter id="fields-and-charsets">
-  <!-- $Id: field-structure.xml,v 1.5 2006-11-17 14:54:00 marc Exp $ -->
+  <!-- $Id: field-structure.xml,v 1.6 2006-11-23 09:03:50 marc Exp $ -->
    <title>Field Structure and Character Sets
    </title>
    
@@ -261,44 +261,134 @@
      would both produce the same results.
     </para>
    </section>
-  <section id="default-idx-debug">
-   <title>Field structure debugging using the special 
-          <literal>zebra::index::</literal> element set</title>
+
+  <section id="default-idx-zebra">
+   <title>Accessing Zebra internal record data using 
+    the <literal>zebra::</literal> element sets</title>
+   <para>
+    Starting with <literal>Zebra</literal> version
+    <literal>2.0.4-2</literal> or newer, one has the possibility to
+    use the special
+    <literal>zebra::data</literal>,
+    <literal>zebra::meta</literal> and 
+    <literal>zebra::index</literal> element set names.
+   </para>
+   <note>
+    <para>
+     Usage of the <literal>zebra::</literal> element sets accesses
+     record data directly from the internal storage, and will
+     therefore work exactly the same way, irrespectively of indexing
+     filter used. 
+    </para>
+    <para>
+     These element set names are optimized for retrieval speed, and
+     will perform better than using for example
+     <literal>alvis</literal> filter XSLT based extraction of small
+     parts of the records.  
+    </para>
+   </note>
+   <para>
+    For example, to  fetch the raw binary record data stored in the
+    zebra internal storage, or on the filesystem, the following
+    commands can be issued:
+    <screen>
+      Z> f @attr 1=title my
+      Z> format xml
+      Z> elements zebra::data
+      Z> s 1+1
+      Z> format sutrs
+      Z> s 1+1
+      Z> format usmarc
+      Z> s 1+1
+    </screen>
+    </para>
+   <note>
+    <para>
+     The special 
+     <literal>zebra::data</literal> element set name is 
+     defined for any record syntax, but will always fetch  
+     the raw record data in exactly the original form. No record syntax
+     specific transformations will be applied to the raw record data. 
+    </para>
+   </note>
     <para>
-    At some time, it is very hard to figure out what exactly has been
+    Also, Zebra internal metadata about the record can be accessed: 
+    <screen>
+      Z> f @attr 1=title my
+      Z> format xml
+      Z> elements zebra::meta::sysno
+      Z> s 1+1
+    </screen> 
+    displays in <literal>XML</literal> record syntax only internal
+    record system number, whereas 
+    <screen>
+      Z> f @attr 1=title my
+      Z> format xml
+      Z> elements zebra::meta
+      Z> s 1+1
+    </screen> 
+    displays all available metadata on the record. These include sytem
+      number, database name,  indexed filename,  filter used for indexing,
+      score and static ranking information and finally bytesize of record.
+   </para>
+   <note>
+    <para>
+     The special 
+     <literal>zebra::meta</literal> element set names are only 
+     defined for
+     <literal>SUTRS</literal> and <literal>XML</literal> record
+     syntaxes. 
+    </para>
+   </note>
+   <para>
+    Sometimes, it is very hard to figure out what exactly has been
      indexed how and in which indexes. Using the indexing stylesheet of
      the Alvis filter, one can at least see which portion of the record
      went into which index, but a similar aid does not exist for all
      other indexing filters.  
     </para>
     <para>
-    Starting with <literal>Zebra</literal> version
-    <literal>2.0.4-2</literal> or newer, one has the possibility to
-    use the special
-    <literal>zebra::index::</literal> element set name, which is only defined for
-    the <literal>SUTRS</literal> and <literal>XML</literal> record
-    formats.
+    The special
+    <literal>zebra::index</literal> element set names are provided to
+    access information on per record indexed fields. For example, the
+    queries 
      <screen>
-      Z> f @attr 1=dc_all minutter
+      Z> f @attr 1=title my
        Z> format sutrs
-      Z> elements zebra::index::
+      Z> elements zebra::index
        Z> s 1+1
      </screen>
      will display all indexed tokens from all indexed fields of the
      first record, and it will display in <literal>SUTRS</literal>
      record syntax, whereas 
      <screen>
-      Z> f @attr 1=dc_all minutter
+      Z> f @attr 1=title my
        Z> format xml
-      Z> elements zebra::index::dc_publisher
+      Z> elements zebra::index::title
        Z> s 1+1
-      Z> elements zebra::index::dc_publisher:p
+      Z> elements zebra::index::title:p
        Z> s 1+1
      </screen> 
      displays in <literal>XML</literal> record syntax only the content
-      of the zebra string index <literal>dc_publisher</literal>, or
+      of the zebra string index <literal>title</literal>, or
        even only the type <literal>p</literal> phrase indexed part of it.
     </para>
+   <note>
+    <para>
+     The special <literal>zebra::index</literal> 
+     element set names are only 
+     defined for
+     <literal>SUTRS</literal> and <literal>XML</literal> record
+     syntaxes. 
+    </para>
+    <para> Trying to access numeric <literal>Bib-1</literal> use
+    attributes or trying to access non-existent zebra intern string
+    access points will result in a
+    <literal>
+     Diagnostic [25]: Specified element set name not valid for specified database
+    </literal>
+    </para>
+   </note>
    </section>
   </chapter>
   <!-- Keep this comment at the end of the file
diff --git a/index/retrieve.c b/index/retrieve.c

index 64c3dcc..bbd2fc9 100644 (file)
--- a/index/retrieve.c
+++ b/index/retrieve.c
@@ -1,4 +1,4 @@
-/* $Id: retrieve.c,v 1.55 2006-11-21 22:17:49 adam Exp $
+/* $Id: retrieve.c,v 1.56 2006-11-23 09:03:51 marc Exp $
     Copyright (C) 1995-2006
     Index Data ApS
  
@@ -74,31 +74,40 @@ static int zebra_create_record_stream(ZebraHandle zh,
      
  
  
-static void parse_zebra_elem(const char *elem,
+static int parse_zebra_elem(const char *elem,
                               const char **index, size_t *index_len,
                               const char **type, size_t *type_len)
  {
-    *type = 0;
-    *type_len = 0;
-
      *index = 0;
      *index_len = 0;
  
+    *type = 0;
+    *type_len = 0;
+
      if (elem && *elem)
      {
-        const char *cp = strchr(elem, ':');
+        char *cp;
+        /* verify that '::' is in the beginning of *elem 
+           and something more follows */
+        if (':' != *elem
+            || !(elem +1) || ':' != *(elem +1)
+            || !(elem +2) || '\0' == *(elem +2))
+            return 0;
+ 
+        /* pick out info from string after '::' */
+        elem = elem + 2;
+        cp = strchr(elem, ':');
  
-        if (!cp) /* no colon */
+        if (!cp) /* index, no colon, no type */
          {
              *index = elem;
              *index_len = strlen(elem);
          }
-        else if (cp[1] == '\0') /* 'index:' */
+        else if (cp[1] == '\0') /* colon, but no following type */
          {
-            *index = elem;
-            *index_len = cp - elem;
+            return 0;
          }
-        else
+        else  /* index, colon and type */
          {
              *index = elem;
              *index_len = cp - elem;
@@ -106,6 +115,7 @@ static void parse_zebra_elem(const char *elem,
              *type_len = strlen(cp+1);
          }
      }
+    return 1;
  }
  
  
@@ -135,9 +145,10 @@ int zebra_special_index_fetch(ZebraHandle zh, zint sysno, ODR odr,
          return YAZ_BIB1_NO_SYNTAXES_AVAILABLE_FOR_THIS_REQUEST;
      }
  
-    parse_zebra_elem(elemsetname,
+    if (!parse_zebra_elem(elemsetname,
                       &retrieval_index, &retrieval_index_len,
-                     &retrieval_type,  &retrieval_type_len);
+                     &retrieval_type,  &retrieval_type_len))
+        return YAZ_BIB1_SPECIFIED_ELEMENT_SET_NAME_NOT_VALID_FOR_SPECIFIED_;
  
      if (retrieval_type_len != 0 && retrieval_type_len != 1)
      {
@@ -158,9 +169,7 @@ int zebra_special_index_fetch(ZebraHandle zh, zint sysno, ODR odr,
                                               (retrieval_type_len == 0 ? -1 : 
                                                retrieval_type[0]),
                                               retrieval_index_cstr) == -1)
-            {
                  return YAZ_BIB1_SPECIFIED_ELEMENT_SET_NAME_NOT_VALID_FOR_SPECIFIED_;
-            }
          }
      }
  
@@ -183,7 +192,7 @@ int zebra_special_index_fetch(ZebraHandle zh, zint sysno, ODR odr,
                           "<record xmlns="
                           "\"http://www.indexdata.com/zebra/\""
                           " sysno=\"" ZINT_FORMAT "\""
-                         " set=\"zebra::index::%s/\">\n",
+                         " set=\"zebra::index%s/\">\n",
                           sysno, elemsetname);
          }
          else if (input_format == VAL_SUTRS)
@@ -202,42 +211,51 @@ int zebra_special_index_fetch(ZebraHandle zh, zint sysno, ODR odr,
              zebraExplain_lookup_ord(zh->reg->zei, ord, &index_type, &db,
                                      &string_index);
              string_index_len = strlen(string_index);
+
+            /* process only if index is not defined, 
+               or if defined and matching */
              if (retrieval_index == 0 
                  || (string_index_len == retrieval_index_len 
                      && !memcmp(string_index, retrieval_index,
                                 string_index_len))){
-                
+               
+                /* process only if type is not defined, or is matching */
                  if (retrieval_type == 0 
                      || (retrieval_type_len == 1 
                          && retrieval_type[0] == index_type)){
                      
-                    if (input_format == VAL_TEXT_XML){
-                        wrbuf_printf(wrbuf, "  <index name=\"%s\"", 
-                                     string_index);
-                        
-                        wrbuf_printf(wrbuf, " type=\"%c\"", index_type);
+
+                    zebra_term_untrans(zh, index_type, dst_buf, str);
+                    if (strlen(dst_buf)){
+
+                        if (input_format == VAL_TEXT_XML){
+                            wrbuf_printf(wrbuf, "  <index name=\"%s\"", 
+                                         string_index);
+                            
+                            wrbuf_printf(wrbuf, " type=\"%c\"", index_type);
+                            
+                            wrbuf_printf(wrbuf, " seq=\"" ZINT_FORMAT "\">", 
+                                         key_in.mem[key_in.len -1]);
                          
-                        wrbuf_printf(wrbuf, " seq=\"" ZINT_FORMAT "\">", 
-                                     key_in.mem[key_in.len -1]);
+                            wrbuf_xmlputs(wrbuf, dst_buf);
+                            wrbuf_printf(wrbuf, "</index>\n");
+                        }
+                        else if (input_format == VAL_SUTRS){
+                            wrbuf_printf(wrbuf, "%s ", string_index);
+                            
+                            wrbuf_printf(wrbuf, "%c", index_type);
+                            
+                            for (i = 1; i < key_in.len; i++)
+                                wrbuf_printf(wrbuf, " " ZINT_FORMAT, 
+                                             key_in.mem[i]);
+
+                        /* zebra_term_untrans(zh, index_type, dst_buf, str); */
+                            wrbuf_printf(wrbuf, " %s", dst_buf);
                          
-                        zebra_term_untrans(zh, index_type, dst_buf, str);
-                        wrbuf_xmlputs(wrbuf, dst_buf);
-                        wrbuf_printf(wrbuf, "</index>\n");
+                            wrbuf_printf(wrbuf, "\n");
+                        }
                      }
-                    else if (input_format == VAL_SUTRS){
-                        wrbuf_printf(wrbuf, "%s ", string_index);
                      
-                        wrbuf_printf(wrbuf, "%c", index_type);
-                    
-                        for (i = 1; i < key_in.len; i++)
-                            wrbuf_printf(wrbuf, " " ZINT_FORMAT, 
-                                         key_in.mem[i]);
-
-                        zebra_term_untrans(zh, index_type, dst_buf, str);
-                        wrbuf_printf(wrbuf, " %s", dst_buf);
-                        
-                        wrbuf_printf(wrbuf, "\n");
-                    }
                  }
              }
          }
@@ -253,7 +271,8 @@ int zebra_special_index_fetch(ZebraHandle zh, zint sysno, ODR odr,
  }
  
  
-int zebra_special_fetch(ZebraHandle zh, zint sysno, ODR odr,
+
+int zebra_special_fetch(ZebraHandle zh, zint sysno, int score, ODR odr,
                             const char *elemsetname,
                             oid_value input_format,
                             oid_value *output_format,
@@ -304,11 +323,74 @@ int zebra_special_fetch(ZebraHandle zh, zint sysno, ODR odr,
          return YAZ_BIB1_SYSTEM_ERROR_IN_PRESENTING_RECORDS;
      }
  
+    /* processing special elementsetnames zebra::meta:: */
+    if (elemsetname && 0 == strcmp(elemsetname, "meta")){
+        int ret = 0;
+        char rec_str[1024];
+        RecordAttr *recordAttr = rec_init_attr(zh->reg->zei, rec); 
+
+        if (input_format == VAL_TEXT_XML){
+            *output_format = VAL_TEXT_XML;
+
+             sprintf(rec_str, 
+                     "<record xmlns="
+                     "\"http://www.indexdata.com/zebra/\""
+                     " sysno=\"" ZINT_FORMAT "\""
+                     " base=\"%s\""
+                     " file=\"%s\""
+                     " type=\"%s\""
+                     " score=\"%i\""
+                     " rank=\"" ZINT_FORMAT "\""
+                     " size=\"%i\""
+                     " set=\"zebra::%s/\">\n",
+                     sysno, 
+                     rec->info[recInfo_databaseName],
+                     rec->info[recInfo_filename],
+                     rec->info[recInfo_fileType],
+                     score,
+                     recordAttr->staticrank,
+                     recordAttr->recordSize,
+                     elemsetname);
+        }
+        else if (input_format == VAL_SUTRS){
+            *output_format = VAL_SUTRS;
+             sprintf(rec_str, 
+                     "sysno " ZINT_FORMAT "\n"
+                     "base %s\n"
+                     "file %s\n"
+                     "type %s\n"
+                     "score %i\n"
+                     "rank " ZINT_FORMAT "\n"
+                     "size %i\n"
+                     "set zebra::%s\n",
+                     sysno, 
+                     rec->info[recInfo_databaseName],
+                     rec->info[recInfo_filename],
+                     rec->info[recInfo_fileType],
+                     score,
+                     recordAttr->staticrank,
+                     recordAttr->recordSize,
+                     elemsetname);
+        }
+        
+        
+       *rec_lenp = strlen(rec_str);
+        if (*rec_lenp){
+            *rec_bufp = odr_strdup(odr, rec_str);
+            ret = 0;
+        } else {
+            ret = YAZ_BIB1_SYSTEM_ERROR_IN_PRESENTING_RECORDS;
+        }
+
+        rec_free(&rec);
+        return ret;
+    }
+
      /* processing special elementsetnames zebra::index:: */
-    if (elemsetname && 0 == strncmp(elemsetname, "index::", 7)){
+    if (elemsetname && 0 == strncmp(elemsetname, "index", 5)){
          
          int ret = zebra_special_index_fetch(zh, sysno, odr, rec,
-                                            elemsetname + 7,
+                                            elemsetname + 5,
                                              input_format, output_format,
                                              rec_bufp, rec_lenp);
          
@@ -357,7 +439,7 @@ int zebra_record_fetch(ZebraHandle zh, zint sysno, int score,
  
      /* processing zebra special elementset names of form 'zebra:: */
      if (elemsetname && 0 == strncmp(elemsetname, "zebra::", 7))
-        return  zebra_special_fetch(zh, sysno, odr,
+        return  zebra_special_fetch(zh, sysno, score, odr,
                                      elemsetname + 7,
                                      input_format, output_format,
                                      rec_bufp, rec_lenp);
diff --git a/test/api/t16.c b/test/api/t16.c

index 974b703..3070302 100644 (file)
--- a/test/api/t16.c
+++ b/test/api/t16.c
@@ -1,4 +1,4 @@
-/* $Id: t16.c,v 1.5 2006-11-22 10:26:12 adam Exp $
+/* $Id: t16.c,v 1.6 2006-11-23 09:03:51 marc Exp $
     Copyright (C) 1995-2006
     Index Data ApS
  
@@ -117,6 +117,9 @@ static void tst(int argc, char **argv)
      YAZ_CHECK_EQ(fetch_first_compare(zh, "zebra::meta::sysno", VAL_TEXT_XML,
                                       zebra_xml_sysno), ZEBRA_OK);
      
+    YAZ_CHECK_EQ(fetch_first_compare(zh, "zebra::meta", VAL_TEXT_XML,
+                                     "definitely not this"), ZEBRA_FAIL);
+    
      YAZ_CHECK_EQ(fetch_first_compare(zh, "zebra::index::title:p", 
                                       VAL_TEXT_XML,
                                       zebra_xml_index_title_p), ZEBRA_OK);
author	Marc Cromme <marc@indexdata.dk>
	Thu, 23 Nov 2006 09:03:50 +0000 (09:03 +0000)
committer	Marc Cromme <marc@indexdata.dk>
	Thu, 23 Nov 2006 09:03:50 +0000 (09:03 +0000)
doc/field-structure.xml		patch \| blob \| history
index/retrieve.c		patch \| blob \| history
test/api/t16.c		patch \| blob \| history