Adam Dickmeiss [Tue, 8 May 2007 14:27:23 +0000 (14:27 +0000)]
Display match string if log level "extract" is used.
Adam Dickmeiss [Tue, 8 May 2007 12:50:03 +0000 (12:50 +0000)]
Use Odr_oid for OIDs. Require YAZ 3.0.2 or later.
Adam Dickmeiss [Thu, 3 May 2007 07:20:19 +0000 (07:20 +0000)]
Require YAZ 3
Adam Dickmeiss [Wed, 25 Apr 2007 09:38:21 +0000 (09:38 +0000)]
Allow safari filter to specify index type.
Adam Dickmeiss [Wed, 25 Apr 2007 08:22:01 +0000 (08:22 +0000)]
log optimized on level 'extrat'; details in 'details'
Adam Dickmeiss [Wed, 25 Apr 2007 08:18:01 +0000 (08:18 +0000)]
Return proper EOF for safari filter
Adam Dickmeiss [Wed, 18 Apr 2007 11:37:39 +0000 (11:37 +0000)]
Dont use nmem_init, nmem_exit
Adam Dickmeiss [Tue, 17 Apr 2007 20:27:14 +0000 (20:27 +0000)]
Update for YAZ 3s libyaz_server.la
Adam Dickmeiss [Mon, 16 Apr 2007 21:54:37 +0000 (21:54 +0000)]
Another and hopefully, last, YAZ OID DB update
Adam Dickmeiss [Mon, 16 Apr 2007 08:44:31 +0000 (08:44 +0000)]
Update for YAZ 3s new OID system
Adam Dickmeiss [Sat, 7 Apr 2007 22:26:27 +0000 (22:26 +0000)]
Changed extract code so that it optimizes updates of records where content
is almost identical to previous version of record. This makes updating of
the internal explain database faster too. Also fixed memory leak that
occurred for each deleted record.
Adam Dickmeiss [Sat, 7 Apr 2007 22:24:12 +0000 (22:24 +0000)]
Fixed bad memory reference that could occur if empty key block was to
be sorted.
Adam Dickmeiss [Sat, 7 Apr 2007 22:18:46 +0000 (22:18 +0000)]
Remove leading blank line
Adam Dickmeiss [Tue, 3 Apr 2007 16:54:46 +0000 (16:54 +0000)]
Fixed bug #1017: assert failure in isamb for delete of records. Problem
was that root ptr of sort ISAMB was not properly flushed to disk when it
changed.
Adam Dickmeiss [Tue, 3 Apr 2007 15:26:14 +0000 (15:26 +0000)]
Make directory config it it is not there
Adam Dickmeiss [Mon, 2 Apr 2007 16:57:08 +0000 (16:57 +0000)]
Removed a few YLOG_LOG messages. This could be enough to fix bug #1012.
Adam Dickmeiss [Wed, 21 Mar 2007 19:37:15 +0000 (19:37 +0000)]
Update with latest changes.
Adam Dickmeiss [Wed, 21 Mar 2007 19:37:00 +0000 (19:37 +0000)]
Describe the @type action for DOM filter
Adam Dickmeiss [Wed, 21 Mar 2007 19:36:47 +0000 (19:36 +0000)]
Minor change in link to CQL material in YAZ
Adam Dickmeiss [Wed, 21 Mar 2007 13:47:12 +0000 (13:47 +0000)]
For RPN queries the index type (w,p,..) may be specified verbatim
as structure attribute with string value, e.g. @attr 4=w .
Adam Dickmeiss [Tue, 20 Mar 2007 22:42:19 +0000 (22:42 +0000)]
ChangeLog in dist
Adam Dickmeiss [Tue, 20 Mar 2007 22:07:35 +0000 (22:07 +0000)]
Use yaz_iconv flushing.
Adam Dickmeiss [Tue, 20 Mar 2007 22:07:21 +0000 (22:07 +0000)]
Remove debug msg
Adam Dickmeiss [Mon, 19 Mar 2007 21:57:25 +0000 (21:57 +0000)]
Use non-const char return value for strtok work
Adam Dickmeiss [Mon, 19 Mar 2007 21:50:39 +0000 (21:50 +0000)]
WRBUF updates.
Adam Dickmeiss [Wed, 14 Mar 2007 14:16:14 +0000 (14:16 +0000)]
Changed some types in mod_dom.c ; mostly 'xmlChar *' to 'const char *'.
The use of const is more appropriate than non-const becuase these
string references point to xmlNode content - and we are not allowed
to change that. Added buffer safe PI attribute reading for mod_dom.c by
implementing function attr_content_pi. Function index_value_of still has
potential buffer flows. The record extraction system now has a new member,
action, which may be modified by a record filter to signal
delete/replace/insert. This is only honoured if update is used (in which
case the outer system already has said "we don't care whether it's insert
or replace anyway). Added mod_dom test for the use for @type=delete .
Adam Dickmeiss [Wed, 14 Mar 2007 11:48:31 +0000 (11:48 +0000)]
Changed record update API . It is now handled by function
zebra_record_update which does insert/replace/delete/update of
records . This function replaces zebra_record_{insert,delete} and
zebra_admin_exchange_record.
Adam Dickmeiss [Tue, 13 Mar 2007 13:46:11 +0000 (13:46 +0000)]
Fixed bug #944: Allow extraction of multiple records per ES update.
Based on patch from Hans-Werner Hilse.
Adam Dickmeiss [Thu, 8 Mar 2007 21:07:45 +0000 (21:07 +0000)]
Debian package 2.0.13-1
Marc Cromme [Thu, 8 Mar 2007 17:19:12 +0000 (17:19 +0000)]
changed <dom> and <input> parser such that the following conditions actually work:
1) no <input> element at all
2) empty <input> element
3) <input> element starting with an <xslt> instruction (that is, <xmlreader> and/or <marc> not mandatory any more.
Needed to make new define DOM_INPUT_DOM besides DOM_INPUT_MARC and DOM_INPUT_XMLREADER
Still missing detection of <xmlreader> or <marc> after all <xslt> nodes.
And more important: when finding errors here, it's kind of lam just to emit an warning, one should stop processing!
Adam Dickmeiss [Thu, 8 Mar 2007 13:18:35 +0000 (13:18 +0000)]
For MARC indexing, skip until record separator is met.
Adam Dickmeiss [Thu, 8 Mar 2007 12:57:35 +0000 (12:57 +0000)]
Bump to 2.0.13
Marc Cromme [Thu, 8 Mar 2007 11:29:16 +0000 (11:29 +0000)]
corrected typo
Marc Cromme [Thu, 8 Mar 2007 11:24:50 +0000 (11:24 +0000)]
added example of MARCXML indexing with chopping of sort indexes cccording to 'ind2' field containing integer
Adam Dickmeiss [Wed, 7 Mar 2007 21:25:29 +0000 (21:25 +0000)]
Added mod_dom to win32 makefile
Adam Dickmeiss [Wed, 7 Mar 2007 21:14:15 +0000 (21:14 +0000)]
Towards 2.0.12
Adam Dickmeiss [Wed, 7 Mar 2007 21:08:36 +0000 (21:08 +0000)]
Fixed bug with indexing of attributes for rec.grs-class of filters. If
xpath was enabled xelm a/@b would be ignored.
Marc Cromme [Wed, 7 Mar 2007 14:18:35 +0000 (14:18 +0000)]
Added always the XML parsing flag XML_PARSE_NONET to any XML_PARSE_XINCLUDE to avoid spoofing Zebra to fetch megabyte from an external xincluded url. pretty normal safety thing to do, we just did forget before.
Marc Cromme [Wed, 7 Mar 2007 13:05:20 +0000 (13:05 +0000)]
removed documentation of non-working 'insert', 'update' 'delete' functionality in Alvis filter
removed 'update' instruction from example OAI indexing stylesheet
Adam Dickmeiss [Tue, 6 Mar 2007 12:40:18 +0000 (12:40 +0000)]
Fixed bug #931: lem 'zebra::index::field' hangs if not specified 'storeKeys: 1' in zebra.cfg.
Adam Dickmeiss [Tue, 6 Mar 2007 12:21:04 +0000 (12:21 +0000)]
Fixed bug #943: Searches with localid always find a hit.
Adam Dickmeiss [Tue, 6 Mar 2007 12:09:44 +0000 (12:09 +0000)]
Avoid mixed stmt/var declare
Marc Cromme [Tue, 6 Mar 2007 09:24:34 +0000 (09:24 +0000)]
added missing extra dist target
Adam Dickmeiss [Tue, 6 Mar 2007 08:48:57 +0000 (08:48 +0000)]
Fixed bug #946: Coredump on MARC display.
Adam Dickmeiss [Tue, 6 Mar 2007 08:23:24 +0000 (08:23 +0000)]
Added missing xsl for dom1 test.
Marc Cromme [Mon, 5 Mar 2007 13:02:11 +0000 (13:02 +0000)]
added tests for bug #883 'Need an 'ignore' value for the z:type
attribute in the canonical indexing format'
resolved bug #883
tested as well on gutenberg collection
zebra-setup/gutenberg
case closed, see
http://bugzilla.indexdata.dk/show_bug.cgi?id=883
Adam Dickmeiss [Sat, 3 Mar 2007 21:39:10 +0000 (21:39 +0000)]
Fixes for perform_convert: use xmlParseMemory instead of xmlParseMemory
to avoid reading beyond end of buffer. Ensure conversions are stopped
if XSLT conversion fail(s).
Marc Cromme [Thu, 1 Mar 2007 11:21:20 +0000 (11:21 +0000)]
removed section on special record retrieval features, which need a rewrite - only commented out.
added section on debugging of DOM filter configurations
added a bullet point on semantics of DOM filter explaining that records not emerging record and index instructions are discarted, i.e. dropped on the floor. This meets Seb's wishes for the gutenberg collection
Marc Cromme [Thu, 1 Mar 2007 11:18:40 +0000 (11:18 +0000)]
removed quick start and examples, which are very GRS-1 centric.
These need re-writing in terms of the DOM filter
Adam Dickmeiss [Thu, 1 Mar 2007 10:35:46 +0000 (10:35 +0000)]
Allow record filters to return 'skip' this record (RECCTRL_EXTRACT_SKIP).
Make dom filter return 'skip' if no zebra 'record' node exists in
indexing document. Bug #883.
Adam Dickmeiss [Wed, 28 Feb 2007 18:43:06 +0000 (18:43 +0000)]
Fix handling of record retrieval in the case of open failure of external
record file (storedata:0).
Marc Cromme [Wed, 28 Feb 2007 16:46:19 +0000 (16:46 +0000)]
added nice debug output of all xmlreader and xslt XML stuff when running with
zebra/index/zebraidx -c zebra.cfg -s update water.rdf
Don't do thins on huge data - the logs will be at least 4-6 times the size of the input data !!
Marc Cromme [Wed, 28 Feb 2007 14:46:41 +0000 (14:46 +0000)]
closing bug #928 by dropping DOM document to xmlbuffer and re-reading into DOM each time a XSLT transform did occur. Yes, ugly, ugly, but no other possibility.
Added output of XML after each transformation on YLOG_DEBUG level, run indexer with '-v debug' to see all transformations
Marc Cromme [Wed, 28 Feb 2007 13:16:24 +0000 (13:16 +0000)]
removed general warning log of indexing process. this can be seen by running the indexer with '-v debug' anyhow.
Adam Dickmeiss [Mon, 26 Feb 2007 16:12:24 +0000 (16:12 +0000)]
Avoid sprintf with NULL %s value (Solaris dislikes it)
Adam Dickmeiss [Sat, 24 Feb 2007 17:05:40 +0000 (17:05 +0000)]
Fixed bug #929: Unfinished transaction in non-shadow does not get a
warn.
Adam Dickmeiss [Sat, 24 Feb 2007 16:47:16 +0000 (16:47 +0000)]
Deal with two common places for corrupt Explain database
Adam Dickmeiss [Sat, 24 Feb 2007 16:46:22 +0000 (16:46 +0000)]
Proper cleanup (isamb_close) for bad headers
Adam Dickmeiss [Fri, 23 Feb 2007 14:59:12 +0000 (14:59 +0000)]
Use xmlGetLineNo instead of xmlGetNodePath for errors/warnings
Adam Dickmeiss [Fri, 23 Feb 2007 11:35:08 +0000 (11:35 +0000)]
For each element macro.
Adam Dickmeiss [Fri, 23 Feb 2007 11:16:39 +0000 (11:16 +0000)]
For dom filter, in input element construct, parse @inputcharset instead
of @charset .
Adam Dickmeiss [Fri, 23 Feb 2007 11:10:37 +0000 (11:10 +0000)]
Wrap log messages for dom filter. This uses yaz_vsnprintf. Requires
YAZ 2.1.49 or later.
Adam Dickmeiss [Fri, 23 Feb 2007 09:35:17 +0000 (09:35 +0000)]
Fix dist: do not put domfilter.eps in dist.
Marc Cromme [Thu, 22 Feb 2007 15:44:19 +0000 (15:44 +0000)]
added more instructions to DOM filter docs, spell checked both DOM and Alvis filter docs
Marc Cromme [Thu, 22 Feb 2007 12:22:04 +0000 (12:22 +0000)]
added missing dependendy of index.html to all PNG files
Marc Cromme [Thu, 22 Feb 2007 12:10:09 +0000 (12:10 +0000)]
added missing domfilter.eps to make rules, such that it is included in the distribution tarball
Adam Dickmeiss [Thu, 22 Feb 2007 08:59:30 +0000 (08:59 +0000)]
Remove PDF files from EXTRA_DIST/doc_DATA (as done for yaz, metaproxy
for quite some time). Avoid rule option '--export-area-drawing' for
inkscape for generating .png (it doesnt work with sarge). Bug #916.
Adam Dickmeiss [Wed, 21 Feb 2007 17:03:23 +0000 (17:03 +0000)]
zebra.pdf depends on domfilter.pdf
Marc Cromme [Wed, 21 Feb 2007 15:03:30 +0000 (15:03 +0000)]
more info on DOM filter config
Marc Cromme [Wed, 21 Feb 2007 14:15:45 +0000 (14:15 +0000)]
added domfilter.svg to distribution tarball, now make dist runs again
Marc Cromme [Wed, 21 Feb 2007 14:15:07 +0000 (14:15 +0000)]
added more content on dom filter pipelines
Marc Cromme [Wed, 21 Feb 2007 13:38:22 +0000 (13:38 +0000)]
started explaining each dom filter pipeline
Marc Cromme [Wed, 21 Feb 2007 12:29:52 +0000 (12:29 +0000)]
added figure of workflow on DOM XML filter
Marc Cromme [Tue, 20 Feb 2007 15:02:18 +0000 (15:02 +0000)]
small changes to format
Marc Cromme [Tue, 20 Feb 2007 14:57:00 +0000 (14:57 +0000)]
added proper namespace in example config
Marc Cromme [Tue, 20 Feb 2007 14:53:25 +0000 (14:53 +0000)]
some more changes, more to come
Marc Cromme [Tue, 20 Feb 2007 14:28:31 +0000 (14:28 +0000)]
added initial DOM XML filter documentation. Much is missing yet ...
Adam Dickmeiss [Sun, 18 Feb 2007 21:53:22 +0000 (21:53 +0000)]
Fixed bug #898: xslt tests fails on several platforms. Problem was
that test for zs:index node crashed for absent namespace (href==NULL).
Added all .xslt-files in use in est/xslt tests.
Also fixed memory leak in use of xmlGetNodePath.
Adam Dickmeiss [Sun, 18 Feb 2007 21:50:52 +0000 (21:50 +0000)]
Fixed minor memory leak
Marc Cromme [Thu, 15 Feb 2007 15:41:16 +0000 (15:41 +0000)]
changed to respect correct index instructions in new DOM filter
Marc Cromme [Thu, 15 Feb 2007 15:08:41 +0000 (15:08 +0000)]
optimized code such that the RecWord structure recword is only
initialized once for each to-be-indexed record, and not once for each
to-be-indexed term - at the expense of a bit of pointer passing when
recursively transversing the XML DOM tree
Marc Cromme [Thu, 15 Feb 2007 14:44:48 +0000 (14:44 +0000)]
removed dead code pieces which are reminisences from the original
alvis-style parsin and indexing stuff. Now only new dom indexing code
is present.
Marc Cromme [Thu, 15 Feb 2007 14:33:41 +0000 (14:33 +0000)]
pretty formatting warning messages, always giving the file name and
the XML node path as informative parameters along
Marc Cromme [Thu, 15 Feb 2007 13:01:00 +0000 (13:01 +0000)]
rewritten mod_dom instruction parsing code hooked into mod_dom indexing
new stylesheets added, one for PI based indexing, and one for <z:index> based indexing
segmentation fault traced and fixed
test framework updated to use new mod_dom parsing
Marc Cromme [Wed, 14 Feb 2007 16:43:37 +0000 (16:43 +0000)]
added 'static' declaration to functiondefinitions
Marc Cromme [Wed, 14 Feb 2007 16:38:41 +0000 (16:38 +0000)]
changing attribute 'action' to 'type' for better confrmance with Alvis
filter syntax
Marc Cromme [Wed, 14 Feb 2007 16:31:37 +0000 (16:31 +0000)]
indenting entire file according to the rules stated in the very end of
the file, using emacs M-x indent-region, and manual line breaking afterwards
Marc Cromme [Wed, 14 Feb 2007 16:16:15 +0000 (16:16 +0000)]
continued hooking in tinfo and recctr, still need to do real indexing
Marc Cromme [Wed, 14 Feb 2007 15:42:24 +0000 (15:42 +0000)]
removed wanings by zillions of (const char *) casts and the like
Marc Cromme [Wed, 14 Feb 2007 15:23:33 +0000 (15:23 +0000)]
removed the crappy PI and <z:index> parsing code comitted yesterday
replaced with clean parsing logic developped outside mod_dom.c
needs to take care of all new warnings due to stricter compile flags
finally, needs to be hooked into actual indexing of records
Marc Cromme [Tue, 13 Feb 2007 12:19:37 +0000 (12:19 +0000)]
removed unnecessary out-commented code lines
Marc Cromme [Tue, 13 Feb 2007 11:37:02 +0000 (11:37 +0000)]
facturized DOM XML indexing code out into function
static void extract_doc_alvis(struct filter_info *tinfo,
struct recExtractCtrl *recctr,
xmlDocPtr doc)
This is the function to be re-written using both PI and <z:index> instructions,
and also fixing the bug of index type 'p' and '0' chop-over of merged content.
Marc Cromme [Mon, 12 Feb 2007 14:00:20 +0000 (14:00 +0000)]
experimental processing-instruction based indexing XSLT added
Marc Cromme [Mon, 12 Feb 2007 13:58:12 +0000 (13:58 +0000)]
avoiding unnecesasary unused namespace declarations in output documents
Marc Cromme [Mon, 12 Feb 2007 13:24:31 +0000 (13:24 +0000)]
added parsing function 'parse_pi_zebra_20' for processing-instruction parsing and 'format_pi_zebra_err' for error or wanrning formatting. Those are yet not called, and need to be build into the XML parsing in the DOM module.
Adam Dickmeiss [Mon, 12 Feb 2007 10:33:50 +0000 (10:33 +0000)]
Fixed bug #884: Entity declarations in input are lost at retrieval time.
Adam Dickmeiss [Sat, 10 Feb 2007 18:37:42 +0000 (18:37 +0000)]
Fixed serious bug in mf_open which made it fail to see an already existing
metafile. The bug was introduced in mfile 1.70.
Adam Dickmeiss [Sat, 10 Feb 2007 12:46:54 +0000 (12:46 +0000)]
buildconf.sh part of dist.
Marc Cromme [Wed, 7 Feb 2007 13:33:17 +0000 (13:33 +0000)]
corrected DEPRECIATED to DEPRECATED
Marc Cromme [Wed, 7 Feb 2007 13:19:35 +0000 (13:19 +0000)]
added debian libidzebra-2.0-mod-dom package