Marc Cromme [Fri, 25 May 2007 12:30:27 +0000 (12:30 +0000)]
added ICU urls and a section on ICU tokenization and normalization
Marc Cromme [Fri, 25 May 2007 10:32:55 +0000 (10:32 +0000)]
removed debug print statements
Adam Dickmeiss [Fri, 25 May 2007 06:51:35 +0000 (06:51 +0000)]
Fixed non-icu compilation
Sebastian Hammer [Fri, 25 May 2007 03:58:04 +0000 (03:58 +0000)]
took sizeof of the wrong variable.. darnit
Adam Dickmeiss [Thu, 24 May 2007 11:10:26 +0000 (11:10 +0000)]
Similar to pazpar2.cfg.dist but with ICU stuff in it. Seems to
work quite well at this time.
Adam Dickmeiss [Thu, 24 May 2007 11:09:27 +0000 (11:09 +0000)]
Minor rearrangements. Prints normalized terms to console in
ICU mode (must be removed or controlled later).
Adam Dickmeiss [Thu, 24 May 2007 10:57:38 +0000 (10:57 +0000)]
Cosmetic
Adam Dickmeiss [Thu, 24 May 2007 10:56:38 +0000 (10:56 +0000)]
Removed unneeded initialization of more_tokens and need_new_token.
Simplified initialization of buf16 chain member.
Adam Dickmeiss [Thu, 24 May 2007 10:52:36 +0000 (10:52 +0000)]
Test for bug 1140 now passes
Adam Dickmeiss [Thu, 24 May 2007 10:51:36 +0000 (10:51 +0000)]
Fixed bug #1140: Unexpected EOF for icu_chain_next_token. The
need_new_token was left in an unitialized state (from last round).
Adam Dickmeiss [Thu, 24 May 2007 10:35:21 +0000 (10:35 +0000)]
Added test case for bug #1140 in routine test_bug_1140. Is not currently
called from main (#if 0 block).
Adam Dickmeiss [Wed, 23 May 2007 21:58:28 +0000 (21:58 +0000)]
New pazpar2 option -X which puts pazpar2 in debug (insecure) mode.
At this point (-X) only affects the session ID creation.
Marc Cromme [Wed, 23 May 2007 14:44:18 +0000 (14:44 +0000)]
First ICU chain integration in relevance ranking of pazpar2.
Tokenization not working correctly, need more debugging.
Marc Cromme [Wed, 23 May 2007 11:19:31 +0000 (11:19 +0000)]
ICU chain loaded imder configuration of server. This happens only if ICU support is compiled in, and additionally, an ICU chain config section is present in the main pazpar2.cfg XML file.
Adam Dickmeiss [Wed, 23 May 2007 10:04:55 +0000 (10:04 +0000)]
Ignore executable
Adam Dickmeiss [Wed, 23 May 2007 09:57:54 +0000 (09:57 +0000)]
HTTP error handler yaz_logs and uses yaz_snprintf to prevent buffer
overflows. Queries are checked for UTF-8 correctness. This works best
with YAZ 3.0.5 or later.
Jakub Skoczen [Wed, 23 May 2007 09:18:10 +0000 (09:18 +0000)]
Removed bug with improper utf-8 query encoding.
Marc Cromme [Wed, 23 May 2007 09:08:10 +0000 (09:08 +0000)]
use <form accept-charset='UTF-8'
Marc Cromme [Wed, 23 May 2007 06:42:25 +0000 (06:42 +0000)]
protect from missing ICU development environment by #ifdef HAVE_ICU , such that build process can continue on platforms missing ICU
Marc Cromme [Tue, 22 May 2007 21:20:10 +0000 (21:20 +0000)]
finished test ICU stand-allone program for benchmarking of ICU tokenization and normalization. Works quite well, benchmarking on the James English Bible from Project Gutenberg (4,5 MB plain text consisting of 870.000 individual tokens) took 3.5 seconds on a laptop. More testing/benchmarking is needed.
Marc Cromme [Tue, 22 May 2007 08:26:59 +0000 (08:26 +0000)]
started stand-allone ICU test
Adam Dickmeiss [Tue, 22 May 2007 07:51:45 +0000 (07:51 +0000)]
Added a few frees here and there
Marc Cromme [Mon, 21 May 2007 10:14:08 +0000 (10:14 +0000)]
ICU chain XML configuration up and running, used in unit test as well.
Jakub Skoczen [Mon, 21 May 2007 10:10:41 +0000 (10:10 +0000)]
Counters and timers cd.
Jakub Skoczen [Mon, 21 May 2007 09:23:14 +0000 (09:23 +0000)]
No longer relevant.
Jakub Skoczen [Mon, 21 May 2007 09:07:43 +0000 (09:07 +0000)]
Corrected timers and counters.
Jakub Skoczen [Mon, 21 May 2007 08:21:33 +0000 (08:21 +0000)]
Creating symlinks for the pz2.js during the configure process
Marc Cromme [Sun, 20 May 2007 19:00:17 +0000 (19:00 +0000)]
ICU chain working correctly with tokenizer, normalizer and casemap operations, and data extraction in normalform, sortkey form and display form all possible. New unit test added
Sebastian Hammer [Fri, 18 May 2007 19:52:52 +0000 (19:52 +0000)]
Initialize ID setting searlier -- Should alleviate need for explicitly setting id
Jakub Skoczen [Fri, 18 May 2007 17:16:05 +0000 (17:16 +0000)]
Last touch :).
Jakub Skoczen [Fri, 18 May 2007 15:16:18 +0000 (15:16 +0000)]
Added simple stylesheet to the example.
Jakub Skoczen [Fri, 18 May 2007 13:00:14 +0000 (13:00 +0000)]
Cleaning styles.
Jakub Skoczen [Fri, 18 May 2007 12:38:48 +0000 (12:38 +0000)]
Handling situation when the location node is empty.
Jakub Skoczen [Fri, 18 May 2007 11:44:44 +0000 (11:44 +0000)]
Removing obsolete stuff.
Jakub Skoczen [Fri, 18 May 2007 11:36:39 +0000 (11:36 +0000)]
Added simple example of the pz2.js usage.
Jakub Skoczen [Thu, 17 May 2007 22:56:41 +0000 (22:56 +0000)]
Temporary fix to make the target filter work.
Jakub Skoczen [Thu, 17 May 2007 21:00:09 +0000 (21:00 +0000)]
Bug in pzHttpRequest->get
Jakub Skoczen [Wed, 16 May 2007 20:54:17 +0000 (20:54 +0000)]
Removed a bug which caused a malfunction of XMLHttpRequest in some browsers.
Marc Cromme [Wed, 16 May 2007 19:50:01 +0000 (19:50 +0000)]
ICU chain passes directives display, norm, sort, and normalize. Directives tokenize and charmap need more work yet.
Marc Cromme [Wed, 16 May 2007 19:12:00 +0000 (19:12 +0000)]
corrected ICU normalizer functions such that unit test run withut segfault.
icu_buf_utf18_copy function corrected to set utf16_len right
Sebastian Hammer [Wed, 16 May 2007 17:16:21 +0000 (17:16 +0000)]
pz:cclmap:* settings were not recognized by settings/init command
Adam Dickmeiss [Wed, 16 May 2007 13:07:18 +0000 (13:07 +0000)]
Use lynx if that's an alternative to wget.
Marc Cromme [Wed, 16 May 2007 12:39:49 +0000 (12:39 +0000)]
temorarily commented faulty transliterator test out
progress on ICU chain test, but need to fix transliterator test first
Adam Dickmeiss [Wed, 16 May 2007 09:37:34 +0000 (09:37 +0000)]
Skip test if wget is not found
Adam Dickmeiss [Wed, 16 May 2007 08:31:17 +0000 (08:31 +0000)]
Fire test against z3950.indexdata.com/marc instead.
Jakub Skoczen [Wed, 16 May 2007 07:53:31 +0000 (07:53 +0000)]
pz2.js:
removed jquery dependency
added xsl stylesheet support for detailed record view
merged with pzQuery.js
added pzHttpRequest class
client.js:
updated to use the new library
Adam Dickmeiss [Tue, 15 May 2007 21:40:57 +0000 (21:40 +0000)]
Fix check for yaz-ztest. Fixed make distcheck.
Adam Dickmeiss [Tue, 15 May 2007 21:28:36 +0000 (21:28 +0000)]
Use -l for pazpar2.
Adam Dickmeiss [Tue, 15 May 2007 21:27:55 +0000 (21:27 +0000)]
Added option pazpar2 option -l to specify logfile. Removed usage msg
and removed description for no longer supported options.
Adam Dickmeiss [Tue, 15 May 2007 15:50:47 +0000 (15:50 +0000)]
Regression test, test_http.sh, moved to sub directory test. The test
makes a session, tries stat, search and show on a local yaz-ztest.
Route make_sessionid modified to return deterministic session ID.
If that is considered a problem an option or configuration must be
added to Pazpar2 so this can be tuned.
Marc Cromme [Tue, 15 May 2007 15:11:42 +0000 (15:11 +0000)]
continuing work on ICU chain of command pattern, not finished yet
Adam Dickmeiss [Tue, 15 May 2007 08:56:03 +0000 (08:56 +0000)]
Begin work on PP2 WS HTTP test.
Adam Dickmeiss [Tue, 15 May 2007 08:52:35 +0000 (08:52 +0000)]
Simplify: use wrbuf_cstr to get a NUL-terminated string out. xfree works
fine on a NULL ptr.
Adam Dickmeiss [Tue, 15 May 2007 08:51:49 +0000 (08:51 +0000)]
Exit when address is already in use (HTTP binding).
Marc Cromme [Mon, 14 May 2007 13:51:24 +0000 (13:51 +0000)]
ICU chain of normalizers and tokenizers half-way implemented
Jakub Skoczen [Mon, 14 May 2007 12:57:43 +0000 (12:57 +0000)]
Minor changes to allow logging out in the client.
Marc Cromme [Mon, 14 May 2007 10:07:48 +0000 (10:07 +0000)]
inital version of ICU chain XML config test file
Marc Cromme [Mon, 14 May 2007 08:01:39 +0000 (08:01 +0000)]
removed dead code from this file
Marc Cromme [Fri, 11 May 2007 22:59:36 +0000 (22:59 +0000)]
free-ing memory to avoid memory leakage in test program
Marc Cromme [Fri, 11 May 2007 22:23:33 +0000 (22:23 +0000)]
checked in very nice ICU normalization examples
Sebastian Hammer [Fri, 11 May 2007 16:57:42 +0000 (16:57 +0000)]
Ignore targets with no name associated -- this is one way to eliminate
'ghost' targets without settins -- at least a name must be set before
a search can proceed.
Marc Cromme [Fri, 11 May 2007 10:38:42 +0000 (10:38 +0000)]
Added icu_buf_utf8_copy() and icu_buf_utf16_copy() functions.
Finished wrapping ICU transliterator in new icu_normalizator object including constructor, destructer, and normalize work functions. Needs more testing, though.
Marc Cromme [Fri, 11 May 2007 09:35:50 +0000 (09:35 +0000)]
constructor and destructor wrappers for ICU transliterator services added
Marc Cromme [Fri, 11 May 2007 08:41:07 +0000 (08:41 +0000)]
non-compiling tests temporarily removed with #if 0 ... #endif
Marc Cromme [Fri, 11 May 2007 08:27:29 +0000 (08:27 +0000)]
added first examples of ICU transliterator token normalization
Adam Dickmeiss [Fri, 11 May 2007 06:48:32 +0000 (06:48 +0000)]
test_icu_I18N.c
Marc Cromme [Thu, 10 May 2007 12:11:42 +0000 (12:11 +0000)]
started ICU transliterator integration for more complex normalization rules than lowercasing
Marc Cromme [Thu, 10 May 2007 11:53:47 +0000 (11:53 +0000)]
danish tokenization unit test added, counting error in tokenizer corrected
Adam Dickmeiss [Thu, 10 May 2007 11:46:09 +0000 (11:46 +0000)]
Factor relevance charset normalization out to a separate implementation
in charsets.c.
Marc Cromme [Thu, 10 May 2007 10:29:58 +0000 (10:29 +0000)]
fixed tokenization counting error, added more english tokenization
unit tests
Adam Dickmeiss [Thu, 10 May 2007 09:26:19 +0000 (09:26 +0000)]
Replacing trie with linear search using linked list. The trie is
both overkill and does not handle null-terminated strings. This change
is one step towards a configurable character set system (which may
use ICU as driver).
Adam Dickmeiss [Thu, 10 May 2007 09:24:32 +0000 (09:24 +0000)]
Changed string chop right; problem is that a pointer could point to
one element before the start of an array (only one element after is
portable).
Marc Cromme [Wed, 9 May 2007 14:01:21 +0000 (14:01 +0000)]
ICU tokenizer works now
Jakub Skoczen [Wed, 9 May 2007 11:54:04 +0000 (11:54 +0000)]
Updated query handling.
Marc Cromme [Mon, 7 May 2007 13:10:00 +0000 (13:10 +0000)]
removed now superflous expwerimental file. useful contetn moved to icu_I18N.c
Marc Cromme [Mon, 7 May 2007 13:08:26 +0000 (13:08 +0000)]
remove now unnecessary ICU bug experimenting file, useful content moved into icu_I18N.c
Marc Cromme [Mon, 7 May 2007 12:52:04 +0000 (12:52 +0000)]
pretty-formatted all ICU code and removed dead code sections
Marc Cromme [Mon, 7 May 2007 12:18:34 +0000 (12:18 +0000)]
updated ICU casemap wrappers to use dynamic buffers, all ICU tests succeed
Marc Cromme [Mon, 7 May 2007 09:31:36 +0000 (09:31 +0000)]
moved working ICU sorting into YAZ unittest test_icu_I18N.c
commented casemapping out for the time beeing, need to integrate with new dynamic ICU buffers
Marc Cromme [Mon, 7 May 2007 08:42:45 +0000 (08:42 +0000)]
updatet error reporting to only report when strings are actually
sorted wrongly
Marc Cromme [Mon, 7 May 2007 08:15:34 +0000 (08:15 +0000)]
corrected error handeling in UErrorCode icu_utf16_from_utf8() to
mirror the error handeling in UErrorCode icu_utf16_from_utf8_cstr();
Marc Cromme [Mon, 7 May 2007 08:02:03 +0000 (08:02 +0000)]
unnecessary comments and print statements removed
Marc Cromme [Mon, 7 May 2007 07:58:31 +0000 (07:58 +0000)]
ICU sorting works correct now. Had forgotten to pass on the correct length of the destination buffer. Dynamic destination bugger resizing works as well.
Marc Cromme [Thu, 3 May 2007 11:53:12 +0000 (11:53 +0000)]
buffer stuff ok now, correct resizing
added printout of sort keys to see what get's wrong here ..
Marc Cromme [Thu, 3 May 2007 11:35:33 +0000 (11:35 +0000)]
changed error handeling, which had nasty side effects
Marc Cromme [Thu, 3 May 2007 09:36:33 +0000 (09:36 +0000)]
tweaking .. using dynamic allocated buffers. Now sorting fails again, but using static buffers as in icu_bug.c it works ..
Jakub Skoczen [Wed, 2 May 2007 19:32:13 +0000 (19:32 +0000)]
Minor changes to make it easier for for the server side scipt to init session.
Marc Cromme [Wed, 2 May 2007 14:03:03 +0000 (14:03 +0000)]
added ICU experiment which sorts correctly given all locales tried.
Marc Cromme [Wed, 2 May 2007 14:01:36 +0000 (14:01 +0000)]
tweaking, still no good results with danish sorting
Marc Cromme [Tue, 1 May 2007 13:27:32 +0000 (13:27 +0000)]
Added some more locales which fail. Something is very rotten in the kingdom of Denmark! Need to find out what wents wrong ...
Marc Cromme [Tue, 1 May 2007 13:16:09 +0000 (13:16 +0000)]
Added sorting test for ICU - only used in test_icu_I18N.c so far.
English and german sorting tests perform fine (including german special characters), but sorting of danish special characters fails. Very suspect. Needs more investigation! See test_icu_I18N_sortmap() in test_icu_I18N.c for details
Marc Cromme [Tue, 1 May 2007 08:17:05 +0000 (08:17 +0000)]
moved ICU helper function declarations from icu_I18N.h header file to icu_I18N.c source file
Marc Cromme [Tue, 1 May 2007 08:10:26 +0000 (08:10 +0000)]
cleaned ICU case folding/mapping tests
Adam Dickmeiss [Tue, 1 May 2007 07:58:43 +0000 (07:58 +0000)]
Fixed compilation of test test_icu_I18N (syntax error).
Sebastian Hammer [Tue, 1 May 2007 05:04:53 +0000 (05:04 +0000)]
Handle situation where IDF becomes 0 because all records contain a term (occurs
frequently when records result from a search).
This actually suggests that there may be a better technique than IDF for balancing
our TF, but I'll be darned if I know what it is.
Sebastian Hammer [Tue, 1 May 2007 05:02:54 +0000 (05:02 +0000)]
Handle records with null-value for string sortkey
Sebastian Hammer [Mon, 30 Apr 2007 14:29:48 +0000 (14:29 +0000)]
Added Paratext to demo
Sebastian Hammer [Mon, 30 Apr 2007 14:29:12 +0000 (14:29 +0000)]
Added new resources. Configuration changes
Sebastian Hammer [Mon, 30 Apr 2007 14:28:09 +0000 (14:28 +0000)]
Varous display changes to MK demo
Marc Cromme [Mon, 30 Apr 2007 13:56:52 +0000 (13:56 +0000)]
checked in test for ICU uppercase lowercase, title and foldcase char mapping