From: Marc Cromme Date: Mon, 5 Feb 2007 13:35:12 +0000 (+0000) Subject: feature table updated X-Git-Tag: ZEBRA.2.0.12~74 X-Git-Url: http://jsfdemo.indexdata.com/cgi-bin?a=commitdiff_plain;h=c614b7ba9397183d925e7372c884d8e466141a4d;p=idzebra-moved-to-github.git feature table updated --- diff --git a/doc/introduction.xml b/doc/introduction.xml index 9778008..b9cda85 100644 --- a/doc/introduction.xml +++ b/doc/introduction.xml @@ -1,5 +1,5 @@ - + Introduction
@@ -95,8 +95,11 @@ --> - - &zebra; networked protocols +
+ &zebra; Document Model + +
+ &zebra; document model @@ -108,38 +111,41 @@ - Fundamental operation types - &z3950;/&sru; explain, search, and scan - - + Complex semi-structured Documents + &xml; and &grs1; Documents + Both &xml; and &grs1; documents exhibit a &dom; like internal + representation allowing for complex indexing and display rules + and + - &z3950; protocol support - yes - Protocol facilities supported are: - Init, Search, Present (retrieval), - Segmentation (support for very large records), Delete, Scan - (index browsing), Sort, Close and support for the ``update'' - Extended Service to add or replace an existing &xml; - record. Piggy-backed presents are honored in the search - request. Named result sets are supported. - + Input document formats + &xml;, &sgml;, Text, ISO2709 (&marc;) + + A system of input filters driven by + regular expressions allows most ASCII-based + data formats to be easily processed. + &sgml;, &xml;, ISO2709 (&marc;), and raw text are also + supported. + - Web Service support - &sru_gps; - The protocol operations explain, - searchRetrieve and scan - are supported. &cql; to internal - query model &rpn; - conversion is supported. Extended RPN queries - for search/retrieve and scan are supported. - + Document storage + Index-only, Key storage, Document storage + Data can be, and usually is, imported + into &zebra;'s own storage, but &zebra; can also refer to + external files, building and maintaining indexes of "live" + collections. + +
+
+ +
+ &zebra; Index Scanning &zebra; index scanning @@ -231,10 +249,14 @@ Scan - yes + term suggestions Scan on a given named index returns all the - indexed terms in lexicographical order near the given start term. - + indexed terms in lexicographical order near the given start + term. This can be used to create drop-down menues and search + suggestions. + and + + Facetted browsing @@ -242,20 +264,24 @@ &zebra; supports scan inside a hit set from a previous search, thus reducing the listed terms to the - subset of terms found in the documents/records of the hit set. - + subset of terms found in the documents/records of the hit + set. + Drill-down or refine-search partially scanning in result sets can be used to implement drill-down in search clients - +
+
+
+ &zebra; Document Presentation &zebra; document presentation @@ -275,49 +301,80 @@ Search results include at any time the total hit count of a given query, either exact computed, or approximative, in case that the hit count exceeds a possible pre-defined hit set truncation - level. - - + level. + + and + + Paged result sets yes - Paging of search requests and present/display request can return any - successive number of records from any start position in the hit set, - i.e. it is trivial to provide search results in successive pages of - any size. - + Paging of search requests and present/display request + can return any successive number of records from any start + position in the hit set, i.e. it is trivial to provide search + results in successive pages of any size. + - &xml;ocument transformations + &xml; document transformations &xslt; based - Record presentation can be performed in many pre-defined &xml; data + Record presentation can be performed in many + pre-defined &xml; data formats, where the original &xml; records are on-the-fly transformed through any preconfigured &xslt; transformation. It is therefore trivial to present records in short/full &xml; views, transforming to RSS, Dublin Core, or other &xml; based data formats, or transform records to XHTML snippets ready for inserting in XHTML pages. - + + Binary record transformations &marc;, &usmarc;, &marc21; and &marcxml; + post-filter record transformations - Record Syntaxes Multiple record syntaxes for data retrieval: &grs1;, &sutrs;, - &xml;, ISO2709 (&marc;), etc. Records can be mapped between record syntaxes - and schemas on the fly. - + &xml;, ISO2709 (&marc;), etc. Records can be mapped between + record syntaxes and schemas on the fly. + + + + &zebra; internal metadata + yes + &zebra; internal document metadata can be fetched in + &sutrs; and &xml; record syntaxes. Those are useful in client + applications. + + + + &zebra; internal raw record data + yes + &zebra; internal raw, binary record data can be fetched in + &sutrs; and &xml; record syntaxes, leveraging %zebra; to a + binary storage system + + + + &zebra; internal record field data + yes + &zebra; internal record field data can be fetched in + &sutrs; and &xml; record syntaxes. This makes very fast minimal + record data displays possible. +
+
+
+ &zebra; Sorting and Ranking &zebra; sorting and ranking @@ -335,9 +392,10 @@ Sortnumeric, lexicographicSorting on the basis of alpha-numeric and numeric data - is supported. Alphanumeric sorts can be configured for different data encodings - and locales for European languages. - + is supported. Alphanumeric sorts can be configured for + different data encodings and locales for European languages. + and + Combined sorting @@ -345,29 +403,35 @@ Sorting on the basis of combined sorts ­ e.g. combinations of ascending/descending sorts of lexicographical/numeric/date field data is supported - + Relevance ranking TF-IDF like Relevance-ranking of free-text queries is supported using a TF-IDF like algorithm. - + - Relevence ranking - TDF-IDF like - - + Static pre-ranking + yes + Enables pre-index time ranking of documents where hit + lists are ordered first by ascending static rank, then by + ascending document ID. +
+
+
+ &zebra; Live Updates - - &zebra; document model + +
+ &zebra; live updates @@ -379,38 +443,95 @@ - Complex semi-structured Documents - &xml; and &grs1; Documents - Both &xml; and &grs1; documents exhibit a &dom; like internal - representation allowing for complex indexing and display rules - + Incremental and batch updates + + It is possible to schedule record inserts/updates/deletes in any + quantity, from single individual handled records to batch updates + in strikes of any size, as well as total re-indexing of all records + from file system. + - Input document formats - &xml;, &sgml;, Text, ISO2709 (&marc;) - - A system of input filters driven by - regular expressions allows most ASCII-based - data formats to be easily processed. - &sgml;, &xml;, ISO2709 (&marc;), and raw text are also - supported. - + Remote updates + &z3950; extended services + Updates can be performed from remote locations using the + &z3950; extended services. Access to extended services can be + login-password protected. + and + - Document storage - Index-only, Key storage, Document storage - Data can be, and usually is, imported - into &zebra;'s own storage, but &zebra; can also refer to - external files, building and maintaining indexes of "live" - collections. - + Live updates + transaction based + Data updates are transaction based and can be performed + on running &zebra; systems. Full searchability is preserved + during life data update due to use of shadow disk areas for + update operations. Multiple update transactions at the same + time are lined up, to be performed one after each other. Data + integrity is preserved. + -
+
+
+ &zebra; Networked Protocols + + + &zebra; networked protocols + + + + Feature + Availability + Notes + Reference + + + + + Fundamental operations + &z3950;/&sru; explain, + search, scan, and + update + + + + + &z3950; protocol support + yes + Protocol facilities supported are: + init, search, + present (retrieval), + Segmentation (support for very large records), + delete, scan + (index browsing), sort, + close and support for the update + Extended Service to add or replace an existing &xml; + record. Piggy-backed presents are honored in the search + request. Named result sets are supported. + + + + Web Service support + &sru_gps; + The protocol operations explain, + searchRetrieve and scan + are supported. &cql; to internal + query model &rpn; + conversion is supported. Extended RPN queries + for search/retrieve and scan are supported. + + + + +
+
+
+ &zebra; Data Size and Scalability &zebra; data size and scalability @@ -428,25 +549,20 @@ No of records40-60 million - + Data size 100 GB of record data + &zebra; based applications have sucessfully indexed up + to 100 GB of record data - - - - File pointers - 64 bit - - Scale out multiple discs - + Performance @@ -456,7 +572,7 @@ where N is the total database size, and by O(n), where n is the specific query hit set size. - + Average search times @@ -466,79 +582,24 @@ provided that the boolean queries are constructed sufficiently precise to result in hit sets of the order of 1000 to 5.000 documents. - + Large databases + 64 bit file pointers 64 file pointers assure that register files can extend the 2 GB limit. Logical files can be automatically partitioned over multiple disks, thus allowing for large databases. -
+
- - - &zebra; live updates - - - - Feature - Availability - Notes - Reference - - - - - Batch updates - - It is possible to schedule record inserts/updates/deletes in any - quantity, from single individual handled records to batch updates - in strikes of any size, as well as total re-indexing of all records - from file system. - - - - Incremental updates - - - - - - Remote updates - &z3950; extended services - - - - - Live updates - - Data updates are transaction based and can be performed on running - &zebra; systems. Full searchability is preserved during life data update due to use - of shadow disk areas for update operations. Multiple update transactions at the same time are lined up, to be - performed one after each other. Data integrity is preserved. - - - - Database updates - live, incremental updates - Robust updating - records can be added and deleted ``on the fly'' - without rebuilding the index from scratch. - Records can be safely updated even while users are accessing - the server. - The update procedure is tolerant to crashes or hard interrupts - during database updating - data can be reconstructed following - a crash. - - - - -
+
+ &zebra; Supported Platforms &zebra; supported platforms @@ -555,35 +616,31 @@ Linux - GNU Linux (32 and 64bit), journaling Reiser or (better) JFS filesystem - on disks. GNU/Debian Linux packages are available - + GNU Linux (32 and 64bit), journaling Reiser or (better) + JFS filesystem + on disks. NFS filesystems are not supported. + GNU/Debian Linux packages are available + Unix tarball - Usual tarball install possible on many major Unix systems - + &zebra; is written in portable C, so it runs on most + Unix-like systems. + Usual tarball install possible on many major Unix systems + Windows - - Windows installer packages available - - - - Supported Platforms - UNIX, Linux, Windows (NT/2000/2003/XP) - &zebra; is written in portable C, so it runs on most - Unix-like systems as well as Windows (NT/2000/2003/XP). Binary - distributions are - available for GNU/Debian Linux and Windows - + NT/2000/2003/XP + &zebra; runs as well on Windows (NT/2000/2003/XP). + Windows installer packages available +
- +