1 # $Id: WebService.pod,v 1.5 2007-01-24 09:28:02 mike Exp $
3 package ZOOM::IRSpy::WebService;
7 ZOOM::IRSpy::WebService - Accessing the IRSpy database as a Web Service
11 Because IRSpy keeps its information about targets as ZeeRex records in
12 a Zebra database, that information is available via the SRU and SRW
13 web services. These two services are very closely related: the former
14 REST-like, based on HTTP GET URLs, and the latter SOAP-based. Both
15 use the same query language (CQL) and the same XML-based result
18 (In addition, Zebra provides ANSI/NISO Z39.50 services, but these are
19 not further discussed here.)
23 Here is a example SRU URL that accesses the IRSpy database of the live
24 system (although it will not be accessible to most clients due to
25 firewall issues. It is broken across lines for clarity:
27 http://irspy.indexdata.com:8018/IR-Explain---1?
29 operation=searchRetrieve&
36 # http://irspy.indexdata.com:8018/IR-Explain---1?version=1.1&operation=searchRetrieve&query=net.port=3950&maximumRecords=10&recordSchema=zeerex
40 It is beyond the scope of this document to provide a full SRU
41 tutorial, but briefly, the URL above consists of the following parts:
45 =item http://irspy.indexdata.com:8018
47 The base-URL of the SRU server.
51 The name of the SRU database.
53 =item version=1.1, operation=searchRetrieve, etc.
55 SRU parameters specifying the operation requested.
59 The parameters are as follows:
65 Mandatory - SRU requests must contain an explicit version identifier,
66 and Zebra supports only version 1.1.
68 =item operation=searchRetrieve
70 Mandatory - SRU requests must contain an operation. Zebra supports
71 several, as discussed below.
73 =item query=net.port=3950
75 When the operation is C<searchRetrieve>, a query must be specified.
76 The query is always expressed in CQL (Common Query Language), which
77 Zebra's IRSpy database supports as described below.
79 =item maximumRecords=10
81 Optional. Specifies how many records to include in a search
82 response. When omitted, defaults to zero: the response includes a
83 hit-count but no records.
85 =item recordSchema=zeerex
87 Optional. Specifies what format the included XML records, if any,
88 should be in. If omitted, defaults to "dc" (Dublin Core). Zebra's
89 IRSpy database supports several schemas as described below.
95 =head2 SUPPORTED OPERATIONS
97 Zebra supports the following SRU operations:
103 This operation requires no further parameters, and returns a ZeeRex
104 record describing the IRSpy database itself.
108 This is the principle operation of SRU, combining searching of the
109 database and retrieval of the records that are found. Its behaviour
110 is specified primarily by the C<query> parameter, support for which is
111 described below, but also by C<startRecord>, C<maximumRecords> and
116 This operation scans an index of the database and returns a list of
117 candidate search terms for that index, including hit-counts. Its
118 behaviour is specified primarily by the C<scanClause> parameter, but
119 also by C<maximumTerms> and C<responsePosition>.
121 Here is an example SRU Scan URL:
123 http://irspy.indexdata.com:8018/IR-Explain---1?
126 scanClause=dc.title=fish
128 This lists all words occurring in titles, in alphabetical order,
129 beginning with "fish" or, if that word does not occur in any title,
130 the word that immediately follows it alphabetically.
132 The C<scanClause> parameter is a tiny query, consisting only an
133 index-name, a relation (usually "=") and a term. The supported index
134 names are the same as those listed below.
140 The following CQL context sets are supported, and are recognised in
141 queries by the specified prefixes:
148 http://www.loc.gov/standards/sru/cql/cql-context-set.html
152 The Record Metadata context set.
153 http://srw.cheshire3.org/contextSets/rec/1.1/
157 The Network context set.
158 http://srw.cheshire3.org/contextSets/net/
162 The Dublin Core context set.
163 http://www.loc.gov/standards/sru/cql/dc-context-set.html
167 The ZeeRex context set.
168 http://srw.cheshire3.org/contextSets/ZeeRex/
172 Within those sets, the following indexes are supported:
198 =item zeerex.numberOfRecords
204 =item zeerex.attributeType
206 =item zeerex.attributeValue
210 =item zeerex.recordSyntax
212 =item zeerex.supports_relation
214 =item zeerex.supports_relationModifier
216 =item zeerex.supports_maskingCharacter
218 =item zeerex.default_contextSet
220 =item zeerex.default_index
224 These indexes may in general be used with all the relations
233 although of course not all combinations of index and relation make
235 The masking characters
239 may be used in all appropriate circumstances, as may the
240 word-anchoring character C<^>.
242 Finally, sorting criteria may be specified within the query itself.
243 Since YAZ's CQL parser does not yet implement the recently approved
244 CQL 1.2 sorting extension described at
245 http://zing.z3950.org/cql/sorting.html a different scheme is used
246 involving special relation modifiers, C<sort>, C<sort-desc> and
249 When a search-term that carries either the C<sort> or C<sort-desc>
250 relation-modifier is C<or>'d with a query, the results of that query
251 are sorted according to the value associated with the specified index
252 - for example, sorted by title if the query is C<or>'d with
253 C<dc.title=/sort 0>. In such sort-specification query terms, the term
254 itself (C<0> in this example) is the precendence of the sort-key, with
255 zero being highest. Further less significant sort keys may also be
256 specified, using higher-valued terms. By default, sorting is
257 lexicographical (alphabetical); however, if the additional relation
258 modified C<numeric> is also specified, then numeric sorting is used.
260 For example, the query:
262 net.host = *.edu and dc.title=^a* or net.port=/sort/numeric 0
264 Finds records describing services hosted in the C<.edu> domain and
265 whose titles' first words begin with the letter C<a>, and sorts the
266 results in numeric order of the port number that they run on. And the
269 net.host = *.edu or net.port=/sort/numeric 0 or net.path=/sort-desc 1
271 Sorts all the C<.edu>-hosted services numerically by port; and further
272 sorts each equivalence class of services running the same port
273 alphabetically, descending, by database name.
275 =head2 RECORD SCHEMAS
277 The IRSpy Zebra database supports record retrieval using the following
284 Dublin Core records (title, creator, description, etc.)
288 ZeeRex records, the definitive version of the information that drives
289 the database. These records use an extended version of the ZeeRex 2.0
290 schema that also includes an <irspy:status> element at the end of the
295 An XML format that prescribes how the record is indexed for
296 searching. This is useful for debugging, but not likely to be very
297 exciting for casual passers-by.
305 The specifications for SRU (REST-like Web Service) at
306 http://www.loc.gov/sru
308 The specifications for SRW (SOAP-based Web Service) at
309 http://www.loc.gov/srw
311 The Z39.50 specifications at
312 http://lcweb.loc.gov/z3950/agency/
314 The ZeeRex specifications at
315 http://explain.z3950.org/
317 The Zebra database at
318 http://indexdata.com/zebra
322 Mike Taylor, E<lt>mike@indexdata.comE<gt>
324 =head1 COPYRIGHT AND LICENSE
326 Copyright (C) 2006 by Index Data ApS.
328 This library is free software; you can redistribute it and/or modify
329 it under the same terms as Perl itself, either Perl version 5.8.7 or,
330 at your option, any later version of Perl 5 you may have available.