2 package ZOOM::IRSpy::WebService;
6 ZOOM::IRSpy::WebService - Accessing the IRSpy database as a Web Service
10 Because IRSpy keeps its information about targets as ZeeRex records in
11 a Zebra database, that information is available via the SRU and SRW
12 web services. These two services are very closely related: the former
13 REST-like, based on HTTP GET URLs, and the latter SOAP-based. Both
14 use the same query language (CQL) and the same XML-based result
17 (In addition, Zebra provides ANSI/NISO Z39.50 services, but these are
18 not further discussed here.)
22 Here is a example SRU URL that accesses the IRSpy database of the live
23 system (although it will not be accessible to most clients due to
24 firewall issues. It is broken across lines for clarity:
26 http://irspy.indexdata.com:8018/IR-Explain---1?
28 operation=searchRetrieve&
35 # http://irspy.indexdata.com:8018/IR-Explain---1?version=1.1&operation=searchRetrieve&query=net.port=3950&maximumRecords=10&recordSchema=zeerex
39 It is beyond the scope of this document to provide a full SRU
40 tutorial, but briefly, the URL above consists of the following parts:
44 =item http://irspy.indexdata.com:8018
46 The base-URL of the SRU server.
50 The name of the SRU database.
52 =item version=1.1, operation=searchRetrieve, etc.
54 SRU parameters specifying the operation requested.
58 The parameters are as follows:
64 Mandatory - SRU requests must contain an explicit version identifier,
65 and Zebra supports only version 1.1.
67 =item operation=searchRetrieve
69 Mandatory - SRU requests must contain an operation. Zebra supports
70 several, as discussed below.
72 =item query=net.port=3950
74 When the operation is C<searchRetrieve>, a query must be specified.
75 The query is always expressed in CQL (Common Query Language), which
76 Zebra's IRSpy database supports as described below.
78 =item maximumRecords=10
80 Optional. Specifies how many records to include in a search
81 response. When omitted, defaults to zero: the response includes a
82 hit-count but no records.
84 =item recordSchema=zeerex
86 Optional. Specifies what format the included XML records, if any,
87 should be in. If omitted, defaults to "dc" (Dublin Core). Zebra's
88 IRSpy database supports several schemas as described below.
94 =head2 SUPPORTED OPERATIONS
96 Zebra supports the following SRU operations:
102 This operation requires no further parameters, and returns a ZeeRex
103 record describing the IRSpy database itself.
107 This is the principle operation of SRU, combining searching of the
108 database and retrieval of the records that are found. Its behaviour
109 is specified primarily by the C<query> parameter, support for which is
110 described below, but also by C<startRecord>, C<maximumRecords> and
115 This operation scans an index of the database and returns a list of
116 candidate search terms for that index, including hit-counts. Its
117 behaviour is specified primarily by the C<scanClause> parameter, but
118 also by C<maximumTerms> and C<responsePosition>.
120 Here is an example SRU Scan URL:
122 http://irspy.indexdata.com:8018/IR-Explain---1?
125 scanClause=dc.title=fish
127 This lists all words occurring in titles, in alphabetical order,
128 beginning with "fish" or, if that word does not occur in any title,
129 the word that immediately follows it alphabetically.
131 The C<scanClause> parameter is a tiny query, consisting only an
132 index-name, a relation (usually "=") and a term. The supported index
133 names are the same as those listed below.
139 The following CQL context sets are supported, and are recognised in
140 queries by the specified prefixes:
147 http://www.loc.gov/standards/sru/cql/cql-context-set.html
151 The Record Metadata context set.
152 http://srw.cheshire3.org/contextSets/rec/1.1/
156 The Network context set.
157 http://srw.cheshire3.org/contextSets/net/
161 The Dublin Core context set.
162 http://www.loc.gov/standards/sru/cql/dc-context-set.html
166 The ZeeRex context set.
167 http://srw.cheshire3.org/contextSets/ZeeRex/
171 Within those sets, the following indexes are supported:
197 =item zeerex.numberOfRecords
203 =item zeerex.attributeType
205 =item zeerex.attributeValue
209 =item zeerex.recordSyntax
211 =item zeerex.supports_relation
213 =item zeerex.supports_relationModifier
215 =item zeerex.supports_maskingCharacter
217 =item zeerex.default_contextSet
219 =item zeerex.default_index
223 These indexes may in general be used with all the relations
232 although of course not all combinations of index and relation make
234 The masking characters
238 may be used in all appropriate circumstances, as may the
239 word-anchoring character C<^>.
241 Finally, sorting criteria may be specified within the query itself.
242 Since YAZ's CQL parser does not yet implement the recently approved
243 CQL 1.2 sorting extension described at
244 http://zing.z3950.org/cql/sorting.html a different scheme is used
245 involving special relation modifiers, C<sort>, C<sort-desc> and
248 When a search-term that carries either the C<sort> or C<sort-desc>
249 relation-modifier is C<or>'d with a query, the results of that query
250 are sorted according to the value associated with the specified index
251 - for example, sorted by title if the query is C<or>'d with
252 C<dc.title=/sort 0>. In such sort-specification query terms, the term
253 itself (C<0> in this example) is the precendence of the sort-key, with
254 zero being highest. Further less significant sort keys may also be
255 specified, using higher-valued terms. By default, sorting is
256 lexicographical (alphabetical); however, if the additional relation
257 modified C<numeric> is also specified, then numeric sorting is used.
259 For example, the query:
261 net.host = *.edu and dc.title=^a* or net.port=/sort/numeric 0
263 Finds records describing services hosted in the C<.edu> domain and
264 whose titles' first words begin with the letter C<a>, and sorts the
265 results in numeric order of the port number that they run on. And the
268 net.host = *.edu or net.port=/sort/numeric 0 or net.path=/sort-desc 1
270 Sorts all the C<.edu>-hosted services numerically by port; and further
271 sorts each equivalence class of services running the same port
272 alphabetically, descending, by database name.
274 =head2 RECORD SCHEMAS
276 The IRSpy Zebra database supports record retrieval using the following
283 Dublin Core records (title, creator, description, etc.)
287 ZeeRex records, the definitive version of the information that drives
288 the database. These records use an extended version of the ZeeRex 2.0
289 schema that also includes an <irspy:status> element at the end of the
294 An XML format that prescribes how the record is indexed for
295 searching. This is useful for debugging, but not likely to be very
296 exciting for casual passers-by.
304 The specifications for SRU (REST-like Web Service) at
305 http://www.loc.gov/sru
307 The specifications for SRW (SOAP-based Web Service) at
308 http://www.loc.gov/srw
310 The Z39.50 specifications at
311 http://lcweb.loc.gov/z3950/agency/
313 The ZeeRex specifications at
314 http://explain.z3950.org/
316 The Zebra database at
317 http://indexdata.com/zebra
321 Mike Taylor, E<lt>mike@indexdata.comE<gt>
323 =head1 COPYRIGHT AND LICENSE
325 Copyright (C) 2006 by Index Data ApS.
327 This library is free software; you can redistribute it and/or modify
328 it under the same terms as Perl itself, either Perl version 5.8.7 or,
329 at your option, any later version of Perl 5 you may have available.