|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.marc4j.MarcXmlWriter
public class MarcXmlWriter
Class for writing MARC record objects in MARCXML format. This class outputs a
SAX event stream to the given OutputStream
or
Result
object. It can be used in a SAX
pipeline to postprocess the result. By default this class uses a nulll
transform. It is strongly recommended to use a dedicated XML serializer.
This class requires a JAXP compliant XML parser and XSLT processor. The underlying SAX2 parser should be namespace aware. In addition this class requires ICU4J to perform Unicode normalization. A stripped down version of 2.6 originating from the XOM project is included in this distribution.
The following example reads a file with MARC records and writes MARCXML records in UTF-8 encoding to the console:
InputStream input = new FileInputStream("input.mrc") MarcReader reader = new MarcStreamReader(input); MarcWriter writer = new MarcXmlWriter(System.out, true); while (reader.hasNext()) { Record record = reader.next(); writer.write(record); } writer.close();
To perform a character conversion like MARC-8 to UCS/Unicode register a
CharConverter
:
writer.setConverter(new AnselToUnicode());
In addition you can perform Unicode normalization. This is for example not done by the MARC-8 to UCS/Unicode converter. With Unicode normalization text is transformed into the canonical composed form. For example "a´bc" is normalized to "ábc". To perform normalization set Unicode normalization to true:
writer.setUnicodeNormalization(true);
Please note that it's not garanteed to work if you try to convert normalized
Unicode back to MARC-8 encoding using
UnicodeToAnsel
.
This class provides very basic formatting options. For more advanced options
create an instance of this class with a
SAXResult
containing a
ContentHandler
derived from a dedicated XML
serializer.
The following example uses
org.apache.xml.serialize.XMLSerializer
to write MARC records
to XML using MARC-8 to UCS/Unicode conversion and Unicode normalization:
InputStream input = new FileInputStream("input.mrc") MarcReader reader = new MarcStreamReader(input); OutputFormat format = new OutputFormat("xml","UTF-8", true); OutputStream out = new FileOutputStream("output.xml"); XMLSerializer serializer = new XMLSerializer(out, format); Result result = new SAXResult(serializer.asContentHandler()); MarcXmlWriter writer = new MarcXmlWriter(result); writer.setConverter(new AnselToUnicode()); while (reader.hasNext()) { Record record = reader.next(); writer.write(record); } writer.close();
You can post-process the result using a Source
object pointing
to a stylesheet resource and a Result
object to hold the
transformation result tree. The example below converts MARC to MARCXML and
transforms the result tree to MODS using the stylesheet provided by The
Library of Congress:
String stylesheetUrl = "http://www.loc.gov/standards/mods/v3/MARC21slim2MODS3.xsl"; Source stylesheet = new StreamSource(stylesheetUrl); Result result = new StreamResult(System.out); InputStream input = new FileInputStream("input.mrc") MarcReader reader = new MarcStreamReader(input); MarcXmlWriter writer = new MarcXmlWriter(result, stylesheet); writer.setConverter(new AnselToUnicode()); while (reader.hasNext()) { Record record = (Record) reader.next(); writer.write(record); } writer.close();
It is also possible to write the result into a DOM Node:
InputStream input = new FileInputStream("input.mrc") MarcReader reader = new MarcStreamReader(input); DOMResult result = new DOMResult(); MarcXmlWriter writer = new MarcXmlWriter(result); writer.setConverter(new AnselToUnicode()); while (reader.hasNext()) { Record record = (Record) reader.next(); writer.write(record); } writer.close(); Document doc = (Document) result.getNode();
Field Summary | |
---|---|
protected static String |
COLLECTION
|
protected static String |
CONTROL_FIELD
|
protected static String |
DATA_FIELD
|
protected static String |
LEADER
|
protected static String |
RECORD
|
protected static String |
SUBFIELD
|
Constructor Summary | |
---|---|
MarcXmlWriter(OutputStream out)
Constructs an instance with the specified output stream. |
|
MarcXmlWriter(OutputStream out,
boolean indent)
Constructs an instance with the specified output stream and indentation. |
|
MarcXmlWriter(OutputStream out,
String encoding)
Constructs an instance with the specified output stream and character encoding. |
|
MarcXmlWriter(OutputStream out,
String encoding,
boolean indent)
Constructs an instance with the specified output stream, character encoding and indentation. |
|
MarcXmlWriter(Result result)
Constructs an instance with the specified result. |
|
MarcXmlWriter(Result result,
Source stylesheet)
Constructs an instance with the specified stylesheet source and result. |
|
MarcXmlWriter(Result result,
String stylesheetUrl)
Constructs an instance with the specified stylesheet location and result. |
Method Summary | |
---|---|
void |
close()
Closes the writer. |
CharConverter |
getConverter()
Returns the character converter. |
protected char[] |
getDataElement(String data)
|
boolean |
getUnicodeNormalization()
Returns true if this writer will perform Unicode normalization, false otherwise. |
boolean |
hasIndent()
Returns true if indentation is active, false otherwise. |
void |
setConverter(CharConverter converter)
Sets the character converter. |
protected void |
setHandler(Result result,
Source stylesheet)
|
void |
setIndent(boolean indent)
Activates or deactivates indentation. |
void |
setUnicodeNormalization(boolean normalize)
If set to true this writer will perform Unicode normalization on data elements using normalization form C (NFC). |
protected void |
toXml(Record record)
|
void |
write(Record record)
Writes a Record object to the result. |
protected void |
writeEndDocument()
Writes the root end tag to the result. |
protected void |
writeStartDocument()
Writes the root start tag to the result. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected static final String CONTROL_FIELD
protected static final String DATA_FIELD
protected static final String SUBFIELD
protected static final String COLLECTION
protected static final String RECORD
protected static final String LEADER
Constructor Detail |
---|
public MarcXmlWriter(OutputStream out)
MarcException
public MarcXmlWriter(OutputStream out, boolean indent)
MarcException
public MarcXmlWriter(OutputStream out, String encoding)
MarcException
public MarcXmlWriter(OutputStream out, String encoding, boolean indent)
MarcException
public MarcXmlWriter(Result result)
result
-
SAXException
public MarcXmlWriter(Result result, String stylesheetUrl)
result
-
SAXException
public MarcXmlWriter(Result result, Source stylesheet)
result
-
SAXException
Method Detail |
---|
public void close()
MarcWriter
close
in interface MarcWriter
public CharConverter getConverter()
getConverter
in interface MarcWriter
public void setConverter(CharConverter converter)
setConverter
in interface MarcWriter
converter
- the character converterpublic void setUnicodeNormalization(boolean normalize)
normalize
- true if this writer performs Unicode normalization, false
otherwisepublic boolean getUnicodeNormalization()
protected void setHandler(Result result, Source stylesheet) throws MarcException
MarcException
protected void writeStartDocument()
SAXException
protected void writeEndDocument()
SAXException
public void write(Record record)
write
in interface MarcWriter
record
- -
the Record
object
SAXException
public boolean hasIndent()
public void setIndent(boolean indent)
indent
- protected void toXml(Record record) throws SAXException
SAXException
protected char[] getDataElement(String data)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |