http://zorba.io/modules/xml

View as XML or JSON.

This module provides functions for reading XML files from string inputs. It allows reading of well-formed XML documents as well as well-formed external parsed entities, described by XML 1.0 Well-Formed Parsed Entities. The functions can also perform Schema and DTD validation of the input documents.

The following example parses a sequence of XML elements and returns them in a streaming fashion - each at a time:

 import module namespace x = "http://zorba.io/modules/xml";
 import schema namespace opt = "http://zorba.io/modules/xml-options";
 x:parse(
   "<from1>Jani</from1><from2>Jani</from2><from3>Jani</from3>",
   <opt:options>
     <opt:parse-external-parsed-entity/>
   </opt:options>
 )
 

Another useful option allows to skip an arbitrary number of levels before returning a sequence of nodes as shown in the following example:

 import module namespace x = "http://zorba.io/modules/xml";
 import schema namespace opt = "http://zorba.io/modules/xml-options";
 x:parse(
   "<root>
     <from1>Jani1</from1>
     <from2>Jani2</from2>
     <from3>Jani3</from3>
   </root>",
   <opt:options>
     <opt:parse-external-parsed-entity opt:skip-root-nodes="1"/>
   </opt:options>
 )
 

Function Summary

canonicalize ($xml-string as xs:string) as xs:string

A function to canonicalize the given XML string, that is, transform it into Canonical XML as defined by Canonical XML .

canonicalize ($xml-string as xs:string, $options as element(opt:options)) as xs:string

A function to canonicalize the given XML string, that is, transform it into Canonical XML as defined by Canonical XML .

parse ($xml-string as xs:string?, $options as element(opt:options)?) as node()* external

A function to parse XML files and fragments (i.

Functions

canonicalize#1

declare  function x:canonicalize($xml-string as xs:string) as xs:string

A function to canonicalize the given XML string, that is, transform it into Canonical XML as defined by Canonical XML.

Note: This function is not streamable. If a streamable string is used as input for the function it will be materialized.

Note: This function sets the XML_PARSE_NOERROR option when parsing the XML input.

Parameters

xml-string as xs:string
a string representation of a well formed XML to canonicalize. XML fragments are not allowed.

Returns

xs:string
the canonicalized XML string.

canonicalize#2

declare  function x:canonicalize($xml-string as xs:string, $options as element(opt:options)) as xs:string

A function to canonicalize the given XML string, that is, transform it into Canonical XML as defined by Canonical XML.

This version of the function allows specifying certain options to be used when initially parsing the XML string. These are of the same form as the options to x:parse#2(), although the following options are currently ignored for this function:

  • <opt:no-error/>
  • <opt:base-uri/>
  • <opt:schema-validate/>
  • <opt:parse-external-parsed-entity/>

Note: This function is not streamable, if a streamable string is used as input for the function it will be materialized.

Note: This function sets the XML_PARSE_NOERROR option when parsing the XML input.

Parameters

xml-string as xs:string
a string representation of a well formed XML to canonicalize. XML fragments are not allowed.
options as element(opt:options)
an XML containg options for the canonicalize function.

Returns

xs:string
the canonicalized XML string.

parse#2

declare  function x:parse($xml-string as xs:string?, $options as element(opt:options)?) as node()* external

A function to parse XML files and fragments (i.e. external general parsed entities).

The functions takes two arguments: the first one is the string to be parsed and the second argument is an <options/> element that passes a list of options to the parsing function. They are described below. The options element must conform to the xml-options:options element type from the xml-options.xsd schema. Some of these will be passed to the underlying library (LibXml2) and further documentation for them can be found at LibXml2 parser.

The list of available options:
  • <base-uri/> - the element must have a "value" attribute, which will provide the baseURI that will be used as the baseURI for every node returned by this function.
  • <no-error/> - if present, the option will disable fatal error processing. Any failure to parse or validate the input in the requested manner will result in the function returning an empty sequence and no error will raised.
  • <schema-validate/> - if present, it will request that the input string be Schema validated. The element accepts an attribute named "mode" which can have two values: "strict and "lax". Enabling the option will produce a result that is equivalent to processing the input with the option disabled, and then copying the result using the XQuery "validate strict|lax" expression. This option can not be used together with either the <DTD-validate/> or the <parse-external-parsed-entity/> option. Doing so will raise a zerr:ZXQD0003 error.
  • <DTD-validate/> - the option will enable the DTD-based validation. If this option is enabled and the input references a DTD, then the input must be a well-formed and DTD-valid XML document. The <DTD-load/> option must be used for external DTD files to be loaded. If the option is enabled and the input does not reference a DTD then the option is ignored. If the option is disabled, the input is not required to reference a DTD and if it does reference a DTD then the DTD is ignored for validation purposes. This option can not be used together with either the <schema-validate/> or the <parse-external-parsed-entity> option. Doing so will raise a zerr:ZXQD0003 error.
  • <DTD-load/> - if present, it will enable loading of external DTD files.
  • <default-DTD-attributes/> - if present, it will enable the default DTD attributes.
  • <parse-external-parsed-entity/> - if present, it will enable the processing of XML external entities. If the option is enabled, the input must conform to the syntax extParsedEnt (production [78] in XML 1.0, see Well-Formed Parsed Entities). In addition, by default a DOCTYPE declaration is allowed, as described by the [28] doctypedecl production, see Document Type Definition. A parameter is available to forbid the appearance of the DOCTYPE. The result of the function call is a list of nodes corresponding to the top-level components of the content of the external entity: that is, elements, processing instructions, comments, and text nodes. CDATA sections and character references are expanded, and adjacent characters are merged so the result contains no adjacent text nodes. If the option is disabled, the input must be a well-formed XML document conforming to the Document production (production [1] in XML 1.0). This option can not be used together with either the <schema-validate/> or the <DTD-validate/> option. Doing so will raise a zerr:ZXQD0003 error. The <parse-external-parsed-entity/> option has three parameters, given by attributes. The first attribute is "skip-root-nodes" and it can have a non-negative value. Specifying the paramter tells the parser to skip the given number of root nodes and return only their children. E.g. skip-root-nodes="1" is equivalent to parse-xml($xml-string)/node()/node() . skip-root-nodes="2" is equivalent to parse-xml($xml-string)/node()/node()/node() , etc. The second attribute is "skip-top-level-text-nodes" with a boolean value. Specifying "true" will tell the parser to skip top level text nodes, returning only the top level elements, comments, PIs, etc. This parameter works in combination with the "skip-root-nodes" paramter, thus top level text nodes are skipped after "skip-root-nodes" has been applied. The third paramter is "error-on-doctype" and will generate an error if a DOCTYPE declaration appears in the input, which by default is allowed.
  • <substitute-entities/> - if present, it will enable the XML entities substitutions.
  • <remove-redundant-ns/> - if present, the parser will remove redundant namespaces declarations.
  • <no-CDATA/> - if present, the parser will merge CDATA nodes as text nodes.
  • <xinclude-substitutions/> - if present, it will enable the XInclude substitutions.
  • <no-xinclude-nodes/> - if present, the parser will not generate XInclude START/END nodes.

An example that sets the base-uri of the parsed external entities:

   import module namespace x = "http://zorba.io/modules/xml";
   import schema namespace opt = "http://zorba.io/modules/xml-options";
   x:parse("<from1>Jani</from1><from2>Jani</from2><from3>Jani</from3>",
     <opt:options>
       <opt:base-uri opt:value="urn:test"/>
       <opt:parse-external-parsed-entity/>
     </opt:options>
   )
 

Parameters

xml-string as xs:string
The string that holds the XML to be parsed. If empty, the function will return an empty sequence
options as element(opt:options)
The options for the parsing

Returns

node()*
The parsed XML as a document node or a list of nodes, or an empty sequence.