http://zorba.io/modules/data-cleaning/normalization

View as XML or JSON.

This library module provides data normalization functions for processing calendar dates, temporal values, currency values, units of measurement, location names and postal addresses. These functions are particularly useful for converting different data representations into cannonical formats.

The logic contained in this module is not specific to any particular XQuery implementation.

Function Summary

normalize-address ($addr as xs:string*) as xs:string*

Uses an address normalization Web service to convert a postal address given as input into a cannonical representation format.

normalize-phone ($addr as xs:string*) as xs:string*

Uses an phone number normalization Web service to convert a phone number given as input into a cannonical representation.

to-date ($sd as xs:string, $format as xs:string?) as xs:string

Converts a given string representation of a date value into a date representation valid according to the corresponding XML Schema type.

to-dateTime ($sd as xs:string, $format as xs:string?) as xs:string

Converts a given string representation of a dateTime value into a dateTime representation valid according to the corresponding XML Schema type.

to-time ($sd as xs:string, $format as xs:string?) as xs:string?

Converts a given string representation of a time value into a time representation valid according to the corresponding XML Schema type.

Functions

normalize-address#1

declare  %an:nondeterministic function normalization:normalize-address($addr as xs:string*) as xs:string*

Uses an address normalization Web service to convert a postal address given as input into a cannonical representation format.

Parameters

addr as xs:string
A sequence of strings encoding an address, where each string in the sequence corresponds to a different component (e.g., street, city, country, etc.) of the address.

Returns

xs:string*
A sequence of strings with the address encoded in a cannonical format, where each string in the sequence corresponds to a different component (e.g., street, city, country, etc.) of the address.

normalize-phone#1

declare  function normalization:normalize-phone($addr as xs:string*) as xs:string*

Uses an phone number normalization Web service to convert a phone number given as input into a cannonical representation.

Parameters

addr as xs:string

Returns

xs:string*
A strings with the phone number encoded in a cannonical format.

Attention : This function is still not implemented.

to-date#2

declare  function normalization:to-date($sd as xs:string, $format as xs:string?) as xs:string

Converts a given string representation of a date value into a date representation valid according to the corresponding XML Schema type.

Parameters

sd as xs:string
The string representation for the date
format as xs:string
An optional parameter denoting the format used to represent the date in the string, according to a sequence of conversion specifications. In the format string, a conversion specification is introduced by '%', usually followed by a single letter or 'O' or 'E' and then a single letter. Any character in the format string that is not part of a conversion specification is interpreted literally, and the string '%%' gives '%'. The supported conversion specifications are as follows:
 '%b' Abbreviated month name in the current locale. '%B' Full month name in the current locale. '%d' Day of the month as decimal number (01-31). '%m' Month as decimal number (01-12). '%x' Date, locale-specific. '%y' Year without century (00-99). '%Y' Year with century. '%C' Century (00-99): the integer part of the year divided by 100. '%D' Locale-specific date format such as '%m/%d/%y'. '%e' Day of the month as decimal number (1-31), with a leading pace for a single-digit number. '%F' Equivalent to %Y-%m-%d (the ISO 8601 date format). '%h' Equivalent to '%b'. 

Returns

xs:string
The date value resulting from the conversion.

to-dateTime#2

declare  function normalization:to-dateTime($sd as xs:string, $format as xs:string?) as xs:string

Converts a given string representation of a dateTime value into a dateTime representation valid according to the corresponding XML Schema type.

Parameters

sd as xs:string
The string representation for the dateTime.
format as xs:string
An optional parameter denoting the format used to represent the dateTime in the string, according to a sequence of conversion specifications. In the format string, a conversion specification is introduced by '%', usually followed by a single letter or 'O' or 'E' and then a single letter. Any character in the format string that is not part of a conversion specification is interpreted literally, and the string '%%' gives '%'. The supported conversion specifications are as follows:

 '%b' Abbreviated month name in the current locale. '%B' Full month name in the current locale. '%c' Date and time, locale-specific. '%C' Century (00-99): the integer part of the year divided by 100. '%d' Day of the month as decimal number (01-31). '%H' Hours as decimal number (00-23). '%I' Hours as decimal number (01-12). '%j' Day of year as decimal number (001-366). '%m' Month as decimal number (01-12). '%M' Minute as decimal number (00-59). '%p' AM/PM indicator in the locale. Used in conjunction with '%I' and *not* with '%H'. '%S' Second as decimal number (00-61), allowing for up to two leap-seconds. '%x' Date, locale-specific. '%X' Time, locale-specific. '%y' Year without century (00-99). '%Y' Year with century. '%z' Offset from Greenwich, so '-0900' is 9 hours west of Greenwich. '%Z' Time zone as a character string. '%D' Locale-specific date format such as '%m/%d/%y': ISO C99 says it should be that exact format. '%e' Day of the month as decimal number (1-31), with a leading pace for a single-digit number. '%F' Equivalent to %Y-%m-%d (the ISO 8601 date format). '%g' The last two digits of the week-based year (see '%V'). '%G' The week-based year (see '%V') as a decimal number. '%h' Equivalent to '%b'. '%k' The 24-hour clock time with single digits preceded by a blank. '%l' The 12-hour clock time with single digits preceded by a blank. '%r' The 12-hour clock time (using the locale's AM or PM). '%R' Equivalent to '%H:%M'. '%T' Equivalent to '%H:%M:%S'. 

Returns

xs:string
The dateTime value resulting from the conversion.

to-time#2

declare  function normalization:to-time($sd as xs:string, $format as xs:string?) as xs:string?

Converts a given string representation of a time value into a time representation valid according to the corresponding XML Schema type.

Parameters

sd as xs:string
The string representation for the time.
format as xs:string
An optional parameter denoting the format used to represent the time in the string, according to a sequence of conversion specifications. In the format string, a conversion specification is introduced by '%', usually followed by a single letter or 'O' or 'E' and then a single letter. Any character in the format string that is not part of a conversion specification is interpreted literally, and the string '%%' gives '%'. The supported conversion specifications are as follows:

 '%H' Hours as decimal number (00-23). '%I' Hours as decimal number (01-12). '%M' Minute as decimal number (00-59). '%p' AM/PM indicator in the locale. Used in conjunction with '%I' and *not* with '%H'. '%S' Second as decimal number (00-61), allowing for up to two leap-seconds. '%X' Time, locale-specific. '%z' Offset from Greenwich, so '-0900' is 9 hours west of Greenwich. '%Z' Time zone as a character string. '%k' The 24-hour clock time with single digits preceded by a blank. '%l' The 12-hour clock time with single digits preceded by a blank. '%r' The 12-hour clock time (using the locale's AM or PM). '%R' Equivalent to '%H:%M'. '%T' Equivalent to '%H:%M:%S'. 

Returns

xs:string?
The time value resulting from the conversion.