This library module provides similarity functions for comparing sets of XML nodes (e.g., sets of XML elements, attributes or atomic values).
These functions are particularly useful for matching near duplicate sets of XML nodes.
The logic contained in this module is not specific to any particular XQuery implementation.
Returns the union between two sets, using the deep-equal() function to compare the XML nodes from the sets.
Example usage :
deep-union ( ( "a", "b", "c") , ( "a", "a",) )
The function invocation in the example above returns :
("a", "b", "c",)
Returns the intersection between two sets, using the deep-equal() function to compare the XML nodes from the sets.
Example usage :
deep-intersect ( ( "a", "b", "c") , ( "a", "a",) )
The function invocation in the example above returns :
("a")
Removes exact duplicates from a set, using the deep-equal() function to compare the XML nodes from the sets.
Example usage :
distinct ( ( "a", "a", ) )
The function invocation in the example above returns :
("a", )
Returns the overlap coefficient between two sets of XML nodes.
The overlap coefficient is defined as the shared information between the input sets (i.e., the size of the intersection) over the size of the smallest input set.
Example usage :
overlap ( ( "a", "b",) , ( "a", "a", "b" ) )
The function invocation in the example above returns :
1.0
Returns the Dice similarity coefficient between two sets of XML nodes.
The Dice coefficient is defined as defined as twice the shared information between the input sets (i.e., the size of the intersection) over the sum of the cardinalities for the input sets.
Example usage :
dice ( ( "a", "b",) , ( "a", "a", "d") )
The function invocation in the example above returns :
0.4
Returns the Jaccard similarity coefficient between two sets of XML nodes.
The Jaccard coefficient is defined as the size of the intersection divided by the size of the union of the input sets.
Example usage :
jaccard ( ( "a", "b",) , ( "a", "a", "d") )
The function invocation in the example above returns :
0.25