XML: The What, Why, Where and Hows Explained by a Newbie

If you’re looking for an in depth, super technical, ultra descriptive piece about what XML is and what it does then (if I were you) I’d probably keep searching. However, if (like me) you’ve just found out that XML is not just a representation of someone who doesn’t know their alphabet then let me enlighten you on the very basics.

Firstly, XML stands for EXtensible Markup Language. The key word there is ‘Extensible’. This means it is a markup language that can be expanded to or added to by it’s users and still be usable by the application that is displaying it. Okay so I’m jumping ahead of myself. I should explain that XML doesn’t actually do anything. It is just information surrounded by tags. Software must be used if one would like to store, display, send and receive it.

So you’re probably asking “If it doesn’t do anything, then what’s the point of it?” Well let me inform. One of XML’s main features is it is self-descriptive, meaning it can be read by both humans and machines. With many systems today containing data in conflicting formats, large amounts of data needs converting, due to this it is time consuming and some data is often lost. However, because XML stores data in a plain text format it means many new, old and upgrading systems can read the same information with no data lost and it can be converted incredibly quickly.

So what does this magical XML look like? Below is an example of some simple XML code:

XML Example:

<email>

   <date> 26/07/2012 </date>

   <time> 17:07 </time>

   <from> Janet </from>

   <to> James </to>

   <subject> Just saying hi </subject>

   <body> Hi James, had a lovely chat with you today. You must come over soon. </body>

</email>

From the example above, it is very clear to decipher that it is an email addressed to James from Janet on the date of the 26th July 2012 at 5:07pm. The subject and body of the email are also included. As you can see, the code is made up by different parts.

Screen Shot 2016-07-11 at 11.45.30

Like HTML attributes can also be used in XML (see below).

<person gender=”female”>

   <name> Janet Smith </name>

   <age> 43 </age>

</person>

Hopefully, you could decipher that the text above is describing Janet Smith giving information about her gender and age. However, the gender here, has been shown as an attribute. Attributes are designed to contain data related to a specific element. Attribute values must be quoted, by either double or single quotes.

<person gender=”female”>

Unlike HTML, which relies on predefined tags, XML’s tags are solely created by the user. However, these two markup languages are often in partnership. Put simply: XML stores and transports the data, while HTML formats and presents it.

The Rules of XML

Although, like all things there are rules to XML. These are known as the syntax rules. Let’s go through the common 3:

  1. All XML documents must contain a root element which is the parent of all the other elements.

  2. <root>

       <child>

          <subchild>………..</subchild>

       </child>

    </root>

  3. All XML elements must have a closing tag.

    <yes> Am I doing it right </yes>

    <no> Am I doing it right

  4. XML tags are case sensitive. So the tag <email> is different to <Email> and these are both different to <EMAIL>. Opening and closing tags must be the same.

    <yes> Am I doing it right </yes>

    <Yes> Am I doing it right </Yes>

    <no> Am I doing it right </No>

Of course there are many more, but like the above they are all pretty simple. 

However, as mentioned XMLs tags and elements are made up by the user. Due to this it can lack structure and may be hard to find software that translates the code into the format that one wishes. Content models such as DocBook help with this. DocBook is a collection of standards and tools for technical publishing, originally created by software companies as a standard for computer documentation, it can now be used for other kinds of content and has been adapted for many purposes. DocBook provides a number of tags that allow the user to easily publish the documents in any other form of documentation such as PDF and HTML. Other content models include: DITA (Darwin Information Typing Architecture), S100D, XBRL and more. 

All in all, XML may seem a bit complex to understand, but it is being used in many different areas for its simplicity. It chooses brains over beauty, focusing more on the logical information, rather than how it is formatted. Therefore, it is used in various different professions and industries. Including, finance, publishing, medicine, science and many more.

References:

XML In Smart Cities? Really? How?

I can imagine you have heard the term ‘smart city’ thrown around in the media a lot recently, but what on earth does this term mean? Well, from what I can establish there is no universally accepted definition of this term… I know, helpful right? Let’s have a stab at it.

Dr Sam Musa, a faculty member of the Computer Networks and Cybersecurity Department at the University of Maryland suggests a smart city is a city which “integrates multiple technological solutions in a secure fashion to manage the city’s assets – the city’s assets include, but are not limited to, local department information systems, schools, libraries, transportation systems, hospitals, power plants, law enforcement, and other community services” (1). Boy, that’s a lot of data!

I know what you are thinking, what is this guy rambling on about? So let me put it into context… After reading the article “Smart cities will be necessary for our survival” written by Madhumita Venkataramanan, I started thinking, bearing in mind the type of company I work for and the industry it operates in XML, I began asking the question could XML be used in managing data produced by smart cities? (2)

So after a bit of digging, the short answer is: yes, but it’s a bit complicated to explain as you can imagine. So if I haven’t bored you already I would encourage you to jump ship while you can! You have been warned!

Sounds a bit daunting right?… And it can be, but let’s not get tangled up in the details!

To start let’s look at XML at a high level, described perfectly by my colleague Martin Bluck, “XML is a language used to describe the structure of data inside a document”. The key word here is structure. So when we look at big data, structure is critical to ensure data is easily understood, read and managed by not only computers but also humans. XML allows you to achieve this (3,4). This isn’t the only reason Smart Cities and big data should use XML but it’s the most obvious, and to avoid boring you to tears I shall not delve any deeper than that. If you want to find out more check out the origins section of the XML specification from W3C.

How does this relate to Smart Cities then, you may ask? So let’s talk about network data! I’m fairly certain I heard someone’s head hit a desk… don’t fall asleep on me yet. Let’s try something else… In Madhumita’s article she describes a scenario which is actually happening in Singapore where local government has installed over 1,000 sensors with the purpose of monitoring the cleanliness of public spaces. As you can imagine the data collected would be vast and highly complex and in many normal circumstances, indecipherable. However, if the data collected was encoded in XML, a review of that data could quickly identify key structural components of the data and thus identify the data itself.

Let me attempt to give you an example of what this would look like, keeping in mind I am not a technical author of XML:

<Sensor_Location>City-Marina Area</Sensor_Location >
<TimeFrame>
<Date>07/04/2016</Date>
<TimeFrame>14:00-15:00</TimeFrame >
</TimeFrame>
   <Environmental_Data>
     <C02>0.1</C02>
     <Nox>0.2</Nox>
   </Enviromental_Data>
   <Vehicle_Counted>
     <Cars>10</Cars>
     <Light_Good_vehicle>50</Light_Good_vehicle >
   </Vehicle_Counted >
 </Sensor_Location>

This is what the structure could look like. Would you agree, this is fairly easy to understand? You can easily see the structural and substructural  components and most importantly the data. This makes it very easy for someone to analyse this data and also ensure complex data can be processed by machines quickly.

This is merely a simple example, but there are many others from my experience who operate in a variety of sectors and in time I imagine more organisations will see the benefit of this format. In this area they will see the value, both in terms of time and cost saving in processing, managing and reviewing big data by using XML.

I would love to hear your thoughts, comments and criticisms for this article in the comments below!

Find out more about how DeltaXML could help you! Click Here

References

(1) Dr Sam Musa (January 2016) Smart City Road Map. University of Maryland. http://www.academia.edu/21181336/Smart_City_Roadmap

(2) Madhumita Venkataramanan (11 January 16) Smart cities will be necessary for our survival. Wired. http://www.wired.co.uk/news/archive/2016-01/11/smart-city-planning-permission

(3) Sarah O’Keefe (11 January 2016) Top eight signs it’s time to move to XML. Scriptorium. http://www.scriptorium.com/2016/01/top-eight-signs-its-time-to-move-to-xml/

(4) W3C (26 November 2008) Extensible Markup Language (XML) 1.0 (Fifth Edition). W3C.https://www.w3.org/TR/REC-xml/#sec-origin-goals

 

An XSLT linear feedback shift register

The Linear Feedback Shift Register (LFSR) provides a simple yet very versatile solution for the generation of pseudo-random sequences of bits.

This concept, can be efficiently emulated using XSLT 2.0 or 3.0, the sample show here uses XSLT 3.0 because it is quite neat to use a closure to maintain the state of the function that manipulates the shift-register.

The XSLT snippet below shows the low-level emulation of the shift-register. This takes a boolean sequence as an input argument which is then shifted right a single ‘bit’, the result of an XOR operation on the output bit (right-most bit) and bits at 2 tap points is then fed back to the left-most bit of the ‘register’.

<xsl:function name="fn:shift-sequence" as="xs:boolean*">
    <xsl:param name="sequence" as="xs:boolean+"/>
    
    <xsl:variable name="output-bit" as="xs:boolean" select="$sequence[last()]"/>
    <xsl:variable name="bit1" as="xs:boolean" select="$sequence[$tap-point-1]"/>
    <xsl:variable name="bit2" as="xs:boolean" select="$sequence[$tap-point-2]"/>
    <xsl:variable name="new-bit" as="xs:boolean" 
      select="fn:xor($bit2, fn:xor($output-bit, $bit1))"/>
    
    <xsl:sequence select="
      ($new-bit,
      subsequence($sequence, 1, count($sequence) - 1)
      )"/>
    
  </xsl:function>
  
  <xsl:function name="fn:xor" as="xs:boolean">
    <xsl:param name="operand1"/>
    <xsl:param name="operand2"/>
    <xsl:sequence select="($operand1 and not($operand2)) or (not($operand1) and $operand2)"/>
  </xsl:function>

Now that we have this low-level function, we need a way to maintain the state of the shift-register for each subsequent bit in the generated sequence. The approach I’ve taken is to use the following function that takes a bit sequence seed as an argument and returns an updated version of itself, along with the current output bit:

<!-- 
    creates a function that will generate:
    1. a new generator-function with modified sequence as seed
    2. the last bit of the modfied sequence as the output bit
  -->
  <xsl:function name="fn:generator-function" as="function(*)">
    <xsl:param name="seed" as="xs:boolean*"/>
    <xsl:variable name="new-sequence" select="fn:shift-sequence($seed)"/>
    <xsl:sequence select="function() as item()* {
      (
      fn:generator-function($new-sequence),
      $new-sequence[last()])
      }"/>
  </xsl:function>

We can now use this function in any number of ways, but this example shows how xsl:iterate can be used to generate a specific number of bits:

<!-- 
   generate a pseudo-random bit sequence by using an iterator that
   takes a 'generator-function' as its input 
 -->
  <xsl:function name="fn:generate-bit-sequence" as="xs:boolean*">
    <xsl:param name="generator-function" as="function(*)"/>
    <xsl:param name="count" as="xs:integer"/>     
    
    <xsl:iterate select="1 to $count">
      <xsl:param name="bit-generator" as="function(*)+" select="$generator-function"/>
      <xsl:variable name="generator-result" as="item()*" select="$bit-generator()"/>
      <xsl:sequence select="$generator-result[$BIT_INDEX]"/>
      <xsl:next-iteration>
        <xsl:with-param name="bit-generator" select="$generator-result[$FN_INDEX]"/>
      </xsl:next-iteration>    
    </xsl:iterate>
    
  </xsl:function>

So, that is really all the XSLT we need to generate a pseudo-random sequence of bits. I have then just added some extra functionality to make the result more readable by showing bit-sequences a 8-bit words with ‘1’ and ‘0’ characters. The final complete test XSLT can be run against itself:

Input seed: 1011101101010111

Output sequence:

11010101 10111011 00101000 00100001 11011000 11100101 00001010 10111011 00110101 00100001
10001011 11100100 10110011 10111111 00011010 00111101 01000110 10110010 11010001 00011100
00110111 01010100 10000100 10101111 10011111 01001110 11011100 11101000 00010110 10011000
01100001 11001001 00100101 01111111 11111010 01111111 11100111 01111111 10110100 01111110
00001101 01111010 00100010 01100110 11101111 00110000 10001101 10010011 10100000 11111010

THE complete XSLT for generating this sequence is shown below, the XSLT processor used here was Saxon-PE 9.7.0.3 from Saxonica:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:fn="urn.internal.lrs.functions"
  exclude-result-prefixes="xs fn"
  expand-text="yes"
  version="3.0">
  
  <xsl:output indent="yes"/>
  
  <xsl:variable name="seed-bits" as="xs:boolean*" 
    select="fn:string-to-boolean-sequence('1011101101010111')"/>
  <xsl:variable name="tap-point-1" as="xs:integer" select="15"/>
  <xsl:variable name="tap-point-2" as="xs:integer" select="14"/>
  <xsl:variable name="bits-required" select="8 * 50"/>
  
  <xsl:variable name="FN_INDEX" as="xs:integer" select="1"/>
  <xsl:variable name="BIT_INDEX" as="xs:integer" select="2"/>
  
  <xsl:variable name="initial-generator-function" as="function(*)"
    select="fn:generator-function($seed-bits)"/>
      
 
  <!-- shift bit-sequence right and xor output-bit with tap-points
       to produce new left-most bit in the 'register'
  --> 
  <xsl:function name="fn:shift-sequence" as="xs:boolean*">
    <xsl:param name="sequence" as="xs:boolean+"/>
    
    <xsl:variable name="output-bit" as="xs:boolean" select="$sequence[last()]"/>
    <xsl:variable name="bit1" as="xs:boolean" select="$sequence[$tap-point-1]"/>
    <xsl:variable name="bit2" as="xs:boolean" select="$sequence[$tap-point-2]"/>
    <xsl:variable name="new-bit" as="xs:boolean" 
      select="fn:xor($bit2, fn:xor($output-bit, $bit1))"/>
    
    <xsl:sequence select="
      ($new-bit,
      subsequence($sequence, 1, count($sequence) - 1)
      )"/>
    
  </xsl:function>
  
  <xsl:function name="fn:xor" as="xs:boolean">
    <xsl:param name="operand1"/>
    <xsl:param name="operand2"/>
    <xsl:sequence select="($operand1 and not($operand2)) or (not($operand1) and $operand2)"/>
  </xsl:function>
  
  <!-- 
    creates a function that will generate:
    1. a new generator-function with modified sequence as seed
    2. the last bit of the modfied sequence as the output bit
  -->
  <xsl:function name="fn:generator-function" as="function(*)">
    <xsl:param name="seed" as="xs:boolean*"/>
    <xsl:variable name="new-sequence" select="fn:shift-sequence($seed)"/>
    <xsl:sequence select="function() as item()* {
      (
      fn:generator-function($new-sequence),
      $new-sequence[last()])
      }"/>
  </xsl:function>
  
  <xsl:template match="/">   
    <root>
      <xsl:variable name="result-bits" as="xs:boolean*" 
        select="fn:generate-bit-sequence($initial-generator-function, $bits-required)"/>
      <xsl:sequence select="fn:boolean-sequence-to-binary-string($result-bits)"/>        
    </root>
  </xsl:template>
    
 <!-- 
   generate a pseudo-random bit sequence by using an iterator that
   takes a 'generator-function' as its input 
 -->
  <xsl:function name="fn:generate-bit-sequence" as="xs:boolean*">
    <xsl:param name="generator-function" as="function(*)"/>
    <xsl:param name="count" as="xs:integer"/>     
    
    <xsl:iterate select="1 to $count">
      <xsl:param name="bit-generator" as="function(*)+" select="$generator-function"/>
      <xsl:variable name="generator-result" as="item()*" select="$bit-generator()"/>
      <xsl:sequence select="$generator-result[$BIT_INDEX]"/>
      <xsl:next-iteration>
        <xsl:with-param name="bit-generator" select="$generator-result[$FN_INDEX]"/>
      </xsl:next-iteration>    
    </xsl:iterate>
    
  </xsl:function>
  
  <!-- 
            convenience functions for handling bit-sequences 
            as strings of '1' and '0' chars 
  -->
  
  <!-- renders result: e.g. '1011011 1011111 1111011 1010011 -->
  <xsl:function name="fn:boolean-sequence-to-binary-string" as="xs:string">
    <xsl:param name="sequence" as="xs:boolean*"/>
    <xsl:variable name="count" as="xs:integer" select="count($sequence)"/>
    <xsl:variable name="displayed-register-length" as="xs:integer" select="8"/>
    <xsl:sequence select="
      string-join(('
',
      for $x in 1 to $count return 
      (if ($sequence[$x]) then '1' else '0',
      if ($x mod 80 eq 0) then '
' else if ($x mod $displayed-register-length eq 0) then ' ' else ''
      )
      ),'')"/>        
  </xsl:function>
  
  <!-- 
    complements fn:boolean-sequence-to-binary-string,
    e.g. converts '101' to (true(), false(), true())
  -->
  <xsl:function name="fn:string-to-boolean-sequence" as="xs:boolean*">
    <xsl:param name="text" as="xs:string"/>
    <xsl:variable name="codepoint-for-one" as="xs:integer" select="string-to-codepoints('1')"/>
    <xsl:variable name="codepoints" as="xs:integer*" select="string-to-codepoints($text)"/>
    <xsl:variable name="count-codepoints" as="xs:integer" select="count($codepoints)"/>
    <xsl:sequence select="for $x in $codepoints return $x eq $codepoint-for-one"/>
  </xsl:function>
  
</xsl:stylesheet>

Conclusion

Approaches for random number generation include the fn:random-number-generator function in XPath 3.1. Dimitre Novatchev’s FXSL also has functions for random number generation.

The main motivation when I started this was to explore higher order functions in XSLT 3.0, but the approach I’ve outlined here for generating pseudo-random bit sequences with XSLT will hopefully also be of interest for certain applications.

Launch of XMLFlow – softening the scream?

Judging not only from my own experience but from many social media posts such as Tina Henderson’s tweet shown below, the review and release process for the creation of content for articles or books can be a painful experience:

Pardon me while I go screaming to the next room

Pardon me while I go screaming to the next room

The version control approach

To solve the document review and maintenance problem, many software developers are now looking at how we can use modern resources such as distributed version control systems, like Git or Mercurial to improve things for writers. Many of these efforts come from developers who have gone on to become authors covering specialist content in the area they work.

Because code is managed as well-formatted lines of text, the line-by-line comparison methods used for managing different versions can be reasonably effective for version control, so the thinking is – why can’t it work for narrative content also?

This is probably great for other developers who know all about feature branches and merges and understand repos and clones, and choose a line-based format such as one of the markdown variants, but (as a former writer turned coder) I’m not convinced it is the way forward for writers.

As an example, here’s a link to a writing, the future, take 2 by @bbirdman promoting one approach in response to Scott Chacon’s (from GitHub) Living the Future of Technical Writing.

Writers in general won’t see content creation as something akin to code writing, they, or their publishers, are also likely to want structure and semantics that do not easily break down into lines of text. Moreover, they know that most errors arising from edits have to be spotted by eye, they can not be picked up by a test-suite, a style-checker or a code compiler.

Writing with DITA

This XML based document markup format, is a total contrast to the markdown-variant approach for authoring linked to earlier, but it is becoming increasingly popular in many publishing environments – its main emphasis is on providing structure along with flexible capabilities for the reuse of content.

XMLFlow – a ‘living version’ approach

[Update: XMLFlow was demonstrated at the XMLLondon 2015 conference – see video]

xmf-intro

XMLFlow is a web app I’m developing (and first released today) that exploits DeltaXML’s DITA Merge product running on a host web server. It can merge multiple DITA documents that have been modified ‘offline’ back into a single valid DITA document ready for futher processing or publishing.

This kind of document merge is frequently difficult for the person tasked with resolving the changes back into a single document. This is because of the high probability of conflicting changes occuring when the document has been modified in a concurrent way (not necessarily the most effective way to work – but inevitable in many situations).

By pulling all changes into a single simplified document view (not full WYSYWIG), and showing dynamically where changes come from and how they can be characterised, XMLFlow aims to greatly improve the efficiency and quality of this process. Changes are either accepted, rejected or deferred in a ‘Working Merge’ that can be stored for sharing or working on later.

Once all changes have been resolved they are finalized back into a ‘pure’ DITA document – with the final version of the working merge containing a history of all review decisions – to meet the needs of any formal document review process.

Next Steps

The direction this app takes will depend on feedback or discussion, so please post any comments here. XMLFlow mainly serves as a study on how one important and critical task in the document review process can be improved, but in a way that fits in with existing practices.

XMLFlow is predominantly a demonstration of one of many possible front-end solutions for exploiting what DeltaXML’s DITA Merge product can do to assist in the DITA writer’s workflow. But to be effecitve in this role, I’ve tried to include all the basic functionality to make it usable in its own right. I’ve concentrated on DITA, because that’s what DITA Merge specialises in – however support for more document formats is possible in future.

This first ‘cloud-aware’ release of XMLFlow can be tried out on the new XMLFlow link.

Three into Two: Three way merge with a two way result

Introduction

Since the introduction of our new Merge product (4.0 in February this year) we have seen a number of customers and potential users ask about user interfaces and in particular options for reusing existing interfaces based around two way change such as accept/reject systems.

We have also seen some customers who have been trying to apply merge to the branch merge problem as typically found in a software version control system. This blog post is written for these users, software developers familiar with systems such as subversion, git or mercurial who are looking at how DeltaXML Merge will work in a similar setting.

Software version control systems are often based around ‘diff3’, a line-based three way system that takes three inputs (two branches and their common ancestor). In some cases the merge results are often presented as a side-by-side representation, left side being my branch or the current branch. The right hand side is often the other branch, their branch or the remote branch. However, merge systems in other cases have extra content display panels, perhaps to show the ancestor content or the in-progress result.

Our three-to-two processing system is orientated towards this branch merge use-case. We will ‘rotate’ the result perspective slightly to show changes in the context of two branches. So if content exists in the other branch and not mine it has been added and vice versa. In some cases (such as a three way conflict) we will lose change information.

Merge direction and Symmetry

In the paragraph above we talked about ‘rotating’ results.  This is best illustrated with a diagram and a small example.

merge direction

Merge direction examples

In this diagram above M is the common ancestor.  Concurrent merge allows us to do an n-way merge.  This is illustrated by the red arrow.  We can take P, Q, R and S inputs can combine them together in a symmetrical manner – none of them is treated any differently to the others.

For our example let’s consider a simple sentence of text that exists in M and which is edited in P1 and Q1:

M: The quick brown fox jumps over the lazy dog

P1: The very quick brown fox jumps over the lazy dog

Q1: The extremely quick brown fox jumps over the lazy dog

From the perspective of M, what’s happened is that there are two additional words here and our result will look like this:

The very extremely quick brown fox jumps over the lazy dog.

We see two additions and I’ve used different green styles to show that these were from different inputs.  In an editor we may have tool tips to identify who did the adding,  In accept/reject systems these words may have different authors.

Now let’s consider this from a different perspective.  I am on branch Q.  I’ve added the word extremely in the Q1 edit and I want to merge in what’s been happening on the P branch.  I can do this with a three way merge operation that is depicted by the blue arrow in the diagram above.

From my perspective on Q the P1 version does not have the word extremely in the text.  How can this be represented in a two way form?  Most accept/reject style interfaces work in terms of addition and deletion (either of which you can accept or reject).  If extremely doesn’t exist in the other (P) branch then it should be a delete rather than an add after the merge.  Similarly, the word very didn’t exist in my Q branch and so it has been added:

The very extremely quick brown fox jumps over the lazy dog.

When using accept/reject interfaces I can accept both changes and I get a result equivalent to one of the branches.  Likewise, if I reject both I get the other branch.

The yellow arrow in the diagram illustrates the merge in the other direction – merging Q onto P.  What was an add is now a delete and vice-versa.  With our APIs you can have both of these scenarios by swapping the order of the versions in the merge() methods or the order of addVersion() method calls.

This is a simple example of the distinction between the symmetrical three way merge and how it differs from the branch merge case where we can use a two-way representation of the result.

Use cases

As users of software version control systems we did find the merge process frustrating on occasion.  We are presented with conflicts, but in order to understand the conflicts it would help if we could understand some of the other changes that are being merged.  The diff3 algorithm automatically processes or accepts the non-conflicting changes and leaves just the conflicts for the user to deal with.  Our underlying n-way representation and the rule processing system we have developed allows us to do better and we’ve worked on three use cases which we believe will be useful/relevant.

Case 1: Conflicting Changes

This use case is modelled on software version control and diff3. Simple non-conflicting changes, such as element adds and deletes and text modification are automatically processed and the user only has to deal with the conflicts.

Case 2: All Changes

This differs from case 1 above by removing the automatic processing of non-conflicting changes.  We will show all of the changes including the simple ones.

Case 3: Their Changes

This use case is based on the assumption that the user doing a merge on the current branch remembers and understands what has happened on that branch but is interested in any conflicts and also any changes on the other, remote or ‘their’ branch.

A worked example of these use cases is included in the merge documentation.  Please take a look at: Three To Two Merge Use Cases

Nested Change

One thing not seen in a line-based software version control merge is the concept of nested change. Suppose that in a document one user changes a word in a paragraph while another user deletes the entire paragraph. Visually this would look like this (this is how our n-way merge and oXygen plugin represents this case):

Track change and accept/reject systems cannot represent this particular case. Indeed our two-way deltaV2 representation cannot either. It’s a consequence of both hierarchical change and three or more inputs.

If our goal is to produce a two-way result for this case we need to make some compromises and lose some granularity. In this case the change has happened at the paragraph deletion point. We have two paragraphs that differ by one word suddenly. From the perspective of the branch we can use either of these two paragraphs when representing the change, but as the example below demonstrates the user does not see the change to  word suddenly.

example using accept/reject

Nested change after two way processing and accept/reject conversion

The Two-Way Argument

We’ve been asked more than once: Why not simply do a two-way comparison between branches?

There are two answers to this question:

  1. Firstly using the ancestor information provides better alignment of the information. It’s a common frame of reference to the changes that have been made in both branches and which provides a better overall alignment.
  2. This leads onto the second answer which is that without an ancestor it is not possible to identify conflict.  With a two way comparison we only know that one branch has made a change and that it is different to the change in the other branch. For example, if we have a word that appears in P1 and not in Q1 we do not know if P1 has added it or if Q1 has deleted it.   We have no way of knowing that two changes overlap in the case of nested change.

Both of these arguments apply equally to diff3 used in software version control by the way!

Summary

We can generate a good two way representation of a three-way merge. But it’s not a perfect solution, information is lost in some cases.  However, it presents the results in terms of the very familiar two-way metaphor.  We’ve demonstrated interfaces for nested change, but these require a new type of interface to manage change and conflicts;  the two-way results should provide easier adoption through the use of familiar interfaces for this common branch merge process.

Ignoring Whitespace When Comparing XQuery

In this, my third post on XQuery code comparison, I look at the issue of ignoring whitespace changes where they are not significant (see previous posts: Comparing XQuery with DeltaXML Core and Adding Structure to an XQuery Comparison).

Here’s the ‘A’ version of the XQuery:

input1

Now for the ‘B’ version of the code – with some extra whitespace added – most of which is not significant, you might also notice that the local:summary-full() and local-summary:short() are swapped over:

input2

Lets now compare these files using the same DXP pipeline as developed over my previous 2 blog posts on this, the pipeline converts the XQuery to XML token elements and then adds wrapper elements and keys for the functions – which are also marked as non-ordered:

problems

This result (shown above) is fine, except for a couple of whitespace problems which are highlighted. This extra whitespace is a distraction and causes extra effort when performing a code merge, fortunately, DeltaXML Core comes with ‘Ignore Changes’ output XSLT filters that we can added to the pipeline, all that I need to do to insert a further XSLT filter ahead of these, to mark the changes that can be ignored.

Here’s the ‘mark-ignore-changes.xsl’ output XSLT filter:

<?xml version="1.0" encoding="utf-8"?>
<!-- Copyright (c) 2005-2010 DeltaXML Ltd. All rights reserved -->
<!-- $Id$ -->

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                
xmlns:dxa="http://www.deltaxml.com/ns/non-namespaced-attribute"
                
xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1" 
                
xmlns="http://www.w3.org/1999/xhtml"
                
xpath-default-namespace="http://www.w3.org/1999/xhtml">
  
  
<xsl:template match="@* | node()">
    
<xsl:copy>
      
<xsl:apply-templates select="@* | node()"/>
    
</xsl:copy>
  
</xsl:template>
  
  
<!-- Mark whitespace tokens found in XQuery expressions --> 
  
<xsl:template match="span[@class eq 'whitespace'][@deltaxml:deltaV2]">
    
<xsl:copy>
      
<xsl:attribute name="deltaxml:ignore-changes" select="'true'"/>
      
<xsl:apply-templates select="@* | node()"/>
    
</xsl:copy>        
  
</xsl:template>
  
  
<!-- Mark whitespace-only text-nodes found in XQuery element constructors --> 
  
<xsl:template match="span[@class eq 'txt'][@deltaxml:deltaV2]">
    
<xsl:choose>
      
<xsl:when test="string-length(normalize-space(.)) eq 0">
        
<xsl:copy>
          
<xsl:attribute name="deltaxml:ignore-changes" select="'true'"/>
          
<xsl:apply-templates select="@* | node()"/>
        
</xsl:copy>
      
</xsl:when>
      
<xsl:otherwise>
        
<xsl:copy-of select="."/>
      
</xsl:otherwise>
    
</xsl:choose>
  
</xsl:template>
  
</xsl:stylesheet>

The above filter is an ‘identity transform’ with two added templates designed to match changes to the two types of whitespace changes that we wish to ignore, the tokens of interest (span elements) have ‘txt’ and ‘whitespace’ class attributes, a further check is required for ‘txt’ tokens to enusre only whitespace-only tokens of this type are marked. Now the filter has been created we need to add this to the DXP pipeline along with the built-in ‘ignore changes’ filters – as shown below:

<!DOCTYPE comparatorPipeline SYSTEM "../dxp/dxp.dtd">
<!-- $Id$ -->
<comparatorPipeline description="compare xquery" id="xquery">
  
  
<inputFilters>
    
<filter>
      
<file path="xquery2xml.xsl" relBase="dxp"/>
    
</filter>
    
<filter>
      
<file path="key-xquery.xsl" relBase="dxp"/>
    
</filter>   
  
</inputFilters>
  
  
<outputFilters>
    
<!-- The following filter is where the change to be ignored is marked -->
    
<filter>
      
<file path="mark-ignore-changes.xsl" relBase="dxp"/>
    
</filter>
    
    
<!-- The following two filters are included as part of the release, and
         are general purpose. They update the delta based on the marks added
         by the previous filter.
-->
    
<filter>
      
<resource name="/xsl/apply-ignore-changes.xsl"/>
    
</filter>
    
<filter>
      
<resource name="/xsl/propagate-ignore-changes.xsl"/>
    
</filter>
    
<filter>
      
<file path="xquery-tokens2html.xsl" relBase="dxp"></file>      
    
</filter>
  
</outputFilters>
  
  
<outputProperties>
    
<property name="indent" literalValue="no"/>
  
</outputProperties>
  
  
<comparatorFeatures>
    
<feature name="http://deltaxml.com/api/feature/isFullDelta" literalValue="true"/>
    
<feature name="http://deltaxml.com/api/feature/enhancedMatch1" literalValue="true"/>
  
</comparatorFeatures>
  
</comparatorPipeline>

With the DXP pipeline now modified as above to ignore whitespace changes, this is the result of running DeltaXMLCore:

desired-result

The result (above) is what we wanted, the whitespace added in the ‘B’ version of XQuery code is in the result, but is not marked as a change.

Conclusion

It has proved relatively simple to refine the XQuery code comparison pipeline I built previously so that certain whitespace changes are ignored. This is one of the great strengths of using a transform pipeline – the capabilities of the comparison can gradually be improved as new requirements for our comparison arise, and we can also easily exploit filters that come bundled with DeltaXML Core. The main motive for this exercise was really to investigate how non-XML could be converted to XML within a Core pipeline to allow a comparison, but in the process we’ve already built a code comparison solution that I’ve found to be considerably more robust than many off-the-shelf equivalents.

Before finishing this blog series, there’s just one more fix I’d like to add: currently a function that is moved is compared correctly, but, because it’s treated as orderless, the new position of the function is not shown in the result. There’s a ‘HandleMoves’ filter included with Core that I will probably be using for this, but I’ll save this work for another day.

Adding Structure to an XQuery Comparison

In my first post on Comparing XQuery with DeltaXML Core, we used a flat sequence of span elements, where each element represented an XQuery language token, with a class attribute showing the token type. Whilst this proved simple and effective, in this post I take things a stage further to allow a more accurate/robust comparison of the XQuery by using an additional XSLT template to the input XSLT filter to add some structure to this set of tokens.

The original XQuery source (input ‘A’) used for this exercise is shown below, each recognized token-type is represented by a different foreground color, note that there are also many ‘whitespace’ tokens interspersed with the visible tokens.

code-wrapped

In the code above I’ve used rectangles to highlight the structure that we are going to add so that they enclose each set of tokens to be wrapped. Specifically, the language structures that we’re looking to wrap are:

  • function declarations
  • element constructors with separate start and end tags
  • self-closed element constructor tags

The new ‘wrap-spans’ XSLT template is added to the existing XSLT filter, xquery2xml.xsl and then called from the entry-point template, as shown below:

<xsl:template match="/">
  
<xsl:variable name="text-file-uri" select="f:path-to-uri(normalize-space(text-file))"/>
  
<xsl:message>xquery2xml transform on: <xsl:value-of select="$text-file-uri"/></xsl:message>
  
<xsl:variable name="file-content" as="xs:string" select="unparsed-text($text-file-uri)"/>
  
<xsl:variable name="tokens" as="element()*" select="xqf:show-xquery($file-content)"/>
  
  
<pre>
    
<xsl:call-template name="wrap-spans">
      
<xsl:with-param name="spans" select="$tokens"/>
      
<xsl:with-param name="index" select="1"/>
    
</xsl:call-template>
  
</pre>
</xsl:template>

The required output XML for this template is the same as the input XML but with added wrapper spans as shown below (there are a large number of tokens, so to save space, I’ve only shown the tokens for the first function and haven’t shown the input XML). Each wrapper span has a class attribute with value ‘wrapper, but there’s also a data-id attribute with a value that starts with ‘ey’ for an element wrapper or ‘fy’ for a function declaration wrapper. The number appended to this value is the position of the start token, I’ve added this and ‘pos’ attributes for other tokens just for diagnostics purposes (these are removed later).

<?xml version="1.0" encoding="UTF-8"?>
<pre xmlns="http://www.w3.org/1999/xhtml">
  
<span pos="1" class="open"/>
  
<span class="wrapper" data-open-id="fy2">
    
<span class="prolog">declare function</span>
    
<span pos="3" class="whitespace"> </span>
    
<span pos="4" class="function">local:summary-full</span>
    
<span pos="5" class="parenthesis">(</span>
    
<span pos="6" class="variable">$emps</span>
    
<span pos="7" class="whitespace"> </span>
    
<span pos="8" class="op">as</span>
    
<span pos="9" class="whitespace"> </span>
    
<span pos="10" class="node-type">element</span>
    
<span pos="11" class="parenthesis">(</span>
    
<span pos="12" class="qname">employee</span>
    
<span pos="13" class="parenthesis">)</span>
    
<span pos="14" class="quantifier">*</span>
    
<span pos="15" class="parenthesis">)</span>
    
<span pos="16" class="whitespace">
    
</span>
    
<span pos="17" class="op">as</span>
    
<span pos="18" class="whitespace"> </span>
    
<span pos="19" class="node-type">element</span>
    
<span pos="20" class="parenthesis">(</span>
    
<span pos="21" class="qname">dept</span>
    
<span pos="22" class="parenthesis">)</span>
    
<span pos="23" class="quantifier">*</span>
    
<span pos="24" class="whitespace">
    
</span>
    
<span pos="25" class="op">{</span>
    
<span pos="26" class="whitespace">
    
</span>
    
<span pos="27" class="higher">for</span>
    
<span pos="28" class="whitespace"> </span>
    
<span pos="29" class="variable">$d</span>
    
<span pos="30" class="whitespace"> </span>
    
<span pos="31" class="op">in</span>
    
<span pos="32" class="whitespace"> </span>
    
<span pos="33" class="function">fn:distinct-values</span>
    
<span pos="34" class="parenthesis">(</span>
    
<span pos="35" class="variable">$emps</span>
    
<span pos="36" class="step">/</span>
    
<span pos="37" class="qname">deptno</span>
    
<span pos="38" class="parenthesis">)</span>
    
<span pos="39" class="whitespace">
    
</span>
    
<span pos="40" class="higher">let</span>
    
<span pos="41" class="whitespace"> </span>
    
<span pos="42" class="variable">$e</span>
    
<span pos="43" class="whitespace"> </span>
    
<span pos="44" class="op">:=</span>
    
<span pos="45" class="whitespace"> </span>
    
<span pos="46" class="variable">$emps</span>
    
<span pos="47" class="filter">[</span>
    
<span pos="48" class="qname">deptno</span>
    
<span pos="49" class="whitespace"> </span>
    
<span pos="50" class="op">=</span>
    
<span pos="51" class="whitespace"> </span>
    
<span pos="52" class="variable">$d</span>
    
<span pos="53" class="filter">]</span>
    
<span pos="54" class="whitespace">
    
</span>
    
<span pos="55" class="op">return</span>
    
<span pos="56" class="whitespace">
    
</span>
    
<span class="wrapper" data-open-id="ey57">
      
<span class="es">&lt;</span>
      
<span pos="58" class="en">dept</span>
      
<span pos="59" class="z">&gt;</span>
      
<span pos="60" class="txt">
      
</span>
      
<span class="wrapper" data-open-id="ey61">
        
<span class="es">&lt;</span>
        
<span pos="62" class="en">full</span>
        
<span data-close-id="sz63" class="z">/&gt;</span>
      
</span>
      
<span pos="64" class="txt">
      
</span>
      
<span class="wrapper" data-open-id="ey65">
        
<span class="es">&lt;</span>
        
<span pos="66" class="en">deptno</span>
        
<span pos="67" class="z">&gt;</span>
        
<span pos="68" class="op">{</span>
        
<span pos="69" class="variable">$d</span>
        
<span pos="70" class="op">}</span>
        
<span pos="71" class="sc">&lt;/</span>
        
<span pos="72" class="cl">deptno</span>
        
<span data-close-id="ez73" class="z">&gt;</span>
      
</span>
      
<span pos="74" class="txt">
      
</span>
      
<span class="wrapper" data-open-id="ey75">
        
<span class="es">&lt;</span>
        
<span pos="76" class="en">headcount</span>
        
<span pos="77" class="z">&gt;</span>
        
<span pos="78" class="txt"> </span>
        
<span pos="79" class="op">{</span>
        
<span pos="80" class="function">fn:count</span>
        
<span pos="81" class="parenthesis">(</span>
        
<span pos="82" class="variable">$e</span>
        
<span pos="83" class="parenthesis">)</span>
        
<span pos="84" class="op">}</span>
        
<span pos="85" class="txt"> </span>
        
<span pos="86" class="sc">&lt;/</span>
        
<span pos="87" class="cl">headcount</span>
        
<span data-close-id="ez88" class="z">&gt;</span>
      
</span>
      
<span pos="89" class="txt">
      
</span>
      
<span class="wrapper" data-open-id="ey90">
        
<span class="es">&lt;</span>
        
<span pos="91" class="en">payroll</span>
        
<span pos="92" class="z">&gt;</span>
        
<span pos="93" class="txt"> </span>
        
<span pos="94" class="op">{</span>
        
<span pos="95" class="function">fn:sum</span>
        
<span pos="96" class="parenthesis">(</span>
        
<span pos="97" class="variable">$e</span>
        
<span pos="98" class="step">/</span>
        
<span pos="99" class="qname">salary</span>
        
<span pos="100" class="parenthesis">)</span>
        
<span pos="101" class="op">}</span>
        
<span pos="102" class="txt"> </span>
        
<span pos="103" class="sc">&lt;/</span>
        
<span pos="104" class="cl">payroll</span>
        
<span data-close-id="ez105" class="z">&gt;</span>
      
</span>
      
<span pos="106" class="txt">
      
</span>
      
<span pos="107" class="sc">&lt;/</span>
      
<span pos="108" class="cl">dept</span>
      
<span data-close-id="ez109" class="z">&gt;</span>
    
</span>
    
<span pos="110" class="open"/>
    
<span pos="111" class="whitespace">
    
</span>
    
<span pos="112" class="op">}</span>
    
<span data-close-id="fz113" class="op">;</span>
  
</span>
  
<span pos="114" class="whitespace">
    
  
</span>
</pre>

In the XML above we can see that the start of each function declaration is marked by a ‘prolog’ token with the value ‘declare function’ (in practice, the whitespace separator may vary), the start of an element constructor is maked by an ‘es’ or ‘esx’ token (note that the token types used by the XMLSpectrum tokenizer do not correspond directly to the language specification for XQuery). The ‘wrap-tokens’ template uses this information to set a ‘wrap-open’ variable that detects the start of a wrapped sequence of tokens, the XSLT excerpt for this is shown below:

<xsl:variable name="is-fn-declaration" as="xs:boolean"
              
select="$span/@class eq 'prolog' and $prolog-tokens[1] eq 'declare' and $prolog-tokens[2] eq 'function'"/>

<xsl:variable name="wrap-open" as="xs:string" 
              
select="if ($span/@class = ('es','esx'))
                      
then 'ey'                          (: ey - element start :)
                      
else if ($is-fn-declaration)
                      
then 'fy'                          (: fy - function declaration start :)
                      
else ''"/>

A ‘wrap-close’ variable is used to detect the close of a wrapped token sequence in a similar manner to ‘wrap-open’, the full code for the template is shown below:

<xsl:template name="wrap-spans">
  
<xsl:param name="spans" as="node()*"/>
  
<xsl:param name="index" as="xs:integer"/>
  
  
<xsl:variable name="span" as="node()?" select="$spans[$index]"/>
  
<xsl:variable name="prev-span" as="node()?" select="$spans[$index - 1]"/>
  
<xsl:variable name="next-span" as="node()?" select="$spans[$index + 1]"/>               
  
  
<xsl:variable name="prolog-tokens" as="xs:string*" select="tokenize($span, '\s+')"/>
  
  
<xsl:variable name="is-fn-declaration" as="xs:boolean"
                
select="$span/@class eq 'prolog' and $prolog-tokens[1] eq 'declare' and $prolog-tokens[2] eq 'function'"/>
  
  
<xsl:variable name="wrap-open" as="xs:string" 
                
select="if ($span/@class = ('es','esx'))
                        
then 'ey'                          (: ey - element start :)
                        
else if ($is-fn-declaration)
                        
then 'fy'                          (: fy - function declaration start :)
                        
else ''"/>
  
  
<xsl:variable  name="wrap-close"
                 
select="
                         
if ($span/@class eq 'op' and $span eq ';' 
                         
and $prev-span/@class eq 'op' and $prev-span/@class eq 'op'
                         
and $prev-span eq '}')
                         
then 'fz'                                                          (: fz - closed function declaration :)
                         
else if (ends-with($span, '/&gt;') and $prev-span/@class = ('en', 'atn'))
                         
then 'sz'                                                          (: sz - self-closed element :)
                         
else if ($span eq '&gt;' and $prev-span/@class eq 'cl')
                         
then 'ez'                                                          (: ez - closed element :)
                         
else ''"/>
  
  
<xsl:choose>
    
<xsl:when test="empty($span)"/>        
    
<xsl:when test="$wrap-open ne ''">
      
<xsl:variable name="span-children" as="node()*">
        
<xsl:call-template name="wrap-spans">
          
<xsl:with-param name="spans" select="$spans"/>
          
<xsl:with-param name="index" select="$index + 1"/>
        
</xsl:call-template>
      
</xsl:variable>
      
      
<span class="wrapper" data-open-id="{$wrap-open}{$index}">
        
<xsl:sequence select="$span"/>                  
        
<xsl:sequence select="$span-children"/>                       
      
</span>
      
      
<xsl:if test="exists($span-children[last()]/@data-close-id)">    
        
<xsl:variable name="prev-index" select="xs:integer(substring($span-children[last()]/@data-close-id, 3))"/>
        
<xsl:call-template name="wrap-spans">
          
<xsl:with-param name="spans" select="$spans"/>
          
<xsl:with-param name="index" select="$prev-index + 1"/>
        
</xsl:call-template>
      
</xsl:if>
    
</xsl:when>
    
<xsl:when test="$wrap-close ne ''">
      
<span data-close-id="{$wrap-close}{$index}">
        
<xsl:copy-of select="$span/@*|$span/node()"/>
      
</span>
    
</xsl:when>
    
<xsl:otherwise>
      
<xsl:apply-templates select="$span" mode="wrapping">
        
<xsl:with-param name="pos" select="$index"/>
      
</xsl:apply-templates>
      
<xsl:call-template name="wrap-spans">
        
<xsl:with-param name="spans" select="$spans"/>
        
<xsl:with-param name="index" select="$index + 1"/>
      
</xsl:call-template>
    
</xsl:otherwise>       
  
</xsl:choose>
</xsl:template>

The most important part to this template is its recursive nature, with an xsl:choose instruction used to control the recursive template call depending on whether a ‘wrap-open’ is detected and a new wrapper element must be added, a ‘wrap-close’ is detected and the currently wrapped sequence must be closed, or no open/close markers are found and therefore the current sequence should continue.

Note that on a ‘wrap-open’ event, we need to continue processing tokens immediately after the ‘wrap-close’ event which will occur within the recursive call. The ‘wrap-close’ event therefore adds a ‘data-close-id’ attribute to the closing token, this is then read within the calling ‘wrap-open’ event to get the next token index.

Orderless Elements

Now we’ve added some structure to our XQuery code tokens we can make code comparisons more resilient. In XQuery, the order of functions declared in the prolog is not significant, as declared functions now have a dedicated ‘wrapper’ element, we can exploit DeltaXML Core’s ‘orderless’ comparison feature (described in Comparing Orderless Elements). Orderless comparisons are performed on elements when deltaxml:ordered attributes with a value of ‘false’ are found, also, a deltaxml:key attribute can be added to elements to ensure that alignment is acheived most efficiently and reliably. We would normally use the QName for the function (represented using {uri}local-name ) and also add an arity value as the key for each function declaration, but to simplify things slightly I will use the prefixed function name only. To add these attributes we just need one further template, but because this functionality is separate to the initial token wrapping I’ll write a separate XSLT filter for this, key-xquery.xsl:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns="http://www.w3.org/1999/xhtml" 
                
xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1"
                
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0" xpath-default-namespace="http://www.w3.org/1999/xhtml">
  
  
<xsl:output method="xml" indent="no"/>
  
  
<xsl:template match="/pre">
    
<pre xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1" deltaxml:ordered="false">
      
<xsl:apply-templates select="*" mode="top"/>
    
</pre>
  
</xsl:template>
  
  
<xsl:template match="@* | node()" mode="lower">
    
<xsl:copy>
      
<xsl:apply-templates select="@* | node()" mode="lower"/>
    
</xsl:copy>
  
</xsl:template>
  
  
<xsl:template match="span" mode="top">
    
<span deltaxml:key="p-{position()}">
      
<xsl:apply-templates select="@class | node()" mode="lower"/>            
    
</span>
  
</xsl:template>
  
  
<xsl:template match="span" mode="lower">
    
<span>
      
<xsl:apply-templates select="@class | node()" mode="lower"/>            
    
</span>
  
</xsl:template>
  
  
<xsl:template match="span[@class eq 'wrapper'][starts-with(@data-open-id, 'fy')]" mode="top">
    
<span deltaxml:key="{span[@class eq 'function'][1]}">
      
<xsl:apply-templates select="@class | node()" mode="lower"/>
    
</span>
  
</xsl:template>
  
</xsl:stylesheet>

This simple template adds the required ordered=”false” attribute to the ‘pre’ attribute and then adds keys to each top-level element. All that’s left is to add this ‘key-xquery’ filter to our Pipeline Configuration file (DXP) and run the comparison, here’s the new DXP file with the added filter:

<!DOCTYPE comparatorPipeline SYSTEM "../dxp/dxp.dtd">
<!-- $Id$ -->
<comparatorPipeline description="compare xquery" id="xquery">
  
  
<inputFilters>
    
<filter>
      
<file path="xquery2xml.xsl" relBase="dxp"/>
    
</filter>
    
<filter>
      
<file path="key-xquery.xsl" relBase="dxp"/>
    
</filter>   
  
</inputFilters>
  
  
<outputFilters>   
    
<filter>
      
<file path="xquery-tokens2html.xsl" relBase="dxp"></file>      
    
</filter>
  
</outputFilters>
  
  
<outputProperties>
    
<property name="indent" literalValue="no"/>
  
</outputProperties>
  
  
<comparatorFeatures>
    
<feature name="http://deltaxml.com/api/feature/isFullDelta" literalValue="true"/>
    
<feature name="http://deltaxml.com/api/feature/enhancedMatch1" literalValue="true"/>
  
</comparatorFeatures>
  
</comparatorPipeline>

Its now time to compare this XQuery with a modified version, these are the changes I made:

  1. Swapped the order of the local:summary-full and local:summary-short function declarations
  2. In local:summary-full(): Added ‘company’ to $emps/deptno to give $emps/company/deptno
  3. In local:summary-full(): Added a ‘serial’ attribute to the ‘dept’ element constructor
  4. In local:summary-full(): Removed the ‘full’ child element from the ‘dept’ element constructor
  5. In local:summary-short(): Changed ‘$emps[deptno = $d]’ to be ‘$emps[deptno ne $d]’

And here is the result of using DeltaXML Core to compare this modified version with the original – using orderless comparison with keys:

keyed-result

As we can see above, the changes have been marked clearly and there is no issue with the different order of the function declarations (the result shows the order of the original by default). It’s now time to look at the output from DeltaXML Core from the same input, but this time when keyed comparison is not used:

unkeyed-result

The above result shows that, when using ordered comparison, correct changes are not picked up and that incorrect changes are reported, simply because the correct functions have not been aligned and therefore the wrong functions have been compared with each other.

Conclusion

By creating a filter to add wrapper elements with unique keys to XQuery function declarations, we were able to exploit DeltaXML Core’s orderless comparison and ignore changes in the order of function declarations within the XQuery prolog – the example comparison showed that this allowed an accurate result where other comparators would fail. It would be relatively straightforward to extend the ‘wrap-spans’ template and ‘key-xquery.xsl’ filter to handle other orderless language constructs such as variable declarations.

Future Enchancements

One feature that I think could be useful would be a ‘ghost’ image showing a faint rendering of the place where the code block occurred in the other version (see below), this could probably be done best with an extra input/output filter to use a ‘placeholder’ element as a reference – hopefully I will get the chance to try this out in a further blog post.

placeholder-result