XmlDiff TUTORIAL
================

:Author: Sylvain Thnault
:Organization: Logilab
:Version: $Revision: 1.4 $
:Date: $Date: 2003-10-08 09:34:12 $

.. contents::

Synopsis
--------
::

    xmldiff [Options] from_file to_file
    xmldiff [Options] [-r] from_directory to_directory

    Options:
      -h, --help
	 display this help message and exit.
      -V, --version
	 display version number and exit
      -H, --html
	 input files are HTML instead of XML
      -r, --recursive
	 when comparing directories, recursively compare any
	 subdirectories found.
      -e encoding, --encoding=encoding
	 specify the encoding to use for output. Default is UTF-8
      -n, --not-normalize-spaces
	 do not normalize spaces and new lines in text and comment nodes.
      -c, --exclude-comments
	 do not process comment nodes
      -g, --ext-ges
	 include all external general (text) entities.
      -p, --ext-pes
	 include all external parameter entities, including the external DTD
	 subset.

      --profile=file
	 display an execution profile (run slower with this option),
	 profile saved to file (binarie form).



Detailed example
----------------

if you process two files file1 and file2 which respectively contain: ::

    <memory>
      <mailbox path="/var/spool/mail/almaster"/>
      <server-socket port="7776" recipe="pia.PDA"/>
      <server-socket port="7777" recipe="proxy.Web proxy"/>
      <email_addr mine="yes">almaster@logilab.org</email_addr>
      <junkbuster-method value="18" />
      <spoken-languages>
       <language name="italian" code="it" />
       <language name="english" code="fr" />
       <language name="english" code="en" />
      </spoken-languages>
    </memory>

and

::

    <memory>
      <box path="/var/spool/mail/almaster"/>
      <server-socket port="7776" recipe="pia.PDA"/>
      <server-socket port="7797" recipe="proxy.Web proxy"/>
      <email_addr mine="yes">syt@logilab.org</email_addr>
      <junkbuster-method val="18">
       <newson/>
      </junkbuster-method>
      <spoken-languages new="new attribute">
       <language name="english" code="fr" />
       <language code="it" name="italian" />
      </spoken-languages>
      <test>
       <!-- this is an append test -->
	hoye!
      </test>
    </memory>


executing *xmldiff file1 file2* will give the following result: ::

    rename_node, /memory[1]/mailbox[1], box]
    [insert-after, /memory[1]/junkbuster-method[1],
    <spoken-languages new="new attribute">
      <language code="it" name="italian"/>
    </spoken-languages>
    ]
    [insert-after, /memory[1]/spoken-languages[1],
    <test>
      <!-- this is an append test -->
    hoye!
    </test>
    ]
    [update, /memory[1]/email_addr[1]/text()[1], syt@logilab.org]
    [rename_node, /memory[1]/junkbuster-method[1]@value, val]
    [append-first, /memory[1]/junkbuster-method[1],
    <newson/>
    ]
    [move-first, /memory[1]/spoken-languages[2]/language[2], /memory[1]/spoken-languages[1]]
    [update, /memory[1]/server-socket[2]@port, 7797]
    [remove, /memory[1]/spoken-languages[2]]



This give you a list of primitives to apply on file1 to obtain file2
(you should obtain file2 after the execution of all this script!). See
[4] and [5] for more information.
The script above tell you the 9 actions to apply on file1:

* insert after the node /memory/spoken-languages[0] the below xml subtree::

      <test>
      <!-- this is an append test -->
      hoye!
      </test>

* rename node /memory/mailbox[0] to "box"

* append a node <newson> to the node /memory[0]/junkbuster-method[0]

* append an attribute named "new" with value "new attribute" to the
  node /memory/spoken-languages[0]

* update attribute /memory/server-socket[1]@port value to "7797"

* update text /memory/email_addr/text()[0] to "syt@logilab.org"

* rename attribute /memory/junkbuster-method[0]@value to "val"

* move the attributes "code" and "name" from
  /memory[0]/spoken-languages[0]/language[1] to
  /memory[0]/spoken-languages[0]/language[0]
  and rename them to LogilabXmldiffTmpAttr:code and
  LogilabXmldiffTmpAttr:name

* move the attributes "code" and "name" from
  /memory[0]/spoken-languages[0]/language[0] to
  /memory[0]/spoken-languages[0]/language[1]
  and rename them to LogilabXmldiffTmpAttr:code and
  LogilabXmldiffTmpAttr:name

* remove node /memory/spoken-languages/language[2]

* rename attributes LogilabXmldiffTmpAttr:code and
  LogilabXmldiffTmpAttr:name of /memory/spoken-languages/language[0]
  to name and code

* rename attributes LogilabXmldiffTmpAttr:code and
  LogilabXmldiffTmpAttr:name of /memory/spoken-languages/language[1]
  to name and code

Note all xpath are relative to the file1 with previous steps applied.


Warnings
--------

* This version of xmldiff doesn't process the DTD, CDATA and
  PROCESSING INSTRUCTIONS nodes, so if there is a difference between two
  document in one of those nodes, xmldiff won't see it.

* Furthermore, xml namespaces are disabled:
  <xsl:transform  xmlns:xsl="..."/> and
  <xslt:transform xmlns:xslt="..."/>
  are seen as different nodes

* Comparing document bigger than 200Ko can take a few minutes (during
  tests, it took at about 25 seconds to diff two versions of a 130Ko
  document on a Celeron 533 box with 256Mo RAM)

* The execution time is scaled to the number of differences between
  the documents to compare

* Finally, a few assumptions have been made to obtain the faster
  algorithm:

  - there is an ordering <_l on the labels in the shema such that a node
    with a label l1 can appear as the descendent of a node with a label l2
    only if l1 <_l l2

  - for any leaf x from T1, there is at most one leaf y from T2 which
    can be mapped with x (internally, 2 node may be mapped together if
    their lcs (longest common subsequence) ratio is greater than 0.6)


References
----------

1. "Change detection in hierarchically structured information"
   by S. Chawathe, A. Rajaraman, H. Garcia-Molina, J. Widom
   Stanford University, 1996
   The Fast Match / Edit Script algorithm (fmes), used by default

2. http://www.w3.org/TR/2000/REC-xml-20001006
   XML 1.0 W3C recommendation

3. http://www.w3.org/TR/xpath
   XML path language 1.0 W3C recommendation


Feedback
--------

xmldiff discussion should take place on the xml-logilab mailing list.
Please check http://lists.logilab.org/mailman/listinfo/xml-projects for
information on subscribing and the mailing list archives.
