GedML: Genealogical Data in XML
GedML
These pages describe GedML, a way of encoding genealogical data
sets in XML. It combines the well-established GEDCOM data model
with the XML standard for encoding complex information.
The result is a representation that can easily be converted to
and from GEDCOM, but can be manipulated much more easily using
standard tools: notably, by using an XSLT processing such as Saxon.
What's available?
After a couple of years during which the GedML site was incomplete and dormant, I have
put together a very basic set of useful tools.
The software supplied contains four Java classes (in source and compiled form).
These are:
- GedcomParser: This class implements the SAX2 XMLReader interface, so it pretends
to be an XML parser, but actually it is parsing GEDCOM files. Each GEDCOM tag is presented
as an element. The names of tags are unchanged from the original GEDCOM. The entire file
is wrapped in a <GED> element, and the TRLR record is removed. Record identifiers are
included in the file as ID attributes, while cross-references to other records are included
as REF attributes. The GedcomParser can be used with any software that expects to take input
from an XML parser.
- AnselInputStreamReader: GEDCOM files use a rather unusual character encoding which
is not supported by most Java VMs. This class performs the conversion from ANSEL characters
to Unicode. The GedcomParser uses this class to read the input file. As supplied, the GedcomParser
can only handle input in ANSEL or ASCII; it cannot handle the so-called "ANSI" input (actually
ISO-8859-1) which is supported by many genealogy packages, though not by the GEDCOM standard.
- GedcomOutputter: This is the reverse of GedcomParser; it acts as a SAX2 ContentHandler
which serializes a SAX event stream in the form of a GEDCOM file. It can be used as the back-end
for an XSLT transformation that is required to produce GEDCOM output.
- AnselOutputStreamWriter: This is the reverse of AnselInputStreamReader: it converts
Unicode characters to ANSEL, and is used to write the output file by the GedcomOutputter.
Also included are a number of stylesheets:
- GedcomToXml.xsl performs an identity transformation; if GedcomParser is used as the
input parser, the effect is to convert from GEDCOM encoding to XML. To achieve this with Saxon, the
command line is:
java com.icl.saxon.StyleSheet -x GedcomParser my.ged GEDCOMtoXML.xsl >my.xml
- XmlToGedcom.xsl also performs an identity transformation, but this time it is configured
to use GedcomOutputter to produce the output in GEDCOM format. Note that this uses a feature that
is specific to the Saxon XSLT processor.
- GedcomToHtml.xsl produces an HTML rendition of the GEDCOM file. Use this as a starting point
to display your GEDCOM files in whatever way you want.
The download file also includes demonstration files kennedy.ged and kennedy.xml
to give you something to test the programs on.
The software can be downloaded here. There are no restrictions on its use,
but equally, it is provided with no warranty or support.
Michael H. Kay
2 April 2002