Adding XML capabilities to C programs with libxml2

By David Turover

This document is in the public domain.

For brevity's sake, the code in this document contains no error checking.

In real life, you will want to check for NULL pointers and function returns.

Introduction

libxml2 is a library of functions for handling XML data.

A simple example:

#include <stdio.h>

#include <libxml/tree.h>

int main(){

xmlDocPtr doc;

xmlNodePtr nodeLevel1;

xmlNodePtr nodeLevel2;

doc = xmlParseFile("xmlfile.xml");

for(nodeLevel1 = doc->children;

nodeLevel1 != NULL;

nodeLevel1 = nodeLevel1->next)

{

printf("%s\n",nodeLevel1->name);

for(nodeLevel2 = nodeLevel1->children;

nodeLevel2 != NULL;

nodeLevel2 = nodeLevel2->next)

{

printf("\t%s\n",nodeLevel2->name);

}

}

xmlSaveFile("xmlfile_copy.xml", doc);

xmlFreeDoc(doc);

return 0;

}

The above code, compiled with -lxml2, should print out the names of the elements in the first two elements' depth of an XML file, and save a copyof the file.

Explanation of Introduction

The xmlDocPtr is a pointer to an xmlDoc structure. It represents an XML data source.

You load an XML file with the xmlParseFile() function, which takes as a parameter the name of an XML file and returns a pointer to a new xmlDoc structure (or NULL on failure). When done, you release this memory with the xmlFreeDoc() function. You can export an xmlDocPtr's data as an XML file with the xmlSaveFile() function.

The xmlNodePtr points to a single element or node of an XML document. Each xmlNode has a .children member which is an xmlNodePtr to the first of this node's children. Each xmlNode has a .name member which is a string containing the name of the element it represents or the word "text" for a text node.

The xmlNodePtr is the basic structure used to traverse an XML document with libxml2. It contains several xmlNodePtrs which can be used to move around the document. If there is no other node in a particular direction, the pointer is NULL.

xmlNodePtr->children / The first child of the node
xmlNodePtr->last / The node's last child
xmlNodePtr->parent / The current node's parent node
xmlNodePtr->next / The next sibling nodeis the next child of the parent node
xmlNodePtr->prev / The previous sibling node
xmlNodePtr->doc / The xmlDocPtr for the document containing this node
/ <node>
<node> ->parent
<node>
<node> ->prev
</node>
<node>
</node>
</node>
<node> You Are Here
<node> ->children
</node>
<node>
</node>
<node> ->last
</node>
</node>
<node> ->next
<node>
</node>
</node>
</node>
</node>

Although the above diagram suggests that nodes are simply elements, be aware that areas of whitespace between elements are also nodes. This makes the above example faulty if it is taken literally, because the child and neighbour node pointers would point to the whitespace between elements.

Checking for text nodes

You can easily check to see what type of xmlNode you have by looking at the xmlNodePtr->type member, which is an integer with one of the following values:

XML_ELEMENT_NODE

XML_ATTRIBUTE_NODE

XML_TEXT_NODE

XML_CDATA_SECTION_NODE

XML_ENTITY_REF_NODE

XML_ENTITY_NODE

XML_PI_NODE

XML_COMMENT_NODE

XML_DOCUMENT_NODE

XML_DOCUMENT_TYPE_NODE

XML_DOCUMENT_FRAG_NODE

XML_NOTATION_NODE

XML_HTML_DOCUMENT_NODE

XML_DTD_NODE

XML_ELEMENT_DECL

XML_ATTRIBUTE_DECL

XML_ENTITY_DECL

XML_NAMESPACE_DECL

XML_XINCLUDE_START

XML_XINCLUDE_END

XML_DOCB_DOCUMENT_NODE

The only ones you need to care about right now are XML_TEXT_NODE

and XML_ELEMENT_NODE.

Handling a Node

An XML node generally looks like this:

<this_is_a_node attribute1="abcdefg" attribute2="12345">

<this_is_a_child_node>Hello World</this_is_a_child_node>

</this_is_a_node>

The things you can manipulate are the node itself, the node's attributes,

and the node's contents.

Attributes

Working with attributes of a node is fairly straightforward: You use the xmlGetProp() function to get an attribute's value and the xmlSetProp() function to change an attribute's value. If you want to know if an attribute exists, you use the xmlHasProp() function. If you want to completely remove an attribute, use xmlUnsetProp().

xmlSetProp(xmlNodePtr node, xmlChar *name, xmlChar *value);

xmlGetProp(xmlNodePtr node, xmlChar *name);

xmlHasProp(xmlNodePtr node, xmlChar *name);

xmlUnsetProp(xmlNodePtr node, xmlChar *name);

xmlGetProp returns a string that must be freed with the xmlFree() function when you are done with it, or else your program will have a memory leak.

Content

Working with content is less intuitive. The content of a node is not simply what a node contains, but is the text of a node and its children with the elements stripped and removed. Thus the content of <this_is_a_node> from the above example would be "Hello World", with the child element <this_is_a_child_node> nowhere to be seen. If you try adding element tags to a node's content, libxml2 will &escape their < and > characters.

To work with content, then, you use the xmlNodeSetContent() and xmlNodeGetContent() functions to set or retrieve a node's content, or the xmlNodeAddContent() function to append to a node's content.

xmlNodeSetContent(xmlNodePtr node, xmlChar *content);

xmlNodeAddContent(xmlNodePtr node, xmlChar *content);

xmlNodeGetContent(xmlNodePtr node);

As with xmlGetProp(), you must use xmlFree() on the result of xmlNodeGetContent() or else you will have a memory leak.

To print everything an element contains, not simply its content, use xmlElemDump()

xmlElemDump(FILE * output, xmlDocPtr doc, xmlNodePtr node);

Strings: xmlChar* versus char*

xmlChar* is the string type used by libxml2.

You can easily cast between char* and xmlChar*.

Creating a New Node

To create a node from scratch and add it to a document:

xmlNodePtr node = xmlNewNode(NULL, "name");

xmlNodePtr nodeParent = doc->children;

node = xmlDocCopyNode(node, doc, 1);

xmlAddChild(nodeParent, node);

The xmlNewNode() function allocates memory for a new node. When you are done, you must free the node with xmlFree() unless the node has been added to another structure (as it has here) which will be freed. The NULL in xmlNewNode() is where an xmlNsPtr namespace pointer would be if the node was going to be assigned to a particular namespace; we are not using namespaces right now, so it is left as NULL.

The xmlDocCopyNode() function does not copy the node to the target document. Instead, it only copies the document information to the node, so that the node believes it is part of the document. To add the node to the document, you must then use another function such as xmlAddChild(), xmlAddSibling(), xmlAddNextSibling(), or xmlAddPrevSibling().

Summary of xmlNode Members and Simple Interface Functions

typeNode type (usually XML_ELEMENT_NODE or XML_ELEMENT_TEXT)

nameString containing element's name, or "text" if a text node

childrenFirst child of node

lastLast child of node

parentParent node

nextNext sibling node

prevPrevious sibling node

docThe document containing this node

xmlSetProp(xmlNodePtr node, const xmlChar *name, const xmlChar *value);

xmlGetProp(xmlNodePtr node, const xmlChar *name);

xmlHasProp(xmlNodePtr node, const xmlChar *name);

xmlUnsetProp(xmlNodePtr node, const xmlChar *name);

xmlNodeSetContent(xmlNodePtr cur, const xmlChar *content);

xmlNodeAddContent(xmlNodePtr cur, const xmlChar *content);

xmlNodeGetContent(xmlNodePtr cur);

xmlElemDump(FILE * output, xmlDocPtr doc, xmlNodePtr node);

For more information, read the API docs at:

Addendum: Custom Functions

These are a couple functions written for the semester project.

/** srSeekChildNodeNamed() : Get a pointer to the child with the given name

* Returns a pointer to the data, not a copy of it

*/

xmlNodePtr srSeekChildNodeNamed(xmlNodePtr p, char * name){

if(p == NULL || name == NULL) return NULL;

for(p=p->children; p!= NULL; p=p->next){

if(p->name & (strcmp((char*)p->name,name) == 0)){

return p;

}

}

return NULL;

}

/* srXPath(): Avoid having to use contexts in your code

* Returns an xmlXPathObjectPtr that you must free with xmlXPathFreeObject()

*/

xmlXPathObjectPtr srXPath(xmlChar * str, xmlDocPtr doc){

xmlXPathContextPtr xpContext;

xmlXPathObjectPtr xpResult;

if(str == NULL){

printf("Error: srXPath(): NULL received for xpath string\n");

return NULL; }

if(doc == NULL){

printf("Error: srXPath(): xmlDocPtr is NULL\n");

return NULL; }

xpContext = xmlXPathNewContext(doc);

if(xpContext == NULL){

printf("Error: srXPath(): Failed to create xpath context\n");

return NULL; }

xpResult = xmlXPathEvalExpression(str, xpContext);

xmlXPathFreeContext(xpContext);

return xpResult;

}

Addendum: Using LibXSLT

Example:

#include <libxml/tree.h>

#include <libxslt/libxsltInternals.h>

#include <libxslt/transform.h>

int main(){

xmlDocPtr doc = xmlParseFile("xmlfile.xml");

xsltStylesheetPtr xsl=xsltParseStyleSheetFile("xslfile.xsl");

xmlDocPtr result = xsltApplyStylesheet(xsl, doc, NULL);

xmlSaveFile("stylesheet_output.xml", result);

xmlFreeDoc(doc);

xmlFreeDoc(result);

xsltFreeStylesheet(xsl);

return 0;

}

The XSL file is available, as an xmlDocPtr, as xsltStylesheetPtr->doc

The NULL in xsltApplyStylesheet() is where you would give parameters to the XSL parser. That is beyond the scope of this document.

If you want to add nodes to an XSL stylesheet, you will need to give the nodes an XSL namespace:

xmlNsPtr xslNamespace=xmlNewNs(NULL,

(xmlChar*)"

(xmlChar*) "xsl");

xmlNodePtr newNode=xmlNewNode(xslNamespace,(xmlChar*)"variable");

xmlNodeSetContent(varNode, (xmlChar *)"fourty-two");

mlSetProp(varNode, (xmlChar *)"name", (xmlChar *)"my_variable");

xmlDocCopyNode(varNode, xslDoc, 1);

xmlAddPrevSibling(xslDoc->children->next, varNode);

This creates an <xsl:variable> element named my_variable, which contains

the value "fourty-two".

See also:

Addendum: Using XPath:

#include <libxml/tree.h>

#include <libxml/xpath.h>

int main(){

xmlChar * xpath = "/foo[@bar='baz']";

xmlDocPtr doc = xmlParseFile("xmlfile.xml");

xmlXPathContextPtr context = xmlXPathNewContext(doc);

xmlXPathObjectPtr result =

xmlXPathEvalExpression(xpath, context);

if(xmlXPathNodeSetIsEmpty(result->nodesetval)){

printf("No result\n");

}

xmlFreeDoc(doc);

xmlXPathFreeContext(context);

xmlXPathFreeObject(result);

}

xmlXPathEvalExpression() function returns NULL if the xmlChar* is not a valid XPath statement. Otherwise, it returns an xmlXPathObjectPtr.

xmlXPathObjectPtr->nodesetval is a xmlNodeSetPtr, a new data structure introduced with xpath.

The xmlNodeSetPtr contains an array of pointers to the nodes that matched the XPath statement.

xmlNodeSetPtr->nodeTab is an array of xmlNodePtr to the results.

xmlNodesetPtr->nodeNr is the length of the ->nodeTab array

An xmlXPathObject must be freed with the xmlXPathFreeObject() function.

See also: