Converting Data to and from XML [1]

Water Import

Most data in the world is not in XML format. Water Import can convert many data sources into XML objects.

Each type of data format has a conversion method that takes a string and returns an XML object, e.g.,

  • delimited string → XML object
  • CSV file → XML objects

The first conversion method is <to_objects/>, which takes multiple lines and converts them to

<string>

Paul,Bryan,Charlotte,NC

Richard,Reid,East Lansing,MI

Brian,Rogers,Chapel Hill,NC

</string>.<to_objects/>

Let’s assign the resulting objects to the variable TAs, then access some fields from the resulting objects. What would be an expression that would return “East Lansing”? TAs.1.2

By default, <to_objects/> assumes that rows are separated by line breaks and fields within a row are separated by commas. It’s possible to change the field separators; see Ch. 8 of the Water book.

We can also use <for_each> to return all of the hometowns of the TAs:

TAs.<for_each>

value.2

</>

Why doesn’t this work? Only the last hometown is returned.

What change do we need to make? Add combiner=insert or returns=”all” to the <for_each> call.

The most basic data structures that we can return are vectors. But it’s much easier to work with our data if we build instances of Water classes.

We can do that by naming the class that we want to create as a parameter of <to_objects/>.

<defclass person first last city state/>

<set TAs=<string>

Paul,Bryan,Charlotte,NC

Richard,Reid,East Lansing,MI

Brian,Rogers,Chapel Hill,NC

</string>.<to_objects maker=person/>

Let’s write an expression that returns the first name of all the TAs:

CSV files can be imported too, using the <csv_to_objects/> method:

<set TAs=

<string>

Paul,"Bry

an","Charlotte, NC"

Richard,Reid,"East Lansing, MI"

Brian,Rogers,"Chapel Hill, NC"

</string>.<csv_to_objects/>

/>

<csv_to_objects/> understands that strings are quoted if they contain a quote, comma, or newline, and any quote characters within the data are escaped.

Water can also import data in XML format.

If a string contains valid Water code, we can execute the string at run time:

"<vector 0 'ABC' />".<execute/>

This code segment returns a vector with the elements 0 and “ABC.”

Often, when dealing with Web services, we are attempting to import XML data that is not valid Water code.

We can import XML data by specifying the execution_kind in the <execute/> method.

<set TAs=<string>

<ta title="Lead">

<name>

<first>Paul</first>

<last>Bryan</last>

</name>

<home>

<city>Charlotte</city>

<state>NC</state>

</home>

</ta>

</>.<execute execution_kind="ek_data"/> />

If we inspect the resulting object, all of our data is there, but it is difficult to access. For example, to access the TA’s city, we must execute the path:

TAs._content.1._content.0._content.0

Fortunately, Water provides a <normalize/> method that simplifies the parsing of XML data:

<normalize TAs/>

If we inspect the resulting object, we see that it is formatted in a fashion that we expect.

<normalize TAs/>.home.city

We can also parse XML data from on-line sources. For example, the following Water program collects news stories from cnet.com:

<defclass cnet_viewer>

<defmethod htm_large_class>

<set cnet_rss=<normalize

<file " execution_kind="ek_data"/>

/> />

<html>

<body

<h1>c|net News Items</>

cnet_rss.channel.item_vector.<for_each combiner=insert>

<p

<a href=value.link 0=value.title/>

" "

value.description

/>

</>

/>

</>

</>

</>

<server root=cnet_viewer/>

First, the cnet_viewer class accesses news.com using the <file/> method.

Each time is accessed, we use the <execute/> and <normalize/> methods to create an easy-to-use Water object, which we store in the variable cnet_rss.

Once we have a Water object, we create HTML output by using a <for_each> loop to pull the data out of cnet_rss.

Water Export

We can take one of the objects we created before and translate it back to XML:

<defclass person first last city state/>

<vector

<person first="Paul" last="Bryan"

city="Charlotte" state="NC"/>

<person first="Richard" last="Reid"

city="East Lansing" state="MI"/>

<person first="Brian" last="Rogers"

city="Chapel Hill" state="NC"/>

/>.<to_xml/>

We can also convert the object <to_concise_xml/>, which gives us back just about the same representation we began with.

We can convert it <to_csv/>:

"person

first,last,city,state

Paul,Bryan,Charlotte,NC

Richard,Reid,East Lansing,MI

Brian,Rogers,Chapel Hill,NC"

The header row can be suppressed by

<to_csv header=false/>

We can easily write a <to_table/> method to create a table of simple objects:

<defclass person

first_name

last_name

employee_number=optional

last_updated=optional

/>

<set a_data=

<vector

<person "Mike" "Plusch" 23 <date 2002 10 2/> />

<person "Christopher" "Fry" 19 />

/>

/>

vector.<defmethod to_table>

<table width="100%" border=1

_subject.<for_each returns='all'>

<tr

value.<for_each returns='all'>

<td

value.<to_html/>

/>

</>

/>

</>

/>

</defmethod>

a_data.<to_table/>

We add a <to_table/> method to the vector class that uses <for_each> to generate a table row (<tr>) for each object in the vector.

Summary

In this lecture we have analyzed many of the methods that Water uses to import and export data.

References

  1. Plusch, Mike. Water: Simplified Web Services and XML Programming.