Nothing makes you want Groovy more than XML

I’m in Delaware this week teaching a course in Java Web Services using RAD7. The materials include a chapter on basic XML parsing using Java. An exercise at the end of the chapter presented the students with a trivial XML file, similar to:


<library>
  <book isbn="1932394842">
    <title>Groovy in Action</title>
    <author>Dierk Koenig</author>
  </book>
  <book isbn="1590597583">
    <title>Definitive Guide to Grails</title>
    <author>Graeme Rocher</author>
  </book>
  <book isbn="0978739299">
    <title>Groovy Recipes</title>
    <author>Scott Davis</author>
  </book>
</library>

(with different books, of course) and asked the students to find a book with a particular isbn number and print it’s title and author values.

I sighed and went to work, producing a solution roughly like this:


import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;

import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;

public class ParseLibrary {
    public static void main(String[] args) {
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        Document doc = null;
        try {
            DocumentBuilder builder = factory.newDocumentBuilder();
            doc = builder.parse("books.xml");
        } catch (Exception e) {
            e.printStackTrace();
            return;
        }
        NodeList books = doc.getElementsByTagName("book");
        for (int i = 0; i < books.getLength(); i++) {
            Element book = (Element) books.item(i);
            if (book.getAttribute("isbn").equals("1932394842")) {
                NodeList children = book.getChildNodes();
                for (int j = 0; j < children.getLength(); j++) {
                    Node child = children.item(j);
                    if (child.getNodeType() == Node.ELEMENT_NODE) {
                        if (child.getNodeName().equals("title")) {
                            System.out.println("Title: "
                                + child.getFirstChild().getNodeValue());
                        } else if (child.getNodeName().equals("author")) {
                            System.out.println("Author: "
                                + child.getFirstChild().getNodeValue());
                        }
                    }
                }
            }
        }
    }
}

The materials didn’t supply a DTD, so I didn’t have any ID attributes to make it easier to get to the book I wanted. That meant I was reduced to continually using getElementsByTagName(String). I certainly didn’t want to traverse the tree, what with all those whitespace nodes containing the carriage-return/line-feed characters. So I found the book nodes, cast them to Element (because only Elements have attributes), found the book I wanted, got all of its children, found the title and author child elements, then grabbed their text values, remembering to go to the element’s first child before doing so.

What an unsightly mess. The only way to simplify it significantly would be to use a 3rd partly library, which the students didn’t have, and it would still be pretty ugly.

One of the students said, “I kept waiting for you to say, ‘this is the hard way, now for the easy way,’ but you never did.”

I couldn’t resist replying, “well, if I had Groovy available, the whole program reduces to:


def library = new XmlSlurper().parse('books.xml')
def book = library.books.find { it.@isbn == '1932394842' }
println "Title: ${book.title}\nAuthor: ${book.author}"

“and I could probably shorted that if I thought about it. How’s that for easy?”

On the bright side, as a result I may have sold another Groovy course. :) For all of Groovy’s advantages over raw Java (and I keep finding more all the time), nothing sells it to Java developers like dealing with XML.

About Kenneth Kousen
I teach software development training courses. I specialize in all areas of Java and XML, from EJB3 to web services to open source projects like Spring, Hibernate, Groovy, and Grails. Find me at Google+ on Google+ I am the author of "Making Java Groovy", a Java / Groovy integration book published by Manning in the Fall of 2013.

23 Responses to Nothing makes you want Groovy more than XML

  1. Brett Knights says:

    Well if they can’t find and install Jaxen it’s unlikely they’re going to find and install Groovy.

    Also for the task at hand your code is way wordy. What’s below is shorter and could still benefit from a couple of methods to make the main body more readable. It’s not quite as efficient as yours but if you’re going to go to Groovy efficiency isn’t your primary driver anyway.

    import javax.xml.parsers.DocumentBuilder;
    import javax.xml.parsers.DocumentBuilderFactory;

    import org.w3c.dom.Document;
    import org.w3c.dom.Element;
    import org.w3c.dom.NodeList;

    public class ParseLibrary throws Exception {
    public static void main(String[] args) {
    DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
    Document doc = builder.parse(“books.xml”);

    NodeList books = doc.getElementsByTagName(“book”);
    for (int i = 0; i < books.getLength(); i++) {
    Element book = (Element) books.item(i);
    if (book.getAttribute(“isbn”).equals(“1932394842″)) {
    NodeList titles = book.getElementsByTagName(“title”);
    if(titles ! = null) for(int t = 0; t< titles.getLength(); t++) System.out.println(“Title: ” + titles.item(t).getFirstChild().getNodeValue());

    NodeList authors = book.getElementsByTagName)”author”);
    if(authors ! = null) for(int a = 0; a< authors.getLength(); a++) System.out.println(“Author: ” + authors.item(a).getFirstChild().getNodeValue());

    break; // just one book per isbn
    }
    }
    }
    }

  2. Brett Knights says:

    Or would be more readable if you’re comments let me format it properly.

  3. Ken Kousen says:

    Hi Brett,

    Yes, your code is somewhat shorter, but I’d still take the Groovy solution any time. And as for Jaxen, yes, that helps a lot, but Groovy not only makes XML easier, it makes everything easier.

    One thing is indisputable, though. As much as I like the overall product, WordPress is a truly lousy way to display source code.

    Thanks for commenting, though. :)

  4. Jon Chase says:

    I hear ya – I just had the same Groovy XML experience:

    The original Java code (and a so-so Groovy impl): http://www.juliesoft.com/blog/jon/index.php/2008/03/09/groovy-is-coming/

    The final Groovy code: http://www.juliesoft.com/blog/jon/index.php/2008/03/12/groovy-micro-benchmark-revisited-groovy-is-fast/

  5. Jon Chase says:

    And yes, WordPress’s formatting could be better (that’s why I just use screenshots for my code!!).
    :)

  6. Ken Kousen says:

    Jon, those are very interesting results. I’m glad you found a way to get the efficiency back. Personally, I worry a lot less about efficiency in a technology as new as Groovy, figuring that’ll come automatically with time. I’ve heard many reports of progress in that area already.

    And Brett, you’re right, I should think about doing screen shots for my code. What a pain, though. My current system is to paste in the code, then go to code view and add tabs and sprinkle in %lt;pre%gt; and %lt;code%gt; tags as necessary. It’s a really lousy system.

  7. Pingback: Groovy on Grails » Blog Archive » Nothing Makes You Want Groovy More Than XML (Ken Kousen)

  8. Jim says:

    Good article. BTW,

    library.books.find { it.@isbn == ’1932394842′ }

    should be

    library.book.find { it.@isbn == ’1932394842′ } // ‘book’ should be singular

  9. Ken Kousen says:

    Of course, you’re right. I really am going to have to start pasting in images of my source code rather than trying to just type it into WordPress.

    Thanks for catching that.

  10. Pavan Sibal says:

    I too like Groovy,but I do also think that XPATH expressions can be easily used to extract a particular node like groovy expressions.

  11. Michael Mellinger says:

    This line:
    def book = library.books.find { it.@isbn == ’1932394842′ }

    Should be:
    def book = library.book.find { it.@isbn == ’1932394842′ }

    def library = new XmlSlurper().parse(‘books.xml’)
    def book = library.book.find { it.@isbn == ’1932394842′ }
    println “Title: ${book.title}\nAuthor: ${book.author}”

  12. Ken Kousen says:

    Thanks for the typo catch. Entering code in WordPress is really annoying. :)

  13. Paul says:

    For displaying source code in wordpress, use the syntaxhighlighter plugin: http://wordpress.org/extend/plugins/syntaxhighlighter/

    Just wrap your code in

    code here

    . Languages are defined on the plugins homepage. (even though not all languages are implemented, you can still use ‘java’ and it does a good job of groovy)

    Have a look here to see it in action:
    http://www.javathinking.com/?p=95

  14. Paul says:

    Hey, it looks like you do have the plugin installed, because my comment is rendering code using it! It should say:

    |sourcecode language=’css’|code here|/sourcecode|

    where | should really be [ and ]

  15. Ken Kousen says:

    Paul, that is so sweet! I had no idea the plugin was installed here, at WordPress. I guess maybe it should have been obvious, but I didn’t see it documented anywhere.

    Chalk that up as yet another thing that I wish I realized years ago. :)

    Thanks!

  16. Well, I’m afraid you’ve succeeded at selling another Groovy course — not so at teaching them good Java programming. Maybe next time you could consider present your students with something cleaner? ;-) e.g.

    import java.io.FileInputStream;
    
    import javax.xml.xpath.XPath;
    import javax.xml.xpath.XPathExpression;
    import javax.xml.xpath.XPathFactory;
    
    import org.xml.sax.InputSource;
    
    public class ParseLibrary {
        public static void main( String[] args ) {
     
            XPathFactory xpathFactory = XPathFactory.newInstance();
            XPath xpath = xpathFactory.newXPath();
            XPathExpression xpathExpression = null;
            try {
                xpathExpression = xpath.compile( "/library/book[@isbn = '1932394842']/title" );
                InputSource is = new InputSource( new FileInputStream( "/home/ja/books.xml" ) );
                String title = xpathExpression.evaluate( is );
                System.out.println( "The title is: " + title );
            } catch( Exception e ) {
                e.printStackTrace();
            }
        }
    }
    
  17. But still the Groovy-way beats the hell out of Java…

  18. Pingback: Groovy, The Gateway Drug | Should Be Simple

  19. love the way Groovy allows me to work with XML! Thanks for this post!

  20. Mohammad K says:

    Thanks for the helpful post!
    I was thinking if you could help me with my issue?
    What would be the best way to programmatically remove all the nodes from the whole XML document, so the xml looks like this:

    Dierk Koenig

    Graeme Rocher

    Scott Davis

    I would like to do that using Groovy. Any help?

  21. I am so sold by the “Groovy-way” that I wrote a smallish Java library to largely (somewhat?) mimic it. Please check it out at https://github.com/MorganConrad/xen. It’s still *very* preliminary, but if you check out test/GeocoderDemo.java it more or less matches the example from “Making Java Groovy”.

  22. Ken Kousen says:

    Very impressive. :) I’ll have to give it a try next time I have to deal with XML.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 1,153 other followers

%d bloggers like this: