Traversing an XML Document
Problem
How do I work with XML documents?
Solution
Clojure has powerful libraries for processing XML documents. One low-level approach is to use the function clojure.xml.parse to read a document and parse it into a map of the root element with child elements nested within it. parse accepts a File, an InputStream, or a String containing a URI for its argument.
Suppose the following XML document is located in a file named "calendar.xml":
<?xml version="1.0"?>
<calendar>
<holiday type="International">
<name>International Lefthanders Day</name>
<date>
<month>August</month>
<day>13</day>
</date>
</holiday>
<holiday type="Personal">
<name>Rover's birthday</name>
<date>
<month>October</month>
<day>12</day>
</date>
</holiday>
<holiday type="National">
<name>Groundhog Day</name>
<date>
<month>February</month>
<day>2</day>
</date>
</holiday>
<holiday type="State">
<name>Kamehameha Day</name>
<date>
<month>June</month>
<day>11</day>
</date>
</holiday>
</calendar>
parse returns a map with three keys:
(use '[clojure.xml :only (parse)])
(def xml-doc (parse (File. "calendar.xml")))
(keys xml-doc) => (:tag :attrs :content)
The :tag of the root element:
(:tag xml-doc) => :calendar
It has no attributes but contains 4 child elements:
(:attrs xml-doc) => nil
(count (:content xml-doc)) => 4
The first child element is a <holiday> element:
(def holiday (first (:content xml-doc)))
(:tag holiday) => :holiday
(:attrs holiday) => {:type "International"}
The holiday contains 2 children of its own, a <name> element and a <date> element:
(:content holiday) =>
[{:tag :name, :attrs nil, :content ["International Lefthanders Day"]}
{:tag :date, :attrs nil, :content [{:tag :month, :attrs nil, :content ["August"]} {:tag :day, :attrs nil, :content ["13"]}]}]
There is a higher-level approach, rather than using parse directly, which may be more convenient. The function clojure.core/xml-seq provides a sequence wrapper that allows you to perform a depth-first traversal of the XML document:
(map (fn [elt] (or (:tag elt) elt)) (xml-seq xml-doc)) =>
(:calendar
:holiday :name "International Lefthanders Day" :date :month "August" :day "13"
:holiday :name "Rover's birthday" :date :month "October" :day "12"
:holiday :name "Groundhog Day" :date :month "February" :day "2"
:holiday :name "Kamehameha Day" :date :month "June" :day "11")
We can use a list comprehension to extract some relevant info:
(defn holiday-name [holiday] (first (:content (first (:content holiday)))) )
(defn holiday-month [holiday] (first (:content (first (:content (second (:content holiday)))))))
(defn holiday-day [holiday] (first (:content (second (:content (second (:content holiday)))))))
(for [elt (xml-seq xml-doc) :when (= :holiday (:tag elt))] [(holiday-name elt) (holiday-month elt) (holiday-day elt)]) =>
(["International Lefthanders Day" "August" "13"]
["Rover's birthday" "October" "12"]
["Groundhog Day" "February" "2"]
["Kamehameha Day" "June" "11"])
Used this to help parse xml data for one of my courses.
Could have done it with Python, but wanting to find more excuses to use Clojure.
Thanks for writing this!
Post preview:
Close preview