Update (07.08.2013): I wrote some additional removal operations for fast-zip which can be found in this Gist.
I’ve recently had a fair share of zipper usage in Clojure whilst working on rewrite-clj (which utilizes zippers to traverse Clojure code/EDN trees). One thing that really causes tons and tons of problems for me – or at least unpleasant surprises – is the design of remove (doesn’t matter whether in Clojure’s standard zippers or fast-zip‘s) whose documentation reads as follows:
Removes the node at loc, returning the loc that would have preceded it in a depth-first walk.
Okay, so when I remove a node I’ll end up somewhere before it. But where exactly? Well, it might be directly left (on the same level) of the removed node in some cases but it might also happen that removal takes you deep (actually as deep as possible) into a left-adjacent branch. Let’s have some XML (which can be traversed using zippers) to illustrate the two cases:
<list> <first>1</first> <second>2</second> </list>
Let’s say the resulting zipper is stored in xml-loc and tag returns the XML tag name as a keyword. We would observe the following:
(-> xml-loc ;; at 'list' z/down ;; at 'first' z/right ;; at 'second' z/remove ;; where are we? tag) ;; => :first
That matches a sane person’s expectations. Even the following does not surprise too much:
(-> xml-loc ;; at 'list' z/down ;; at 'first' z/remove ;; where are we? tag) ;; => :list
We removed the leftmost element (which we can know beforehand), so going up one level in the tree is reasonable since we can predict that perfectly. Let’s alter the XML document a little bit:
<list> <first> <a>0</a> <b> <c>0.5</c> <d>1</d> </b> </first> <second>2</second> </list>
The same traversal as before will now let us end up deep inside the document.
(-> xml-doc ;; at 'list' z/down ;; at 'first' z/right ;; at 'second' z/remove ;; where are we? tag) ;; => :d
How about that? We are now two levels deeper than before the removal which – obviously, now – is not an operation to easily reason about in complex zipper manipulations. But why is it designed that way?
Well – glad you asked – it makes depth-first tree walking quite comfortable since calling next on the nodes produced by any of edit, replace or remove will always result in a not-yet-visited location. Take the following generic iterative pre-order walk, for example:
(def walk-zip [zloc f] (->> zloc (iterate (comp z/next f)) (drop-while (complement z/end?)) (first)))
The only requirement is that f doesn’t move within the zipper. For example, to remove all “p” tags from an XML document, it’s enough to write:
(-> xml-doc (walk-zip (fn [loc] (if (= (tag loc) :p) (z/remove loc) loc))) ...)
But there are cases where remove‘s behaviour is less than optimal. Say, you want to have a generic function that removes the node to the right of the current location and then moves back to the original position. Should be simple, right?
(defn remove-right [zloc] (if-let [rloc (z/right zloc)] (-> rloc ;; at the right sibling z/remove ;; right sibling removed, now we are... err... somewhere z/next ;; this should bring us back to the original level, right? z/left) ;; go back to the original node zloc))
This is already better than trusting
(-> rloc z/remove) to automatically go back to the original node, but it will fail if the removed node was the rightmost one of its branch. For example, in our XML document above (the one with the nested “first” element), the following will happen:
(-> xml-doc ;; at 'list' z/down ;; at 'first' remove-right ;; where are we? tag) ;; => :c
The problem is that once “second” is removed, the “d” element is the last one a depth-first traversal would reach. next will thus not modify the zipper location and the call to left will move it to “c”. Fantastic.
We have to handle the removal of the rightmost node by actually counting how deep inside the tree we currently are. Then, after the removal we have to move upwards to reach the so-found level.
(defn depth [zloc] (->> zloc (iterate z/up) ;; go up ... (take-while z/up) ;; ... until you are at the top count dec)) (defn remove-right [zloc] (if-let [rloc (z/right zloc)] (let [nloc (z/remove rloc) delta (- (depth nloc) (depth zloc))] (reduce (fn [loc _] (z/up loc)) nloc (range delta))) zloc))
Removing nodes is an excellent way of getting lost inside a tree, unless you’re doing it as part of a generic walk over the tree nodes. In my opinion the zipper implementations should offer one or more of the following (and I’d be glad to participate in implementing those features if they are desired and accepted):
- a function that remembers the initial position of the zipper, then applies the desired modifications/movements, before finally reverting to the initial location,
- a variant of remove that doesn’t go back to prev but either left or up (see this unanswered post),
- predefined remove-right and remove-left functions.
It might reduce headaches.