xscDev

Something Something Development

Removing Nodes in Zippers or How To Get Lost

with 5 comments

Update (07.08.2013): I wrote some additional removal operations for fast-zip which can be found in this Gist.

I’ve recently had a fair share of zipper usage in Clojure whilst working on rewrite-clj (which utilizes zippers to traverse Clojure code/EDN trees). One thing that really causes tons and tons of problems for me – or at least unpleasant surprises – is the design of remove (doesn’t matter whether in Clojure’s standard zippers or fast-zip‘s) whose documentation reads as follows:

Removes the node at loc, returning the loc that would have preceded
it in a depth-first walk.

Okay, so when I remove a node I’ll end up somewhere before it. But where exactly? Well, it might be directly left (on the same level) of the removed node in some cases but it might also happen that removal takes you deep (actually as deep as possible) into a left-adjacent branch. Let’s have some XML (which can be traversed using zippers) to illustrate the two cases:

<list>
  <first>1</first>
  <second>2</second>
</list>

Let’s say the resulting zipper is stored in xml-loc and tag returns the XML tag name as a keyword. We would observe the following:

(-> xml-loc  ;; at 'list'
  z/down     ;; at 'first'
  z/right    ;; at 'second'
  z/remove   ;; where are we?
  tag)       ;;   => :first

That matches a sane person’s expectations. Even the following does not surprise too much:

(-> xml-loc  ;; at 'list'
  z/down     ;; at 'first'
  z/remove   ;; where are we?
  tag)       ;;   => :list

We removed the leftmost element (which we can know beforehand), so going up one level in the tree is reasonable since we can predict that perfectly. Let’s alter the XML document a little bit:

<list>
  <first>
    <a>0</a>
    <b>
      <c>0.5</c>
      <d>1</d>
    </b>
  </first>
  <second>2</second>
</list>

The same traversal as before will now let us end up deep inside the document.

(-> xml-doc    ;; at 'list'
  z/down       ;; at 'first'
  z/right      ;; at 'second'
  z/remove     ;; where are we?
  tag)         ;;   => :d

How about that? We are now two levels deeper than before the removal which – obviously, now – is not an operation to easily reason about in complex zipper manipulations. But why is it designed that way?

Well – glad you asked – it makes depth-first tree walking quite comfortable since calling next on the nodes produced by any of edit, replace or remove will always result in a not-yet-visited location. Take the following generic iterative pre-order walk, for example:

(def walk-zip
  [zloc f]
  (->> zloc
    (iterate (comp z/next f))
    (drop-while (complement z/end?))
    (first)))

The only requirement is that f doesn’t move within the zipper. For example, to remove all “p” tags from an XML document, it’s enough to write:

(-> xml-doc
  (walk-zip
    (fn [loc]
      (if (= (tag loc) :p)
        (z/remove loc)
        loc)))
  ...)

But there are cases where remove‘s behaviour is less than optimal. Say, you want to have a generic function that removes the node to the right of the current location and then moves back to the original position. Should be simple, right?

(defn remove-right
  [zloc]
  (if-let [rloc (z/right zloc)]
    (-> rloc   ;; at the right sibling
      z/remove ;; right sibling removed, now we are... err... somewhere
      z/next   ;; this should bring us back to the original level, right?
      z/left)  ;; go back to the original node
    zloc))

This is already better than trusting (-> rloc z/remove) to automatically go back to the original node, but it will fail if the removed node was the rightmost one of its branch. For example, in our XML document above (the one with the nested “first” element), the following will happen:

(-> xml-doc    ;; at 'list'
  z/down       ;; at 'first'
  remove-right ;; where are we?
  tag)         ;;   => :c

The problem is that once “second” is removed, the “d” element is the last one a depth-first traversal would reach. next will thus not modify the zipper location and the call to left will move it to “c”. Fantastic.

We have to handle the removal of the rightmost node by actually counting how deep inside the tree we currently are. Then, after the removal we have to move upwards to reach the so-found level.

(defn depth
  [zloc]
  (->> zloc
    (iterate z/up)    ;; go up ...
    (take-while z/up) ;; ... until you are at the top
    count
    dec))

(defn remove-right
  [zloc]
  (if-let [rloc (z/right zloc)]
    (let [nloc (z/remove rloc)
          delta (- (depth nloc) (depth zloc))]
      (reduce
        (fn [loc _]
          (z/up loc))
        nloc (range delta)))
    zloc))

Removing nodes is an excellent way of getting lost inside a tree, unless you’re doing it as part of a generic walk over the tree nodes. In my opinion the zipper implementations should offer one or more of the following (and I’d be glad to participate in implementing those features if they are desired and accepted):

  • a function that remembers the initial position of the zipper, then applies the desired modifications/movements, before finally reverting to the initial location,
  • a variant of remove that doesn’t go back to prev but either left or up (see this unanswered post),
  • predefined remove-right and remove-left functions.

It might reduce headaches.

Written by Yannick

August 6th, 2013 at 3:47 pm

Posted in Clojure

Tagged with , ,

  • Malcolm Sparks

    You’re absolutely right. Removing nodes with zippers is not easy, there should be better support for it.

  • darth10

    Interesting read.
    Maybe the simplest solution is to bubble all the way up to the root when you delete? I know it sounds inane, but this technique has zero non-determinism.

    • http://dev.xscheme.de/ Yannick

      It’s not like the current “remove” is nondeterministic but I know what you mean. The go-back-to-root solution would be easier to reason about, I’d just hate to go down a tree (maybe looking for a specific pattern), perform the removal and then execute the same traversal again to continue manipulating that specific subtree. And there are probably cases where the removal makes it impossible to recognize said subtree again.

      Of course, you’re not talking about replacing the current behaviour of “remove” but offering an alternative; and it certainly has its applications but one might not (and probably shouldn’t) feel content with a solution including that amount of unnecessary movement.

      • darth10

        >And there are probably cases where the removal makes it impossible to recognize said subtree again.

        This really makes sense. Maybe there are more different deletion strategies too; and maybe deletion behavior should be a parameter.

        remove-left and remove-right are also really elegant when you think about it. So instead of navigating to a node to delete, navigate to the parent before calling delete via remove-left/remove-right. No more guessing the current node; as you’re still on the same node! But then how would you empty a tree with at least one node?

        Well…. I must be rambling again.

        • http://dev.xscheme.de/ Yannick

          Ramble away, that’s what these comment sections are for. :)

          I really like remove-right and remove-left, too, since they might not be too hard to implement efficiently when you look at the data structures used for zippers (they manage right and left siblings).

          > But then how would you empty a tree with at least one node?

          Well, the original “remove” is still available (it really _is_ useful in walking a tree), so we can check if the current node is the only child of its parent (e.g. does it have neither left nor right siblings?) then use that, I guess.

          Or an additional removal function that moves left/right if siblings are available and up otherwise (as mentioned in the post). Maybe let it only remove-then-move if there are siblings available and return nil otherwise, prompting the user to actually handle the case of a single child without having to actually check for it manually? Would make for some concise if-else statements, I think.

          Rambling is fun.