xscDev

Something Something Development

base64-clj: Fast Base64 Encoding/Decoding for Clojure

with 2 comments

TL;DR: I wrote a rather fast Base64 encoding/decoding library for Clojure.

Update: Alexander Taggart mentioned his library data.codec in the comments. It is about 3-5 times faster than the Apache Commons implementation and I should thus have used it as my reference point. By using (set! *unchecked-math* true), which I saw there, I could at least lift my own implementation above the Apache one, performance-wise. I can say with confidence that data.codec is, however, really as fast as it’s gonna get.

So, starting with this post I tried to implement Base64 encoding and decoding in Clojure. I used seqs to represent the input data, had a vector as encoding alphabet and a map for decoding. Very clojure-y, as you can see, but slow.

Taking the Apache Commons Base64 Codec as a reference point, my intuitive and short solution took about 100.000 times as long as the Apache one (~300 milliseconds vs. ~3 microseconds for the example quote given on the Wiki page) on a 4-core i5 machine (4GB RAM) I had access to at that time. Surprisingly, using my laptop – an i7 with 8 cores and 8GB RAM, it sped up to around 270 microseconds –  only 0.9% of the original time. I really can’t grasp which kind of optimization the JVM is performing on the faster machine to achieve that stunning result; maybe some kind of built-in parallellization on the lazy input seqs? Or maybe it’s not as stunning as I think it is, maybe the computing power just is that much higher? Whatever the answer, I’d like to hear some opinions on that. (The code I was benchmarking can be found here.)

Anyway, I started from scratch, trying to squeeze out as much performance as possible:

  • use a String to store the encode alphabet and employ charAt for lookup;
  • create the decode alphabet at compile time using a macro;
  • mark encode and decode alphabet as ^:const;
  • put nearly everything in macros to avoid function calls;
  • use unchecked integer arithmetic and a lot of manual casts (int, byte, …) to avoid reflection calls;
  • compare integers with == instead of =;
  • avoid seqs and lazyness – I do not map over the input data but access it using indices;
  • use a ByteBuffer for result generation.

So, what’s the result of my efforts? Well, you can have a look at base64-clj on Github and judge for yourself.

First of all, ugly. I like Clojure so much, it really hurts to see it be all Java-like and verbose. A simple “+” is now “unchecked-add-int” and there is a cast to “int” on basically every turn of the code.

Secondly, it’s fast. I’m running benchmarks with Criterium and my implementation is only around 1.5 to 2.5 times slower than Apache’s pure-Java one. Have a look at the results.

System

amd64 Linux 3.2.0-41-generic 8 cpu(s)
Java HotSpot(TM) 64-Bit Server VM 23.21-b01
Runtime arguments: -XX:+TieredCompilation -Xmx1g -Dclojure.compile.path=/git/
base64-clj/target/classes -Dbase64-clj.version=0.1.1-SNAPSHOT -Dfile.encoding=UTF-8 
-Dclojure.debug=false

Apache Commons Base64 Codec

Encoding: ~3.64 microseconds
Decoding: ~3.61 microseconds

Benchmarking `apache-base64-encode` ...
Evaluation count : 16451460 in 60 samples of 274191 calls.
      Execution time sample mean : 3.639461 µs
             Execution time mean : 3.639549 µs
Execution time sample std-deviation : 47.434118 ns
    Execution time std-deviation : 47.654358 ns
   Execution time lower quantile : 3.564183 µs ( 2.5%)
   Execution time upper quantile : 3.736487 µs (97.5%)
                   Overhead used : 1.040998 ns

Benchmarking `apache-base64-decode` ...
Evaluation count : 17019600 in 60 samples of 283660 calls.
      Execution time sample mean : 3.607791 µs
             Execution time mean : 3.607120 µs
Execution time sample std-deviation : 67.940161 ns
    Execution time std-deviation : 68.545808 ns
   Execution time lower quantile : 3.489565 µs ( 2.5%)
   Execution time upper quantile : 3.749907 µs (97.5%)
                   Overhead used : 1.040998 ns

base64-clj

Encoding: ~7.88 microseconds (x2.16)
Decoding: ~6.90 microseconds (x1.91)

Benchmarking `encode` ...
Evaluation count : 7829460 in 60 samples of 130491 calls.
      Execution time sample mean : 7.881315 µs
             Execution time mean : 7.880720 µs
Execution time sample std-deviation : 141.536183 ns
    Execution time std-deviation : 143.438947 ns
   Execution time lower quantile : 7.672325 µs ( 2.5%)
   Execution time upper quantile : 8.204788 µs (97.5%)
                   Overhead used : 1.036409 ns

Benchmarking `decode` ...
Evaluation count : 8894580 in 60 samples of 148243 calls.
      Execution time sample mean : 6.901715 µs
             Execution time mean : 6.901556 µs
Execution time sample std-deviation : 99.689121 ns
    Execution time std-deviation : 100.470285 ns
   Execution time lower quantile : 6.726199 µs ( 2.5%)
   Execution time upper quantile : 7.088927 µs (97.5%)
                   Overhead used : 1.036409 ns

Well, that’s as fast as I’m gonna get, I think. Thank you for reading.

Written by Yannick

May 7th, 2013 at 1:23 pm

  • Alexander Taggart

    Can you please compare it with:

    https://github.com/clojure/data.codec

    • http://dev.xscheme.de/ Yannick

      Woah. I really don’t know how I could miss the first result on Google for “clojure base64″, but I have to say: Good job! I used “(set! *unchecked-math* true)” in my code after seeing it in yours and now its even faster than the Apache Codec.

      But still nowhere close to your implementation. I modified the benchmark to omit string/byte-array conversions and got the following numbers (enc/dec):

      – Apache Commons Codec: 2.65µs/2.27µs
      – base64-clj: 2.32µs/1.76µs
      – clojure.data.codec.base64: 895ns/747ns

      I will update the article accordingly. Cheers!