Friday, April 20, 2012

IntervalMap 0.2.3.2 released

I added some benchmarks using criterion. There is a benchmark section in the cabal spec, so you can run the benchmarks using cabal bench (more info in Johan Tibell's blog entry). Here is a sample report from a run on my machine.

Next steps: implement a hedge-union algorithm, and look at delete performance.

IntervalMap home page, hackage

Monday, February 13, 2012

CoffeScript performance hints


This applies only if you have tight loops in your code - if you are manipulating the DOM or making network requests, ignore this.

1. Specify a direction for counting loops

In a range such as [m..n] CoffeeScript can't know if n is larger or smaller then m, so it generates code like this:

n = 10
for i in [1..n]
  foo i

translates to

for (i = 1; 1 <= n ? i <= n : i >= n; 1 <= n ? i++ : i--) {
  foo(i);
}

Add a "by 1" after the range, and you get

for (i = 1; i <= n; i += 1) {
  foo(i);
}

2. No return value might be bad

CoffeeScript will return a list of expression values from a for loop, even if you do not care about the return value.

foo = (n) ->
  for x in [1..n] by 1
    bar x

translates to

foo = function(n) {
  var x, _results;
  _results = [];
  for (x = 1; x <= n; x += 1) {
    _results.push(bar(x));
  }
  return _results;
};

CoffeeScript allocates a list to hold the result from each iteration! If your code does not otherwise allocate memory, this could easily get you a factor of 10 slowdown.

If you add an explicit return value, all is well:

foo = (n) ->
  for x in [1..n] by 1
    bar x
  false

translates to

foo = function(n) {
  var x;
  for (x = 1; x <= n; x += 1) {
    bar(x);
  }
  return false;
};

Monday, January 2, 2012

Haskell Interval Maps

I just released my first package on Hackage: a library similar to Data.Map, but taking intervals as keys and allowing efficient search for all keys containing a point or contained in an interval, for example.

I needed a data structure - in Java - to search in a set of overlapping appointments, that is, intervals of timestamps. I found nothing suitable on the web, and so I started to implement it on my own. Wikipedia yielded augmented search trees as the basic idea. I often use Haskell for prototyping stuff I'll eventually have to implement in Java, C, or whatever.

After that I had a leftover Haskell implementation, and thought that completing it might be a good project to get a bit more into Haskell again. I've been following Haskell for a long time (since before there were type classes, actually), but never used it for larger projects, and I've not kept up with recent developments.

I used How to Write a Haskell Program as a guideline.

The first thing to do was to implement a reasonably full-featured interface. Data.Map was the obvious guideline to use, and I was dismayed by the number of functions it offers. It really seems a bit too rich. It's certainly nice to have all these functions, but on the other hand the pure mass of them also makes it harder to use the library (now, which function should I use...) and to read code using it. In the end I did implement most of them, but it was really a tough decision not to go for a leaner interface.

Next step was going for better test coverage. QuickCheck is great, so that did not take too long (but my test code looks ugly and could use some refactoring). What did take much longer was fixing the bugs this uncovered, especially in the delete... functions.

Next was the standard Haskell build system: cabal. This also took some time. Despite the good user manual, I still had trouble with many aspects. For starts: which version of base do I need? To my surprise, I did not find reasonable documentation about this anywhere. Integrating the tests was also a bit tricky.

I also wrote some performance tests using criterion, but found that the results are very unreliable on my system. But at least I was able to see that the performance was reasonably close to Data.Map for the basic functions, and that fromDistinctAscList seemed indeed to be linear.

Here's the projects home page: http://www.chr-breitkopf.de/comp/IntervalMap/
Hackage: http://hackage.haskell.org/package/IntervalMap-0.2.0

Wednesday, September 28, 2011

Bitten by Implicits

I just wasted half a day tracking down a NullPointerException in a mixed Scala/Java project.

The original problem was a NPE iterating through a list returned by a java method:

    import scala.collection.JavaConversions._

    for (x <- javaMethod())
       ...

I found the reason quickly enough: the java method was still null-infected - it returned null instead of the empty list when no results were found.
I was in a bit of a hurry, so instead of refactoring the Java method, I went though a helper:

for (x <- fixNull(javaMethod()))
       ...

    def fixNull(xs: Iterable[String]): Iterable[String] = {
        if (xs == null) Nil else xs
    }

To my utter astonishment, this did not fix the NPE. The reason is probably obvious to more seasoned scala programmers, but took me quite some time to see. Actually, after some futile debugging experiments, I just gave up and went home. Quite typically, the critical insight came 5 minutes after leaving the office. Work is just not conductive to creative thinking :-)

Actually, this is a type problem. javaMethod returns a java.util.List[String], but fixNulls wants a Scala Iterable. So the Scala compiler inserts an implict conversion from JavaConversions. Unfortunately, the conversion wraps a returned null into an object, so xs == null always returns false!

Changing the argument type fixes the problem:

    def fixNull(xs: java.util.List[String]): Iterable[String] = ...

I think that this would have been easier to find if the JavaConversions wrappers would throw a NPE when asked to wrap a null. Then the problem would have been obvious from the stack trace.

For me, that is another example why having "invisible" things in your code is problematic. Scala implicits can be extremely helpful, but you must be aware of them when debugging.

Of course, the best solution is to refactor the Java code to return an empty list instead of null.

Monday, August 8, 2011

A short-lived misfortune

Last night, there was break-in into my employers offices, and some notebooks were stolen, among them my Lemote Yeeloong.

When the police arrived, they mentioned that some notebooks had been found in a garbage can at a nearby kebab place. They even had a list of models and serial numbers.

Given that the Lemote is more of a curiosity here in Germany, it was fairly obvious that it was mine even without checking the serial number. I was directed to the appropriate police office, just a few hundred meters away.
After a short wait, and signing a receipt, I had my notebook back. The battery was empty, but otherwise, not a scratch (and, thankfully, no kebab).

All in all, it took just about 3 hours from the discovery of the theft, to having the notebook back on my desk. The was some luck involved, but still, I was quite impressed by the effectiveness of the police.

Sunday, June 19, 2011

ICFP Programming Contest 2011

Whew! Finally over. I liked the task, because of much prior experience with lambda calculus and SK combinators (but not MTG). That didn't help much though - I did not find a way to compose functions until a few hours ago. By then I was too tired to try and implement it. More detailed write-up soon.


Ok, here it goes:

The contest started at 2:00 in the night, so I decided get a decent sleep and looked at the task only some 6 hours after the start. I liked the task at first sight: lambda calculus and combinators! Cool card images, too.

To be able react to the game state during play, it was necessary to implement the game mechanics, so I wrote a straightforward implementation in Haskell. For the logic, I first wrote code to generate a number in a given slot, using the zero, succ, and dbl cards. For a start into strategy, I used that to decrement the opponents slot with the lowest vitality. Obviously not a winning strategy, but I wasn't ready to tackle help and attack yet.

Before the submission to the unofficial tournament server, I dutifully tested the executable on my Debian Squeeze VM. It did not run, because of shared library problems. So I tried static linking, but to my surprise discovered that ghc does not support that on Linux :-( So I had to compile on the Debian VM. A bit of a hassle, but not that bad.

I submitted my server, and soon got some "submission failed" problems. Not too many - about one failure every few games. Some of those were due to timeouts, so I did some code tuning. I also noted a problem with the zero card not being converted to a number in some places (0-ary functions are bad), and fixed those, too.

After that, I was quite optimistic that my submission would work. But no such luck - I still got "invalid format" errors. Lacking a more informative error message, at this point I could only review all pattern matches that ended with a _ -> error ("can't happen") clause, but I did not find the cause of the problem.

Now, the best resource would probably have been to try harder to reproduce the bug, but what I did instead (out of desperation, really) was to re-implement my player in another language: C. This did not take long, and I actually discovered the mistakes in the Haskell version while coding. As it turns out, being mentally in "pure mode", I had not faithfully implemented the order of side-effects resulting from the call-by-value evaluation. Also, there was a problem in my implementation of the help and attack cards, where I would produce an error if the j parameter was not a valid slot number too early, without first decrementing the vitality of slot i. While I thought that the actual game logic would be easier to implement in Haskell, the core of the C version looked much cleaner at that point, so I continued developing that version. It worked without problems on the first attempt. Another nice thing was that I could submit statically linked executables directly from my 32-bit desktop system instead of having to use the 64-bit VM on my notebook. Nicer for my eyes and back, that is. Even though I allocated memory for some applications, I did not release it, figuring that 500M go a long way (with a value struct being 12 bytes in size in 32-bit mode). I could always use the Boehm GC if necessary.

At that point is was Saturday evening, and I had made no progress toward a good game strategy. So I finally sat down with pen and paper, and tried to drag up my old SKI and lambda-calculus knowledge. That used to be quite good - I still have a toy functional language implemented by combinator reduction lying around, and once knew S, K, I and friends (B, C, B', C' and so on) by heart. But, as more than one contest participant has noticed, doing the implementation first and the creative "brain-work" later in the contest is often the wrong order - by the time you've finished the necessary groundwork and maybe a few tools, you're likely out of sleep and not in the best frame of mind for creative thinking.

I was stumped for a long time trying to find a general way to compose functions. That I solved a few hours before the contest's end (using slot 0 and K _, S _, _ get, _ zero).

As for strategy, it seemed obvious, that one should use help to build up vitality and use that to attack the opponent. I did not manage to implement such a scheme in the remaining time.

My submission.


After the contest

Given that I had prior experience with combinators, I was quite frustrated at my bad performance. In retrospect, what has held me up most was, I think, that the contest theme got me in a pure functional mindset, and I mostly blanked out the side-effects arising from call-by-value semantics. Combinators just scream "lazy" and "pure" to me, which caused my mind to ignore the glaringly obvious possibility of executing several vitality-affecting operations in one action (simple example: _ inc, S _, _ inc, _ zero). That, together with get to make copies for later use, would have got me in the right direction for a better result. That still would not have been a top result, as I doubt that I would have figured out how to use zombie effectively, but it would have been into the "ok" instead of "also running" category. The same problem, thinking pure, was also responsible for the bugs in my earlier Haskell implementation. So getting that right from the start would have saved me at least 12 hours of debugging and the C implementation, too.

I also completely ignored the approach NOT to do a simulation. Thus, I missed implementing some simple players that would later have helped me to debug my non-working implementation. In particular, the echo player, the random Player are absurdly easy to implement and might have helped in debugging. Also, writing and evaluating a few functions by hand might have gotten me around my mental "only one side-effect per action" block.

But as usual, I had much fun and greatly enjoyed the contest. Many thank to the organizers for the interesting task! I'm looking forward to see all the cool strategies the top contestants are sure to have devised. In particular, I'm curious what really can be done with zombie.

Tuesday, May 24, 2011

Lemote Yeelong Benchmarks

A few weeks ago, I got my Lemote Yeeloong netbook. After a veritable odyssey of OS upgrades / rescues and reinstallations, caused by stupidity (mine), unfamiliarity with Debian, and unstable to non-existent connectivity to the Chinese download sites, I was finally able to run some benchmarks.

According to the Loongson 2F website, the CPU should "Outperform 1GHz Pentium III".

I do not have access to a 1GHz PIII, so I used an Asus EeePc 701 for comparison. It has an Intel Celeron-M ULV 353 underclocked to 630 MHz. Taking these SuperPI benchmark results, which are probably best case for frequency scaling, would place the CPUs about equal (Pentium III 1 GHz at 130 sec., and Celeron-M 630 MHz at 126 sec.)

System comparison:

System CPU MHz L2 Cache Memory OS
Lemote Yeeloong 8089 Loongson 2F 800 512 KB 1 GB Fedora 13
EeePc 701 Celeron-M ULV 353 630 512 KB 2 GB Scientific Linux 6

I used Fedora instead of Debian on the Lemote because it uses the MIPS n32 ABI, for potentially much better performance.

Stream TRIAD bandwidth

Stream results are rather bad:

System TRIAD MB/sec
Yeeloong 601
EeePc 1220

bzip2 compression

As an integer benchmark, I used bzip2 to decompress and compress the gcc 4.6.0 source tarball (71,579,535 bytes). Time is user time in seconds.

System Decompress Compress bzip2 version
Yeeloong 145.69 767.74 from OS
Yeeloong 140.90 521.15 bzip 1.0.6, compiled with gcc 4.6.0 with default optimization
EeePc 114.09 507.09 from OS

Radiance rendering

Using Mark Stock's benchmark test. Unfortunately, on the Yeeloong, -O3 -ffast-math delivered invalid results, so I had to compile with the flags listed below. I'll try to rerun this with a newer version of gcc, as this gives a big improvement on some platforms (AMD64, for example, but not on the Celeron-M).

System User time Compiler Flags
Yeeloong 17915 gcc 4.4.4 -march=native -O3 -fno-math-errno -funsafe-math-optimizations
EeePc 15884 gcc 4.7.0 20110501 -march=native -Ofast

Hint

The systems are almost identical at about 67 MQuips, but, as indicated by the Stream result, the Loongson is hampered by bad main memory and, apparently, L2 cache, performance.


Updates:
2011-05-25 updated bzip2 results