Yacpb: 2009

Wednesday, October 28, 2009

A good article on the problems in the Finnish startup environment

Finland is Missing the Bowling Alley or How to Find the First Pin

The linked article discusses well why it is hard for Finnish software startups to become major worldwide players.

Tuesday, October 27, 2009

A correct implementation for the 'halts' function in the Buddhist context

Yields a correct answer for all programs P.

halts P = 無

Monday, October 26, 2009

Flu^3

We all have noticed the mass media hysteria around 'swine flu'. Give it a cool name and it feels more dangerous. Make it dangerous and sell more papers. Were there a high profile war going on, the press would have other things in mind. Now it is the flu, however.

What a coincidence that I just got my third flu virus infection this month only. Normally I get flu once/twice a year. Haven't caught the SWINE (it intimidates more in caps, doesn't it) version yet, though - so they say. I'm going to write to Guinness if I get my fourth infection.

All this even though I use Linux.

Sunday, October 25, 2009

Feynman on the feeling of inadequacy

The linked quote is great, well worth linking!

You have no responsibility to live up to it!

Saturday, October 24, 2009

Book: Founders at Work

I'm only halfway reading Jessica Livingston's Founders at Work - stories of startups' early days, but I can already say that this will be one of the top books I have read this year.

Did you know that Yahoo, then already one of the biggest sites in the world, was running on diesel fuel for several days during a power outage?

Friday, October 23, 2009

New startup and a word about types

It's been a while. I joined a new tech startup in Helsinki and have had no time to write to my blog. But I'm always returning! See?

My summer adventures at TKK went smoothly and I'm happy that I had the opportunity to work in a very interesting project and enhance my C/C++ skills there. Now I'm writing mostly Python and it feels great after months of C, almost like if somebody just imported antigravity. I have to admit, I have long been a strong proponent of explicit typing as it is a form of self-documentation, but when you just need to build something quickly, as in startups, explicit types are simply slowing you down. I need to add that I'd still love static typing, something Python does not offer.

Days are already dark in Helsinki and it's getting cold. Now we have to face six months of darkness, which is always so... intriguing. I have always wondered though how quickly it passes. Now that I have Python with me, the spring will surely return in no time.

Have you ever wondered why your relatives are always asking for your help with computers? It's not that they are stupid, but the computers are still way too difficult to use. That's going to change in the next decade(s). I bet the whole platonic idea of the computer is going to change. It'll be more like that everything is a computer, and we will start calling these devices something else that more accurately describes their function.

Wednesday, September 9, 2009

Adding a license text to a set of files using sed

The problem: I needed to add (prepend) a license text to a set of source files.

I thought sed might be a good tool for this task. Having not done anything but trivial sed scripts in the past, I took a look in the web for advice.

I was faced with the following script (I already lost the original URL, I'm sorry):

1{h; r [file] D;} 2{x; G; }

It needs some explaining. Sed works by maintaining two data buffers: the pattern space and the hold space. A sed execution cycle is performed for each input line. First the input line is placed in the pattern space. Then the commands are (conditionally) executed. Finally, the contents of the pattern space are printed to stdout (if not explicitly prevented). The hold space maintains its contents between two cycles. The pattern space on the other hand is cleared between two cycles.

The GNU sed manual explains the used commands as follows.

h - "Replace the contents of the hold space with the contents of the pattern space."

r [file] - "Queue the contents of filename to be read and inserted into the output stream at the end of the current cycle, or when the next input line is read. Note that if filename cannot be read, it is treated as if it were an empty file, without any error indication."

D - "Delete text in the pattern space up to the first newline. If any text is left, restart cycle with the resultant pattern space (without reading a new line of input), otherwise start a normal new cycle."

x - "Exchange the contents of the hold and pattern spaces."

G - "Append a newline to the contents of the pattern space, and then append the contents of the hold space to that of the pattern space. "

So the program is performing the job as follows. The first set of commands is executed only for the first line of the input. The first line is copied to the hold buffer. After that, the file [file] is queued to be printed to stderr in the end of the current cycle. Finally, the contents of the pattern space is deleted (so that the first line won't be printed just yet).

Now the file [file] (containing the license text) has been printed. Nothing else has been printed. Also, the first line has been saved to the hold space. Looks good. The second set of commands do the rest. These commands are executed for the second line of the input only. First the contents of the hold space and the pattern space are switched. Now the hold space contains the second line of the input and the pattern space contains the first line of the input. Now a newline and the contents of the hold space are appended to the pattern space. The pattern space now contains the first two lines of the input in correct order. Next, the cycle ends and the contents of the pattern space are printed.

No commands are executed for the remaining lines of the input. They are just printed by default. We have printed the license text, the first line of the input, the second line of the input and the rest of the input in correct order - effectively prepending the license text on the input files.

Now considering the license text is located at /tmp/license, the following code is exactly what is needed to do the job for each *.c and *.h files in the current directory.

sed -i '1{h; r /tmp/license D;} 2{x; G; }' *.[ch]

Monday, August 24, 2009

Google's 3D view over Vallila/Helsinki

And so Google Earth got 3D view of Helsinki [digitoday.fi]. Here's a picture depicting Vallila, that is, my current home.

Friday, August 14, 2009

LLVM assembly and integer signedness

LLVM assembly does not encode integer signedness which can be a nuisance when reading the assembly code. On the other hand the policy simplifies the representation, as the sign does not make a difference in most of the operations (such as integer addition). In some operations (such as boolean comparisons on integers <, <=, >, >=) signedness is significant, however. For completeness, let's remember that the (dis)equality operator does not care about the sign.

Let's have a look on the following C program block:

if (x < UINT_MAX-1)
    puts("foo");
else
    puts("bar")

Piping clangs's output to llvm-dis gives us the following LLVM assembly representation:

%1 = load i32* %x, align 4
%2 = icmp ult i32 %1, -2
br i1 %2, label %bb, label %bb1

bb:
%3 = call i32 @puts(i8* getelementptr ([4 x i8]* @.str, i32 0, i32 0)) nounwind
br label %bb2

bb1:
%4 = call i32 @puts(i8* getelementptr ([4 x i8]* @.str1, i32 0, i32 0)) nounwind
br label %bb2

bb2:

Now, in the assembly code, it can be read that the %1 register is compared with -2, and the operands are to be interpreted as unsigned 32-bit integers. On the bit level, that is of course correct. However, a human needs to make a conversion in her head to understand on what concrete values of %1 the first branch is taken.

In the two's complement, the first bit represents the sign. For negative numbers, the sign bit is set, for positive, it is unset. The number zero is represented as a value where all bits are unset. The smallest negative value of an n-bit integer is represented as one followed by n-1 zeros. Semantically it can be thought as if the rightmost n-1 bits represent a positive value that is summed with the smallest negative value available (-2^(n-1)), and the result is something between -2^(n-1)..-1.

Assuming 4-bit integers, -2 is represented as 1110 (-2^3 + 6). When 1110 is interpreted as an unsigned integer, that is 14 (2^3 + 2^2 + 2^1). However, in our example, we have 32-bit integers (the i32 type in LLVM), where the min value for int is -2147483648. In the 32-bit world, -2 is represented by a number where there are 31 ones followed by a zero. Interpreted as unsigned, that is 2^31 + 2^30 ... 2^2 + 2^1 = 4294967294.

So we came to the conclusion, for values of %1 < 4294967294, the first branch is taken. Otherwise the second branch is taken. That was not obvious directly from the assembly code. We had to consider the integer width to deduce the concrete unsigned value.

It should be noted that one can decide whether to write integer literals as the llvm tools do or not. Since the bit representation is identical for signed i32 -2 and unsigned i32 4294967294, it does not matter which way it is written. I would still stick to the signed interpretation for consistency's sake. Otherwise it becomes a hopeless mess.

Tuesday, August 4, 2009

Sex differences

I wanted to see what kind of questions we are asking about the opposite sex. Google. Let us assume that questions asked about men are usually asked by women and vice versa.

Top suggestions for 'how do women...':
1. get pregnant
2. think
3. get yeast infections
4. flirt
5. come

Top suggestions for 'how do men...':
1. think
2. fall in love
3. flirt
4. get hpv(*)
5. get yeast infections

(*) human papillomavirus

Comparing the first matches makes me sad.

Friday, July 24, 2009

Emil finds its first defect!

I earlier described the nature of my work at TKK tcslab. My job is to write a tool for automatically testing C programs using a technique called dynamic symbolic execution basing the implementation on LLVM. I call my tool Emil (that is ucfirst(reverse(lower(Lime))) after a similar tool called Lime for testing Java programs, developed here as well).

Emil is currently at 2.9 KLOC, and for the first time, I wrote a test case that should fail, and it failed on the first attempt! This is somewhat a milestone so I now celebrate it by showing the test case and the failing inputs found by Emil. This is of course all trivial and useless, but I believe Emil is going to the right direction.

==divzero.c==


int input(void);
int main() {
    int i = input();
    int j = input();
    if (i > j) {
        i = i / (0*j); // ouch
    } else {
        i = j;
    }
    return i;
}

==Values for i, j by Emil that will lead to division by zero==
{i: 892648572, j: -1285173836}

I hope this is just the beginning for Emil. The future of Emil will be out of my hands, though, because I will (in my current knowledge) leave Emil in the hands of others before next year and be searching for new challenges in the software industry.

Sunday, July 19, 2009

Back to work

Ah, my week-long summer vacation is now over. I mostly spent the time in a friend's summer cottage in Sysmä and in the beautiful archipelago of Sipoo. Last two days I was in Porvoo. Vacation reading included works from Hesse and Solzhenitsyn, but I especially want to raise Erlend Loe's Naive, Super that I fell in love with when sitting in a bus.

As for work, I'm now familiar with LLVM optimization pass architecture and implementation. It all feels intuitive. On the negative side, C++ frequently demonstrates its clumsiness in basic tasks. Also, the LLVM API is poorly documented, so I constantly find myself grepping (actually, ack'ing) around the LLVM tree and reading code rather than the doxygen (huh) generated API. However, all in all I'm definitely thankful for LLVM.org/docs.

Wednesday, June 17, 2009

C++ name mangling

Ever wondered what the 'extern "C" { ... }' block stands for in C++? In C and C++ the 'extern' keyword is used to declare external variables that are instantiated in other translation units. Using 'extern' tells the compiler not to allocate space for the variable as space is being allocated elsewhere. The keyword is also used within variable instantiation. In that case its semantics are to denote that the variable may be referenced from other translation units.

The 'extern "C" { ... }' block, however, has additional semantics. Let's first explain what name mangling is and why it is required.

In C functions cannot be overloaded by type. Thus we cannot declare, for example, functions with prototypes 'int foo(void)' and 'int foo(int)' in the same program. In C++, however, such overloading is possible. Due to the tight coupling of C and C++, particularly in the past (C++ was first implemented by compiling C++ code to C code and then using a regular C compiler to produce machine code), C++ compilers must use distinct symbols for overloaded functions to be compatible with the linker. Name mangling is the act of adding type information in the function names to separate overloaded functions from each other.

For example, the GNU C compiler mangles the first 'foo' function [int foo(void)] to '_Z3foov' and the second 'foo' function [int foo(int)] to '_Z3fooi'. The '_Z' is just a prefix that is a unique identifier in C/C++ that prevents conflicts with user defined identifiers. The number is the length of the original name, three in case of 'foo', and the following letters encode argument types (here 'i' for int and 'v' for void, respectively).

Now, suppose there is a C++ function 'bar' that we would want to use in our C module. We would declare a prototype for 'bar' in the C module that references the function and link the C module with the C++ module, that provides the definition for 'bar'. Let's try just that:


-------module.c--------
int bar(void);
int hooray() {
    return bar() + 42;
}
-----------------------


-------bar.cpp--------
int bar(void) {
    return 7;
}
-----------------------

Let's first compile our modules:


$ g++ -c -O0 -o bar.o bar.cpp
$ gcc -c -O0 -o module.o module.c

Then link them together to produce an executable


$ gcc -O0 bar.o module.o


 module.o: In function `hooray':
 module.c:(.text+0x7): undefined reference to `bar'
 collect2: ld returned 1 exit status

The linker could not find 'bar' from the symbol table. And that is precisely due to C++ name mangling. Let's have a look inside bar.o:

$ objdump -D bar.o


 -- [clip] --
00000000 <_Z3barv>:
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   b8 07 00 00 00          mov    $0x7,%eax
   8:   5d                      pop    %ebp
   9:   c3                      ret
 -- [/clip] --

There it is again, the function is called '_Z3barv' instead of 'bar'. This is the point where the extern "C" block comes to the rescue. It declares that the symbols declared inside the block are to be referenced from C context, and thus the compiler cannot mangle the names.

If the code in bar.cpp is wrapped inside the extern "C" block, everything will work:


-------bar.cpp--------
extern "C" {

int bar(void) {
    return 7;
}

}
-----------------------

Recompiling bar.cpp:


$ g++ -c -O0 -o bar.o bar.cpp

.. and relinking the modules:


$ gcc -O0 bar.o module.o

No errors. If you now disassemble bar.o you will see that the name of the bar function is now 'bar' instead of '_Z3barv', and thus the C module is able to reference the C++ function.

Friday, June 5, 2009

Software recommendations

It took a while but I have now replaced grep with ack to satisfy my daily grepping needs.

Another tip of the day is that if you are a Vim user you will want to start using ctags to move around in your codebase.

Wednesday, May 20, 2009

Graduation thoughts and leisure time

I returned my master's thesis for inspection last Friday. Finally there's some leisure time ahead.

We'll stay in Sanna's summer cottage on a Sipoo island from today until Sunday. I need mail access for checking if my inspectors have sent any comments about my thesis. My cellphone's email application will suffice. I cannot afford to miss any comments because we are unsure if we can get the inspection process done before the deadline of May 28th. It seems improbable now. Missing the deadline will move my graduation date to September.

Not graduating this spring has both cons and pros. If I had my degree now, the TKK summer trainee salary would be higher than for not having the degree. On the other hand, should I graduate next autumn, I think I may pay my HYY membership fee for the whole student year 2009-2010, which in turn would grant me the public transportation discount of 50% for one additional year, which means a lot in savings! But having the degree now, again, would perhaps help me in search for the next job - a quest that I must begin during the summer.

Ok, that's enough pondering. Now, in which books I want to lose myself on the island? I'll take two works from Pekka Himanen: HIMeros and "The hacker ethic", Steppenwolf from H. Hesse and Hofstadter's GEB, that I've been slowly reading for a couple of years.

Saturday, May 9, 2009

"Did you mean 'mature'?"

Google is sometimes amusing. Google proposes 'mature' for 'mathurl' that I discussed in my last post. Also, the sponsored link is funny in this context at least if you understand Finnish.

One more thing about the search. Peculiarly the mathURL site itself isn't appearing as the top result but my post about it.

Friday, May 8, 2009

mathURL

This is the greatest tool for a long time. Writing complicated equations in LaTeX can be syntactically tricky. Reinforcements have arrived. MathURL auto-compiles the equations while you write them and shows the result on top of the source code.

Additionally, you are permitted to embed LaTeX equations in your web page. You are provided with a URI to the image. For example, <img src="http://mathurl.com/?img=5euwuy" alt="" /> will emerge as

$img=5euwuy$

Thursday, May 7, 2009

About the summer job

Just a few notes to document my initial feelings about the approaching summer job research project at TKK.

I'm going to work in a group where the broad subject of interest is computer aided verification and testing, CAV. My work will be based on Koushik Sen's doctoral thesis Scalable automated methods for dynamic program analysis. More specifically, based on Sen's thesis, friends at TKK (and elsewhere) have developed an automated testing tool LIME which is used to build useful automatic test cases for Java programs.

My job is to experimentate if we can easily build a similar tool for programs written in C - using LLVM. Incidently, or not, the very same tool I'm describing in my master's thesis.

I'm definitely looking forward to the job. I also believe LLVM is the right choice for implementing the tool due to its extendable C++ interface for analysing and transforming LLVM code.

Thursday, April 23, 2009

Busy times etc.

Lately I have been working full time with my LLVM thesis. Also, in terms of life, it is harsh times currently. Everybody is mortal, unfortunately - or fortunately?

Now something completely different: If a Finn ever happens to read my blog, and is interested in Mika Waltari's life and work, then I must recommend Panu Rajala's biography of Waltari: Unio Mystica. It is a refreshing story of a creative, turbulent, storming mind.

Now, I want to graduate; afterwards: work hard, learn, sense, feel, enjoy and suffer.

Thursday, April 9, 2009

Intuitive Clojure

I just downloaded and tried Clojure for the first time. With a little help from clojure dot org, I managed to write the following snippet that demonstrates the use of tail calls, Java interoperation, exceptions and a bit of I/O. Of course, as is typical for lisps, it just worked the first time.

The program simplifies div expressions. The input is given in format <numerator> <denominator>, for example, 4/2, is given as 4 2.

 1 (defn gcd [a b]
 2     (if (= a b)
 3         a
 4         (if (> a b)
 5            (recur (- a b) b)
 6            (recur a (- b a)))))
 7 
 8 (defn run []
 9     (let [line (read-line)
10           args (.split line " ")
11           len (alength args)]
12         (if (< len 2)
13            (println "Expected two arguments.")
14            (try (let [a (Integer/parseInt (aget args 0))
15                       b (Integer/parseInt (aget args 1))
16                       gcd (gcd a b)]
17                    (println a "/" b "=" (/ a gcd) "/" (/ b gcd)))
18                (catch NumberFormatException
19                    ex
20                    (println "Expected natural numbers."))))))
21 (run)

Wednesday, April 8, 2009

Lessons on JNI and dynamic libraries in Linux

I was experimenting for the first time with two things: The Java Native Library (JNI) and home-made dynamically linked libraries in Linux. It took some awkward moments until it all worked due to amateurish handling of the subject, and some bad attention. Still, it was rather amusing, and shall be documented.

By the Sun JNI example, I began by creating a Java class (C_Gcd.java) and writing the method declaration for the native method called gcd. Finally I added a static code block to the class that loads the library with System.loadLibrary, which I passed the library name (gcd) as an argument.

Next I generated the C headers (C_Gcd.h) from my Java source file using javah. After that, I created a .cpp file (I decided to use C++) in which the implementation of the native method would be. Next I wrote the actual gcd implementation.

Okay, that was easy. Next we compile the C++ code and make a shared library out of it. Here is what went wrong: I forgot that all dynamic library names are prefixed with 'lib'. I created a file called gcd.so.

Now, thinking everything was going fine, I wrote another Java class that would be using the native gcd method of the C_Gcd class. Now, I noticed that I want the gcd method to be static to avoid instatiating a C_Gcd object. So I happily went and added the static keyword to the gcd method declaration in the Java source file. Here was the second mistake: I forgot that changing the Java method declaration may, and probably will, lead to changes in the output of javah.

Now, believing it should run fine, I compiled everything again and run the Java application but got:

Exception in thread "main" java.lang.UnsatisfiedLinkError: no gcd in java.library.path

Google reminded me of LD_LIBRARY_PATH, but setting it to the current path did not help. I kept having the same error.

Next I went to investigate the system with ldconfig. At some point I realised that not only is it a convention that all libraries are prefixed with lib, they must be! Unfortunately I went and changed the argument of System.loadLibrary to libgcd, which is of course wrong. So here was the third mistake: we refer to a library libfoo with mere foo, but I expressed, using the same example, liblibfoo. I still got the very same error.

After a moment I corrected the lattest glitch, and now, the error message changed:

Exception in thread "main" java.lang.UnsatisfiedLinkError: Gcd.gcd(II)I

It was still an UnsatisfiedLinkError, but it dropped the talk about java.library.path.

Next, feeling desparete, I decided to take a look on the Java bytecode using javap. It looked completely healthy. I decided to take a break.

After the break, I opened the C++ sources again and began reading with no prejudice -- and there it was -- I finally saw that the C function declaration and its definition had been in conflict all the time. Corrected it, recompiled, and, uh, it finally worked!

And here we summarize on how to get JNI working painlessly under Linux:

Make sure to have LD_LIBRARY_PATH point to the library path. (*)
Make sure to name the library libFoo.so.
Make sure to refer to the library with mere Foo.
Be careful when changing the native method signatures.

(*) or launch the JVM with: java -Djava.library.path=. YourApplication in case your library is in the current directory.

For completeness, here's the source (with class names cleaned a bit):

== Gcd.java ==


public class Gcd {
    public static native int gcd(int a, int b) throws IllegalArgumentException;
    static {
        System.loadLibrary("Gcd");
    }   
}

== Gcd.h ==


/* DO NOT EDIT THIS FILE - it is machine generated */
#include <jni.h>
/* Header for class Gcd */
    
#ifndef _Included_Gcd
#define _Included_Gcd
#ifdef __cplusplus
extern "C" {
#endif
/*
 * Class:     Gcd
 * Method:    gcd
 * Signature: (II)I
 */
JNIEXPORT jint JNICALL Java_Gcd_gcd
  (JNIEnv *, jclass, jint, jint);

#ifdef __cplusplus
}
#endif
#endif

== Gcd.cpp ==


#include "Gcd.h"

JNIEXPORT jint JNICALL
Java_Gcd_gcd(JNIEnv *env, jclass cl, jint a, jint b)
{   
    if (a < 0 || b < 0) {
        jclass ex = env->FindClass( 
                "java/lang/IllegalArgumentException");
        env->ThrowNew(ex, "Both arguments must be positive.");
        env->DeleteLocalRef(ex);
        return -1;
    }
    while (a != b) {
        if (a > b) {
            a -= b;
        } else {
            b -= a;
        }
    }
    return a;
}

== Test.java ==


public class Test {
    public static void main(String[] args) {
        System.out.println("gcd(12,24) = " + Gcd.gcd(12, 24));
    }
}

== Makefile ==


CC=g++
INCLUDE=-I/usr/lib/jvm/java-1.5.0-sun-1.5.0.16/include/linux/ -I/usr/lib/jvm/java-1.5.0-sun-1.5.0.16/include

libGcd.so: Gcd.o
        $(CC) -shared -o libGcd.so Gcd.o

Gcd.o: Gcd.cpp 
        $(CC) $(INCLUDE) -O3 -c Gcd.cpp

clean:
        rm -f libGcd.so Gcd.o

Saturday, March 21, 2009

Thesis working hours

A Python script follows, that shows the working hour statistics.


import sys
from subprocess import *

p1 = Popen(["git", "log"], stdout=PIPE)
p2 = Popen(["grep", "^Date"], stdin=p1.stdout, stdout=PIPE)
p3 = Popen(["awk", "{printf \"%d \", $5}"], stdin=p2.stdout, stdout=PIPE)
output = p3.communicate()[0]
hours = output.split()
stats = {}
for h in range(0,24):
    stats[h] = 0
for h in hours:
    stats[int(h)] += 1
for k in stats.keys():
    print "%02d: %s" % (k, "*"*stats[k])

And the results...


00: *****
01: ***
02:
03: **
04:
05:
06:
07:
08:
09:
10: **
11: *****
12: ************
13: ************
14: **********************************
15: **********************
16: ************
17: **********
18: **
19: ***
20: ***
21: **
22: *
23:

Wednesday, March 11, 2009

LaTeX: How to change the bibliography heading title

A few months ago I needed this information and now, certainly too late, I found the answer from Ki-Joo Kim's BibTeX guide. It is simply

\renewcommand\bibname{Foo}

Another thing: why must the scripting language used in BibTeX style files (.bst) be so confusing? Anyone knows what this language is, or is this just an adhoc language created for that single purpose? Anyhow, I'd like to find a manual for it.

Tuesday, March 3, 2009

Thesis development

Today's result of playing with git-archive, shell scripting and gnuplot:

f(time)=wc -w

Saturday, February 28, 2009

Vim users: subscribe to this blog

Regular Vim tips and tricks: http://dailyvim.blogspot.com/. It already saved me once as I had forgotten the very useful @@ sequence that repeats last used macro.

Nokia Ovi barely usable

What is going on with the world's largest mobile phone company? Their strategy is to become big on the net but, man, for me their Ovi is a huge disappointment.

OK, it works fine from my N82 but with Firefox 3 (Linux) some features don't work at all! Yesterday I was trying to define my mobile phone model for Ovi (which should be automatically determined since I had already created an account using my phone's web browser) and the UI just didn't work. I couldn't press "next". The button wasn't there. And another thing, the UI was _really_ slow - not far from unusable.

So I tried the newest Opera and this time the button appeared and I managed to set my mobile model for the system. Still, the system was painfully slow.

What was positive I managed to upload an image to the system from my phone which was my initial goal. And the sync feature worked.

Today I logged in to Ovi again from my desktop - maybe the slowness was just a temporary problem? Nah, it's still slow.

Considering Ovi, Nokia's web technology is insufficient. They should take example from Google on how to make web apps that are platform independent (in my experience, having experimented Google apps from several software and hardware platforms with no problems) and user friendly in easiness of use and responsiveness.

Friday, February 27, 2009

Vim cheatsheet wallpaper

I decided to put a beautiful vim cheatsheet in good use. Usually the right display is enough for my apps (irssi, shell, browser) and the left display is quite empty. So why not use this gorgerous cheatsheet as the wallpaper?

Thursday, February 26, 2009

Basic's not the answer, write in C

This made me laugh, and so I must link it here.

Sunday, February 22, 2009

A CPS question and answer

Continuation passing style is a hard concept. Recently one fellow pasted the following piece of Haskell code in IRC and asked for its meaning:

cfold’ f z [] = z
cfold’ f z (x:xs) = f x z (\y -> cfold’ f y xs)

cfold f z l = cfold’ (\x t g -> f x (g t)) z l

It is a fold implemented in continuation passing style. Let's see what we can say about the type of cfold'. The function cfold' takes a function f of three arguments. The first argument of f is of some type t, the second argument is of some type t1, and the third argument is another function. The other function takes one argument of type t1 and its return value is the same as cfold's, that is, t1. Finally the return type of f is t1 as well. The second argument z of cfold' is something of type t1. The third argument of cfold' is a list of ts that is pattern matched in the definitions of cfold'. Thus, the type signature of the whole function is cfold' :: (t -> t1 -> (t1 -> t1) -> t1) -> t1 -> [t] -> t1.

Next, what can we say about the definition of cfold'? When the third argument, that is a list, is an empty list, it returns what ever is the value of its second argument of type t1. If the list is non-empty, we return what is returned by calling f when it is given x -- the head of the list (of type t), then z of type t1, and a function from t1 to t1.

At this point we start calling the first argument of cfold', that is f, a continuation. In the second definition of cfold' the control is passed to a continuation f. The continuation is a recipe of what to do next. The continuation is given the head of the list and a base value for the fold. The trailing anonymous function from t1 to t1 allows the continuation to get the result of the fold for the tail of the list.

Now, given the fold implemented in CPS, how do we, for example, multiply the elements of a list together? The continuation must contain that logic. Our continuation was a function of type (t -> t1 -> (t1 -> t1) -> t1). The first argument was the head of the list, the second argument was the base value for the fold, and the third argument was the way to get result of the fold for the tail of the list. So the definition for the continuation must be something like this:

(\x t g -> (*) x (g t))

It says that given the head of the list, x, a base value for the fold, t, and the function g that yields the result of the fold for the tail of the list when it is given t, the result of the whole fold is the list head multiplied by the value of the fold for the list tail.

Now cfold' alone is not describing much. We define cfold that passes the continuation, that does the actual folding, to cfold'.

If we try cfold with the sum function, a base value of zero and a one element list containing the number one, the expression reduces as below:

cfold (+) 0 [1]
(\f z l -> cfold' (\x t g -> f x (g t)) z l) (+) 0 [1]
cfold' (\x t g -> (+) x (g t)) 0 [1]
(\f z (x:xs) -> f x z (\y -> cfold' f y xs)) (\x t g -> (+) x (g t)) 0 [1]
(\x t g -> (+) x (g t)) 1 0 (\y -> cfold' (\x t g -> (+) x (g t)) y [])
(+) 1 ((\y -> cfold' (\x t g -> (+) x (g t)) y []) 0)
(+) 1 0
1

So for a non-empty list, cfold' calls the continuation, that performs the folding, and for an empty list it returns the base value.

Hello, autojump

Now everybody go and install autojump! It has quickly proved itself by improving my directory browsing experience in the shell.

Description:

* I remember the software name was "auto..." something, and all my code is in ~/koodi.

Before:

$ cd ~/koodi/auto<tab> <enter>

Now:

$ j auto <enter>

Tuesday, February 17, 2009

Benchmarking llvm-gcc and gcc

I just compiled llvm-gcc (4.2.1 with LLVM 2.6svn) and did some benchmarking. Given a naive implementation of the Fibonacci sequence (fib.c):

#include <stdio.h>

unsigned long int fib(int n) {
   if (n < 3)
       return 1;    
   return fib(n-2) + fib(n-1);
} 

int main() {
   int i;
   for (i=1; i < 46; i++)
       printf("fib(%d) = %ld\n", i, fib(i));
   return 0;
}

Then we compile with -O3:

llvm-gcc -O3 fib.c -o fib-llvm-gcc
gcc -O3 fib.c -o fib-gcc

And the results:

time ./fib-llvm-gcc
real 0m3.053s
user 0m3.052s
sys 0m0.000s

time ./fib-gcc
real 0m11.176s
user 0m11.173s
sys 0m0.004s

That is something.

What comes to code size (both in x86 assembly), the faster is also the smaller:

wc fib-llvm-gcc.S
108 288 1900 fib-llvm-gcc.S

wc fib-gcc.S
291 738 4485 fib-gcc.S

Wednesday, February 11, 2009

Job Interview Failure

I have been fortunate enough in my career to avoid IT job interviews up to this point. It has been that I have been asked not the other way around. It is quite amusing that I have worked as an interviewer but never been interviewed myself. So yesterday I went to my first IT job interview to see what it is like to be on the other end of the table.

Now I'm going to write what are generally the three most important human features in hiring context.

The most important part of yourself to show in an interview is that you are exceptionally passionate about your subject. There are a lot of people who know how to design and write programs competently, but those with passion are the ones with the greatest chance to stand up and achieve something remarkable. Real passion is something very rare.

The second most important part of yourself is that you are specialized. The greatest companies do not seek people that are generally competent in every aspect, but the people who are the best in some aspect. Usually the passionate are also the most specialized.

The third most important part of yourself is to show that you are not a hoax, that you know what you are talking about. This is the easiest part to do. Everybody with the slightest sophistication can show this. With the passionate this is self-evident.

What comes to my first interview yesterday, I think I emerged as an enthusiast and not as a complete clown. But I made one terrible mistake. I drew myself as a generalist - not a specialist. I did not convince the interviewer, nor did I deserve it. Albeit knowing all this, I somehow managed to fail.

---

So I accepted the research trainee position in TKK.

Friday, February 6, 2009

A summer trainee offer from TKK

My search for a job seems to have paid off. TKK is offering a summer trainee position in a research group that develops computer aided verification and testing techniques. I have five days do decide whether or not I'm accepting the position there.

The position was my first choice from the positions offered by TKK, but I'm going to see whether one certain company would like to hire me within five days - if their hiring machinery is even efficient enough for it to be possible.

Thursday, February 5, 2009

Nondeterminism in Haskell

In nondeterministic programming an expression may have more than one value. In SICP we are given the following logic puzzle with its Scheme solution:

"Baker, Cooper, Fletcher, Miller, and Smith live on different floors of an apartment house that contains only five floors. Baker does not live on the top floor. Cooper does not live on the bottom floor. Fletcher does not live on either the top or the bottom floor. Miller lives on a higher floor than does Cooper. Smith does not live on a floor adjacent to Fletcher's. Fletcher does not live on a floor adjacent to Cooper's. Where does everyone live?" (SICP, pp. 418)

A Haskell version of the (almost) same solution follows (Control.Monad and Data.List are needed):

type Name = String
type Floor = Int

adjacent :: Floor -> Floor -> Bool
adjacent a b = abs (a-b) == 1

distinct :: [Floor] -> Bool
distinct l = l == nub l

dinesman :: [[(Name,Floor)]]
dinesman = do
  baker <- [1..5]         
  cooper <- [1..5]         
  fletcher <- [1..5]
  miller <- [1..5]         
  smith <- [1..5]         
  guard (distinct [baker,cooper,fletcher,miller,smith])         
  guard (baker /= 5)         
  guard (cooper /= 1)         
  guard (not (fletcher == 1) || fletcher == 5))         
  guard (miller > cooper)
  guard (not (adjacent smith fletcher))
  guard (not (adjacent fletcher cooper))
  return [("Baker", baker),
          ("Cooper", cooper),
          ("Fletcher", fletcher),
          ("Miller", miller),
          ("Smith", smith)]

Nondeterminism allows us to omit writing the loops that try all possible execution paths. Nondeterministic computing can be imagined as if the computer simultaneously tried all the possible branches of a computation, where the computation branches every time there are multiple choices to proceed, and stops the branches that yield no result, and finally, returns all results from the succeeded branches.

Our nondeterministic process proceeds as follows. Initially we state that each person may live in any five possible floors. That makes 3125 different initial branches to take. The first restriction drops all branches where two or more persons would share the same room. That leaves 120 active branches. The next restriction drops all branches where Baker is living on the top floor. The next restriction drops all branches where Cooper is living on the first floor, etc. Finally what we have is one remaining branch that is the only solution to the problem.

To understand the code, let's take a look on Haskell's MonadPlus class. The class defines two operations:

class Monad m => MonadPlus m where
mzero :: m a
mplus :: m a -> m a -> m a

The operation mzero denotes failure while the operation mplus denotes choice or combination. Here are the semantics of the operations:

mzero >>= f = mzero
a `mplus` mzero = a
mzero `mplus` b = b
a `mplus` (b `mplus` c) = (a `mplus` b) `mplus` c

The first reads that if a failure is applied on some action, the result is still a failure. The second reads that if something other than a failure is added to a failure, the result is the non-failure thing. The third reads that if a failure is added to something other than a failure, the result is again the non-failure thing. The two previous definitions are thus analogous to logical OR. The last one reads that the mplus binary operation is associative.

It appears that lists are instances of MonadPlus, defined as below:

instance MonadPlus [] where
mzero = []
a `mplus` b = a ++ b

The previous means that with lists, the empty list represents a failure and appending represents combination.

Let's go back to the code. The guard operation which we use to make restrictions on branches is defined as follows:

guard :: MonadPlus m => Bool -> m ()
guard True = return ()
guard False = mzero

So guard is an operation that takes a boolean expression and returns a () value meaning "no information" if the boolean argument evaluates to true, and mzero if the boolean argument evaluates to false.

So when a single branch runs a guard operation in our code, at that point, we have some combination of the floor numbers for each person. If the predicate is true, the branch will continue normally. If the predicate is false, the guard yields a failure. From the definition of MonadPlus, we remember that a failure applied on anything yields a failure. It means that the first failing guard will fail the whole branch. Succeeding branches will return a list of name-floor pairs. The return value of the function is just a combination of the results of all succeeded branches.

Does it work?
>> dinesman
[[("Baker", 3), ("Cooper", 2), ("Fletcher", 4), ("Miller", 5), ("Smith", 1)]]

Saturday, January 31, 2009

On self study plans, again

Referring to my previous post, I have now, in the past few days, advanced deep into SICP's interpreter chapter (ch 4). Today I began reading the DIY Scheme interpreter tutorial previously mentioned. I managed to read the first five chapters, and I'm very satisfied so far. The tutorial does not try to teach elementary functional programming concepts, but maintains its scope by just showing how to write real software in Haskell.

Nowadays I'm rather busy writing my LLVM thesis and thus I have not yet got an opportunity to start reading RWH, a promise made a few posts ago.

Friday, January 30, 2009

Job needed

The job market is not the most active right now. However, I'm soon going to send some applications to certain promising places. Both University of Helsinki and Helsinki University of Technology are providing some research group and software developer summer trainee positions. I'm applying there but the number of available positions isn't plenty and I assume people are applying there en masse due to current economy.

Fortunately I'm still unfinished with my thesis so my current unempoyment is not harmful -- it's actually hastening my graduation. But I miss working as a software developer. Past four years in software business have been fun and I want to be there again, soon.

Thursday, January 29, 2009

An apt definition of functional programming

While reading Jukka Paakki's paper on attribute grammars I found out that it contains a pretty apt definition of functional programming that I have been learning over the past year. So here it is for the public's delight (OCR'd, so the possible typos might not be original):

"The functional style of programming can
be summarized with the following characteristics:

referential transparency
higher-order functions
lazy evaluation
pattern matching

Referential transparency is the term used
traditionally to express that the value of
an expression depends only on the values
of its subexpressions, and that the value
of each occurrence of an object is the
same, irrespective of the context. In other
words, side effects are ruled out. Functions
being higher-order means that they are
first-class values and can, for instance,
be stored in data structures and
returned as result of function calls. Lazy
evaluation is tantamount to "nonstrictness,"
meaning that the value of an object is not
computed until it is actually needed, and
thus the value of an argument may be
undefined when calling and executing the
function. Pattern matching is a general
term used to express the mechanism applied
when trying to make X and Y identical in an
equation X = Y where X and Y are expressions.
Pattern matching is restricted in functional
programming to introduce bindings only to
the left-hand side X when solving such an
equation."
(Paakki, J. 1995. Attribute grammar paradigms—a high-level methodology in language implementation. ACM Comput. Surv. 27, 2 (Jun. 1995), page 236.)

What could be added is the concept of recursion.

Saturday, January 24, 2009

Yaht is done

Referring to my previous post, I have now finished with Yet Another Haskell Tutorial. My intention was to do all of the exercises. I think I did over 90% of them which is good enough.

As mentioned earlier, my Haskell journey now continues with the Scheme interpreter tutorial. I'm also going to read through RWH as a side-side-project.

Friday, January 23, 2009

My favourite introduction to monads

It is said that the most difficult concept to grasp in Haskell is that of monads. I'm not trying to tell what monads are because it is already been done so many times and I'm not even expert enough to try that. Instead, I'm going to talk about my personal monad-learning experience.

A highly heterogenous set of monad tutorials have been written. Brent Yorgey recently posted a blog entry titled Abstraction, intuition, and the "monad tutorial fallacy" where he pointed out that the numerous metaphors given to monads, each of them purpoted to be _the_ methaphor that should make monads clear for everybody, are not helping but actually harming the learning process. The metaphors are actually hiding essential properties of the concept and thus making learning harder.

I didn't understand monads when I read that they are just like burritos either. I'm still uncomfortable with monads, but after reading several tutorials on the subject, I think I'm now starting to understand what they are useful for in pure, functional programming.

No single monad tutorial can be given the glory of having positively affected most my learning process. I think that after reading several examples of monad usage, I'm gradually grasping the idea of the abstraction they provide. The text that I feel is describing the subject in a very clear way, and that does not hide essential concepts, is Wadler's 1995 paper Monads for functional programming. The paper first describes several examples of tasks that are implemented unwieldy in a pure functional language without side effects. Then it proceeds to show how the same problems can be solved better using monads. The three monadic laws are given, as they should be, and in the end a parser implemented with monads is described.

My point is that people shouldn't be afraid of the original papers. Sometimes they really are useful in _learning_ the subject although occasionally they get lost in their excessively rigorous handling (for being educational for laymen) of the subject.

Thursday, January 22, 2009

Recursion with Y

The following is written in Literate Haskell style.

Let us first write a recursive Haskell function that tells if a natural number is even:

> isEven n = if (==) n 0 then True else (if isEven (pred n) then False else True)

From Lambda calculus, we know the Y combinator, which returns the fixed point function for any function f:

> y f = f (y f)

The definition above satisfies the fixed point definition of f(x) = x.

Now let us define the isEven function again now using Y:

> isEven' = y (\f n -> if (==) n 0 then True else (if f (pred n) then False else True))

Above, Y binds its function argument on f, which results in a recursive call:

y (\f n -> if (==) n 0 then True else (if f (pred n) then False else True)) = (\f n -> if (==) n 0 then True else (if f (pred n) then False else True)) (y (\f n -> if (==) n 0 then True else (if f (pred n) then False else True)))

Partial application results in:

(\n -> if (==) n 0 then True else (if (y (\f n -> if (==) n 0 then True else (if f (pred n) then False else True))) (pred n) then False else True))

Now let's ensure that the halting conditon for the recursion works. Let's pass it a zero:

(\n -> if (==) n 0 then True else (if (y (\f n -> if (==) n 0 then True else (if f (pred n) then False else True))) (pred n) then False else True)) 0 = if (==) 0 0 then True else (if (y (\f n -> if (==) n 0 then True else (if f (pred 0) then False else True))) (pred n) then False else True)

The first condition evaluates to True and thus the whole expression evaluates to True.

If we pass an argument larger than zero, call it $, we get:

if (y (\f n -> if (==) n 0 then True else (if f (pred n) then False else True))) (pred $) then False else True)

The first condition above is the recursive function itself, which returns True for zero. So we pass it the predecessor of n. If the predecessor of n is even, n itself must be odd. Otherwise n is even.

Now we can test that it works for some small values:

map (\n -> (n, isEven' n)) [0..5]
[(0,True),(1,False),(2,True),(3,False),(4,True),(5,False)]

Friday, January 9, 2009

Self study plans

Self studying is always fun. Currently, I'm learning Haskell by reading Yaht and doing all the exercises in it. Almost done. In paraller, I'm reading through SICP which Santa brought kindly to me. The idea is to kill two birds with one stone (that's a terrible, violent idiom!) by writing a Scheme interpreter in Haskell. There is a promising tutorial to do just that in Wikibooks. After Yaht is done and the interpreter chapter been read in SICP, I plan to start the interpreter project.

And the bad news: the taocp project is currently halted :(

Thursday, January 8, 2009

Saviour VCS

I have been writing my master's thesis for a few months now. I'm using git. Today it came to my rescue, really. I rm'd the wrong file, namely, the .tex file that contains my thesis as a whole. After a millisecond in shock, I recalled that I really am using a vcs. 1) man git-checkout 2) *magic* 3) back in business.