NLTK’s dispersion_plot on Mac OS X

Posted: February 28th, 2010 | Author: Michel Rijnders | Filed under: Books, Mac, NLTK, Python | No Comments »

While reading “Natural Language Processing with Python” I ran into problems on my Mac with examples that were using the dispersion_plot function: calls to the function returned immediately without displaying anything.

Turns out matplotlib’s back-end wasn’t configured properly. To fix this I had to add a rc file (matplotlibrc) to my ~/.matplotlib directory. The rc file contains the following:

backend: TkAgg

And, hey presto:
Screen shot 2010-02-28 at 12.26.21 PM

(disclaimer: “Works on my machine!”)


My Reading List for 2010

Posted: January 9th, 2010 | Author: Michel Rijnders | Filed under: Books, Programming Language Theory | No Comments »

One of the suggestions of “The Pragmatic Programmer” is that you should learn at least one new programming language every year. This is a great suggestion, but after a couple of years its usefulness diminishes, e.g. if one already knows Perl and Python, then the payback on learning Ruby is rather small. Therefore I’m going to concentrate on the foundations of programming languages this year. Here’s my tentative reading list:

Suggestions welcome.


Ruby Quiz, Haskell Solution: LCD Numbers

Posted: December 17th, 2009 | Author: Michel Rijnders | Filed under: Haskell, Ruby Quiz, Uncategorized | 2 Comments »

A solution to Ruby Quiz #14 in literate Haskell:

LCD Numbers
===========

Problem
-------

[original source](http://rubyquiz.com/quiz14.html)

This week's quiz is to write a program that displays LCD style numbers
at adjustable sizes.

The digits to be displayed will be passed as an argument to the
program. Size should be controlled with the command-line option -s
follow up by a positive integer. The default value for -s is 2.

For example, if your program is called with:

    $ lcd.rb 012345

The correct display is:

     --        --   --        --
    |  |    |    |    | |  | |
    |  |    |    |    | |  | |
               --   --   --   --
    |  |    | |       |    |    |
    |  |    | |       |    |    |
     --        --   --        -- 

And for:

    $ lcd.rb -s 1 6789

Your program should print:

     -   -   -   -
    |     | | | | |
     -       -   -
    | |   | | |   |
     -       -   - 

Note the single column of space between digits in both examples. For
other values of -s, simply lengthen the - and | bars.

Solution
--------

Module declaration and imports:

> module Main where
>
> import Data.Char (digitToInt)
> import Data.List (intersperse)
> import System.Console.GetOpt
> import System.Environment (getArgs)

First we define the numbers at size 1:

> n0 = [ " - "
>      , "| |"
>      , "   "
>      , "| |"
>      , " - "
>      ]
>
> n1 = [ "   "
>      , "  |"
>      , "   "
>      , "  |"
>      , "   "
>      ]
>
> n2 = [ " - "
>      , "  |"
>      , " - "
>      , "|  "
>      , " - "
>      ]
>
> n3 = [ " - "
>      , "  |"
>      , " - "
>      , "  |"
>      , " - "
>      ]
>
> n4 = [ "   "
>      , "| |"
>      , " - "
>      , "  |"
>      , "   "
>      ]
>
> n5 = [ " - "
>      , "|  "
>      , " - "
>      , "  |"
>      , " - "
>      ]
>
> n6 = [ " - "
>      , "|  "
>      , " - "
>      , "| |"
>      , " - "
>      ]
>
> n7 = [ " - "
>      , "  |"
>      , "   "
>      , "  |"
>      , "   "
>      ]
>
> n8 = [ " - "
>      , "| |"
>      , " - "
>      , "| |"
>      , " - "
>      ]
>
> n9 = [ " - "
>      , "| |"
>      , " - "
>      , "  |"
>      , " - "
>      ]
>

Put the numbers in  a list:

> numbers = [n0,n1,n2,n3,n4,n5,n6,n7,n8,n9]

Horizontal scaling function, given a string replicate the second
character n times:

> hscale n cs = head cs : replicate n (cs!!1) ++ [last cs]

Vertical scaling function, repeat the second and fourth row n times:

> vscale n css = head css : replicate n cs1 ++ [cs2] ++ replicate n cs3 ++ [cs4]
>   where cs1 = css !! 1
>         cs2 = css !! 2
>         cs3 = css !! 3
>         cs4 = last css

Scale function; note this function scales a single number:

> scale n = vscale n . map (hscale n)

Function that converts a list of numbers to a string of LCD numbers:

> lcd n = concat .
>         intersperse "\n" .
>         foldr1 (zipWith (++)) .
>         intersperse (replicate (3 + 2*n) " ") .
>         map (scale n . (numbers !!))

`main` function:

> main = do
>   args <- getArgs
>   let (n, digits) = parseArgs args
>   putStrLn $ lcd n $ map digitToInt digits

Command-line argument parsing:

> data Flag = Scale Int
>             deriving Eq
>
> options = [Option "s" [] (ReqArg (Scale . read) "") ""]
>
> parseArgs args =
>   case parse args of
>    (_, [], _)              -> error "Usage: lcd [-s n] digits"
>    ([], digits, [])        -> (2, head digits)
>    ([Scale n], digits, []) -> (n, head digits)
>    (_, _, _)               -> error "Usage: lcd [-s n] digits"
>   where
>     parse = getOpt RequireOrder options

The Myth of the Page Fold

Posted: November 9th, 2009 | Author: Michel Rijnders | Filed under: Web Development | No Comments »

Nice article dispelling the myth of the page fold being a impenetrable barrier for users.

Update: page fold: myth or reality?

The story continues: Scrolling and Attention


Slides Haskell Workshop

Posted: November 8th, 2009 | Author: Michel Rijnders | Filed under: Haskell | Tags: , , | 1 Comment »

Haskell Workshop

The slides for the workshop on Haskell and functional programming I gave yesterday at Devnology’s Community Day.


Ruby Quiz, Haskell Solution: Sampling

Posted: September 27th, 2009 | Author: Michel Rijnders | Filed under: Haskell, Ruby Quiz | No Comments »

The Quiz

A classic sampling problem: write a program sample which takes two integers n and m as input. n is the size of the sample. m is the size of the population. The program should print out n random unique indices. Two example runs:

$ ./sample 3 10
0
2
8
$ ./sample 3 10
1
2
9

The output must be sorted. The complete, original quiz is here.

A Haskell Solution

Take One

My first (naïve) attempt uses a list of integers to represent the pool still available (i.e. the population not sampled yet). When it has to draw a sample it takes a random number i between 0 and the length of the list and removes the element at index i from the list, thus guaranteeing the uniqueness of the generated indices. It works correctly but it runs out of memory for the "big sample" (n= 5,000,000 and m = 1,000,000,000) mentioned in the original quiz, not very suprising since it keeps both the current samples as well as the pool still availabe in memory. It is also quite slow because of the use of a plain list.

module Main where

import Control.Monad.State
import Data.List (delete, sort)
import System (getArgs)
import System.Random

main :: IO ()
main = do
  args <- getArgs
  let n = read (args !! 0) ::Int
      m = read (args !! 1) :: Int
  gen <- getStdGen
  let init = RandomPool [0..m] gen
      result = evalState (sample n) init
  mapM_ print (sort result)

data RandomPool = RandomPool { pool :: [Int], gen :: StdGen }

type StateRP = State RandomPool

sample :: Int -> StateRP [Int]
sample 0 = return []
sample n = do
  st <- get
  let hi = length (pool st) - 1
      (i, gen') = randomR (0, hi) (gen st)
      x = pool st !!i
      pool' = delete x (pool st)
  put RandomPool { pool = pool', gen = gen' }
  xs <- sample (n - 1)
  return (x:xs)

Take Two

My second attempt solves the memory problem by keeping only the current samples in memory. When it has to draw a sample it takes a random number x between 0 and m and checks if that number has already been used. If the number has been used it tries agian. This solution also uses the Data.Set module for increased performance.

module Main where

import Control.Monad.State
import Data.List (sort)
import Data.Set as S
import System (getArgs)
import System.Random

main :: IO ()
main = do
  args <- getArgs
  let n = read (args !! 0) ::Int
      m = read (args !! 1) :: Int
  gen <- getStdGen
  let init = RandomSet S.empty gen
      result = evalState (sample m n) init
  mapM_ print (sort result)

data RandomSet = RandomSet { set :: S.Set Int , gen :: StdGen }

type StateRS = State RandomSet

sample :: Int -> Int -> StateRS [Int]
sample hi n =
  if n == 0
    then do st <- get
            return (toList (set st))
    else do draw hi
            sample hi (n - 1)

draw :: Int -> StateRS ()
draw hi = do
  st <- get
  let (x, gen') = randomR (0, hi - 1) (gen st)
  put st { gen = gen' }
  if x `S.member` set st
     then draw hi
     else do
       put st { set = insert x (set st) }
       return ()

Here's an example run for the big sample. Note that I have to increase the maximum stack size for individual threads (+RTS -K250m) to prevent a stack space overflow:

$ time ./sample 5000000 1000000000 +RTS -K250m > big_sample.txt 

real    23m24.355s
user    23m1.658s
sys     0m9.548s
$ ls -l big_sample.txt
-rw-r--r--  1 mies  staff  49483467 Sep 27 17:13 big_sample.txt
$ head big_sample.txt
243
280
416
494
556
602
804
909
970
1126
$ tail big_sample.txt
999998483
999998863
999999002
999999028
999999052
999999053
999999115
999999291
999999853
999999870

The code plus solutions to other quizes is available on GitHub.


Ruby Quiz, Haskell Solution: Maximum Sub-Array

Posted: August 30th, 2009 | Author: Michel Rijnders | Filed under: Haskell, Ruby Quiz | No Comments »

The Quiz

Given an array of integers, find the sub-array with maximum sum. (The complete, original quiz is here.)

A Haskell Solution

module Main where

import Data.List (inits, maximumBy, tails)
import System (getArgs)

maxSubArray :: [Int] -> [Int]
maxSubArray =
  maximumBy (\ x y -> compare (sum x) (sum y)) . concatMap inits . tails

main :: IO ()
main = do
  args <- getArgs
  print (maxSubArray (read (head args) :: [Int]))

Ruby Quiz, Haskell Solution: Happy Numbers

Posted: August 24th, 2009 | Author: Michel Rijnders | Filed under: Haskell, Ruby Quiz | No Comments »

The Quiz

Write a program that tells whether a given integer is happy. A happy number is found using the following process: Take the sum of the squares of its digits, and continue iterating this process until it yields 1, or produces an infinite loop. (The complete, original quiz is here.)

A Haskell Solution

module Main where

import System (getArgs)

digits :: Int -> [Int]
digits = map (\c -> read [c] :: Int) . show

happy :: Int -> Bool
happy n = happy' n []
  where
    s = sum . map (\x -> x * x) . digits
    happy' n ns
      | s n == 1      = True
      | s n `elem` ns = False
      | otherwise     = happy' (s n) (n : ns)

main :: IO ()
main = do
  args <- getArgs
  if happy (read (head args) :: Int)
    then putStrLn ":-)"
    else putStrLn ":-("
  return ()

Cake’s Progress

Posted: August 24th, 2009 | Author: Michel Rijnders | Filed under: Cake, Open Source Projects | No Comments »

Cake is progressing rather slowly at the moment. Its next release (0.2) is about receiving requests, and thus will involve lots of networking code. Since I know very little about network programming I am first working my way through "UNIX Network Programming", Volume 1.

In the meantime, to still do some Haskell coding, I’m trying to solve some Perl and Ruby quizzes in Haskell. I’ll post my solutions plus explanations here.


Perl Quiz, Haskell Solution: Plusified Equations

Posted: August 19th, 2009 | Author: Michel Rijnders | Filed under: Haskell, Perl Quiz of the Week | 1 Comment »

The Quiz

We are given two positive integers $L and $R we need to find Plusified
expressions of both for which Eval($E_L) == Eval($E_R). So what is a plusified
expression? It is an expression where we can choose whether to add a single
"+" between any consecutive digit. So for example the number 123 has the
following plusified expression:

  • 123
  • 12+3
  • 1+23
  • 1+2+3

So if we are given 123 and 96 we can form the following plusified equation:

  • 12+3 == 9+6

Your mission is to write a Perl program (or an equivalent program in any
programming language) that will find all solutions to the plusified equation
of two numbers given as input. To normalise the output we’ll rule that:

  1. The equations should be given one at each line.
  2. They will be sorted so consecutive digits will take precedence over "+"'s.
  3. A "+" has no surrounding spaces.
  4. The = sign does have a preceding and following space.

A Haskell Solution

Listing of the complete solution:

module Main where

import Data.List (iterate)
import System (getArgs)

split :: Char -> String -> [String]
split delim s =
  let (s', s'') = break (== delim) s
  in s' : case s'' of
        delim : _ -> split delim (tail s'')
            otherwise -> []

plusify :: String -> [String]
plusify ""       = [""]
plusify (c : "") = [c : ""]
plusify (c : cs) = concatMap (\cs' -> [c : cs', c : '+' : cs']) (plusify cs)

combis :: String -> String -> [(String, String)]
combis x y =
  [(l, r) | l <- plusify x, r <- plusify y, sum' l == sum' r]
    where sum' = sum . map (\s -> read s :: Int) . split '+'

main :: IO ()
main = do
  args <- getArgs
  mapM_ (putStrLn . \(l, r) -> l ++ " = " ++ r) (combis (args!!0) (args!!1))
  return ()