A different approach to web form design

Posted: March 2nd, 2010 | Author: Thijs Oppermann | Filed under: Design, Forms, Usability, Web Development | No Comments »

An interesting article about a different way to look at forms: “Mad Libs” Style Form Increases Conversion 25-40%.

Conclusion of the article seems to be that conversion increases a lot when presenting the form as a narrative. Don’t know if that really is supported by the small test they did, but I’m inclined to think they might be on to something.


NLTK’s dispersion_plot on Mac OS X

Posted: February 28th, 2010 | Author: Michel Rijnders | Filed under: Books, Mac, NLTK, Python | No Comments »

While reading “Natural Language Processing with Python” I ran into problems on my Mac with examples that were using the dispersion_plot function: calls to the function returned immediately without displaying anything.

Turns out matplotlib’s back-end wasn’t configured properly. To fix this I had to add a rc file (matplotlibrc) to my ~/.matplotlib directory. The rc file contains the following:

backend: TkAgg

And, hey presto:
Screen shot 2010-02-28 at 12.26.21 PM

(disclaimer: “Works on my machine!”)


A distributed index setup for Sphinx

Posted: February 9th, 2010 | Author: Thijs Oppermann | Filed under: Sphinx search | Tags: , , | No Comments »

Sphinx search is a powerful search engine. Recently we released it (version 0.9.9-rc2) as the backend for most of the searches on one of our high-volume websites. This site has about 360.000 visitors a day that generate about 4.500 search queries for the Sphinx backend per minute on average, peaking to nearly 9.000 per minute when it gets busy on the site. To be able to handle that many requests we currently run Sphinx on four dedicated servers.

A problem with having more than one sphinx server is that you need to make sure the results from the different server are close to the same. Since it is possible to switch between servers for two consecutive searches (which on the site in question could also be a browsing action, for example moving from one page of results to the next) it could be very confusing if the search result were different.

With Sphinx there are a number of ways to solve this problem. The most commonly used solutions are:

  • run the indexer on one server and make those indexing results available to all the other servers (through scp, rsync, or hosting on a shared filesystem)
  • using a distributed index setup

The first should work, but is actually not recommended by the makers of Sphinx. We went for the second solution: a distributed index setup.
Read the rest of this entry »


Simple ranked text search for MongoDB

Posted: February 8th, 2010 | Author: Ward Bekker | Filed under: Open Source Projects, Ruby, Software Engineering | No Comments »

In this code snippit you can see how to do a basic ranked text search for MongoDB. The code relies on two simple mapreduce operations. One to create an inverted index from some demo text, and a second one to score the matching documents based on query term hits.


MongoDB first impressions

Posted: February 8th, 2010 | Author: Ward Bekker | Filed under: Uncategorized | 4 Comments »

For a customer we have developed log analytics software. It’s currently uses MYSQL as the database backend. The system reads in a hourly log file, and calculates all kinds of fancy statistics. I wanted to see how the system would work if I used MongoDB, a schema-less document DB, instead of MYSQL. My impressions in no particular order:

  • Importing log data is much easier than on MYSQL because MongoDB is schema-less. Just create a collection (=bucket) and insert every log line into it as a hash. For log files that don’t have a fixed amount of fields, it’s a great fit.
  • Like MYSQL, you do need to create indexes to make searching fast(er).
  • MongoDB supports map reduce operations. It made some of the calculations much more elegant and better readable than the code that was written for MYSQL.
  • Chaining of map reduce operations is supported, and works as you would expect.
  • Queries are written in javascript. I’m happy that they didn’t invent yet another ’scripting’ language. Javascript looks capable enough.
  • Map reduce operations are not particularly fast. They are upgrading their javascript engine to V8 to improve the execution speed.
  • MongoDB community is nowhere near the size of MYSQL. Don’t expect a lot of Google results for a specific mongoDB issue. The moderated Google group is a better place to go currently.
  • I liked the API. Calls are not verbose and their intented use is easy to understand.
  • Although quite capable, mongoDB is still a young project. I need to have more time with it before using it on a customer project.

My Reading List for 2010

Posted: January 9th, 2010 | Author: Michel Rijnders | Filed under: Books, Programming Language Theory | No Comments »

One of the suggestions of “The Pragmatic Programmer” is that you should learn at least one new programming language every year. This is a great suggestion, but after a couple of years its usefulness diminishes, e.g. if one already knows Perl and Python, then the payback on learning Ruby is rather small. Therefore I’m going to concentrate on the foundations of programming languages this year. Here’s my tentative reading list:

Suggestions welcome.


Ruby Quiz, Haskell Solution: LCD Numbers

Posted: December 17th, 2009 | Author: Michel Rijnders | Filed under: Haskell, Ruby Quiz, Uncategorized | 2 Comments »

A solution to Ruby Quiz #14 in literate Haskell:

LCD Numbers
===========

Problem
-------

[original source](http://rubyquiz.com/quiz14.html)

This week's quiz is to write a program that displays LCD style numbers
at adjustable sizes.

The digits to be displayed will be passed as an argument to the
program. Size should be controlled with the command-line option -s
follow up by a positive integer. The default value for -s is 2.

For example, if your program is called with:

    $ lcd.rb 012345

The correct display is:

     --        --   --        --
    |  |    |    |    | |  | |
    |  |    |    |    | |  | |
               --   --   --   --
    |  |    | |       |    |    |
    |  |    | |       |    |    |
     --        --   --        -- 

And for:

    $ lcd.rb -s 1 6789

Your program should print:

     -   -   -   -
    |     | | | | |
     -       -   -
    | |   | | |   |
     -       -   - 

Note the single column of space between digits in both examples. For
other values of -s, simply lengthen the - and | bars.

Solution
--------

Module declaration and imports:

> module Main where
>
> import Data.Char (digitToInt)
> import Data.List (intersperse)
> import System.Console.GetOpt
> import System.Environment (getArgs)

First we define the numbers at size 1:

> n0 = [ " - "
>      , "| |"
>      , "   "
>      , "| |"
>      , " - "
>      ]
>
> n1 = [ "   "
>      , "  |"
>      , "   "
>      , "  |"
>      , "   "
>      ]
>
> n2 = [ " - "
>      , "  |"
>      , " - "
>      , "|  "
>      , " - "
>      ]
>
> n3 = [ " - "
>      , "  |"
>      , " - "
>      , "  |"
>      , " - "
>      ]
>
> n4 = [ "   "
>      , "| |"
>      , " - "
>      , "  |"
>      , "   "
>      ]
>
> n5 = [ " - "
>      , "|  "
>      , " - "
>      , "  |"
>      , " - "
>      ]
>
> n6 = [ " - "
>      , "|  "
>      , " - "
>      , "| |"
>      , " - "
>      ]
>
> n7 = [ " - "
>      , "  |"
>      , "   "
>      , "  |"
>      , "   "
>      ]
>
> n8 = [ " - "
>      , "| |"
>      , " - "
>      , "| |"
>      , " - "
>      ]
>
> n9 = [ " - "
>      , "| |"
>      , " - "
>      , "  |"
>      , " - "
>      ]
>

Put the numbers in  a list:

> numbers = [n0,n1,n2,n3,n4,n5,n6,n7,n8,n9]

Horizontal scaling function, given a string replicate the second
character n times:

> hscale n cs = head cs : replicate n (cs!!1) ++ [last cs]

Vertical scaling function, repeat the second and fourth row n times:

> vscale n css = head css : replicate n cs1 ++ [cs2] ++ replicate n cs3 ++ [cs4]
>   where cs1 = css !! 1
>         cs2 = css !! 2
>         cs3 = css !! 3
>         cs4 = last css

Scale function; note this function scales a single number:

> scale n = vscale n . map (hscale n)

Function that converts a list of numbers to a string of LCD numbers:

> lcd n = concat .
>         intersperse "\n" .
>         foldr1 (zipWith (++)) .
>         intersperse (replicate (3 + 2*n) " ") .
>         map (scale n . (numbers !!))

`main` function:

> main = do
>   args <- getArgs
>   let (n, digits) = parseArgs args
>   putStrLn $ lcd n $ map digitToInt digits

Command-line argument parsing:

> data Flag = Scale Int
>             deriving Eq
>
> options = [Option "s" [] (ReqArg (Scale . read) "") ""]
>
> parseArgs args =
>   case parse args of
>    (_, [], _)              -> error "Usage: lcd [-s n] digits"
>    ([], digits, [])        -> (2, head digits)
>    ([Scale n], digits, []) -> (n, head digits)
>    (_, _, _)               -> error "Usage: lcd [-s n] digits"
>   where
>     parse = getOpt RequireOrder options

Compiling Apache from source on Ubuntu 9.10 (Karmic Koala)

Posted: November 11th, 2009 | Author: Henry Snoek | Filed under: Open Source Projects | Tags: , | 1 Comment »

Last week my OS got upgraded to Ubuntu 9.10. After that I wanted to compile Apache from source. Unfortunately I got this build error:

htpasswd.c:101: error: conflicting types for ‘getline’
/usr/include/stdio.h:651: note: previous declaration of ‘getline’ was here
make[2]: *** [htpasswd.o] Error 1

This is fixed by replacing getline with parseline on line 651 in /usr/include/stdio.h

Kudos to HowtoForge for pointing this out.


The Myth of the Page Fold

Posted: November 9th, 2009 | Author: Michel Rijnders | Filed under: Web Development | No Comments »

Nice article dispelling the myth of the page fold being a impenetrable barrier for users.

Update: page fold: myth or reality?



Slides Haskell Workshop

Posted: November 8th, 2009 | Author: Michel Rijnders | Filed under: Haskell | Tags: , , | 1 Comment »

Haskell Workshop

The slides for the workshop on Haskell and functional programming I gave yesterday at Devnology’s Community Day.