Ruby Quiz, Haskell Solution: Sampling

Posted: September 27th, 2009 | Author: Michel Rijnders | Filed under: Haskell, Ruby Quiz | No Comments »

The Quiz

A classic sampling problem: write a program sample which takes two integers n and m as input. n is the size of the sample. m is the size of the population. The program should print out n random unique indices. Two example runs:

$ ./sample 3 10
0
2
8
$ ./sample 3 10
1
2
9

The output must be sorted. The complete, original quiz is here.

A Haskell Solution

Take One

My first (naïve) attempt uses a list of integers to represent the pool still available (i.e. the population not sampled yet). When it has to draw a sample it takes a random number i between 0 and the length of the list and removes the element at index i from the list, thus guaranteeing the uniqueness of the generated indices. It works correctly but it runs out of memory for the "big sample" (n= 5,000,000 and m = 1,000,000,000) mentioned in the original quiz, not very suprising since it keeps both the current samples as well as the pool still availabe in memory. It is also quite slow because of the use of a plain list.

module Main where

import Control.Monad.State
import Data.List (delete, sort)
import System (getArgs)
import System.Random

main :: IO ()
main = do
  args <- getArgs
  let n = read (args !! 0) ::Int
      m = read (args !! 1) :: Int
  gen <- getStdGen
  let init = RandomPool [0..m] gen
      result = evalState (sample n) init
  mapM_ print (sort result)

data RandomPool = RandomPool { pool :: [Int], gen :: StdGen }

type StateRP = State RandomPool

sample :: Int -> StateRP [Int]
sample 0 = return []
sample n = do
  st <- get
  let hi = length (pool st) - 1
      (i, gen') = randomR (0, hi) (gen st)
      x = pool st !!i
      pool' = delete x (pool st)
  put RandomPool { pool = pool', gen = gen' }
  xs <- sample (n - 1)
  return (x:xs)

Take Two

My second attempt solves the memory problem by keeping only the current samples in memory. When it has to draw a sample it takes a random number x between 0 and m and checks if that number has already been used. If the number has been used it tries agian. This solution also uses the Data.Set module for increased performance.

module Main where

import Control.Monad.State
import Data.List (sort)
import Data.Set as S
import System (getArgs)
import System.Random

main :: IO ()
main = do
  args <- getArgs
  let n = read (args !! 0) ::Int
      m = read (args !! 1) :: Int
  gen <- getStdGen
  let init = RandomSet S.empty gen
      result = evalState (sample m n) init
  mapM_ print (sort result)

data RandomSet = RandomSet { set :: S.Set Int , gen :: StdGen }

type StateRS = State RandomSet

sample :: Int -> Int -> StateRS [Int]
sample hi n =
  if n == 0
    then do st <- get
            return (toList (set st))
    else do draw hi
            sample hi (n - 1)

draw :: Int -> StateRS ()
draw hi = do
  st <- get
  let (x, gen') = randomR (0, hi - 1) (gen st)
  put st { gen = gen' }
  if x `S.member` set st
     then draw hi
     else do
       put st { set = insert x (set st) }
       return ()

Here's an example run for the big sample. Note that I have to increase the maximum stack size for individual threads (+RTS -K250m) to prevent a stack space overflow:

$ time ./sample 5000000 1000000000 +RTS -K250m > big_sample.txt 

real    23m24.355s
user    23m1.658s
sys     0m9.548s
$ ls -l big_sample.txt
-rw-r--r--  1 mies  staff  49483467 Sep 27 17:13 big_sample.txt
$ head big_sample.txt
243
280
416
494
556
602
804
909
970
1126
$ tail big_sample.txt
999998483
999998863
999999002
999999028
999999052
999999053
999999115
999999291
999999853
999999870

The code plus solutions to other quizes is available on GitHub.


100% completeness-fu

Posted: September 23rd, 2009 | Author: Josh Kalderimis | Filed under: Open Source Projects, Rails, Ruby, Web Development | Tags: , , | 4 Comments »

The age of completeness-fu is upon us!

Sometimes validations just don’t cut the mustard and all you want to do is to grade an instance based on how complete its information is. For example, a Location has a title and a description but no address, thus its only 60% complete. Or maybe title is worth more than description and address so its 80% complete. Whatever the case, this is not a new problem and recreating the wheel is a bit unnecessary, so welcome to completeness-fu.

The dsl is based on the thinking-sphinx configuration, which is nice, clean and simple, but very effective.

Here is a sample of the config code used to define a set of checks for a completeness score:

define_completeness_scoring do
  check :title,       lambda { |per| per.title.present? },  :high
  check :description, lambda { |per| per.description.present? }, :medium
  check :main_image,  lambda { |per| per.main_image? },     :low
end

It still needs some more tlc, but its a nice start and a simple solution for a common problem.

So please, have a play around with it, fork the code, make some improvements/enhancements and let me know what you think.


Using Nginx + Passenger as your development environment

Posted: September 17th, 2009 | Author: Josh Kalderimis | Filed under: Mac, Rails, Ruby, Web Development | Tags: , , , | 3 Comments »

As a rails developer you are blessed early on with the fantastic script/server for starting a local development server. Rails is smart enough that it will even suggest you install the Mongrel gem as it is a faster alternative to the basic stock standard WEBrick. But as time passes and your skills improve and the amount of projects you are working on increases, you may find yourself looking for a simpler solution than having to start up an individual script/server on different ports for each project. Or you may just want to have an app run in the background waiting for you to access one of the sites and start it up automatically. What ever the case, there are some very nice solutions available.

99% percent of people who have deployed a rails app have undoubtedly come across Passenger (modrails)  from the fantastic guys at Phusion. Simple put, this allows you to config and run your rails app with (initially) Apache or (as of lately) Nginx while also taking advantage of their supplier static assets serving capabilities.

So why would you choose Nginx over Apache in you development environment? For me the reasons for using Nginx was simple and quick configuration, very very very low memory usage, and it mimics my deployment server setup.

So waffle aside, how do we install and setup Nginx, including for you development environment

installing nginx and passenger

Passenger is nice enough to offer to install Nginx for you automagically, including downloading Nginx 0.7.61, but this is already an old version, so the plan is to :

  1. download the latest stable version of nginx
  2. extract to /usr/local/src/nginx
  3. install the latest version of passenger via gem
  4. have passenger configure, compile and install nginx for us
  5. tweak the nginx configs
  6. putting it all together

so lets get started….

1. and 2. download the latest stable version of nginx and extract it

cd /usr/local
sudo mkdir src
cd src
wget http://sysoev.ru/nginx/nginx-0.7.62.tar.gz (latest verion at time of writing)
tar -zxvf nginx-0.7.62.tar.gz

3. and 4. install passenger via gem and configure, compile and install nginx

sudo gem install passenger (or sudo gem update passenger if already installed)
sudo passenger-install-nginx-module

during the installer program enter the following information

When asked: ‘Where is your Nginx source code located?’
answer /usr/local/src/nginx-0.7.62
When asked: ‘Where do you want to install Nginx to?’
answer /usr/local/nginx
When asked about: ‘Extra arguments to pass to configure script:’
answer --with-http_ssl_module
When asked to ‘Confirm configure flags’
answer yes

Ok, now nginx and passenger are installed with ssl support baked in, what to do from here…

5. tweak the nginx configs

The default nginx config file is pretty basic, which is excellent, because its all you really need, but a few tweaks here and there can make a great thing even better. Slicehost has an excellent write up on some recommended changes here, but as this is your development environment and not production, I suggest not changing worker_processes or keepalive_timeout.

In the end, this is what my server nginx.conf looks like:

worker_processes  1;

events {
    worker_connections  1024;
}

http {
    passenger_root /opt/local/lib/ruby/gems/1.8/gems/passenger-2.2.5;
    passenger_ruby /opt/local/bin/ruby;

    include          mime.types;
    default_type  application/octet-stream;

    sendfile           on;
    tcp_nopush     on;
    tcp_nodelay     off;

    keepalive_timeout  65;

    gzip              on;
    gzip_comp_level   2;
    gzip_proxied      any;
    gzip_types        text/plain text/css application/x-javascript text/xml application/xml application/xml+rss text/javascript;

    # All the virtual hosts exist here
    include /usr/local/nginx/sites-enabled/*;
}

(I have taken out all the commented out lines)

You will notice the passenger_root and passenger_ruby properties/directives in the conf file. These are required for passenger to start, but you don’t need to worry about inserting them as the passenger nginx installer does it for you.

This is what my virtual server file looks like:

server {
    listen       80;
    server_name  lotsoffunstuff.local;
   
    root /Users/me/Development/ruby-workspace/pet-projects/lotsoffunstuff.com/public;
   
    passenger_enabled on;
    rails_env development;
}

And thats it, an nginx server + one virtual host all ready to run via:

sudo /usr/local/nginx/sbin/nginx

I also added an alias to me ~/.profile file

alias nginx='sudo /usr/local/nginx/sbin/nginx'
alias stopnginx='sudo /usr/local/nginx/sbin/nginx -s stop'

now you can just use nginx and stopnginx.

6. putting it all together

ok, the title is a little deceptive as it all seems to be together, but there is one very important change yet to be made, making sure passenger can reach and read your source code.

I ran into this problem when I was setting up my environment, and it all has to do with how passenger and nginx works. As per any good webserver, you need to start it as root so it can access the right ports (80) and directories (pid files), but its worker processes should run as nobody or www-data to restrict unneeded access to other resources. For Nginx to know if the server is a rails app or not, its worker process needs to be able to access the document root, in this case public, and every single one of its parent directories. As I keep my development files within my home directory, nginx would throw a 403 error and add a non-descriptive error message to the error.log file.

Two options are available to fix this:

  1. add read access to all the parent directories to everyone (chmod o+r -R .)(I think)
  2. have the nginx worker processes run as a privileged user which can access all the directories in the path

I choose option two and had the nginx worker processes run as myself. Although you could argue this is insecure, as I only have the server running when I need to, and nginx and passenger have a great track record, I think this is better than setting my home directory to read for everyone.

And there we are, all set up and ready to develop! And all in under 30 mins!

some good links and tips

important for deployment : rails maintenance pages done right
init script for ubuntu : nginx-init-ubuntu
1.9 + nginx + passenger : ruby-rails-nginx-passenger
excellent config details : ubuntu-intrepid-nginx-configuration
nginx + vhosts : ubuntu-intrepid-nginx-virtual-hosts
docs galore : http://nginx.net/ and http://wiki.nginx.net/

special mention to slicehost for all the fantastic server and service related articles known to man


Perl Quiz, Perl Solution: Plusified Equations

Posted: September 11th, 2009 | Author: Eduard Lohmann | Filed under: Perl, Perl Quiz of the Week | No Comments »

For problem definition see: The Haskell solution.

Because this was a test for an applicant, I felt I should solve it too in order to get a good idea of how difficult it is.

My approach was so generate the plussified expressions from smallest to largest sum. That way I can discard any expressions from one list that have a smaller sum then the least sum of the other list.

The Algorithm

  1. Generate a list of plussified expressions for each of the integers, sorted smallest to largest sum.
  2. Compare the sums of the first elements of both lists. If one is less then the other remove it and repeat.
  3. If the sums are equal you have found an answer.

    You could now drop the first expression of each list and continue,
    but you would miss the answers where you link the current first element of one list with
    expressions in the other list that are not first, but have the same sum.
    Therefore I also explicitly look for those answers with a recursive call.

Questions welcome at eduard@tty.nl .

#!/usr/bin/env perl
use strict;
use warnings;

main( @ARGV );

sub main {
    /^\d+$/ or die("'$_' is not a valid string of digits.") for @_;
    2 == @_ or die("Supply exactly 2 strings of digits seperated by a space");

    my @answers = find_answers( [ plusify(shift) ], [ plusify(shift) ] );
    print join("\n", @answers ), "\nFound " . scalar(@answers) . " answers.\n";

    return;
}

sub plusify  {
    my ($num) = @_;
    my ($first_digit, $rest) = $num =~ m/(\d)(\d+)?/;
    return ($first_digit) if not defined $rest;
    my @ways = map {  $first_digit . $_ , $first_digit . '+' . $_ } plusify($rest);

    return sort { sum_expression($a) <=> sum_expression($b) } @ways;
}


sub sum_expression  {
    my ($expression) = @_;
    my $sum = 0;
    $sum += $_ for split(qr{\+}, $expression);

    return $sum;
}

sub find_answers  {
    my ( $expressions_1, $expressions_2 ) = @_;
    my @expressions_1 = @{ $expressions_1 };
    my @expressions_2 = @{ $expressions_2 };
    my @answers = ();
    while ( @expressions_1 and @expressions_2 ) {
        my $head_1 = shift(@expressions_1);
        my $head_2 = shift(@expressions_2);
        my $sum_1 = sum_expression($head_1);
        my $sum_2 = sum_expression($head_2);
        if ( $sum_1 == $sum_2 ) {
            push(@answers,"$head_1 = $sum_1 = $head_2" ) ;
            push(@answers, find_answers( [ $head_1 ],  [ @expressions_2 ] ) );
            push(@answers, find_answers( [ @expressions_1 ], [ $head_2 ] ) );
        };

        # Effectively only shift the lowest of the heads
        unshift(@expressions_2, $head_2) if $sum_1 < $sum_2;
        unshift(@expressions_1, $head_1) if $sum_2 < $sum_1;
    };

    return @answers;
}

1;

Ruby Quiz, Haskell Solution: Maximum Sub-Array

Posted: August 30th, 2009 | Author: Michel Rijnders | Filed under: Haskell, Ruby Quiz | No Comments »

The Quiz

Given an array of integers, find the sub-array with maximum sum. (The complete, original quiz is here.)

A Haskell Solution

module Main where

import Data.List (inits, maximumBy, tails)
import System (getArgs)

maxSubArray :: [Int] -> [Int]
maxSubArray =
  maximumBy (\ x y -> compare (sum x) (sum y)) . concatMap inits . tails

main :: IO ()
main = do
  args <- getArgs
  print (maxSubArray (read (head args) :: [Int]))

5 Must-have iPhone libraries

Posted: August 28th, 2009 | Author: Ward Bekker | Filed under: iPhone development | 1 Comment »

A list of  5 must-have iPhone libraries that we use at tty.nl for our iPhone dev needs:

#1 ASIHTTPRequest

The network API of Apple’s Cocoa Touch framework is not a pretty sight. It’s verbose and hard to put into use. The ASIHTTPRequest library by All-Seeing Interactive is high quality wrapper around the CFNetwork API that makes http communication trivial to implement. Love it!

#2 Json-framework

REST & JSON are here to stay. (Bye bye SOAP, won’t miss you!). The json-framework library enables your killer iPhone app to parse & generate strict JSON.

#3 Touchcode

Cocoa’s NSXML API is much too complex for the 95% of the use cases. For that 95% you should just use touchcode. It’s based on the Open Source libxml2 and supports most XPATH queries.

#4 Pinch analytics

Pinch analytics is best described as Google Analytics for your iPhone app. It allows you to track the amount of unique users, the time spent per user, device types and much more. And just like Google’s package it’s totally free.

#5 RegexKitLite

I was shocked to discover the Cocoa Touch framework doesn’t support Regular Expressions out of the box. No worries though: RegexKitLite allows you to continue to use your carefully crafted regular expressions.


Ruby Quiz, Haskell Solution: Happy Numbers

Posted: August 24th, 2009 | Author: Michel Rijnders | Filed under: Haskell, Ruby Quiz | No Comments »

The Quiz

Write a program that tells whether a given integer is happy. A happy number is found using the following process: Take the sum of the squares of its digits, and continue iterating this process until it yields 1, or produces an infinite loop. (The complete, original quiz is here.)

A Haskell Solution

module Main where

import System (getArgs)

digits :: Int -> [Int]
digits = map (\c -> read [c] :: Int) . show

happy :: Int -> Bool
happy n = happy' n []
  where
    s = sum . map (\x -> x * x) . digits
    happy' n ns
      | s n == 1      = True
      | s n `elem` ns = False
      | otherwise     = happy' (s n) (n : ns)

main :: IO ()
main = do
  args <- getArgs
  if happy (read (head args) :: Int)
    then putStrLn ":-)"
    else putStrLn ":-("
  return ()

Cake’s Progress

Posted: August 24th, 2009 | Author: Michel Rijnders | Filed under: Cake, Open Source Projects | No Comments »

Cake is progressing rather slowly at the moment. Its next release (0.2) is about receiving requests, and thus will involve lots of networking code. Since I know very little about network programming I am first working my way through "UNIX Network Programming", Volume 1.

In the meantime, to still do some Haskell coding, I’m trying to solve some Perl and Ruby quizzes in Haskell. I’ll post my solutions plus explanations here.


Perl Quiz, Haskell Solution: Plusified Equations

Posted: August 19th, 2009 | Author: Michel Rijnders | Filed under: Haskell, Perl Quiz of the Week | 1 Comment »

The Quiz

We are given two positive integers $L and $R we need to find Plusified
expressions of both for which Eval($E_L) == Eval($E_R). So what is a plusified
expression? It is an expression where we can choose whether to add a single
"+" between any consecutive digit. So for example the number 123 has the
following plusified expression:

  • 123
  • 12+3
  • 1+23
  • 1+2+3

So if we are given 123 and 96 we can form the following plusified equation:

  • 12+3 == 9+6

Your mission is to write a Perl program (or an equivalent program in any
programming language) that will find all solutions to the plusified equation
of two numbers given as input. To normalise the output we’ll rule that:

  1. The equations should be given one at each line.
  2. They will be sorted so consecutive digits will take precedence over "+"'s.
  3. A "+" has no surrounding spaces.
  4. The = sign does have a preceding and following space.

A Haskell Solution

Listing of the complete solution:

module Main where

import Data.List (iterate)
import System (getArgs)

split :: Char -> String -> [String]
split delim s =
  let (s', s'') = break (== delim) s
  in s' : case s'' of
        delim : _ -> split delim (tail s'')
            otherwise -> []

plusify :: String -> [String]
plusify ""       = [""]
plusify (c : "") = [c : ""]
plusify (c : cs) = concatMap (\cs' -> [c : cs', c : '+' : cs']) (plusify cs)

combis :: String -> String -> [(String, String)]
combis x y =
  [(l, r) | l <- plusify x, r <- plusify y, sum' l == sum' r]
    where sum' = sum . map (\s -> read s :: Int) . split '+'

main :: IO ()
main = do
  args <- getArgs
  mapM_ (putStrLn . \(l, r) -> l ++ " = " ++ r) (combis (args!!0) (args!!1))
  return ()

Paperclip I18n call to arms

Posted: July 22nd, 2009 | Author: Josh Kalderimis | Filed under: Ruby | Tags: , , | 3 Comments »

Paperclip is a fantastic library used by many many many developers, its github project page shows 1500 watchers but I have a feeling even more than that use the plugin in some shape or form.

The one thing that has been missing for me is proper I18n support and not basic interpolations using the :message option. I recently chanced upon an issue in the github issues registry (#14) which included a fork that implemented a basic version of I18n support but did not go far enough in my opinion. So after taking some advice from validates_timeliness, I have created a fork with goes a little bit further and adds two new messages for the attachment size.

The only thing now is to get some support for this fork so it gets patched into paperclip.

So without further ado, please support this change by leaving a message in the google group thread regarding this change:

Google Group Message – I18n changes and additions

And voting for this issue to be resolved:

Github Issue – Support for I18n

Also, if you have any questions, comments or advice regarding the fork, please don’t hesitate to leave a message below.