Posted: February 8th, 2010 | Author: Ward Bekker | Filed under: Open Source Projects, Ruby, Software Engineering | No Comments »
In this code snippit you can see how to do a basic ranked text search for MongoDB. The code relies on two simple mapreduce operations. One to create an inverted index from some demo text, and a second one to score the matching documents based on query term hits.
Posted: December 17th, 2009 | Author: Michel Rijnders | Filed under: Haskell, Ruby Quiz, Uncategorized | 2 Comments »
A solution to Ruby Quiz #14 in literate Haskell:
LCD Numbers
===========
Problem
-------
[original source](http://rubyquiz.com/quiz14.html)
This week's quiz is to write a program that displays LCD style numbers
at adjustable sizes.
The digits to be displayed will be passed as an argument to the
program. Size should be controlled with the command-line option -s
follow up by a positive integer. The default value for -s is 2.
For example, if your program is called with:
$ lcd.rb 012345
The correct display is:
-- -- -- --
| | | | | | | |
| | | | | | | |
-- -- -- --
| | | | | | |
| | | | | | |
-- -- -- --
And for:
$ lcd.rb -s 1 6789
Your program should print:
- - - -
| | | | | |
- - -
| | | | | |
- - -
Note the single column of space between digits in both examples. For
other values of -s, simply lengthen the - and | bars.
Solution
--------
Module declaration and imports:
> module Main where
>
> import Data.Char (digitToInt)
> import Data.List (intersperse)
> import System.Console.GetOpt
> import System.Environment (getArgs)
First we define the numbers at size 1:
> n0 = [ " - "
> , "| |"
> , " "
> , "| |"
> , " - "
> ]
>
> n1 = [ " "
> , " |"
> , " "
> , " |"
> , " "
> ]
>
> n2 = [ " - "
> , " |"
> , " - "
> , "| "
> , " - "
> ]
>
> n3 = [ " - "
> , " |"
> , " - "
> , " |"
> , " - "
> ]
>
> n4 = [ " "
> , "| |"
> , " - "
> , " |"
> , " "
> ]
>
> n5 = [ " - "
> , "| "
> , " - "
> , " |"
> , " - "
> ]
>
> n6 = [ " - "
> , "| "
> , " - "
> , "| |"
> , " - "
> ]
>
> n7 = [ " - "
> , " |"
> , " "
> , " |"
> , " "
> ]
>
> n8 = [ " - "
> , "| |"
> , " - "
> , "| |"
> , " - "
> ]
>
> n9 = [ " - "
> , "| |"
> , " - "
> , " |"
> , " - "
> ]
>
Put the numbers in a list:
> numbers = [n0,n1,n2,n3,n4,n5,n6,n7,n8,n9]
Horizontal scaling function, given a string replicate the second
character n times:
> hscale n cs = head cs : replicate n (cs!!1) ++ [last cs]
Vertical scaling function, repeat the second and fourth row n times:
> vscale n css = head css : replicate n cs1 ++ [cs2] ++ replicate n cs3 ++ [cs4]
> where cs1 = css !! 1
> cs2 = css !! 2
> cs3 = css !! 3
> cs4 = last css
Scale function; note this function scales a single number:
> scale n = vscale n . map (hscale n)
Function that converts a list of numbers to a string of LCD numbers:
> lcd n = concat .
> intersperse "\n" .
> foldr1 (zipWith (++)) .
> intersperse (replicate (3 + 2*n) " ") .
> map (scale n . (numbers !!))
`main` function:
> main = do
> args <- getArgs
> let (n, digits) = parseArgs args
> putStrLn $ lcd n $ map digitToInt digits
Command-line argument parsing:
> data Flag = Scale Int
> deriving Eq
>
> options = [Option "s" [] (ReqArg (Scale . read) "") ""]
>
> parseArgs args =
> case parse args of
> (_, [], _) -> error "Usage: lcd [-s n] digits"
> ([], digits, []) -> (2, head digits)
> ([Scale n], digits, []) -> (n, head digits)
> (_, _, _) -> error "Usage: lcd [-s n] digits"
> where
> parse = getOpt RequireOrder options
Posted: September 27th, 2009 | Author: Michel Rijnders | Filed under: Haskell, Ruby Quiz | No Comments »
The Quiz
A classic sampling problem: write a program sample which takes two integers n and m as input. n is the size of the sample. m is the size of the population. The program should print out n random unique indices. Two example runs:
$ ./sample 3 10
0
2
8
$ ./sample 3 10
1
2
9
The output must be sorted. The complete, original quiz is here.
A Haskell Solution
Take One
My first (naïve) attempt uses a list of integers to represent the pool still available (i.e. the population not sampled yet). When it has to draw a sample it takes a random number i between 0 and the length of the list and removes the element at index i from the list, thus guaranteeing the uniqueness of the generated indices. It works correctly but it runs out of memory for the "big sample" (n= 5,000,000 and m = 1,000,000,000) mentioned in the original quiz, not very suprising since it keeps both the current samples as well as the pool still availabe in memory. It is also quite slow because of the use of a plain list.
module Main where
import Control.Monad.State
import Data.List (delete, sort)
import System (getArgs)
import System.Random
main :: IO ()
main = do
args <- getArgs
let n = read (args !! 0) ::Int
m = read (args !! 1) :: Int
gen <- getStdGen
let init = RandomPool [0..m] gen
result = evalState (sample n) init
mapM_ print (sort result)
data RandomPool = RandomPool { pool :: [Int], gen :: StdGen }
type StateRP = State RandomPool
sample :: Int -> StateRP [Int]
sample 0 = return []
sample n = do
st <- get
let hi = length (pool st) - 1
(i, gen') = randomR (0, hi) (gen st)
x = pool st !!i
pool' = delete x (pool st)
put RandomPool { pool = pool', gen = gen' }
xs <- sample (n - 1)
return (x:xs)
Take Two
My second attempt solves the memory problem by keeping only the current samples in memory. When it has to draw a sample it takes a random number x between 0 and m and checks if that number has already been used. If the number has been used it tries agian. This solution also uses the Data.Set module for increased performance.
module Main where
import Control.Monad.State
import Data.List (sort)
import Data.Set as S
import System (getArgs)
import System.Random
main :: IO ()
main = do
args <- getArgs
let n = read (args !! 0) ::Int
m = read (args !! 1) :: Int
gen <- getStdGen
let init = RandomSet S.empty gen
result = evalState (sample m n) init
mapM_ print (sort result)
data RandomSet = RandomSet { set :: S.Set Int , gen :: StdGen }
type StateRS = State RandomSet
sample :: Int -> Int -> StateRS [Int]
sample hi n =
if n == 0
then do st <- get
return (toList (set st))
else do draw hi
sample hi (n - 1)
draw :: Int -> StateRS ()
draw hi = do
st <- get
let (x, gen') = randomR (0, hi - 1) (gen st)
put st { gen = gen' }
if x `S.member` set st
then draw hi
else do
put st { set = insert x (set st) }
return ()
Here's an example run for the big sample. Note that I have to increase the maximum stack size for individual threads (+RTS -K250m) to prevent a stack space overflow:
$ time ./sample 5000000 1000000000 +RTS -K250m > big_sample.txt
real 23m24.355s
user 23m1.658s
sys 0m9.548s
$ ls -l big_sample.txt
-rw-r--r-- 1 mies staff 49483467 Sep 27 17:13 big_sample.txt
$ head big_sample.txt
243
280
416
494
556
602
804
909
970
1126
$ tail big_sample.txt
999998483
999998863
999999002
999999028
999999052
999999053
999999115
999999291
999999853
999999870
The code plus solutions to other quizes is available on GitHub.
Posted: September 23rd, 2009 | Author: Josh Kalderimis | Filed under: Open Source Projects, Rails, Ruby, Web Development | Tags: development, Rails, Ruby | 4 Comments »
The age of completeness-fu is upon us!
Sometimes validations just don’t cut the mustard and all you want to do is to grade an instance based on how complete its information is. For example, a Location has a title and a description but no address, thus its only 60% complete. Or maybe title is worth more than description and address so its 80% complete. Whatever the case, this is not a new problem and recreating the wheel is a bit unnecessary, so welcome to completeness-fu.
The dsl is based on the thinking-sphinx configuration, which is nice, clean and simple, but very effective.
Here is a sample of the config code used to define a set of checks for a completeness score:
define_completeness_scoring do
check :title, lambda { |per| per.title.present? }, :high
check :description, lambda { |per| per.description.present? }, :medium
check :main_image, lambda { |per| per.main_image? }, :low
end
It still needs some more tlc, but its a nice start and a simple solution for a common problem.
So please, have a play around with it, fork the code, make some improvements/enhancements and let me know what you think.
Posted: September 17th, 2009 | Author: Josh Kalderimis | Filed under: Mac, Rails, Ruby, Web Development | Tags: development, Rails, Ruby, server | 3 Comments »
As a rails developer you are blessed early on with the fantastic script/server for starting a local development server. Rails is smart enough that it will even suggest you install the Mongrel gem as it is a faster alternative to the basic stock standard WEBrick. But as time passes and your skills improve and the amount of projects you are working on increases, you may find yourself looking for a simpler solution than having to start up an individual script/server on different ports for each project. Or you may just want to have an app run in the background waiting for you to access one of the sites and start it up automatically. What ever the case, there are some very nice solutions available.
99% percent of people who have deployed a rails app have undoubtedly come across Passenger (modrails) from the fantastic guys at Phusion. Simple put, this allows you to config and run your rails app with (initially) Apache or (as of lately) Nginx while also taking advantage of their supplier static assets serving capabilities.
So why would you choose Nginx over Apache in you development environment? For me the reasons for using Nginx was simple and quick configuration, very very very low memory usage, and it mimics my deployment server setup.
So waffle aside, how do we install and setup Nginx, including for you development environment
installing nginx and passenger
Passenger is nice enough to offer to install Nginx for you automagically, including downloading Nginx 0.7.61, but this is already an old version, so the plan is to :
- download the latest stable version of nginx
- extract to /usr/local/src/nginx
- install the latest version of passenger via gem
- have passenger configure, compile and install nginx for us
- tweak the nginx configs
- putting it all together
so lets get started….
1. and 2. download the latest stable version of nginx and extract it
cd /usr/local
sudo mkdir src
cd src
wget http://sysoev.ru/nginx/nginx-0.7.62.tar.gz (latest verion at time of writing)
tar -zxvf nginx-0.7.62.tar.gz
3. and 4. install passenger via gem and configure, compile and install nginx
sudo gem install passenger (or sudo gem update passenger if already installed)
sudo passenger-install-nginx-module
during the installer program enter the following information
When asked: ‘Where is your Nginx source code located?’
answer /usr/local/src/nginx-0.7.62
When asked: ‘Where do you want to install Nginx to?’
answer /usr/local/nginx
When asked about: ‘Extra arguments to pass to configure script:’
answer --with-http_ssl_module
When asked to ‘Confirm configure flags’
answer yes
Ok, now nginx and passenger are installed with ssl support baked in, what to do from here…
5. tweak the nginx configs
The default nginx config file is pretty basic, which is excellent, because its all you really need, but a few tweaks here and there can make a great thing even better. Slicehost has an excellent write up on some recommended changes here, but as this is your development environment and not production, I suggest not changing worker_processes or keepalive_timeout.
In the end, this is what my server nginx.conf looks like:
worker_processes 1;
events {
worker_connections 1024;
}
http {
passenger_root /opt/local/lib/ruby/gems/1.8/gems/passenger-2.2.5;
passenger_ruby /opt/local/bin/ruby;
include mime.types;
default_type application/octet-stream;
sendfile on;
tcp_nopush on;
tcp_nodelay off;
keepalive_timeout 65;
gzip on;
gzip_comp_level 2;
gzip_proxied any;
gzip_types text/plain text/css application/x-javascript text/xml application/xml application/xml+rss text/javascript;
# All the virtual hosts exist here
include /usr/local/nginx/sites-enabled/*;
}
(I have taken out all the commented out lines)
You will notice the passenger_root and passenger_ruby properties/directives in the conf file. These are required for passenger to start, but you don’t need to worry about inserting them as the passenger nginx installer does it for you.
This is what my virtual server file looks like:
server {
listen 80;
server_name lotsoffunstuff.local;
root /Users/me/Development/ruby-workspace/pet-projects/lotsoffunstuff.com/public;
passenger_enabled on;
rails_env development;
}
And thats it, an nginx server + one virtual host all ready to run via:
sudo /usr/local/nginx/sbin/nginx
I also added an alias to me ~/.profile file
alias nginx='sudo /usr/local/nginx/sbin/nginx'
alias stopnginx='sudo /usr/local/nginx/sbin/nginx -s stop'
now you can just use nginx and stopnginx.
6. putting it all together
ok, the title is a little deceptive as it all seems to be together, but there is one very important change yet to be made, making sure passenger can reach and read your source code.
I ran into this problem when I was setting up my environment, and it all has to do with how passenger and nginx works. As per any good webserver, you need to start it as root so it can access the right ports (80) and directories (pid files), but its worker processes should run as nobody or www-data to restrict unneeded access to other resources. For Nginx to know if the server is a rails app or not, its worker process needs to be able to access the document root, in this case public, and every single one of its parent directories. As I keep my development files within my home directory, nginx would throw a 403 error and add a non-descriptive error message to the error.log file.
Two options are available to fix this:
- add read access to all the parent directories to everyone (chmod o+r -R .)(I think)
- have the nginx worker processes run as a privileged user which can access all the directories in the path
I choose option two and had the nginx worker processes run as myself. Although you could argue this is insecure, as I only have the server running when I need to, and nginx and passenger have a great track record, I think this is better than setting my home directory to read for everyone.
And there we are, all set up and ready to develop! And all in under 30 mins!
some good links and tips
important for deployment : rails maintenance pages done right
init script for ubuntu : nginx-init-ubuntu
1.9 + nginx + passenger : ruby-rails-nginx-passenger
excellent config details : ubuntu-intrepid-nginx-configuration
nginx + vhosts : ubuntu-intrepid-nginx-virtual-hosts
docs galore : http://nginx.net/ and http://wiki.nginx.net/
special mention to slicehost for all the fantastic server and service related articles known to man
Posted: August 30th, 2009 | Author: Michel Rijnders | Filed under: Haskell, Ruby Quiz | No Comments »
The Quiz
Given an array of integers, find the sub-array with maximum sum. (The complete, original quiz is here.)
A Haskell Solution
module Main where
import Data.List (inits, maximumBy, tails)
import System (getArgs)
maxSubArray :: [Int] -> [Int]
maxSubArray =
maximumBy (\ x y -> compare (sum x) (sum y)) . concatMap inits . tails
main :: IO ()
main = do
args <- getArgs
print (maxSubArray (read (head args) :: [Int]))
Posted: August 24th, 2009 | Author: Michel Rijnders | Filed under: Haskell, Ruby Quiz | No Comments »
The Quiz
Write a program that tells whether a given integer is happy. A happy number is found using the following process: Take the sum of the squares of its digits, and continue iterating this process until it yields 1, or produces an infinite loop. (The complete, original quiz is here.)
A Haskell Solution
module Main where
import System (getArgs)
digits :: Int -> [Int]
digits = map (\c -> read [c] :: Int) . show
happy :: Int -> Bool
happy n = happy' n []
where
s = sum . map (\x -> x * x) . digits
happy' n ns
| s n == 1 = True
| s n `elem` ns = False
| otherwise = happy' (s n) (n : ns)
main :: IO ()
main = do
args <- getArgs
if happy (read (head args) :: Int)
then putStrLn ":-)"
else putStrLn ":-("
return ()
Posted: July 22nd, 2009 | Author: Josh Kalderimis | Filed under: Ruby | Tags: i18n, open source, paperclip | 3 Comments »
Paperclip is a fantastic library used by many many many developers, its github project page shows 1500 watchers but I have a feeling even more than that use the plugin in some shape or form.
The one thing that has been missing for me is proper I18n support and not basic interpolations using the :message option. I recently chanced upon an issue in the github issues registry (#14) which included a fork that implemented a basic version of I18n support but did not go far enough in my opinion. So after taking some advice from validates_timeliness, I have created a fork with goes a little bit further and adds two new messages for the attachment size.
The only thing now is to get some support for this fork so it gets patched into paperclip.
So without further ado, please support this change by leaving a message in the google group thread regarding this change:
Google Group Message – I18n changes and additions
And voting for this issue to be resolved:
Github Issue – Support for I18n
Also, if you have any questions, comments or advice regarding the fork, please don’t hesitate to leave a message below.
Posted: July 16th, 2009 | Author: Josh Kalderimis | Filed under: Ruby, Web Development | No Comments »
One of the most popular open-source indexers for web development has to be Sphinx, an sql full-text search engine. Sphinx has been used by many high profile site, including The Pirate Bay, Craigs List, and NetLog, to name the big ones (more here).
But this post is not about Sphinx, but its Ruby helper library, Thinking Sphinx. In fact, its not just about Thinking Sphinx, finding and fixing a problem yourself.
Thinking Sphinx is an amazing open source gem/plugin/library for Rails, masterminded by the great Pat Allan (@pat), and available to all on github. In fact, with 89 contributors and climbing, this library has had some tender love and care for many caring developers. If you are a Ruby developer and you are interested in indexing, have a look at Thinking Sphinx today, it is simple to install and use, fast and effective, and has some excellent documentation and a very active google group.
Recently while working on a new project for a client of ours I encountered an annoying issue/problem/bug with Thinking Sphinx, more specifically, DateTime multi-value attributes. A multi-value attribute (MVA) is like specifying an array of values for a specific field, for example, employee names, the trouble is that MVAs can only be integers so strings need to be converted (CRC values, CRC32) and datetimes need to be converted to unix timestamps. In Thinking Sphinx MVAs are usually defined through the relationships a model might hold, eg. an office has many employees.
Thinking Sphinx had no problem with me specifying MVAs of integers, but datetimes where a bit of an issue due to two issues, one is the converting, and the other is the group concat to add them all together with commas (like an array). Now, to say that it was an issue makes it sound worse than it really was, because by breaking down the issue into the two parts, converting and concatenating, two improvements could be made at the same time.
So after a brief chat with Pat on the google group to confirm the issue, and on a perfect sunday to be inside (hungover and not great weather), I forked Thinking Sphinx on github (gotta love github for making forking projects so easy) and created a four commit patch which solved the MVA datetime issue for everyone. This patch has since been pulled into the master Thinking Sphinx project so others can use datetime columns as MVAs without having to think about it.
The whole experience of finding this bug, fixing it, and submitting a patch, has reenforced a very important lesson for all developers. If you run into a problem which you do not believe is excepted behavior, do not run away from it and try to find a workaround, get into the code and try to find out where and why the problem is occurring. Its very easy to use libraries and gems and plugins without really knowing how they work or what they do, but if you never get your hands dirty and investigate what is going on then you never learn.
If you can’t find the problem, ask for help on the message boards or by emailing the maintainer. On the other hand, if you do find the problem, check with the maintainer if he can confirm your findings.
Next, create a patch which fixes the issue and does not break or alter current functionality (unless it really does need altering).
And last but not least, share the patch with the community. (make the world a better place)
By the time you have finished with the patch you will not only know the code a lot better, but you would have learnt a truck load about the library, which in the long run is a huge advantage as you are not blindly trusting code written by a hoard of other developers but know first hand what is going on.