Splitting a database dump

This is a way to split a SQL dump into tables that is relatively easy.

It should start with zcat FILE.gz | csplit -ftable – “/DROP TABLE/” {*},
but csplit has bug where reading a lot from standard in does not work.
So instead unzip your file first.

gunzip FILE.gz
csplit -ftable FILE "/DROP TABLE/" {*}

Then to give the files meaningful names:

for FILE in `ls -1 table*`; do
     NAME=`head -n1 $FILE | cut -d$'\x60' -f2`
     mv $FILE "$NAME.sql";
done;

If your dump does not start witjh a DROP TABLE `name` IF EXISTS,
you will have to change the match expression to csplit a litte.

Hope this is useful to someone.

Setting up Emacs to compare two git tags

Introduction

This is a setup for using Emacs to compare two git tags. It is based on Ediff mode and Ediff Trees. Ediff Trees is a useful front-end for comparing large trees of files. To make things easier we have two git clones, one for each tag that we call before and after.

Needed Code

Create a file named ediff-trees.el with the following contents and place it in your Emacs load path:

;;; ediff-trees.el --- Recursively ediff two directory trees
;;;----------------------------------------------------------------------
;; Author: Joao Cachopo <joao.cachopo@inesc-id.pt>
;; Created on: Wed May 10 17:30:49 2006
;; Keywords: ediff, comparing
;; Version: 20071126.1
;;
;; Copyright (C) 2006 Joao Cachopo

;; This program is not part of GNU Emacs

;; This program is free software; you can redistribute it and/or
;; modify it under the terms of the GNU General Public License as
;; published by the Free Software Foundation; either version 2, or (at
;; your option) any later version.

;; This program is distributed in the hope that it will be useful, but
;; WITHOUT ANY WARRANTY; without even the implied warranty of
;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
;; General Public License for more details.

;; You should have received a copy of the GNU General Public License
;; along with GNU Emacs; see the file COPYING.  If not, write to the
;; Free Software Foundation, 675 Massachusettes Ave, Cambridge, MA
;; 02139, USA.

;;; Commentary:

;; The ediff-trees package is a simple frontend to the emacs' ediff
;; package to allow a simpler comparison of two similar directory
;; trees.

;; I wrote this package because I often need to compare two different
;; versions of the same directory tree and ediff-directories is not
;; very helpful in this case.  Specially when the directory trees to
;; compare are deep and only a few files have changed.
;; Typically, that occurs when I create a copy of some project
;; directory tree either to make some experiments myself or to send to
;; someone else that will return a modified directory tree to me
;; later.  (Yes, I heard of version control systems, and I use them
;; regularly.  Yet, for several reasons, sometimes that is not an
;; option.)

;; Later, when I want to integrate the modified directory tree with
;; the original tree, I want to see the differences to the original
;; version, so that I may decide whether to accept the changes or not.
;; This is where this package kicks in...

;; To use it, just call `ediff-trees', which will ask for two
;; directories to compare.  Usually, I give the original directory as
;; the first one and the modified directory as the second one.

;; ediff-trees recursively descends both directories, collecting the
;; pairs of files that are worth "comparing": either files that
;; changed, or that appear in one of the two directory trees but not
;; in the other.  Then, it shows the first "change" using ediff.

;; In fact, ediff-trees either uses ediff to compare a file with its
;; changed version, or simply opens a file that occurs in only one of
;; the trees.

;; The user can then navigate backward and forward in the set of
;; changes by using `ediff-trees-examine-next' and
;; `ediff-trees-examine-previous', respectively.  These functions move
;; from one change (quiting the current ediff session or killing the
;; current file buffer) to another.  Therefore, by repeatedly using
;; these functions we can go through all the changes.  I usually use
;; some global bindings for these functions.  Something like this:
;;
;;   (global-set-key (kbd "s-SPC") 'ediff-trees-examine-next)
;;   (global-set-key (kbd "S-s-SPC") 'ediff-trees-examine-previous)
;;   (global-set-key (kbd "C-s-SPC") 'ediff-trees-examine-next-regexp)
;;   (global-set-key (kbd "C-S-s-SPC") 'ediff-trees-examine-previous-regexp))

;; The `ediff-trees-examine-next-regexp' and
;; `ediff-trees-examine-previous-regexp' skip over the list of changes
;; to a file with a filename that matches a given regexp.

;; This package allows for some customization.  Please, see the
;; ediff-trees group under customize.

;; Finally, to deal with small changes in the white space I often find
;; it useful to configure ediff like this:
;;
;;   (setq ediff-diff-options "-w")
;;   (setq-default ediff-ignore-similar-regions t)

;;; Code:


(require 'ediff)

(defgroup ediff-trees nil
  "Extend ediff to allow comparing two trees recursively."
  :tag "Ediff Trees"
  :group 'ediff)


(defface ediff-trees-deleted-original-face
  '((((class color))
     (:background "Pink"))
    (t (:inverse-video t)))
  "Face for highlighting the buffer when it was deleted from the original tree."
  :group 'ediff-trees)

(defcustom ediff-trees-file-ignore-regexp
  "\\`\\(\\.?#.*\\|.*,v\\|.*~\\|CVS\\|_darcs\\)\\'"
  "A regexp matching either files or directories to be ignored
when comparing two trees.  If a directory matches the regexp,
then its contents is not scanned by `ediff-trees'."
  :type 'regexp
  :group 'ediff-trees)


(defcustom ediff-trees-sort-order-regexps nil
  "*Specifies a list of regexps that determine the order in which
files will be presented during the ediff-trees session.  Files
with filenames matching former regexps appear earlier in the
session.  If a filename matches more than one regexp, the first
one wins."
  :type '(repeat regexp)
  :group 'ediff-trees)



(defun ediff-trees (root1 root2)
  "Starts a new ediff session that recursively compares two
trees."
  (interactive
   (let ((dir-A (ediff-get-default-directory-name))
         f)
     (list (setq f (ediff-read-file-name "Directory A to compare:" dir-A nil))
       (ediff-read-file-name "Directory B to compare:"
                 (if ediff-use-last-dir
                     ediff-last-dir-B
                   (ediff-strip-last-dir f))
                 nil))))
  (ediff-trees-internal root1 root2))


;;; Internal variables, used during an ediff-trees session
(defvar ediff-trees-current-file nil)
(defvar ediff-trees-remaining-files (list))
(defvar ediff-trees-examined-files (list))


(defun ediff-trees-internal (root1 root2)
  (let ((files-changed (ediff-trees-collect-files root1 root2)))
    (if (not (null files-changed))
        (progn
          (setq ediff-trees-remaining-files files-changed)
          (setq ediff-trees-examined-files (list))
          (ediff-trees-examine-next 1))
      (message "There are no changes between the trees!"))))

(defun ediff-trees-collect-files (root1 root2)
  (ediff-trees-sort-files
   (nconc (ediff-trees-collect-changed-files root1 root2)
          (mapcar (lambda (el) (cons el nil))
                  (ediff-trees-collect-new-files root1 root2))
          (mapcar (lambda (el) (cons nil el))
                  (ediff-trees-collect-new-files root2 root1)))))


(defun ediff-trees-sort-files (files)
  (let ((tagged-files (mapcar (lambda (pair)
                                (cons (ediff-trees-get-sort-order (or (car pair) (cdr pair)))
                                      pair))
                              files)))
    (mapcar #'cdr
            (sort tagged-files
                  (lambda (tf1 tf2)
                    (let ((order1 (car tf1))
                          (order2 (car tf2)))
                      (or (< order1 order2)
                          (and (= order1 order2)
                               (let ((el1 (or (cadr tf1) (cddr tf1)))
                                     (el2 (or (cadr tf2) (cddr tf2))))
                                 (string< el1 el2))))))))))


(defun ediff-trees-get-sort-order (pathname)
  (let ((order 0)
        (sorting-regexps ediff-trees-sort-order-regexps))
    (while (and (not (null sorting-regexps))
                (not (string-match (pop sorting-regexps) pathname)))
      (setq order (+ order 1)))
    order))



(defun ediff-trees-collect-changed-files (root1 root2)
  (let ((changed (list)))
    (dolist (filename (directory-files root1))
      (unless (ediff-trees-skip-file-p filename)
        (let ((file1 (expand-file-name filename root1))
              (file2 (expand-file-name filename root2)))
          (when (and (file-exists-p file1) (file-exists-p file2))
            (if (eql (file-directory-p file1)
                     (file-directory-p file2))
                (cond ((file-directory-p file1)
                       (setq changed (nconc changed (ediff-trees-collect-changed-files file1 file2))))
                      ((not (ediff-same-file-contents file1 file2))
                       (push (cons file1 file2) changed)))
              (let ((msg (format "I cannot compare a directory, '%s', with a file.  Continue? "
                                 (if (file-directory-p file1) file1 file2))))
                (if (not (y-or-n-p msg))
                    (error "Aborting ediff-trees"))))))))
    changed))


(defun ediff-trees-collect-new-files (root1 root2)
  "Collect files from root1 that do not appear at root2."
  (let ((new-files (list)))
    (dolist (filename (directory-files root1))
      (unless (ediff-trees-skip-file-p filename)
        (let ((file1 (expand-file-name filename root1))
              (file2 (and root2 (expand-file-name filename root2))))
          (when (file-exists-p file1)
            (cond ((file-directory-p file1)
                   (setq new-files
                         (nconc new-files
                                (ediff-trees-collect-new-files file1
                                                               (and (stringp file2)
                                                                    (file-directory-p file2)
                                                                    file2)))))
                  ((or (null file2) (not (file-exists-p file2)))
                   (push file1 new-files)))))))
    new-files))

(defun ediff-trees-skip-file-p (filename)
  ;; always ignore . and ..
  (or (string= filename ".")
      (string= filename "..")
      (string-match ediff-trees-file-ignore-regexp filename)))


(defun ediff-trees-examine-next (num)
  (interactive "p")
  (if (< num 0)
    (ediff-trees-examine-previous (- num))
    (ediff-trees-examine-file
     (lambda (file) (zerop (setq num (- num 1))))
     (lambda (file) (push file ediff-trees-examined-files))
     (lambda () (pop ediff-trees-remaining-files)))))


(defun ediff-trees-examine-previous (num)
  (interactive "p")
  (if (< num 0)
    (ediff-trees-examine-next (- num))
    (ediff-trees-examine-file
     (lambda (file) (zerop (setq num (- num 1))))
     (lambda (file) (push file ediff-trees-remaining-files))
     (lambda () (pop ediff-trees-examined-files)))))


(defun ediff-trees-examine-next-regexp (regexp)
  (interactive "sSearch for (regexp): ")
  (ediff-trees-examine-file
   (lambda (file) (string-match regexp (or (car file) (cdr file))))
   (lambda (file) (push file ediff-trees-examined-files))
   (lambda () (pop ediff-trees-remaining-files))))


(defun ediff-trees-examine-previous-regexp (regexp)
  (interactive "sSearch for (regexp): ")
  (ediff-trees-examine-file
   (lambda (file) (string-match regexp (or (car file) (cdr file))))
   (lambda (file) (push file ediff-trees-remaining-files))
   (lambda () (pop ediff-trees-examined-files))))


(defun ediff-trees-examine-file (pred save-current-file-fn get-next-file-fn)
  (when (eq (current-buffer) ediff-control-buffer)
    (ediff-really-quit nil))
  (unless (null ediff-trees-current-file)
    (funcall save-current-file-fn ediff-trees-current-file)
    (when (car ediff-trees-current-file)
      (kill-buffer (find-buffer-visiting (car ediff-trees-current-file))))
    (when (cdr ediff-trees-current-file)
      (kill-buffer (find-buffer-visiting (cdr ediff-trees-current-file))))
    (setq ediff-trees-current-file nil))
  (let ((next-file (ediff-trees-get-next-file pred save-current-file-fn get-next-file-fn)))
    (if (null next-file)
        (message "No more files.")
      (progn
        (setq ediff-trees-current-file next-file)
        (if (and (car next-file) (cdr next-file))
            (ediff-files (car next-file) (cdr next-file))
          (progn
            (delete-other-windows)
            (find-file-read-only (or (car next-file) (cdr next-file)))
            (when (null (cdr next-file))
              (let ((overlay (make-overlay 0 (point-max))))
                (overlay-put overlay 'face 'ediff-trees-deleted-original-face)))))))))


(defun ediff-trees-get-next-file (pred save-current-file-fn get-next-file-fn)
  (let ((return-value 'not-found))
    (while (eq return-value 'not-found)
      (let ((next-file (funcall get-next-file-fn)))
        (cond ((null next-file)
               (setq return-value nil))
              ((funcall pred next-file)
               (setq return-value next-file))
              (t
               (funcall save-current-file-fn next-file)))))
    return-value))


(provide 'ediff-trees)

This code can be downloaded from the EmacsWiki. We made a change in the two functions ediff-trees-collect-changed-files and ediff-trees-collect-new-files where we added condition (file-exists-p file1) to skip symbolic links that do not refer to an existing file.

Create another file with the following contents, place it in your Emacs load path and load it:

(assert (and (boundp 'ediff-git-root-before) (stringp ediff-git-root-before)))
(assert (and (boundp 'ediff-git-root-after)  (stringp ediff-git-root-after)))

(require 'ediff-trees)

(setq-default ediff-ignore-similar-regions t)
(setq-default ediff-split-window-function 'split-window-horizontally)
(setq-default ediff-trees-file-ignore-regexp "^[.]?#\\|~$\\|^[.]git$")

(defun next-error-capable-buffer () "Return a 'next-error' capable buffer."
  (ignore-errors (next-error-find-buffer))
)

(defun kill-all-next-error-capable-buffers () "Kill all 'next-error' capable buffers."
  (interactive)
  (let ((buffer (next-error-capable-buffer)))
    (when buffer
      (message "Killing buffer '%s'" (buffer-name buffer))
      (kill-buffer buffer)
      (kill-all-next-error-capable-buffers)
    )
  )
)

(defun ediff-git-commits (commits) "Start a new ediff session that recursively compares 'before' and 'after'."
  (interactive "sEnter commit(s): ")
  (let* ((whitespace-chars " \f\n\r\t")
         (cc-whitespace (concat "[" whitespace-chars "]"))
         (re-commit (concat "\\([^" whitespace-chars "]+\\)"))
        )
    (if (string-match (concat "\\`" cc-whitespace "*" re-commit "\\(?:" cc-whitespace "+" re-commit "\\)?" cc-whitespace "*" "\\'") commits)
      (let ((commit-before (match-string 1 commits)) (commit-after (match-string 2 commits)))
        (unless commit-after
          (if (y-or-n-p (format "Compare %s with parent? " commit-before))
            (setq commit-after commit-before commit-before (concat commit-after "^"))
            (let ((commit (read-string "Enter commit for 'after': ")))
              (if (string-match (concat "\\`" cc-whitespace "*" re-commit cc-whitespace "*" "\\'") commit)
                (setq commit-after (match-string 1 commit))
                (error "Not a valid commit")
              )
            )
          )
        )
        (let* ((case-fold-search nil) (ok "ok") (re-command-ok (concat cc-whitespace (regexp-quote ok) "\\'"))
               (command-template (concat "unset CDPATH && cd %s && git checkout . && git fetch && git checkout %s && echo -n " ok))
               (command-before (format command-template ediff-git-root-before commit-before))
               (command-after  (format command-template ediff-git-root-after commit-after))
              )
          (message "Checking out 'before'...")
          (unless (string-match re-command-ok (shell-command-to-string command-before))
            (error "Error checking out 'before' (%s)" command-before)
          )
          (message "Checking out 'after'...")
          (unless (string-match re-command-ok (shell-command-to-string command-after))
            (error "Error checking out 'after' (%s)" command-after)
          )
        )
      )
      (unless (and (string-equal commits "") (y-or-n-p "No commit(s) supplied. Keep current checkouts? "))
        (error "No valid commit(s) supplied")
      )
    )
    (kill-all-next-error-capable-buffers)
    (message "Comparing...")
    (ediff-trees ediff-git-root-before ediff-git-root-after)
  )
)

(defun visit-next (arg) "In an ediff session, visit next file, else visit next 'next-error' message."
  (interactive "p")
  (if (or (next-error-capable-buffer) (null ediff-trees-current-file))
    (next-error arg)
    (ediff-trees-examine-next arg)
  )
)

(defun visit-previous (arg) "In an ediff session, visit previous file, else visit previous 'next-error' message."
  (interactive "p")
  (if (or (next-error-capable-buffer) (null ediff-trees-current-file))
    (previous-error arg)
    (ediff-trees-examine-previous arg)
  )
)

Make sure you have constants ediff-git-root-before and ediff-git-root-after defined in your config.
Example:

(defconst ediff-git-root-before "/home/eduard/before")
(defconst ediff-git-root-after "/home/eduard/after")

Usage

To start a new ediff session, execute ediff-git-commits (M-x ediff-git-commits) and enter zero, one or two commits. When comparing is done, execute visit-next(or bind it to a key you like), to go to the next differing files.

Harro & Eduard

Quest for the perfect Erlang development environment

Erlang is great, and there is a lot of dev tooling available. Unfortunately these best practices are not easy to find for Erlang newbies like me. So I’ll start writing them down here and grow the list as I’m moving up the Dreyfus model

Get command history for `erl`, the Erlang shell

  1. Install rlwrap. On my Mac using Homebrew:
    brew install rlwrap
  2. Add the alias
    alias erl='rlwrap -a dummy erl'

    in your Bash profile. On my Mac it’s located here:

    ~/.profile

    . Reload your profile like this:

    bash$ source ~/.profile

Automatic reloading of re-compiled modules

  1. Grab Mochiweb’s
    reload.erl

    from here, compile it and put the beam file here:

    ~/bin/reloader.beam
  2. Create or edit
    ~/.erlang

    and add the line

    code:load_abs("[YOUR_HOME_DIR_PLZ_REPLACE]/bin/reloader")

    .

  3. From now on, when you have a module loaded in the Erlang shell and re-compile it outside your shell, the new version will be reloaded automatically

Some nice utility functions for your Erlang shell

** user extended commands **
dbgtc(File)   -- use dbg:trace_client() to read data from File
dbgon(M)      -- enable dbg tracer on all funs in module M
dbgon(M,Fun)  -- enable dbg tracer for module M and function F
dbgon(M,File) -- enable dbg tracer for module M and log to File
dbgadd(M)     -- enable call tracer for module M
dbgadd(M,F)   -- enable call tracer for function M:F
dbgdel(M)     -- disable call tracer for module M
dbgdel(M,F)   -- disable call tracer for function M:F
dbgoff()      -- disable dbg tracer (calls dbg:stop/0)
l()           -- load all changed modules
la()          -- load all modules
mm()          -- list modified modules

These commands are added by:

  1. Compiling user_default.erl and move the beam file to
    ~/bin/user_default.beam
  2. Create or edit
    ~/.erlang

    and add the line

    code:load_abs("[YOUR_HOME_DIR_PLZ_REPLACE]/bin/user_default")

    .

  3. Feel free to add your own shortcuts to your
    user_default.erl

    .

There are multiple versions of

user_default.erl

floating around on the interwebs. So pick the one that feels right.

Practical Erlang testing techniques

Watch the Practical Erlang testing techniques presentation from Mr. Bob Ippolito for a quick rundown of useful testing libs.

Thanks to @andrzejsliwa for the tips! Please add your tips to the comments and I’ll update the post.

Disabling resuming of apps in OSX Lion

The new application resume feature of OSX Lion annoys me big time. To disable it, you need to do the following steps:

1) Disable the `Restore windows when quitting and re-opening apps` checkbox in the General Preference window.

General

2) Clean out and write-protect the Resume database for the command line:

rm -rf ~/Library/Saved\ Application\ State/*
chmod -w ~/Library/Saved\ Application\ State/

Thanks to @andrzejsliwa for the tip!

A Basic Full Text Search Server in Erlang

This post explains how to build a basic full text search server in Erlang. The server has the following features:

  • indexing
  • stemming
  • ranking
  • faceting
  • asynchronous search results
  • web frontend using websockets

Familiarity with the OTP design principles is recommended.

The sample application (build with help from my colleague Michel Rijnders <mies@tty.nl>) uses the Creative Commons Data Dump from StackExchange as demo data.

We cover the following subjects:

Running the Sample Application

Clone the source from GitHub:

 git clone git://github.com/tty/async_search.git

And start the application:

$ rebar get-deps compile && erl -pa `pwd`/ebin `pwd`/deps/*/ebin +P 134217727
Eshell> application:start(async).
Eshell> stackoverflow_importer_ser:import().

Visit http://localhost:3000, you should see the following page:

http://localhost:3000/

Sample ranked search output for the query erlang armstrong:

http://localhost:3000/

Sample tags facets output for the query java:

http://localhost:3000/

OTP Supervision Tree

supervisor tree

Looking at the OTP application supervision tree is a good way to understand the architecture of an OTP application.

The application supervisor async_sup starts up the following supervisors:

  • keyword_sup. A keyword_ser process is created for every unique word in the StackExchange posts. This keyword_ser is linked to the keyword_sup supervisor (a simple_one_for_one supervisor). The keyword_ser child process maintains a list of document positions of a keyword (an inverted index).
  • facet_sup. A keyword_ser process is also created for every unique facet category in the StackExchange posts. This keyword_ser process is linked to the facet_sup supervisor (a simple_one_for_one supervisor as well). The keyword_ser child process maintains a list of facet values with the IDs of the documents the facets appear in.

The application supervisor also start the following gen_server singleton processes:

  • stackoverflow_importer_ser. This server imports the demo Stack Overflow data.
  • document_ser. This server holds a copy of all documents, so it can return the original title and body of matching Stack Overflow posts in the results.
  • query_ser. This server's task is to run the actual query and return results.
  • websocket_ser. This server provides a HTTP frontend for the search engine.

No attention is given to fault tolerance (apart from the basic restart strategies), thus parts of the search index are lost if a keyword_ser process terminates.

Demo Data Import

The StackExchange data is provided as XML. Since some of the documents are quite large, it's not recommended to load the full XML documents in memory. The solution is to use a SAX parser which treats a XML file as a stream, and triggers events when new elements are discovered. The search server uses the excellent SAX parser from the Erlsom library by Willem de Jong.

In the example below erlsom:parse_sax reads the XML file from FilePath and calls the function sax_event if an XML element is found.

When the element is a row element (i.e. a post element), attributes like Id, Title and Body are stored in a dictionary. For every post a copy of all the attributes in document_ser is saved. This is used for returning the actual posts for a query match. After that the add_attribute_tokens function is called:

The add_attribute_tokens function does two things. It calls add_facet (discussed later) and it creates a list of tuples with all the words and their position in the document. This process is called tokenization. Each token/position tuple is then submitted to the add_keyword_position function of the keyword_ser for indexing.

Indexing

Indexing of the tuples, or keywords, is handled by the keyword_ser. For every unique word a keyword_ser process is started if not already present. The state of a keyword_ser process is a dictionary with the document ID as key and a list of positions as value. The document ID corresponds to the ID of the Stack Overflow post.

The keyword_server_name function generates a unique name under which the keyword_ser process is registered, so the module can check if a keyword already has a process or a new process needs to be created.

Stemming

Stemming is the process for reducing inflected words to their base form. Computing and computer both are stemmed to comput. So when a user searches on computing, it also matches text that contains computer. This makes it possible to return results that are relevant, but do not exactly match the query.

In our sample application all keywords are stemmed using the popular Porter Algorithm. The Erlang implementation by Alden Dima is used in the application.

erlang:phash2 is used to transform the stemmed name to a hash, to make sure the registered process name is valid.

Faceting

Faceted search is an important navigation feature for search engines. A user can drill down the search results by filtering on pre-defined attributes, like in this example of a digital camera search on CNET:

Faceted search example

As mentioned above, the data import the function add_attribute_tokens also calls the add_facet function. Using pattern matching the Tags and the Creationdate attributes are selected for faceting. Tags is a so called multivalue facet, as a Stack Overflow post can have one or more tags assigned. For every tag and creation date the facet_ser:add_facet_value function is called.

facet_ser works very similar to keyword_ser. For every facet category, Tag or Creationdate in our case, a facet_ser processes is started. The state of a facet_ser is a dictionary with the Tag or Creationdate values as key and their document IDs as dictionary values.

Querying and Relevance Ranking

In previous sections is shown:

  • how the XML demo data is parsed.
  • how this data is stemmed and indexed by creating a keyword_ser process for every unique keyword.
  • how this data is indexed for faceted search by creating a facet_ser process for every facet category.

With the function stackoverflow_importer_ser:import() these steps are executed, and your Erlang node is now ready for querying. So how does that work?

Querying

Querying is handled by passing the user's query terms to the function do_async_query of the singleton query_ser server. When calling this function you need to specify the module, function and optional reference attribute which will be called when query results are available.

In the handle_cast the following steps are executed:

  • keyword_ser:do_query return all document ids that contain one or more of the user's query terms, including the relevance ranking score, which will be discussed below.
  • All original documents are stored during indexing in a document_ser process. All matching documents are collected.
  • The callback function is invoked with the matching documents and their ranking scores as arguments.
  • Facet results are retrieved for any FacetCategories that are specified by calling facet_ser:get_facets.
  • And the callback function is invoked a second time with the facet results as arguments.

Relevance Ranking

Relevance in this context denotes how well a retrieved document matches the user's search query. Most fulltext search-engines use the BM25 algorithm to determine the ranking score of each document, so let's use that too.

BM25 calculates a ranking score based on the query term frequency in each documents.

See the async_bm25.erl for the implementation.

Displaying the Search Results

As discussed, the query_ser:do_async_query can be called to query our full-text search engine. To allow users to send queries and see the result the websocket_ser module is created. This singleton gen_serverstarts up a Misultin HTTP server on Port 3000. If you browse to http://localhost:3000 you will see a search box. Communication with the search engine is done through websockets.

So, when a user posts a query, this message is received by the websockets_ser:handle_websocket receive block. The query_ser:do_async_query function is called and query results are expected on websockets_ser:query_results function.

The query_results function formats the results as HTML and sends this through the websocket. When received, the HTML is appended to the user's page.

A similar process is executed when the facet results are received:

Improvements

Some obvious features that are lacking from this sample application:

  • The author of this post is an Erlang newbie. Corrections/suggestions to the code are most welcome. You can send them to <ward@tty.nl>
  • Pretty much no attention is given to performance / memory usage.
  • Fault tolerence for the index data. When a server containing index state dies, it will not be revived.
  • Tuple structures passed between modules are not specified. Would be nice to use record syntax for it.
  • No unit/quickcheck/common test added.
  • No function/type specifications.
  • etc..

So, that why it's called a sample application ;-)

Erlang Factory Lite Amsterdam Talks Announced

See http://www.erlang-factory.com/conference/amsterdam for more details and free registration

Travis CI – Distributed, Continuous Integration for the open source community.

By Ward Bekker / TTY Internet Solutions – Travis CI is a new continuous integration service for the open source community. It started out with a Ruby focus and became an instant success. Recently Erlang support was added. A few well known projects, like eTorrent, Mochiweb, Meck and Elixir, already started using it. In this presentation you will learn how the system works, the vision behind it, the upcoming features the team is working on and how to add your own Erlang projects.

The Erlang trace facility

By Jeroen Koops – In this talk, I’ll show how to use Erlang’s low-level trace facility, and the higher-level dbg module that is built on top of it. Finally, I’ll demonstrate how to build a simple tool using the primitives provided by the trace-facility.

Let’s jabber about ejabberd

By Ahmed Omar / Nimbuzz – Just a quick jabber about ejabberd

Zotonic, the Erlang web framework, at MaxClass

By Marc Worrell / MaxClass – Zotonic is both an easy to use content management system and a powerful web framework. It’s built on some of the best pieces of Erlang open source software, by experienced web developers. Zotonic comes with an incredible speed out-of-the-box, an extensible infrastructure and most of all, a friendly community. In the first part of this talk, we summarize the history and development of Zotonic, and give a short introduction to the data model and the architecture.

Travis now available in the Erlang flavor

Travis, the very popular and open distributed build system for the Ruby community, has diversified. It now also features first class Erlang support. It came together with help from former colleague Josh Kaldermis and the other Travis devs. Thx guys! Also many thanks to TTY Internet Solutions for providing a server for hosting the workers.

Currently we provide Erlang/OTP releases R14B01, R14B02 and R14B03. Older versions will be added in the near future. Builds are managed with the excellent Rebar tool from the Basho folks.

Projects

A selection of projects that were added to Travis at the time of writing:

So, why not add your Erlang project now?

The near future

Currently only eunit tests are run. We are going to add support for:

Most of these test can already be run by customizing the script element in the .travis.yml, but we want to make it as convenient as possible.

Oh, and did you add your Erlang project already?

Questions?

Questions or need help? Join #travis on freenode or contact me on twitter

Parallel testing: make your CPU cores sweat

All my fellow team-mates have fast workstations: quad core, 8 gigs of memory. Yay! BUT…..running our full test suite takes about 45 minutes. Boo! It’s a mix of Cucumber+webrat integration tests and unit tests. If you look at the cpu activity it doesn’t even spike a single core during the test. Memory consumption stays practically flat. That’s an extremely poor use of all that computing power. No wonder, all test are run sequentially. In our multi-core age that’s soo 90′s.

The solution is obvious: you need to parallelize the tests. Every integration test needs a dedicated environment to able to get predictable results. For most integration test this means exclusive access to resources like your database (Mysql), memory caching (Memcached) and/or full text search solutions (Sphinx | Solr). You can design your tests to be collision free, but like most multi-threaded programming that uses shared resource it’s quite difficult to get it right. And debugging weird threading issues will make you want to put pencils in your eyes. Trust me on that.

A more efficient way of creating a dedicated environment for every test is the use of virtual machines (vm). You replicate your integration test environment on a vm. Make several clones and your now have a pool of vm’s that can run your tests in parallel and guaranteed exclusivity.

The hard part of this solution;

  • Cucumber and the unit test runner need to be modified to run tests distributed.
  • Non-hypervisor virtualisation systems like Virtualbox and VMWare Server introduce a significant performance overhead. Hypervisor systems require a dedicated box.
  • Provisioning of virtual machines can be a chore. Solutions like Vagrant can help with that.

But it will be worth it. Your CPU cores are worth it.

Using VisualVM to fix live Tomcat and JVM problems

You have done all your Java implementation, unittesting and perhaps integration testing. You met all specs and passed the acceptance phase, so you’re going to deploy your .war file to the live environment and install it on Tomcat. All goes well and you continue on other work to be done… A few hours later the system administrator calls you and asks you why the quad core processor reached 400% of CPU level (where normally it’s around 100% spread over 4 cores). You did the best you can with testing but still a problem like this can get through. Now the challenge starts: How to find what causes this problem!

CPU load is far higher then normal. Note the platforms in the graph!

CPU load is far higher then normal. Note the platforms in the graph!

Continue reading