Quest for the perfect Erlang development environment

Erlang is great, and there is a lot of dev tooling available. Unfortunately these best practices are not easy to find for Erlang newbies like me. So I’ll start writing them down here and grow the list as I’m moving up the Dreyfus model

Get command history for `erl`, the Erlang shell

  1. Install rlwrap. On my Mac using Homebrew:
    brew install rlwrap
  2. Add the alias
    alias erl='rlwrap -a dummy erl'

    in your Bash profile. On my Mac it’s located here:

    ~/.profile

    . Reload your profile like this:

    bash$ source ~/.profile

Automatic reloading of re-compiled modules

  1. Grab Mochiweb’s
    reload.erl

    from here, compile it and put the beam file here:

    ~/bin/reloader.beam
  2. Create or edit
    ~/.erlang

    and add the line

    code:load_abs("[YOUR_HOME_DIR_PLZ_REPLACE]/bin/reloader")

    .

  3. From now on, when you have a module loaded in the Erlang shell and re-compile it outside your shell, the new version will be reloaded automatically

Some nice utility functions for your Erlang shell

** user extended commands **
dbgtc(File)   -- use dbg:trace_client() to read data from File
dbgon(M)      -- enable dbg tracer on all funs in module M
dbgon(M,Fun)  -- enable dbg tracer for module M and function F
dbgon(M,File) -- enable dbg tracer for module M and log to File
dbgadd(M)     -- enable call tracer for module M
dbgadd(M,F)   -- enable call tracer for function M:F
dbgdel(M)     -- disable call tracer for module M
dbgdel(M,F)   -- disable call tracer for function M:F
dbgoff()      -- disable dbg tracer (calls dbg:stop/0)
l()           -- load all changed modules
la()          -- load all modules
mm()          -- list modified modules

These commands are added by:

  1. Compiling user_default.erl and move the beam file to
    ~/bin/user_default.beam
  2. Create or edit
    ~/.erlang

    and add the line

    code:load_abs("[YOUR_HOME_DIR_PLZ_REPLACE]/bin/user_default")

    .

  3. Feel free to add your own shortcuts to your
    user_default.erl

    .

There are multiple versions of

user_default.erl

floating around on the interwebs. So pick the one that feels right.

Practical Erlang testing techniques

Watch the Practical Erlang testing techniques presentation from Mr. Bob Ippolito for a quick rundown of useful testing libs.

Thanks to @andrzejsliwa for the tips! Please add your tips to the comments and I’ll update the post.

Disabling resuming of apps in OSX Lion

The new application resume feature of OSX Lion annoys me big time. To disable it, you need to do the following steps:

1) Disable the `Restore windows when quitting and re-opening apps` checkbox in the General Preference window.

General

2) Clean out and write-protect the Resume database for the command line:

rm -rf ~/Library/Saved\ Application\ State/*
chmod -w ~/Library/Saved\ Application\ State/

Thanks to @andrzejsliwa for the tip!

A Basic Full Text Search Server in Erlang

This post explains how to build a basic full text search server in Erlang. The server has the following features:

  • indexing
  • stemming
  • ranking
  • faceting
  • asynchronous search results
  • web frontend using websockets

Familiarity with the OTP design principles is recommended.

The sample application (build with help from my colleague Michel Rijnders <mies@tty.nl>) uses the Creative Commons Data Dump from StackExchange as demo data.

We cover the following subjects:

Running the Sample Application

Clone the source from GitHub:

 git clone git://github.com/tty/async_search.git

And start the application:

$ rebar get-deps compile && erl -pa `pwd`/ebin `pwd`/deps/*/ebin +P 134217727
Eshell> application:start(async).
Eshell> stackoverflow_importer_ser:import().

Visit http://localhost:3000, you should see the following page:

http://localhost:3000/

Sample ranked search output for the query erlang armstrong:

http://localhost:3000/

Sample tags facets output for the query java:

http://localhost:3000/

OTP Supervision Tree

supervisor tree

Looking at the OTP application supervision tree is a good way to understand the architecture of an OTP application.

The application supervisor async_sup starts up the following supervisors:

  • keyword_sup. A keyword_ser process is created for every unique word in the StackExchange posts. This keyword_ser is linked to the keyword_sup supervisor (a simple_one_for_one supervisor). The keyword_ser child process maintains a list of document positions of a keyword (an inverted index).
  • facet_sup. A keyword_ser process is also created for every unique facet category in the StackExchange posts. This keyword_ser process is linked to the facet_sup supervisor (a simple_one_for_one supervisor as well). The keyword_ser child process maintains a list of facet values with the IDs of the documents the facets appear in.

The application supervisor also start the following gen_server singleton processes:

  • stackoverflow_importer_ser. This server imports the demo Stack Overflow data.
  • document_ser. This server holds a copy of all documents, so it can return the original title and body of matching Stack Overflow posts in the results.
  • query_ser. This server's task is to run the actual query and return results.
  • websocket_ser. This server provides a HTTP frontend for the search engine.

No attention is given to fault tolerance (apart from the basic restart strategies), thus parts of the search index are lost if a keyword_ser process terminates.

Demo Data Import

The StackExchange data is provided as XML. Since some of the documents are quite large, it's not recommended to load the full XML documents in memory. The solution is to use a SAX parser which treats a XML file as a stream, and triggers events when new elements are discovered. The search server uses the excellent SAX parser from the Erlsom library by Willem de Jong.

In the example below erlsom:parse_sax reads the XML file from FilePath and calls the function sax_event if an XML element is found.

When the element is a row element (i.e. a post element), attributes like Id, Title and Body are stored in a dictionary. For every post a copy of all the attributes in document_ser is saved. This is used for returning the actual posts for a query match. After that the add_attribute_tokens function is called:

The add_attribute_tokens function does two things. It calls add_facet (discussed later) and it creates a list of tuples with all the words and their position in the document. This process is called tokenization. Each token/position tuple is then submitted to the add_keyword_position function of the keyword_ser for indexing.

Indexing

Indexing of the tuples, or keywords, is handled by the keyword_ser. For every unique word a keyword_ser process is started if not already present. The state of a keyword_ser process is a dictionary with the document ID as key and a list of positions as value. The document ID corresponds to the ID of the Stack Overflow post.

The keyword_server_name function generates a unique name under which the keyword_ser process is registered, so the module can check if a keyword already has a process or a new process needs to be created.

Stemming

Stemming is the process for reducing inflected words to their base form. Computing and computer both are stemmed to comput. So when a user searches on computing, it also matches text that contains computer. This makes it possible to return results that are relevant, but do not exactly match the query.

In our sample application all keywords are stemmed using the popular Porter Algorithm. The Erlang implementation by Alden Dima is used in the application.

erlang:phash2 is used to transform the stemmed name to a hash, to make sure the registered process name is valid.

Faceting

Faceted search is an important navigation feature for search engines. A user can drill down the search results by filtering on pre-defined attributes, like in this example of a digital camera search on CNET:

Faceted search example

As mentioned above, the data import the function add_attribute_tokens also calls the add_facet function. Using pattern matching the Tags and the Creationdate attributes are selected for faceting. Tags is a so called multivalue facet, as a Stack Overflow post can have one or more tags assigned. For every tag and creation date the facet_ser:add_facet_value function is called.

facet_ser works very similar to keyword_ser. For every facet category, Tag or Creationdate in our case, a facet_ser processes is started. The state of a facet_ser is a dictionary with the Tag or Creationdate values as key and their document IDs as dictionary values.

Querying and Relevance Ranking

In previous sections is shown:

  • how the XML demo data is parsed.
  • how this data is stemmed and indexed by creating a keyword_ser process for every unique keyword.
  • how this data is indexed for faceted search by creating a facet_ser process for every facet category.

With the function stackoverflow_importer_ser:import() these steps are executed, and your Erlang node is now ready for querying. So how does that work?

Querying

Querying is handled by passing the user's query terms to the function do_async_query of the singleton query_ser server. When calling this function you need to specify the module, function and optional reference attribute which will be called when query results are available.

In the handle_cast the following steps are executed:

  • keyword_ser:do_query return all document ids that contain one or more of the user's query terms, including the relevance ranking score, which will be discussed below.
  • All original documents are stored during indexing in a document_ser process. All matching documents are collected.
  • The callback function is invoked with the matching documents and their ranking scores as arguments.
  • Facet results are retrieved for any FacetCategories that are specified by calling facet_ser:get_facets.
  • And the callback function is invoked a second time with the facet results as arguments.

Relevance Ranking

Relevance in this context denotes how well a retrieved document matches the user's search query. Most fulltext search-engines use the BM25 algorithm to determine the ranking score of each document, so let's use that too.

BM25 calculates a ranking score based on the query term frequency in each documents.

See the async_bm25.erl for the implementation.

Displaying the Search Results

As discussed, the query_ser:do_async_query can be called to query our full-text search engine. To allow users to send queries and see the result the websocket_ser module is created. This singleton gen_serverstarts up a Misultin HTTP server on Port 3000. If you browse to http://localhost:3000 you will see a search box. Communication with the search engine is done through websockets.

So, when a user posts a query, this message is received by the websockets_ser:handle_websocket receive block. The query_ser:do_async_query function is called and query results are expected on websockets_ser:query_results function.

The query_results function formats the results as HTML and sends this through the websocket. When received, the HTML is appended to the user's page.

A similar process is executed when the facet results are received:

Improvements

Some obvious features that are lacking from this sample application:

  • The author of this post is an Erlang newbie. Corrections/suggestions to the code are most welcome. You can send them to <ward@tty.nl>
  • Pretty much no attention is given to performance / memory usage.
  • Fault tolerence for the index data. When a server containing index state dies, it will not be revived.
  • Tuple structures passed between modules are not specified. Would be nice to use record syntax for it.
  • No unit/quickcheck/common test added.
  • No function/type specifications.
  • etc..

So, that why it's called a sample application ;-)

Erlang Factory Lite Amsterdam Talks Announced

See http://www.erlang-factory.com/conference/amsterdam for more details and free registration

Travis CI – Distributed, Continuous Integration for the open source community.

By Ward Bekker / TTY Internet Solutions – Travis CI is a new continuous integration service for the open source community. It started out with a Ruby focus and became an instant success. Recently Erlang support was added. A few well known projects, like eTorrent, Mochiweb, Meck and Elixir, already started using it. In this presentation you will learn how the system works, the vision behind it, the upcoming features the team is working on and how to add your own Erlang projects.

The Erlang trace facility

By Jeroen Koops – In this talk, I’ll show how to use Erlang’s low-level trace facility, and the higher-level dbg module that is built on top of it. Finally, I’ll demonstrate how to build a simple tool using the primitives provided by the trace-facility.

Let’s jabber about ejabberd

By Ahmed Omar / Nimbuzz – Just a quick jabber about ejabberd

Zotonic, the Erlang web framework, at MaxClass

By Marc Worrell / MaxClass – Zotonic is both an easy to use content management system and a powerful web framework. It’s built on some of the best pieces of Erlang open source software, by experienced web developers. Zotonic comes with an incredible speed out-of-the-box, an extensible infrastructure and most of all, a friendly community. In the first part of this talk, we summarize the history and development of Zotonic, and give a short introduction to the data model and the architecture.

Travis now available in the Erlang flavor

Travis, the very popular and open distributed build system for the Ruby community, has diversified. It now also features first class Erlang support. It came together with help from former colleague Josh Kaldermis and the other Travis devs. Thx guys! Also many thanks to TTY Internet Solutions for providing a server for hosting the workers.

Currently we provide Erlang/OTP releases R14B01, R14B02 and R14B03. Older versions will be added in the near future. Builds are managed with the excellent Rebar tool from the Basho folks.

Projects

A selection of projects that were added to Travis at the time of writing:

So, why not add your Erlang project now?

The near future

Currently only eunit tests are run. We are going to add support for:

Most of these test can already be run by customizing the script element in the .travis.yml, but we want to make it as convenient as possible.

Oh, and did you add your Erlang project already?

Questions?

Questions or need help? Join #travis on freenode or contact me on twitter

Parallel testing: make your CPU cores sweat

All my fellow team-mates have fast workstations: quad core, 8 gigs of memory. Yay! BUT…..running our full test suite takes about 45 minutes. Boo! It’s a mix of Cucumber+webrat integration tests and unit tests. If you look at the cpu activity it doesn’t even spike a single core during the test. Memory consumption stays practically flat. That’s an extremely poor use of all that computing power. No wonder, all test are run sequentially. In our multi-core age that’s soo 90′s.

The solution is obvious: you need to parallelize the tests. Every integration test needs a dedicated environment to able to get predictable results. For most integration test this means exclusive access to resources like your database (Mysql), memory caching (Memcached) and/or full text search solutions (Sphinx | Solr). You can design your tests to be collision free, but like most multi-threaded programming that uses shared resource it’s quite difficult to get it right. And debugging weird threading issues will make you want to put pencils in your eyes. Trust me on that.

A more efficient way of creating a dedicated environment for every test is the use of virtual machines (vm). You replicate your integration test environment on a vm. Make several clones and your now have a pool of vm’s that can run your tests in parallel and guaranteed exclusivity.

The hard part of this solution;

  • Cucumber and the unit test runner need to be modified to run tests distributed.
  • Non-hypervisor virtualisation systems like Virtualbox and VMWare Server introduce a significant performance overhead. Hypervisor systems require a dedicated box.
  • Provisioning of virtual machines can be a chore. Solutions like Vagrant can help with that.

But it will be worth it. Your CPU cores are worth it.

Solr DataImportHandler issue: positive integers indexed for string ‘nested’ fields

A quick note about a Solr issue that took me some time to solve.

If this sounds familiar….

  • You are using the DataImportHandler for Solr
  • You have a entity with a field which values come from a related entity.
  • After an import it looks like Solr only indexed even postive integers if you look at the schema browser.

….You probably have a ‘nested’ field which name is similar to it’s entity name. See the code below: entity name = regio and field name = regio. Changing field name to something else (regions) solved the issue. When you think about it, it’s somewhat logical that you don’t allow field names to have the same name as the entity. An schema exception during indexing would have been nice though.

<dataConfig>
<dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/nvb?zeroDateTimeBehavior=convertToNull" user="****" password="****"/>
    <document name="vacatures">
        <entity name="vacature" query="select * from vacature">
            <field column="id" name="id" />
            <entity name="regio" query="SELECT regio from foo where vacature='${vacature.id}">
                <field name="regio" column="regio" />
            </entity>

Zend SOAP Server Webservice quickstart

Below a quick writeup of my first impression with building an basic Zend Soap Webservice. I invite you to add spelling and grammer corrections in the comments for my education.

Starting point

  • My team needs to implement a SOAP service for mass posting of vacancies to a job board system.
  • The SOAP service is based on a WSDL of an existing service. So we’ll use these specifications as a starting point for the proof of concept.
  • On a site-note: I prefer REST above SOAP, because of it’s elegant simplicity. But it wouldn’t make a lot of business sense in this case because a lot of, paying, consumers of the new service have working code for the old service. Adapting to a slightly changed SOAP service will be much easier than a switch to a brand new REST API.

Available SOAP Server extensions for PHP

There are several frameworks / extensions / toolkits for creating a SOAP server for PHP:

  • Pear SOAP package. Probably an orphan package as it’s not updated since 2008 and has a beta status. You probably want to look at the alternatives.
  • NuSoap SOAP toolkit.  Started in 2002 and still under active development as the last release was just a few months ago at the time of writing.
  • PHP 5 SOAP extensions. The official SOAP extension for PHP since version 5.
  • Zend SOAP Server. Part of the Zend Framework, so probably not very useful if that’s not your current PHP framework.

As we use the Zend framework for this project, it was a natural choice to use it’s SOAP server implementation. We might opt for one of the alternatives if we slam into a brick wall later down the line.

Testing the waters

The steps I’ve taken to get a basic Zend SOAP Server based on the WSDL up and running

  • I copied the wdsl to the /public directory of the Zend framework application making it publicly accessible under  http://example.org/jobtool.wsdl
  • I created a new controller under application/controllers/soapController.php with an public indexAction function. Example code
  • The new SOAP service is now available under http://example.org/soap
  • Next step: actually handle SOAP requests. Example code. Handling of the soap request is as expected: SOAP method arguments are passed as function arguments. Complex types are represented as a stdClass objects, which basically are associative arrays. Nested complex types are translated to nested stdClass instances. You don’t get any warnings or exceptions if your argument count is different than specified in the SOAP request. IMHO that’s undesirable. I rather have big fat ugly exceptions in that case than subtle bugs.  The associative array you return are translated to the complex type as specified in the WSDL and returned to the client.
  • To test the SOAP service without the need for a full-blown client i’ve used the free soapUI tool. You point this tool to the WDSL and it automatically creates fake soap request that you can use to test your brand new SOAP services. Make sure you specified the correct urls in the soapAction attributes in the WSDL.

Closing Thoughts

I hope this post saved you some when time building your first SOAP Webservice using Zend Framework. I don’t know yet from experience if the Zend SOAP Server will handle more advanced scenario’s. Only time will tell. Let me know how it works for you.