One of the most popular open-source indexers for web development has to be Sphinx, an sql full-text search engine. Sphinx has been used by many high profile site, including The Pirate Bay, Craigs List, and NetLog, to name the big ones (more here).
But this post is not about Sphinx, but its Ruby helper library, Thinking Sphinx. In fact, its not just about Thinking Sphinx, finding and fixing a problem yourself.
Thinking Sphinx is an amazing open source gem/plugin/library for Rails, masterminded by the great Pat Allan (@pat), and available to all on github. In fact, with 89 contributors and climbing, this library has had some tender love and care for many caring developers. If you are a Ruby developer and you are interested in indexing, have a look at Thinking Sphinx today, it is simple to install and use, fast and effective, and has some excellent documentation and a very active google group.
Recently while working on a new project for a client of ours I encountered an annoying issue/problem/bug with Thinking Sphinx, more specifically, DateTime multi-value attributes. A multi-value attribute (MVA) is like specifying an array of values for a specific field, for example, employee names, the trouble is that MVAs can only be integers so strings need to be converted (CRC values, CRC32) and datetimes need to be converted to unix timestamps. In Thinking Sphinx MVAs are usually defined through the relationships a model might hold, eg. an office has many employees.
Thinking Sphinx had no problem with me specifying MVAs of integers, but datetimes where a bit of an issue due to two issues, one is the converting, and the other is the group concat to add them all together with commas (like an array). Now, to say that it was an issue makes it sound worse than it really was, because by breaking down the issue into the two parts, converting and concatenating, two improvements could be made at the same time.
So after a brief chat with Pat on the google group to confirm the issue, and on a perfect sunday to be inside (hungover and not great weather), I forked Thinking Sphinx on github (gotta love github for making forking projects so easy) and created a four commit patch which solved the MVA datetime issue for everyone. This patch has since been pulled into the master Thinking Sphinx project so others can use datetime columns as MVAs without having to think about it.
The whole experience of finding this bug, fixing it, and submitting a patch, has reenforced a very important lesson for all developers. If you run into a problem which you do not believe is excepted behavior, do not run away from it and try to find a workaround, get into the code and try to find out where and why the problem is occurring. Its very easy to use libraries and gems and plugins without really knowing how they work or what they do, but if you never get your hands dirty and investigate what is going on then you never learn.
If you can’t find the problem, ask for help on the message boards or by emailing the maintainer. On the other hand, if you do find the problem, check with the maintainer if he can confirm your findings.
Next, create a patch which fixes the issue and does not break or alter current functionality (unless it really does need altering).
And last but not least, share the patch with the community. (make the world a better place)
By the time you have finished with the patch you will not only know the code a lot better, but you would have learnt a truck load about the library, which in the long run is a huge advantage as you are not blindly trusting code written by a hoard of other developers but know first hand what is going on.