Constructing the Bazaar: Taking advantage of the open-source development model in your project
The open-source development model can can be powerful and incredibly beneficial for any software development project. Yet, even most open-source projects do not follow this style of development. Today, I want to talk a bit about why this is the case, and what can be done to set your project on the right path to enjoy the advantages of the open-source development process.
In 1997, Eric S. Raymond wrote his seminal essay The Cathedral and the Bazaar. He discussed his observations of the Linux kernel development processes, contrasting it with the traditional approach of software design. He likened the teeming and seemingly arbitrary community effort of Linux to a bazaar and the deliberate and traditional top-down approach to the construction of a cathedral.
The Bazaar in action
In his opinion, the bazaar approach is able to tackle even the development of extraordinarily complex systems effectively and in a superior manner. According to Raymond, a large number of beta testers and users â?? who ideally are also developers â?? allow for any bug to be quickly found, characterized and fixed. He contends that debugging is parallelizable and therefore adding more people to the project can be effective.
Ever since Brooks’ The Mythical Man Month, it is accepted wisdom that ‘adding more people to a late project will make it later’. Brooks’ observation rests on the fact that communication overhead in any team will grow quadratically, eventually letting any work grind to a halt. In his essay, Raymond describes the typical organization of an open-source project: A very small team of core developers, and a larger number of halo developers. Therefore, he contends, the communication structure in such a project does not have to be a complete graph. Communication tends to take place among smaller groups, interacting only when needed via paths established by the core developers. Therefore the system as a whole, no matter how large, does not suffer from the full Brooksian overhead.
Cathedrals in open-source
Contrary to what one might think, bazaar-style development as demonstrated successfully by Linux is still quite uncommon, even in the world of open-source. Aside from a few exceptions, most projects are either private efforts of one or few individuals or they are professionally managed software engineering projects run by commercial entities. While there are certainly exceptions, typically both types of projects have either none or little outside development contribution. In both cases, the software design and architecture tends to be much more reminiscent of the cathedral-style.
The last few years saw the emergence of a number of commercial open-source projects. In some cases an established, proprietary software vendor decided to open-source some or all of its code. In others, companies were explicitly founded as open-source companies from the start. In both cases, ‘open-source’ as a business model is the important consideration here. However, the management, architecture and the entire organizational setup often remained largely unchanged from that of the traditional software engineering process or company: Software is designed by a few and implemented largely by normally employed developers, which are managed accordingly, just like in any other traditional software engineering effort. A far cry from the powerful bazaar-style of development, no matter how open the source might be in any other respect.
Looking for the reasons: Modularity and APIs again
Open-source has the inherent ability to quickly gather a large number of users. If the user community building effort (a formidable challenge in itself) is successful then â?? independent from how the software is engineered â?? at least the project can benefit from Raymond’s observation that larger numbers of users will make it easier to discover bugs. This is true, even if most users will not be developers and will not be examining the source code.
But shouldn’t a replication of the Linux engineering success be the holy grail for any open-source project, commercial or not? I am sure that most projects would love to see a large, vibrant and magically self-organizing community of developers. What does it take to get there? Why do most open-source projects simply do not have bazaar-style development?
Besides good developer-community building efforts and a sexy technology or problem to solve, there are a number of things that an open-source project can do to aid the development of the ‘bazaar’.
In this context, it is also interesting to consider the description of the Linux kernel development process given by Alan Cox, one of its core developers. He talks about many practical aspects of Linux development, which confirm Raymond’s observations. He particularly stresses fine-grained modularity as a key factor of success, which permits small teams to work effectively on their module, only requiring communication with a few neighbouring modules (facilitated by core developers), thus helping to keep communication overhead low.
Furthermore, if modularity is achieved via good APIs amongst modules, then this allows changes within modules to take place without the risk of breaking other parts of the system. Likewise, this same approach allows a module to break up into two or more modules if the module grows too much in complexity. Cox points out that most module teams are smaller than six individuals. Whenever a group gets too big, it naturally tends to split in order to reduce the complexity and communication overheads again. He attributes this to the desire of ‘not having to know too much’.
Fascinating stuff, indeed.
I wrote about the importance of APIs before, but that was in the context of plugins, allowing people to start contributing without having to change or modify the existing system. Here, however, we are talking about the effectiveness of APIs and modularity in the actual source of your system, allowing willing and capable contributors to easily and confidently modify the system itself.
Distributed teams are good for you
After Linus Torvalds’ publication of the initial 10,000 line kernel, volunteers from around the world jumped on the chance to work on ‘something interesting’. As anyone who is involved in distributed teams knows, even with the help of mailing lists and chat rooms, blogs, wikis and instant messaging, there is nothing quite like face to face communication. Being able to quickly pop your head over the cubicle wall to find out how to use someone else’s code can be very helpful.
However, there is also a danger in that convenience. The closeness and immediacy of communication in co-located teams can lead to more direct usage of other code and lessens the need for thoughtfully designed interfaces. In other words, unless we are careful we arrive at lower cohesion and higher coupling, the antithesis of good software design. Out of sheer necessity, a distributed development team is well advised to spend more time on defining the boundaries of those pieces of code that individual team members will be working on. Most successful distributed projects will be those that heeded this advice.
Therefore, while lacking the benefits of immediate and direct communication, distributed development teams are forced to think about modularity and APIs between modules from the very start. Consider that Richard Stallman used Unix as the reference design for his GNU system for exactly this reason (among others): Well defined modules, facilitating a distributed development effort. It was a concious choice for GNU, and it was a necessity for Linux: Without an emphasis on modularity and APIs, a distributed development effort becomes unnecessarily difficult, if not impossible.
The dangers of familiarity
Some commercially run open-source projects also have to deal with distributed teams. SnapLogic, located in California, is still a small team. Nevertheless, we already have to deal with a distributed development organization: One developer is in New Zealand, the other on the East-Coast of the US. Several team members frequently work from home. We also find heavily distributed teams in companies like MySQL or Alfresco, where up to 70% of the developers work from home.
But even if your hired developers are distributed, they still have a chance to become ‘too’ familiar with the system they are working on. After all, they are working full time on your project, attend regular meetings and can always easily reach other developers. Therefore, to truly enable the bazaar it is important to keep the part-time contributors in mind. The motivated, smart individuals who may contribute to your project at the end of their work days, on their own time. Those who cannot participate in meetings, will not get or prepare status updates and will not always be able to follow the water-cooler talk on a daily basis. Modules have to be simple, and stand on their own. APIs and coding conventions must be designed to allow contributions with only limited or localized understanding of the system architecture.
Think distributed and think part-time to enable the Bazaar
And this then provides us with the answer to our initial question: Why do so many open-source projects not have the active community of external contributors they are hoping for? Because they have been largely developed by co-located teams of hired software engineers, 100% dedicated to the project, managed and organized like any traditional software development effort. This seems to be especially true for the new crop of ‘custom build’ open-source companies, which would like to take advantage of the open-source business model. They might hope to also enjoy the advantages of the open-source development model one day, but achieving that requires a conscious effort.
Here are a few things to keep in mind then:
- Do not be seduced by the simplicity and ease of communication that you have in your co-located team.
- Always consider what it would take for any new, external developer to jump in quickly and become productive, even if this developer can only use much less direct communication channels.
- Consider that external developers can only work part-time on your project, and do not have the time to follow long discussions or become truly familiar with your system down to the last module.
- Always remind yourself to focus on the distinct and simple (!) modules in your design and reflect them in your code accordingly. External developers may only have time to dive deep into one or two of the modules, not the entire system. Minimize extensive system understanding as a prerequisite.
- Always make sure the modules and their APIs are defined and documented clearly.
- Use code reviews to prevent any more direct usage of a module’s code that would circumvent the APIs in any way.
- Enforcing encapsulation via the programming language is not sufficient in this situation: Particularly insidious and hard to spot is the code that relies on knowledge about the internals of another module, even if it does not take short-cuts around the APIs. This problem can be caused by APIs that are designed with the assumption that the user of the API knows a few internals of the module. In co-located teams with effortless and direct communication, this is a trap one can fall into easily. Therefore, we need to strive to build our code with true knowledge isolation in mind.
If these points are not considered on a daily basis, the code does not have the level of modularity and simple APIs that are needed to facility bazaar-style development. Even the best intentions of new, external developers can be frustrated by unnecessary prerequisites. And as anyone trying to build a community knows: The most important ingredient is to make it easy to join and get started. In our case here this means: Make it easy to be productive and achieve the rewarding feeling of success.
The funny thing is: The importance of modularization, APIs and encapsulation is taught in every entry-level class for software design. We all know that already. And yet, unless we internalize and continuously remember the lessons that successful distributed projects can teach us, it remains exceedingly easy for our co-located, 100% committed teams to end up producing code that is just not suitable for a bazaar.