It is standard teaching and widely accepted best practice that the interfaces between system modules should be well and thoroughly defined during the design phase, before implementation commences. It enables different developers to work on those modules independently and safe in the knowledge that the interfaces are set and reliable. The defined APIs are all they really have to worry about in order to ensure successful integration with other components of the system.
Re-evaluating the best practice
With dynamic languages, such as Python, I believe that this approach should be reconsidered and modified. In reality, most projects will not achieve a complete interface design before implementation start anyway. Some changes always have to be made. They are then usually considered as unfortunate, maybe even unavoidable. Something a development organization will have to ‘deal with’. I would like to suggest that with Python and similar languages such incompleteness in API design is not something to dread, but instead to embrace and to take advantage of.
I am a big proponent of well-defined modules and APIs within a system. As I wrote before, they are essential to enable efficient distributed development, which is especially important in an open source project. But as anyone who has done this before on a reasonably sized system knows: Thoroughly defining those interfaces and APIs is by no means an easy task. A complete specification for any non-trivial system can keep your team busy for a long time. Consider that all possible corner cases need to be covered, and all possibly uses and interactions for the various classes and objects. Throw in the constant threat of changing requirements and the potentially necessary re-redesign of interfaces and APIs and you can see that this is not a very efficient way to operate.
Duck typing in action
This is particularly true if you work with static languages, such as C++, Java or similar. The class hierarchies and abstract interfaces carefully need to be laid out. Having to refactor those hierarchies is possible, but painful. For illustration, say that you have class A and B. Each of those has a method x(). You want to be able to place objects of class A and B into the same collection, iterate over it and call x() on each object. In Java, for example, you would now have to define a common super-class or at least an abstract interface, which contains x(). This result in more source files, more definitions and more documentation. If you choose the common super-class approach then you have to be mindful of Java’s multiple inheritance restriction. If you work in C++ then abstract interfaces as such are not provided, and you probably end up having to deal with the complications that can be caused by multiple inheritance.
Refactoring in such an environment may cause changes that ripple down a number of branches in your class hierarchy. If you add or change a method in a super-class or interface, for example, you need to consider if this method is appropriate for all classes deriving from the super-class or implementing that interface. Therefore, changes like this are something you want to think twice about before doing, considering the impact it may have across the system.
But in a dynamic language like Python this can be different: Using duck typing, it is possible to deal with classes as if they had the same super-class or abstract interface, even though they do not. As long as class A and B happen to have a method x(), you can place them in collections and call x() on all of them, even if they are not related in any way. Thus, if it becomes apparent during the implementation phase that A and B need to be treated that way, then the addition of x() to one or both of those classes is a purely local change. No change in super-classes or abstract interfaces is required.
An example from our own development here at SnapLogic: We had a class that was working on a file object, using the normal read() and write() methods. We now realized that we also needed that class to work on a different type of producer/consumer object we have in our system, which is not related to files at all. Since we used Python and since that other object already had read() and write() methods, we could simply pass it to objects of the worker class, without having to change anything in the code. Had used a static language then we would have to provide common wrapper classes, or implement file interfaces or do some number of other things, all of which resulting in more or less complex changes to our code.
Squishy design for improved time-to-code
It occurs to me that we can take advantage of the flexibilities provided by duck-typing in order to simplify and also accelerate the design phase of inter module interfaces and APIs. I maintain that well defined, and more stable APIs are necessary to enable the bazaar, as I wrote before. However, when you are in the design phase for those interfaces you have two choices: (a) Sit down and specify the interfaces to the last detail before you start writing any code, or (b) finalize the interface design during the implementation phase.
With languages like Python, (b) seems to provide better time-to-code. Python itself has been described as ‘executable pseudo code’. Consider, therefore, that instead of writing the interface definition down in a design document, method signature by method signature, you instead use your Python code as the documentation of the interface design.
On one hand you can save yourself an iteration of writing down the interface definitions twice, just in different forms. Nice, but a relatively minor benefit. More importantly, though: Seeing these interfaces in code does not only provide a concrete implementation for the other developers to look at, it also allows for early integration testing. Any shortcomings in the interface definitions, anything that should be adjusted is more reliably identified when one tries to actually use those interfaces in one’s own code. Nothing beats the ‘aha!’ experience when you write your code, want to use someone else’s API and realize that the data structure you need to pass in is actually quite complex to construct, and that a slightly modified API could have made a big difference. These are the kinds of things that are more difficult to detect when the API is designed purely on paper.
So, defining very early iterations of the suggested APIs in code, rather than on paper, benefits development teams by detecting API shortcomings sooner, and by providing code to work with for early integration.
Frobbing until it fits
And what if a problem with the API has been found that way? Well, that’s exactly where the benefits of languages such as Python come into play: As I explained earlier, if you work with Python, rather than static languages, you often find that any necessary adjustments to the API can be made quickly and locally. Those changes tend to impact much less code than in static languages. With Python, if we see that an API needs adjustment, we can frob the API until we are happy.
This is why I called this squishy design: Even if the APIs don’t exactly fit all the user’s or module’s requirements, we can easily poke and prod and adjust its shape until it does. The code feels more flexible and fluid, disparate pieces of code can more easily be made to fit together, pieces can be squeezed together and their ‘shape’ readily adapts for a seamless fit. And since in squishy design the last stages of design are already represented in the early versions of the code, the flexibility of the code represents a flexibility in design.
Rather than hammering out all APIs down to the last detail on paper ahead of time, you can get a head start and very tangible feel for your final design stages by doing it straight in code. This is possible in static languages as well, but works much better in dynamic languages, such as Python.