Shahzad Bhatti Welcome to my ramblings and rants!

March 6, 2018

Tips from the second edition of “Release It!”

Filed under: Design,Methodologies,Technology — admin @ 4:19 pm

The first edition of “Release It!” has been one of most influential books that I have read and it introduced a number of methods for writing fault tolerant systems such as circuit-breaker, bulkhead patterns so I was excited to read the second edition of the book when it came out. Here are a few tips from the book:

In the first chapter, the author defines stability in terms of robustness, i.e., “A robust system keeps processing transactions, even when transient impulses, persistent stresses, or component failures disrupt normal processing.” He recommends focusing on longevity bugs such as resource leaking, e.g. you can set timeout when invoking a network or database operation or detect dead connections before reading/writing. The author cautions against tightly coupled systems that can easily propagate failures to other parts of the system. Similarly, high number of integration points can increase probability of failure in one of those dependencies. The author suggests looking into the blocking calls that can lead to deadlocks when using multiple threads or scrutinizing resource pool that gets exhausted from blocking operations. Another cause of instability is chain reaction from failure of one of servers that increases load on the remaining servers, which can be remedied using bulkhead or circuit-breaker patterns. High memory usage on the server can also constrain resources and the author recommends using external caching systems such as Redis, memcache, etc. In order to monitor health of the system, the author suggests using a mock transaction to ensure it’s working as expected and keeping metrics on errors especially login errors, high latency warnings. One common self-inflicting failure can be caused by self-denial attacks by marketing campaign that can be mitigated by using shared-nothing architecture or reducing fan-in shared resources. The servers can also be subjected to dogpile effect, where resources after upgrade, cronjob or config change spike up that can be mitigated by using random clock skew, adding random jitter and using exponential backoff. Finally, the author recommends monitoring slow responses, removing unbounded result sets and failing fast.

Here is a summary of stability patterns that the author recommends:

  • Apply timeout with integration points and delayed retries when failure occurs.
  • Apply circuit-breaker to prevent cascading failures (along with timeout).
  • Apply bulk-head pattern to partition the system in the event of chain reaction failure.
  • Steady state
  • Data purging
  • Log files
  • Fail fast, restart fast and reintegrate
  • Let it crash with limited granularity, e.g. boundary of actor.
  • Supervision for monitoring, restarts, etc.
  • Shed Load
  • Back-pressure – queues must be finite for finite response time
  • Governor – create governor to slow rate of actions (from automated response) so that humans can review it.

Next few chapters focus on network, machines and processes for building and deploying the code. The author offers several mechanisms for scaling such as load balancing with DNS, using service registry for upgrading and fail-over, configuration, transparency, collecting logs and metrics, etc. The author recommends load shedding when under high load or use HTTP 503 to notify load balancer. For example, queue length can be calculated as: (max-wait-time / mean-processing-time + 1) * processing-threads * 1.5. You can use listen reject queue to return 503 error to prevent clients from reconnecting immediately. The control-panel chapter recommends tools for administration. It recommends postmortem template such as what happened, apologize, commit to improvement and emphasizes system failures (as opposed to human errors). The author recommends adding indicators such as traffic indicators, business transaction, users, resource pool health, database connection health, data consumption, integration point health, cache health, etc.

The security chapter offers standard best practices from OWASP such as using parameterized queries to protect against SQL injection, using high entropy random session-ids/storing cookies for exchanging session-ids to protect against session hijacking/fixation. In order to protect against XSS (when user’s input is rendered in HTML without escaping) by filtering input and escaping it when rendering. The author recommends using a random nonce and strict SameSite policy to protect against CSRF. Similarly, author recommends using the principle of least privilege, access control, etc. The admin tools can offer tools for resetting circuit breakers, adjust connection pool sizes, disabling specific outbound integrations, reloading configuration, stopping accepting load, toggling feature flags.

For ease of deployment, the author recommends automation, immutable infrastructure, continuous deployment, and rolling changes incrementally.

The author suggests several recommendations on process and organization such as OODA loop for fast learning, functional/autonomous teams, evolutionary architecture, asynchrony patterns, loose clustering and creating options for future.

Lastly, the author offers chaos engineering as a way to test resilience of your system using Simian army or writing your own chaos monkey. In the end, the new edition offers a few additional chapters on scaling, deployment, security, and chaos engineering and more war stories from author’s consulting work.

February 24, 2009

Does software quality matters?

Filed under: Methodologies — admin @ 9:24 pm

In last couple of weeks, I followed interesting debate between Bob Martin and Joel Spolsky/Jeff Attwood (listen Hanselminutes) on the issue of SOLID principles. The SOLID acronym stands for five principles, i.e.,

  • Single Responsibility Principle – this principle is based on the principle of Cohesion from Tom Demarco’s Structured Analysis and Design book and mandates that a class should have one, and only one, reason to change.
  • Open Closed Principle – this principle is based on Bertrand Meyer’s book on Object Oriented Software Construction that says that a class should be open to extend without modifying it.
  • Liskov Substitution Principle – this principle was introduced by Barbara Liskov that says derived classes must be substitutable for their base classes.
  • Dependency Inversion Principle – states that your should depend on abstractions, not on concrete implementation.
  • Interface Segregation Principle – states that you should make fine grained interfaces that are client specific.

I learned these principles many years agao and also watched Bob Martin in Best Practices 2004 conference, where he talked about these principles. These principles sound good, though in reality they should be general guidelines rather than principles. But the heat of the debate was the way these principles have been evangilized by Bob Martin who insisted on using these all the time. Over the years, I have seen similar debates between Cedric Beust and Bob Martin over the use of test driven development. There is also debate on topic of TDD and architecture between Bob Martin and Jim ‘O Coplien. Overall, I find that the issues from these debates boil down to the following:

Usable software vs quality

One of the controversial point that Jeff Attwood raised was that the quality does not matter if no one is using your application. I find there is lot of truth to that. In fact, this is the route that most startup takes when developing a new idea and trying something new. For example, everyone was blaming Twitter for choosing Rails when it had scalability and reliability issues. However, Twitter would not have existed if it were written with most scalable and reliable language or framework that took two more years or if the application itself wasn’t as useful. I tend to follow the age old advice: First make it work, then make it right and then make it fast. There is a reason why rapid prototyping frameworks such as Rails, Django, Zend framework, etc. are popular because they allow you to focus more on business functionality and to reduce time to market. So, I agree the first goal of the software should be to make the software that solves real problems or add value. Nevertheless, if first implementation is horrible then it takes hercules effort to make it right and some companies like Friendster never recover.

External design vs internal design

One of the earliest advice I got on software development was to write manual before writing the code. It focuses you to solve business problem of the customer rather than writing with top down architecture, which is in a way similar to behavioral driven design in spirit. I find that most useful software developed bottom up and driven by the users. Kent Beck often says that you can’t hide a bad architecture with a good GUI. Nevertheless, for an average user, the usability matters a lot. I remember back in early 90s, IBM OS/2 was superior operating system than Windows but largely loss the market due to the usability (and marketing) issues. The same could be told why people think Macs are better than PC. Rails is also a good example that became popular because it could whip up a webapp in ten minutes despite the fact that its code has been plagued with maintenance issues from monolithic architecture and tricks like chain of alias methods. Other examples include WordPress and Drupal both written in PHP and are the leader in the blogging and content management area due to their usability rather than quality of the code. Again as your software crosses some threshold of number of users it may have to be redesigned, e.g. Rails recently announced that it will merge with another framework Merb in 3.0 because Merb has been designed with micro-kernel and pluggability based architecture. This also reminds me of merge between Struts and WebWork that turned out to be failure. Joel Spolsky cautions about software rewrites in his blogs and book and I have also blogged on Software rewrites earlier. In the end you want to extend your application incrementally, which is not an easy thing to do. Ultimately, I think software development is all about people rather than technology, principles or best practice. If you have good people then can take your application to the next step regardless if it was written in PHP, Java or Erlang.

Evolutionary architecture vs up front architecture

This has been brought up in debate between Jim Coplien and Bob Martin, where Jim took the position of more upfront design and architecture and Bob took position of evolutionary architecture. I have a lot of respect for Jim Coplien, I still have a copy of Advanced C++ I bought back in ’92 and it introduced me to the concepts of abstraction, body/handle, envelop/letter principles which are sort of DIP. In the interview with Bob Martin, Jim Coplien raised a lot of good points that YAGNI and test-driven based bottom design can create architecture meltdown. Though, I believe good software is developed bottom up, but I do believe some architecture and domain analysis beneficial. I am not necessarily suggesting BDUF (big design up front) or BRUF (big requirements up front), but iteration 0 style architecture and domain analysis when solving a complex problem. For example, the domain driven design by Eric Evans or Responsibility driven design by Rebecca Wirfs-Brock require working closely with the domain experts to analyze the business problems and capturing essential knowledge that may not be obvious. Any investment in proper domain analysis simplifies rest of development and make your application more extensible. There are a number of agile methodologies such as feature driven development and DSDM that encourage some upfront architecture and domain analysis, which I find beneficial.

Extensibility and maintenance

Once your product is hit and loved by users, your next task becomes extending it and adding more features. At this stage, all -ilities such as scalability, performance, security, extensibility becomes more important. This is also more common in corporate environment and most of the developers live their entire lives maintaining the code written by others. Nevertheless, each team should decide on what practices and principles are appropriate and follow them. Agile methodologies encourage collective ownership and pair programming that can spread knowledge and skills, though there are some alternatives such as code reviews or mentoring. I also think, having a technical lead who ensures overall quality and keeps the bar up for rest of developers can help with extensibility and maintenance.

Test driven Development

Bob Martin has been adament proponent of test driven development with his three laws of TDD. I blogged about original debate between Cedric Beust and Bob Martin back in June, 2006 and showed how Bob Martin’s position was unpragmatic. This reaction has also been echoed by Joel, Jeff, Cedric and Jim, who agree 100% coverage is unrealistic. Lately, more and more people are joining this group. I watched recently an interview of Luke Francl who expressed similar sentiments. In my earlier blog, I wrote various types of testing required for writing a reliable and enterprise level software. One of the selling point of unit testing has been ease of refactoring with confidence but I have also found too many unit tests stifle refactoring because they require changing both the code and unit tests. I have found that testing only public interfaces or a bit high level testing can produce reliable software and also is less prone to breakage when doing refactoring. Though, no one refuses value of testing, but it needs to be practical.

Dependency management

A number of SOLID principles and extended principles such as reuse principle, dependency principles mandates removing or minimizing dependnecies between packages. Those principles were made in C++ environment that historically had problems with managing dependencies. Again, the problem comes from religious attitude with these principles. For example, I have worked in companies where different teams shared their code with java jar files and it created dependency hell. I used jDepend to reduce such dependency but it’s not needed in all situation. I am currently building a common platform that will be used by dozens of teams and I intentionally removed or minimized static dependencies and instead used services.


Unfortunately, in software industry buzzwords are often used to foster consulting or selling books. There are lots of books on principles, best practices, etc. The entire agile movement has turned into prescriptive methodologies as opposed to being pragmatic and flexible based on the environment. People like uncle bob use these principles in draconian fashion such as always stressing on 100% test coverage or always keeping interfaces separate from implementation (DIP). Such advice may bring more money from consulting work or publication, but is disingenuous for practical use. In the end, software development is about people and not about technology or principles and you need good judgement to pick what practices and solutions suit the problem.

October 14, 2008

Observe, Orient, Decide and Act

Filed under: Methodologies — admin @ 8:49 pm

I have heard agile evangilists talk about influence of John Boyd on agile methodologies. I have also read about his story in books like Release It and more recently in Simple Architectures for Complex Enterprises, which I am currently reading. You can read more about Boyd’s background from wikipedia link above but he came up with iterative cycle of: observe, orient, decide and act (OODA). One of key finding he made was that shorter feedback or iteration loop of OODA with low quality was better than longer or tiring cycle of OODA with high quality. Despite the fact that everyone calls his/her organization agile, this feedback loop is real essense of agility.

July 7, 2008

Reaction vs Preparedness

Filed under: Methodologies — admin @ 8:36 pm

I’ve had struggled with culture of fire-fighting and heroism and discussed these issues several time in my blog [1], [2], [3], [4]. Over the weekend, I saw Bruce Schneier’s talk on security. He talked about Hurricane Katrina’s devastation and said that politicians don’t like to invest in infrastructure because it does not get noticed. He showed how reaction is considered more important than preparedness. Another thing he mentioned was that investing in new projects is received better than investing in existing projects. I find that IT culture has very similar value system, where heros or martyrs who react (on daily basis) to emerging crisis and fires are noticed by the managers and receive promotions and accolades. The same culture does not value in investing to better prepare for disasters or emergencies. Unfortunately, once people get promoted they moved on and leave the pile of shit for someone else. Of course, the next team would rewrite rather than keeping the old system. In last two years, I had to rewrite two major systems where the previous systems were written less than two years ago. May be they will be rewritten by someone else when I leave. The cycle will continue…

May 19, 2008

Sprint like Hell!

Filed under: Methodologies — admin @ 9:44 pm

I think Sprint is wrong metaphore for iteration, because managers seem to think that it means developers will run like hell to finish the work. Though, my project has adopted Scrum recently, but it is far from the true practices, principles and values of agility. For past many months, we have been developing like hell and as a result our quality has suffered.

Powered by WordPress