Shahzad Bhatti Welcome to my ramblings and rants!

July 28, 2009

Cut the scope and make your life easy

Filed under: Project Management — admin @ 10:45 am

I have been developing software for over twenty years and in every project you have to grapple with iron triangle of schedule/cost/functionality or sometime referred to as cost/quality/schedule or cost/resources/schedule. In my experience, curtailing the scope produces better results than adding more resources or extending deadline. In addition, slashing the scope also produces other side effects such as reducing the complexity of the software, easier learning curve for users, less training/support cost and better communication among team members.

You can reduce the scope by focusing on essential features using Pareto principle (80-20 rule) and companies like like Apple or 37Signals produce great products that are not only more useful but are much simpler to use. However, this is not easy as project manager or product owner have to say NO. Too often, I see project managers say YES to anything to please upper management and users. In the end, the team is overwhelmed and under stress. Also, a big pile of features where all features are of same importance (priority) is biggest reason for death-march projects.

Working with a small number of features reduces complexity such as essential complexity, cyclomatic complexity or accidental complexity because your codebase is smaller. Though, you still have to apply good software engineering principles such as domain driven design, unit testing, refactoring, etc, but maintenance becomes easier with smaller codebase. When you have a small codebase you have fewer bugs as they are no bugs for zero code. Fewer bugs means less support cost when some user complains of a bug or when system crashes in the middle of the night.

With a small set of features, the user interface becomes simpler, which in turn provides better usability to the users. Often, I have seen users get confuse when they have to work with a complex software that has a lot of features. This often is remedied by providing training or adding support that adds a lot more overhead to the projects. Again, better user interface does not come free automatically with a small set of features, but the usability problem becomes easier with fewer features.

Finally, small number of features and small code means your team size will remain small so communication among team members becomes easier. I like to work with team with size of 5 plus/minus 2, as number of communication links increase exponentially when you add more members. Also, with smaller teams that are colocated, you have better
Osmotic communication that Alistair Cockburn talks about. At Amazon, we have “2-Pizza” teams, i.e., teams are small enough to have team lunch with just two pizzas. Another factor when building teams is whether they are cross functional (vertical) or focus on single expertise such as systems, database, UI, etc. I prefer working with cross functional teams that focus on a single service or an application as communication and priorities within a single team is much easier to manage than between different teams.

In nutshell, reducing scope not only helps you deliver the software in time and delight your users but prepares you better to maintain and support the software. The complexity is number one killer for the software and results in buggy and bloated software. You should watch out when someone says “Wouldn’t it be cool if it did X?” kind of feature requests and often I see developers see this as a challenge or an opportunity to learn or apply new technology. However, each new feature takes a toll on your existing features, software maintenance and your team.

July 26, 2009

Day 5 at #oscon 2009

Filed under: Computing — admin @ 11:00 am

July 24, 2009 that was Day 5 of OSCON 2009 for me started with yet another talk by Gunnar Hellekson on using open source for building government projects. This was followed by very entertaining talk by Erik Meijer on “Fundamentalist Functional Programming”. He talked about side-effect free programming and how most functional languages are not pure. He briefly described Monads features of Haskell and how LINQ is influenced by them. Finally, there were keynotes by Karl Schroeder and Mark Surman, which were not very inspiring.

There were not a lot of sessions on last day of the OSCON, I decided to attend The HTML 5 Experiments to learn a bit on new HTML5 tags. Bruce Lawson showed how he implemented his blog using some of HTML5 tags such as header, footer, section, article, time, etc. He also mentioned canvas feature that was interesting but was difficult for people that require assistance technology. Finally, video tags won’t be available anytime soon due to a lot of proprietary decoders.

I then skipped next session and headed to Tech Museum, which is must see if you are visiting San Jose. I then headed to the airport and flew back to Seattle in the evening. Overall, I enjoyed OSCON 2009, I wished there were more talks on functional programming and was disappointed when haskell talk was cancelled. Also, I wish more talks were a bit more hands on like talks on CoucheDB that showed examples of how to use the system instead of just listing out features.

Day 4 at #oscon 2009

Filed under: Computing — admin @ 9:29 am

Thursday, July 23 2009, which was Day 4 for me at OSCON 2009, started with keynote by Kirrily Robert, where she she deplored acceptance of women in open source projects. This was followed by lame keynote by Tony Hey from Microsoft, where the presenter showed bits of open source contributions by Microsoft. Finally, Simon Wardley talked about cloud computing that was pretty entertaining. I then proceeded to attend talk on JRuby on Google App Engine, which didn’t quite kept up to its name and a lot of talk focused on persistence. I attended talk on Eucalyptus, which is an open source project for building private EC2 based cloud. This was sort of marketing talk, but I got a couple of things out such as how Amazon throttles network traffic within a datacenter to 500mb/sec and between zones to 200mb/sec.

I then attended A Survey of Concurrency Constructs, which presented common constructs for concurrency such as locks, transactional shared memory, message passing, dataflow, futures, i-structures, etc. I liked dataflow due to its deterministic nature, but is difficult to implement. I-structures is also interesting, but is non-deterministic and requires ports that make it similar to actors. I also like Linda as it can simulate dataflow, actors and CSP. Finally, message-passing and actors model are poplar these days due to their implementation in Erlang and Scala languages. Ted mentioned how most of the solutions are 20-30 years old, you can read history of most of these solutions from his slides. This was bleak talk as none of the options presented satisfactory option, though his bias was towards JVM based technology and he was impressed with Jonas Boner’s work on AKKA.

Next, I attended talk on Clojure: Functional Concurrency for the JVM, which described functional nature of Clojure and brief overview of its features and syntax. I found calling Java code from Clojure a little verbose especially when you are using method chaining, e.g.

                 factory.newSaxParser().parse(src, handler)
 becomes
                 (.. factory new SaxParser (parse src handler))
 

Another interesting features of Clojure are its implementation of persistent datastorage and lazy evaluation. Finally, Clojure supports transactional memory for building concurrent applications but there is a little emperical data on its performance and usability. In fact, Ted Sueng mentioned porting some of open source applications to use transactional memory resulted in deadlocks so I am waiting for a little more evidence.

Next, I attended talk on Cassandra: Open Source Bigtable + Dynamo, which is another DHT similar to
Dynomite, Redis, Tokyo Tyrant, Voldemort, HBase, etc. Cassendra is an implementation of DHT based on Amazon Dynamo paper and supports consistent hashing, gossip, failure detection, cluster state, partitioning and replication. I liked the fact that there is no single master as in BigTable so it is easier to scale and uses bloomfilter to keep index of keys. You can read more on its features from the slides.

Last session I attended was “Design Patterns” in Dynamic Languages, where Neal Ford showed how GOF design patterns were created to overcome deficiencies of C++ and he described how dynamic languages like Ruby and Groovy make it trivial to use these patterns without all the ceremony. Neal showed how method_missing can be used to implement builder pattern (though, I prefer not to use method_missing). He showed how each method on array is easier than iterator, how closures can be used to implement command and strategy patterns. Neal then showed, how internel DSLs can be used to implement interpreter pattern. Other examples included decorator and adapter patterns that used invokeMethod feature of Groovy to delegate invocation. Finally, he showed using null object pattern for consistent interface and aridifier to keep your code DRY. You can read more from his slides.

July 25, 2009

Day 3 at #oscon 2009

Filed under: Computing — admin @ 9:44 pm

On the third day (Wednesday, July 22, 2009, the real conference started. The day began with the a couple of keynotes. First, Tim O’reilly talked about Government 2.0, data.gov and other open source organizations that are building applications for the newly opened data. This turned out to be theme of a number of keynote speakers and there was a lot of interest in sunlight labs, http://opensourceforamerica.org/, http://www.gov2summit.com/. Then Dirk Hohndel talked about netbooks and some of innovations from Intel to improve boot time. He deplored state of graphics on Linux that have changed a little in last twenty years. Finally, Mike Lopp, author of Rands in Repose blog talked about how well intentional evil people can ruin companies using Borland as an example.

I started the sessions with Testing iPhone apps with Ruby and Cucumber, which should have been called Testing iPhone GUI apps with Ruby and Cucumber. It was half decent, but the framework had a lot of dependencies that we didn’t go into. I would like to give it a try as testing on Objective-C sucks. I then attended Introduction to Animation and OpenGL on the Android SDK, which seemed too fast and the presenter rambled on miscleneous APIs of OpenGL that I could not follow.

On the second half, I started talk on Automating System Builds and Maintenance with Cobbler and Puppet. This was somewhat useful and I learned a bit to use Cobbler for creating system images and using Puppet for configuration. This was followed by Best practices for ‘scripting’ with Python 3. This was a good talk that described some good principles for writing scripts (as opposed to Python applications). These principles included using optparse for parsing arguments, layers of I/O to help testing (StringIO), using generators for performance and finally using templates for packaging as setup is hard to configure from scratch. I then attended Using Hadoop for Big Data Analysis, which was sort of marketing talk from Cloudera CEO and prsented a few projects that are using Hadoop such as log processing at rackspace, monitoring electircal grid and large hadron collider. Finally, I attended Distributed Applications with CouchDB, which was really good talk on CouchDB by J Chris Anderson from couch.io. It described architecture of CouchDB and features of CouchDB. Chris also gave password for private beta to http://hosting.couch.io, which was “booom-couch”. You can read detailed examples from his slides.

July 23, 2009

Day 2 at #oscon 2009

Filed under: Computing — admin @ 2:34 pm

On the second day of OSCon 2009, I started with PhoneGap tutorial. The PhoneGap is an ambitious project that provides Javascript based unified APIs to develop mobile applications for a variety of mobile platforms such as iPhone, Blackberry, Android, Windows Mobile, Nokia, Palm, etc (most of those are not yet support, but 1.0 is expected in a few months that will have support most of them). It competes with a number of other open source projects such as Joyent Smart platform, Big five, Corona, Nimblekit, Appcellerator, Rhodes, etc. The PhoneGap uses HTML, CSS and Javascript for development and relies on Webkit and HTML5 technologies and standards. Many of mobile platforms such as iphone, android, palmpre support Webkit, though Blackberry and Windows Mobile are exceptions. The PhoneGap uses a number of features of HTML5 such as caching, CSS transformation, fonts, local storage, etc. The PhoneGap uses XUI, which is a subset of jQuery as some of the platforms such as iPhone provide limited caching (25K) for Javascript. It uses selectors and CSS for animations. The session introduced Dashcode tool that comes with XCode to build web applications and then converting those web applications into native applications using PhoneGap. The presenation for this session is available from http://presentations.sintaxi.com/oscon/

For the second half I decided to attend “Scalable Internet Architectures” — more than 10 million consumers/day. It was interesting talk that discussed building scalable architectures from hardware and networking perspective. It empahsized awareness on end-to-end architecture including javascript, application, database, network and machines and stressed importance of including people from operations in the architecture of the system. The presenter suggested use of CDN for static contents and using peer-based HA instead of load balancers as it eliminates load balancers as point of contention or failures. The speaker also suggested use of reverse proxy cache such as Varnish or Squid. He also suggested setting up multiple DNS servers for each data center and registering local servers with local DNS so that they take advantage of shortest path routes and talk to local servers. Other suggestions included use of caching, avoiding 302 redirects, separtion of OLTP and OLAP databases, use of DHT. The speaker also pointed to a number of networking techniques such isolating network for different services to prevent starvation of bandwidth when one of the service is surging the network with high dataload by using mac based filtering.
The speaker mentioned a number of usability techniques to offload expensive operation or hinting users when something is going on in the background. He mentioned use of queuing technology for offload processing. Finally, the speaker talked about a number of lesson learned from scaling and some of big WTF moments from his consulting work. Overall, this talk summarized a lot of existing knowledge for building scalable applications (such as from Steve Souders work) with a couple of new networking techniques to tackle slashdot or denial of service attack. The slides from this talk are available at http://www.slideshare.net/postwait/scalable-internet-architecture.

July 22, 2009

Day 1 at #oscon 2009

Filed under: Computing — admin @ 5:25 pm

The first day of OSCon 2009 covered a number of tutorials and I decided to attend Google App Engine tutorial for the first half of the day. Google App Engine API follows CGI model of web development, i.e., it uses stdin and stdout files and assumes stateless applications. There is limit of 10MB response size and 30 requests per second, but it does not allow streaming. The tutorial started pretty slow and we spent first hour just installing the SDK and tutorial. The Google App Engine SDK is available from http://code.google.com/appengine/downloads.html. I downloaded Mac image and then dragged the image to my hard drive. I then double clicked the app icon for Google Appe Engine SDK, which installed the SDK under /usr/local/google_appengine. Once the SDK is installed, you have to install Google App Engine tutorials from http://code.google.com/p/app-engine-tutorial/.

After installing SDK and tutorial, I copied all files named tutorial? under the SDK. The rest of session covered those tutorials one by one, though we ran out of time in the end and completed only upto tutorial7. In order to install first tutorial, I went into tutorial1 directory, e.g.

 cd /usr/local/google_appengine/tutorial1
 

Then started local app server as follows:

 python ../dev_appserver.py .
 

When I pointed my browser to the http://localhost:8080, I was able to see “Hello World!”.

Next, I registered myself to http://appspot.com. After registering, I received an SMS message for confirmation and was able to fully register after entering the confirmation number. Next, I created an application-id on Google App Engine. You can only create 10 app-ids and you cannot delete app-ids, so be careful with ids. Also, you can also use your own domain instead appspot.com. For my testing purpose, I chose the id “shahbhat”.

Next, I changed app.yaml inside my local tutorial1 directory that describes how your application is configured. You may also notice index.yaml, which describes list of indices in the database, though Google App Engine can figure out what queries are being used and creates indices automatically. I changed application name in app.yaml to “shahbhat”, e.g.

 application: shahbhat
 

I then pushed my application to the Google App Engine by typing

 python ../appcfg.py update .
 

I was then able to go to http://shahbhat.appspot.com/ and see my application, Voila. You can also see your application usage from http://appengine.google.com/dashboard?app_id=shahbhat (you will have to change app_id parameter in your application).

Unfortunately, a lot of people had problems getting to that state so we wasted another half hour in break where other folks sort out configuration and deployment issues.
Next, I went through another tutorial to turn on authentication by setting:

 login: required
 

in app.yaml file. Next I added caching by adding expires options in the app.yaml. I was also able to use curl to test my applications and see headers to verify caching, e.g.

 curl --include http://localhost:8080
 

Which showed following when caching was not configured:

 Cache-Control: no-cache
 

When I configured the caching to 2d, I was able to see:

 Cache-Control: public, max-age=172800
 

The Google App Engine SDK also includes development that you can view by going to:

 http://localhost:8080/_ah/admin
 

The Google App Engine supports Django based templates, e.g.

 #!/usr/bin/env python
   
   import os
   from google.appengine.ext.webapp import template
   
   def main():
       template_values = {"foo" : [1,2,3]}
       template_file = os.path.join(
                    os.path.dirname(__file__), "index.html")
       body = template.render(
         template_file, template_values)
       print "Status: 200 OK"
       print "Content-type: text/html"
       print
       print body
   
   if __name__ == '__main__':
     main()
 

In addition, Google App Engine supports WSGI standard (PEP 333), e.g.

   import os
   import wsgiref.handlers
   
   from google.appengine.ext import webapp
   from google.appengine.ext.webapp import template
   
   class IndexHandler(webapp.RequestHandler):
   
     def get(self):
       template_values = {"foo" : 1}
   
       template_file = os.path.join(os.path.dirname(__file__), "index.html")
       self.response.out.write(template.render(template_file, template_values))
   
   
   def main():
     application = webapp.WSGIApplication([('/', IndexHandler)], debug=True)
     wsgiref.handlers.CGIHandler().run(application)
   
   
   if __name__ == '__main__':
     main()
 

Other tutorials included authentication APIs such as

 create_login_url(dest_url)
 create_logout_url(dest_url)
 get_current_user()
 is_current_user_admin()
 

The SDK also includes decorator to add authentication automitcally using

 from gogole.appengine.ext.webapp.util import login_required
 ...
     @login_required
     def get(self):
 

Finally, we went over datastore APIs for persistence support, e.g.

   import os
   import wsgiref.handlers
   
   from google.appengine.ext import webapp
   from google.appengine.ext.webapp import template
   from google.appengine.ext import db
   
   class ToDoModel(db.Model):
     description = db.StringProperty()
     created = db.DateTimeProperty(auto_now_add=True)
     foo = db.FloatProperty(default=3.14)
     bar = db.IntegerProperty()
     baz = db.BooleanProperty(default=False)
     N = db.IntegerProperty()
     l = db.ListProperty(str, default=["foo", "bar"])
   
   
   class IndexHandler(webapp.RequestHandler):
   
     def get(self):
       todo = ToDoModel(description = "Hello World", bar=1, baz=True)
       todo.put()
   
   def main():
     application = webapp.WSGIApplication([('/', IndexHandler)], debug=True)
     wsgiref.handlers.CGIHandler().run(application)
   
   if __name__ == '__main__':
     main()
 

You can view the model by going to http://localhost:8080/_ah/admin/datastore. The data store supports a number of types such as string, boolean, blob, list, time, text. However, there are some limitations,
e.g. TextProperty can only store upto 500 bytes and Google App Engine will create index if needed, however it won’t create index on TextProperty. For each row, the datastore assigns a numeric id and UUID based key,
though you can provide your own key. Also, a row cannot exceed 1MB.
Unfortunately, we ran out of time At this time, so I had to go to http://code.googlecom/appengne/docs for further documentation. Overall, I thought it was good introduction to Google App Engine, but I was disappointed that instructor wasted a lot of time with setup that could have been used to cover rest of the tutorials.

For the second half of the day, I attended session on “Building applications with XMPP”. This was interesting session that showed usage of XMPP for a number of usecases such as IM, gaming, real-time social networking, , monitoring, etc. The session started with history of XMPP (Extensible Messaging and Presnce Protocol), its Jabber roots and its use of streaming XML. A number of factors contributed to the popularity of XMPP such as open source, XML, federeated network, low latency, etc. The XMPP is also very extensible and supports audio/video via jingle, geo location, TLS, SASL, etc. XMPP architecture is based on client server, where servers are decentralized and federated. XMPP identifies a user with jabber id that looks like email address and consists of local part, domain and resource, e.g. alise@wonderland.lit/TeaParty, where domain is mandatory, but local-part and resource are optional. There are three types of XMPP message stanzas, i.e., presence, IQ, message. The presence-stanza is asynchronous, but IQ stanza requires response. As opposed to web architecture that is based on short lived connections and stateless architecture, XMPP uses one long lived session and events are received asynchronously.

Next, the tutorial showed how to build a javascript client for jabber using sleekxmpp and Strophe library (alternatively you can use twistedword). The example used Bosh protocol to wrap XMPP protocol with HTTP protocol. Unfortunately, there was a lot of fast typing and I could not follow all that. I am waiting for the presenters to post the slides online so that I can use those examples in my own applications.

Powered by WordPress