Challenges of multicore programming

May 30, 2008

Challenges of multicore programming

Filed under: Computing — admin @ 12:28 pm

The multicore processors have put parallel and concurrent programming at the forefront. In A Fundamental Turn Toward Concurrency in Software article in Dr Dobb’s warned programmers that the free lunch is over. This has spurred ongoing debates about future languages and how they will rescue software development. Here are a few features that are being postulated as the panacea of multicores:

Multi-threading

Java, C++, C# camp has had support for native threads for a while and they claim that these native threads will allow programmers to take advantage of the multitude of cores. After having done multi-threading for over ten years, I must admit multi-threading is not easy. Concurrent programming based on threads and locks is error prone and can lead to nasty problems of deadlocks, starvation, idle spinning, etc. As number of cores reaches thousands or millions, the shared memory will become single point of bottleneck.

Software Transactional Memory

Languages like Haskell and Clojure are adding support for STM, which treat memory like database and use optimistic transactions. Instead of locking, each thread can change any data, but when it commits it verifies the data has not been changed and retries transaction if the data is changed. This area is relative new, but resemebles more like shared memory so it probably will face same scalability issues.

Actor Based Model with Message Passing

I learned about Actor based programming from reading Agha Gul’s papers in school. In this model each process owns a private mailbox and processes communicate to each other by sending messages to each other. Languages like Erlang and more recently Scala use Actor based model for parallel programming. This model is very scalable because there is no locking involved. Also, the messages are immutable in Erlang (though may not be in Scala), so data corruption is not likely.The only drawback of this model is that spliting your application or algorithm into independent processes that communicate via message passing is not trivial.

TupleSpace based model

In tuple space, processes communicate to each other by storing messages in tuple spaces. The tuple space provides simple APIs like get, put, read and eval. Where get/read are blocking operations to implement dataflow based application and eval creates a new process to execute a task and store result in the tuple space. I built a similar system called JavaNOW for my Ph.D. project. Though, there are a number of open source and commercial frameworks availeble such as JavaSpaces, GigaSpaces, Blitz.

Fork/Join

This is standard pattern from grid computing, also known as Master-Slave, Leader/Follower, JobJar, and somewhat similar to Map/Reduce where a master process adds work in a queue and workers pick up the work and store result in another queue and master process picks up the result. It also uses other constructs of concurrent/parallel programming such as barriers, futures, etc. There are plenty of libraries and frameworks available for this such as Globus, Hadoop, MPI/MPL, PVM. Java 7 will also have Fork/Join framework.

Messaging Middleware(ESB)

Messaging middlewares allow another way to build actor based model where a thread listens on queue and other threads/processes communicate via pub/sub. Java’s JMS has a lot of support for this and frameworks like Mule, Camel, ServiceMix can help build applications that use SEDA (Staged Event-driven Architecture). Though, this is more prominent in service oriented architectures but there is not reason why it can’t be used for leveraging multicores.

Compiler based parallelism

High performance Fortran or Parallel Fortran use data flow analysis to create parallel applications. Though, this might be simplest solution but often the code generated is not as efficient as hand coded algorithm.

Conclusion

Though, parallel and concurrent programming is new model for vast majority of programmers, but this problem is not new to the high performance computing area. They have already tackled this problem and know what works and what does not work. In early 90s, I worked on Silicon Graphics machine that built large computers based on NUMA architecture and provided shared memory model for computing. Due to inherent nature of shared memory, they were not as scalable. As opposed to those, IBM built SP1 and SP2 systems that used messaging passing based APIs (MPL similar to MPI), where you could add as many nodes as you need. These systems were much more scalable. These two different systems quite nicely show difference between shared based model and messaging based model of programming. A key advantage of message passing is that it avoids any kind of locking. This is the reason I like Erlang because it supports immutability and efficient message passing. However, biggest challenge I found with parallel programming was to breaking an algorithm or application down to smaller parts that run in parallel. Amdahl’s law shows that speedup in an application from multiple processors is limited by its serial nature. Donuld Knuth in recent interview pointed that vast majority of the programs are serial in nature. In the end, there are no simple solutions in the horizon, though there are some proven techniques from HPC and functional langages that can help.

Shahzad Bhatti Welcome to my ramblings and rants!

May 30, 2008