Shahzad Bhatti

November 15, 2016

Tips from “Algorithms to Live By”

Filed under: Algorithms,Computing — admin @ 10:51 pm

The “Algorithms to Live By” by Brian Christian and Tom Griffiths describes computer algorithms from several domains and illustrates practical examples for applying those algorithms in real-life problems. Here is a list of some of those algorithms that I found very useful:

1. Optimal Stopping

This class of problems determines the optimal time to stop further processing when searching or selecting an option. Here are a few examples:

Secretary Hiring Problem
This is a famous math problem, which was defined by a mathematician named Merril Flood based on “Look-Then-Leap-Rule” to find the best candidate by waiting until you review 37% of the candidates and then hiring the candidate who is better than all of the past candidates. There are several other applications of this algorithm such as finding a life partner or apartment hunting. This problem assumes that you cannot go back to the previous candidate once you reject but there are other variations of this algorithm that allow it in case the selected candidate rejects your offer.

Selling a House
When selling a house, you need to determine the range of expected offers and cost of waiting for the best offer.

Finding a Parking Spot
Given a percentage of parking spots available, you determine the number of vacant spots that can be passed before a certain distance until you take the first spot.

2. Explore/Exploit

In this chapter, authors describe several algorithms for exploring available paths and then using the optimal path. Here is a sampling of the approaches based on explore/exploit:
Multi-armed bandit
Given expected value of a slot machine (winnings/# of pulls), you need to maximize winnings. There are several approaches such as:

  • Win-Stay
    You keep using a slot machine as long as you are winning and then switch to a different machine when you lose.
  • Gittins Index
    It is named after Gittins, who was a professor at Oxford. It tries to maximize payoffs for future by calculating a Gittins index for all slot machines and then selecting slot machine with the highest Gittins index.
  • Regret and optimism
    Many problems in life can be defined in terms of regrets and optimism by imagining being at the deathbed and thinking of decisions that you could have made differently.
  • Upper Confidence Bound
    It is also referred as optimism in the face of uncertainty, where you choose your actions as if the environment is as nice as is plausibly possible. Given a range of plausible values and you pick the option with the highest confidence interval.
  • A/B Testing
    It is often used to test new features by offering the new features to a subset of the customers.

One of insight the authors present is that people often explore longer by favoring new over the best older option.

3. Sorting

In this chapter, authors describe several algorithms for sorting and their computing cost in terms of O-notation. The O-notation is generally used to indicate algorithm’s worst performance such as:

  • O(1): Constant cost
  • O(N): Linear cost
  • O(N^2): Quadratic cost
  • O(2^N): Exponential cost
  • O(N!): Factorial cost

Merge-Sort
This algorithm breaks data recursively into smaller sets until there is a single element. It then merges those subsets to create a new sorted list.

Bucket-Sort
A group of n items can be grouped into m buckets in O(nm) time and this insight is used by bucket sorting where items are grouped into a number of sorted buckets. For example, you can use this approach to load returned books into carts based on the shelf numbers.

Sorting is a pre-requisite for searching and there are a lot of practical applications for sorting such as creating matchups between teams. For example, teams can use round-robin based matchup where each team plays each other team but it would result in a lot of matches (O(N^2)). Instead, competitions such as March Madness uses Merge-Sort to move from 64 teams to 32, 16, 8, 4 and finals. However, it doesn’t use full sort as there are only 63 games in the season instead of 192.

4. Caching

In computer design, John Von Neumann designed memory hierarchy to improve lookup performance. It was first used in IBM 360 mainframes. Other computer researchers such as Belady designed algorithms for page faults to load data from disk to memory. There are several algorithms for cache eviction such as First-In, First-Out, Least-Recently-Used, etc.

5. Scheduling

Here are a few of the scheduling algorithms described in this chapter:
Earliest Due Date Strategy
It minimizes maximum lateness by choosing task with the earliest due date first.

Moore’s algorithm
It is similar to Earliest Due Date but it throws out biggest task if the new job can’t be completed by due date.

The authors give an example of Getting Things Done (GTD) technique for time management where small tasks are handled first. The tasks can also have a weight or priority and then the scheduler minimizes the sum of weighted completion time by dividing weight by length of the task and selecting the task with the highest density.

Here are a few issues that can arise with priority based tasks:

  • Priority Inversion – when a low priority task possesses a resource and scheduler executes a higher priority task, which cannot make any progress. One way to address this issue is by allowing the low-priority task to inherit the priority of higher priority task and let it complete.
  • Thrashing – it occurs when system grinds to halt because work cannot be completed due to lack of resources.
  • Context switching – Modern operating system uses context switching to work on multiple tasks but each slice of time needs to be big enough so that the task can make progress. One technique to minimize context switching is interrupt coalescing, which delays hardware interrupt. Similar techniques can be used by batching small tasks, e.g. Getting Things Done technique encourages creating a chunk of time to handle similar tasks such as checking emails, making phone calls, etc.

6. Bayes’s Rule

Reverand Thomas Bayes postulated Bayes’s rule by looking at winning and losing tickets to determine overall ticket pool. It was later proved by Pierre-Simon Laplace, which is commonly referred as Laplace’s law. Laplace worked out Bayes’s Rule to use prior knowledge in prediction problems.

Copernican Principle
Richard Gott hypothesized that the moment you observe something, it is likely to be in the middle of its lifetime.

Normal or Gaussian distribution
It has a bell curve and can be used to predict average life span.

Power-law distribution
It uses range over many scales such as the population of cities or income of people.

Multiplicative Rule
It multiplies quantity observed with some constant factor.

Average Rule
It uses the distribution’s natural average.

Additive Rule
It predicts that the things that will go on just a constant amount longer such as a five more minute rule.

7. Overfitting

In machine learning, overfitting occurs when training data fits tightly with key factors so that it doesn’t accurately predict the outcome for the data that it has not observed.

Cross Validation
Overfitting can be solved with cross-validation by assessing model not just against training data but also against unseen data.

Regularization
It uses contents to penalize complexity.

Lasso
It uses penalty of the total weight of different factors to minimize complexity.

8. Relaxation

In constraint optimization problems, you need to find the best arrangement of a set of variables given a set of rules and scoring mechanism such as traveling salesman problem (O(N!)). Using constraint relaxation, you remove some of the problem constraints, e.g. you can create a minimum spanning tree that connects all nodes in O(N^2) amount of time. Techniques such as Lagrangian Relaxation removes some of the constraints and add them to the scoring system.

9. Randomness

This chapter describes examples of algorithms that are based on random numbers such as:

Monte Carlo Method
It uses random samples to handle qualitatively unmanageable problems.

Hill Climbing
It takes a solution and tries to improve it by permuting some of the factors. It only accepts changes if it results in improvements. However, it may not find the globally optimal solution.

Jitter
It makes random small changes and accepts them even if they don’t improve in order to find the better solution.

Metropolis algorithm
It uses Monte Carlo Method and accepts bad and good tweaks in trying different solutions.

Simulated Annealing
It optimizes problems like annealing by heating up and slowly cooling off.

10. Networking

This chapter describes algorithms used in the computer network such as:

Packet switching
One of key idea of Internet was to use packet switching where TCP/IP sends data packets over a number of connections as opposed to dedicated lines or circuit switching which were used by phone companies.

Acknowledgement
It is used to let the sender know that packet is received. TCP/IP uses the triple handshake to establish a connection and sender resends packets if ACK is not received.

Exponential Backoff
It increases average delay after successive failure.

Flow Control
TCP/IP uses Additive Increase Multiplicative Decrease to increase the number of packets sent and cut the transmission rate in half and ACK is not received.

Bufferbloat
A buffer is a queue that stores outgoing packets, but when the queue length is large, it can add a delay in sending ACK, which would result in redelivery. Explicit Congestion Notification can be used to address those issues.

11. Game Theory

In this chapter, authors discuss several problems from game theory such as:

Halting problem
This problem was first posed by Alan Turing who asserted that a computer program can never tell whether another program that it uses would take forever to compute something.

Prisoner’s dilemma
It is based on two prisoners who are caught and have to either cooperate or work against each other. In general, defection is the dominant strategy.

Nash Equilibrium
It is one of strategy where neither player changes their own play based on the opponent’s strategy.

The Tragedy of the Commons
It involves a shared-resource system where an individual can act independently in a selfish manner that is contrary to the common good of all participants, e.g. voluntary environmental laws where companies are not required to obey emission levels.

Information cascade
Information cascade occurs where an individual abandons their own information in favor of other people’s action. One application of this class of problems is auction systems. Here are a few variations of the auction systems:

  • Sealed-bid – where bidders are unaware of other bid prices so they would have to predict price that other bidders would use.
  • Dutch or descending auction – where bids start at a high price and is slowly lowered until someone accepts it.
  • English or ascending auction – where bid starts at a low price and is then increased.
  • Vickrey auction – it is similar to sealed-bid but winners pay second-place bid. It results in better valuation as bidders are incentivized to bid based on the true value.

Summary

This book presents several domains of algorithms and encourages computational kindness by applying these algorithms in real-life. For example, we can add constraints or reduce the number of available options when making a decision, which would lower the mental labor.

February 6, 2016

Building a Generic Data Service

Filed under: Web Services — admin @ 10:44 pm

As REST based Micro-Services have become prevalent, I often find that web and mobile clients have to connect to different services for gathering data. You may have to call dozens of services to display data on a single screen or page. Also, you may only need subset of data from each service but you still have to pay for the bandwidth and parsing cost.

I created a new Java framework PlexDataProviders for aggregating and querying data from various underlying sources, which can be used to build a general-purpose data service. PlexDataProviders is a light-weight Java framework that abstract access to various data providers such as databases, files, web services, etc. It allows aggregation of data from various data providers.

The PlexDataProviders framework is divided into two components:

  • Data Provider – This component defines interfaces that are implemented to access data sources such as database or web services.
  • Query Engine – This component is used for querying and aggregating data.

The query engine can determine dependency between providers and it also allow you to use output of one of the data provider as input to another data provider. For example, let’s assume:

  • data-provider A requires input-a1, input-a2 and produces output-a1, output-a2
  • data-provider B requires input-b1 and output-a1 and produces output-b1, output-b2

Then you can pass input-a1, input-a2 to the query engine and request output-a1, output-a2, output-b1, output-b2 output data fields.

Benefits

PlexDataProviders provides offers following benefits:

  • It provides a unified way to search data and abstracts integration to underlying data sources.
  • It helps simplifying client side logic as they can use a single data service to query all data instead of using multiple data services.
  • This also help with managing end-points as you only a single end-point instead of connecting to multiple web services.
  • As clients can specify the data they need, this helps with payload size and network bandwidth.
  • The clients only need to create a single data parser so it keeps JSON parsing logic simple.
  • As PlexDataProviders supports multi-threading, it also helps with latency of the data fetch requests.
  • It partial failure so that a failure in a single data provider doesn’t effect other data providers and the data service can still return partial results. User
  • It supports timeout so that clients can receive available data that completes in given timeout interval

Data Structure

Following are primary data structures:

  • MetaField – This class defines meta information for each data field such as name, kind, type, etc.
  • MetaFieldType – This enum class supports primitive data types supported, i.e.
    • SCALAR_TEXT – simple text
    • SCALAR_INTEGER – integer numbers
    • SCALAR_DECIMAL – decimal numbers
    • SCALAR_DATE – dates
    • SCALAR_BOOLEAN – boolean
    • VECTOR_TEXT – array of text
    • VECTOR_INTEGER – array of integers
    • VECTOR_DECIMAL – array of decimals
    • VECTOR_DATE – array of dates
    • VECTOR_BOOLEAN – array of boolean
    • BINARY – binary data
    • ROWSET – nested data rowsets
  • Metadata – This class defines a set of MetaFields used in DataRow/DataRowSet
  • DataRow – This class abstracts a row of data fields
  • DataRowSet – This class abstracts a set of rows

PlexDataProviders also supports nested structures where a data field in DataRow can be instance of DataRowSet.

Adding a Data Provider

The data provider implements following two interfaces

public interface DataProducer {
    void produce(DataRowSet requestFields, DataRowSet responseFields,
            QueryConfiguration config) throws DataProviderException;
}

Note that QueryConfiguration defines additional parameters such as:

  • pagination parameters
  • ordering/grouping
  • filtering parameters
  • timeout parameters

The timeout parameter can be used to return all available data within defined time, e.g. query engine may invoke underlying data providers in multiple threads and if underlying query takes a long time then it would return available data.

public interface DataProvider extends DataProducer, Comparable<DataProvider> {
    String getName();
 
    int getRank();
 
    Metadata getMandatoryRequestMetadata();
 
    Metadata getOptionalRequestMetadata();
 
    Metadata getResponseMetadata();
 
    TaskGranularity getTaskGranularity();
}

Each provider defines name, rank (or priority when matching for best provider), set of mandatory/optional input and output data fields. The data provider can also define granularity as coarse grain or fine grain and the implementation may execute those providers on different threads.

PlexDataProviders also provides interfaces for converting data from domain objects to DataRowSet. Here is an example of provider implementation:

public class SecuritiesBySymbolsProvider extends BaseProvider {
    private static Metadata parameterMeta = Metadata.from(SharedMeta.symbol);
    private static Metadata optionalMeta = Metadata.from();
    private static SecurityMarshaller marshaller = new SecurityMarshaller();
 
    public SecuritiesBySymbolsProvider() {
        super("SecuritiesBySymbolsProvider", parameterMeta, optionalMeta,
                marshaller.getMetadata());
    }
 
    @Override
    public void produce(DataRowSet parameter, DataRowSet response,
            QueryConfiguration config) throws DataProviderException {
        final String id = parameter.getValueAsText(SharedMeta.symbol, 0);
        Map<String, Object> criteria = new HashMap<>();
        criteria.put("symbol", id.toUpperCase());
        Collection<Security> securities = DaoLocator.securityDao.query(criteria);
        DataRowSet rowset = marshaller.marshal(securities);
        addRowSet(response, rowset, 0);
    }
}

Typically, you will create data-provider for each different kind of query that you want to support. Each data provider specifies set of required and optional data fields that can be used to generate output data fields.

Here is an example of marshalling data from Securty domain objects to DataRowSet:

public DataRowSet marshal(Security security) {
    DataRowSet rowset = new DataRowSet(responseMeta);
    marshal(rowset, security, 0);
    return rowset;
}
 
public DataRowSet marshal(Collection<Security> securities) {
    DataRowSet rowset = new DataRowSet(responseMeta);
    for (Security security : securities) {
        marshal(rowset, security, rowset.size());
    }
    return rowset;
}
...

PlexDataProviders provides DataProviderLocator interface for registering and looking up provider, e.g.

public interface DataProviderLocator {
    void register(DataProvider provider);
 
    Collection<DataProvider> locate(Metadata requestFields, Metadata responseFields);
...
}

PlexDataProviders comes with a small application that provides data services by implementing various data providers. It uses PlexService framework for defining the service, e.g.

@WebService
@Path("/data")
public class DataServiceImpl implements DataService {
    private DataProviderLocator dataProviderLocator = new DataProviderLocatorImpl();
    private QueryEngine queryEngine = new QueryEngineImpl(dataProviderLocator);
 
    public DataServiceImpl() {
        dataProviderLocator.register(new AccountsByIdsProvider());
        dataProviderLocator.register(new AccountsByUseridProvider());
        dataProviderLocator.register(new CompaniesBySymbolsProvider());
        dataProviderLocator.register(new OrdersByAccountIdsProvider());
        dataProviderLocator.register(new PositionGroupsBySymbolsProvider());
        dataProviderLocator.register(new PositionsBySymbolsProvider());
        dataProviderLocator.register(new QuotesBySymbolsProvider());
        dataProviderLocator.register(new SecuritiesBySymbolsProvider());
        dataProviderLocator.register(new UsersByIdsProvider());
        dataProviderLocator.register(new WatchlistByUserProvider());
        dataProviderLocator.register(new SymbolsProvider());
        dataProviderLocator.register(new UsersProvider());
        dataProviderLocator.register(new SymbolSearchProvider());
    }
 
    @Override
    @GET
    public DataResponse query(Request webRequest) {
        final DataRequest dataRequest = DataRequest.from(webRequest .getProperties());
        return queryEngine.query(dataRequest);
    }
}

As you can see the data service simply builds DataRequest with input data fields and sends back response back to clients.

Here is an example client that passes a search query data field and requests quote data fields with company details

public void testGetQuoteBySearch() throws Throwable {
    String jsonResp = TestWebUtils.httpGet("http://localhost:" + DEFAULT_PORT
                    + "/data?responseFields=exchange,symbol,quote.bidPrice,quote.askPrice,quote.sales,company.name&symbolQuery=AAPL");
    ...

Note that above request will use three data providers, first it uses SymbolSearchProvider provider to search for matching symbols with given query. It then uses the symbol data field to request company and quote data fields from QuotesBySymbolsProvider and CompaniesBySymbolsProvider. The PlexDataProviders framework will take care of all dependency management for providers.

Here is an example JSON response from the data service:

{
    "queryResponse": {
        "fields": [
            [{
                "symbol": "AAPL_X"
            }, {
                "quote.sales": [
                    [{
                        "symbol": "AAPL_X"
                    }, {
                        "timeOfSale.volume": 56
                    }, {
                        "timeOfSale.exchange": "DOW"
                    }, {
                        "timeOfSale.date": 1455426008762
                    }, {
                        "timeOfSale.price": 69.49132317180353
                    }],
                    [{
                        "symbol": "AAPL_X"
                    }, {
                        "timeOfSale.volume": 54
                    }, {
                        "timeOfSale.exchange": "NYSE"
                    }, {
                        "timeOfSale.date": 1455426008762
                    }, {
                        "timeOfSale.price": 16.677774132458076
                    }],
                    [{
                        "symbol": "AAPL_X"
                    }, {
                        "timeOfSale.volume": 99
                    }, {
                        "timeOfSale.exchange": "NASDAQ"
                    }, {
                        "timeOfSale.date": 1455426008762
                    }, {
                        "timeOfSale.price": 42.17891320885568
                    }],
                    [{
                        "symbol": "AAPL_X"
                    }, {
                        "timeOfSale.volume": 49
                    }, {
                        "timeOfSale.exchange": "DOW"
                    }, {
                        "timeOfSale.date": 1455426008762
                    }, {
                        "timeOfSale.price": 69.61680149649729
                    }],
                    [{
                        "symbol": "AAPL_X"
                    }, {
                        "timeOfSale.volume": 69
                    }, {
                        "timeOfSale.exchange": "NYSE"
                    }, {
                        "timeOfSale.date": 1455426008762
                    }, {
                        "timeOfSale.price": 25.353316897552833
                    }]
                ]
            }, {
                "quote.askPrice": 54.99300665695502
            }, {
                "quote.bidPrice": 26.935682182171643
            }, {
                "exchange": "DOW"
            }, {
                "company.name": "AAPL - name"
            }],
            [{
                "symbol": "AAPL"
            }, {
                "exchange": "NASDAQ"
            }]
        ],
        "errorsByProviderName": {},
        "providers": ["QuotesBySymbolsProvider", "SymbolSearchProvider", "CompaniesBySymbolsProvider"]
    }
}
PlexDataProviders is available from github and is licensed under liberal MIT license. It also comes with a small sample application for demo purpose. Feel free to send me your suggestions.

 

April 23, 2014

Implementing Reactive Extensions (RX) using Java 8

Filed under: Java — admin @ 11:52 pm

In my last blog, I described new lambda support in Java 8. In order to try new Java features in more depth, I implemented Reactive extensions in Java 8. In short, reactive extensions allows processing synchronous and asynchronous in data uniform manner. It provides unified interfaces that can be used as an iterator or callback method for asynchronous processing. Though, Microsoft RX library is huge but I only implemented core features and focused on Observable API. Here is brief overview of implementation:

Creating Observable from Collection

Here is how you can create Observable from a collection:

    List<String> names = Arrays.asList("One", "Two", "Three", "Four", "Five"); 
    Observable.from(names).subscribe(System.out::println, 
       Throwable::printStackTrace, () -> System.out.println("done"));
 

In Microsoft’s version of RX, Observable takes an Observer for subscription, which defines three methods: onNext, onError and onCompleted. onNext is invoked to push next element of data, onError is used to notify errors and onCompleted is called when data is all processed. In my implementation, the Observable interface defines two overloaded subscribe method, first takes callback functions for onNext and onError and second method takes three callback functions including onCompleted. I chose to use separate function parameters instead of a single interface so that caller can pass inline lambda functions instead of passing implementation of Observer interface.

Creating Observable from Array of objects

Here is how you can create Observable from stream:

    Observable.from("Erica", "Matt", "John", "Mike").subscribe(System.out::println, 
         Throwable::printStackTrace, () -> System.out.println("done"));
 

Creating Observable from Stream

Here is how you can create Observable from stream:

    Stream<String> names = Stream.of("One", "Two", "Three", "Four", "Five"); 
    // note third argument for onComplete is optional
    Observable.from(names).subscribe(name -> System.out.println(name), 
       error -> error.printStackTrace());
 

Creating Observable from Iterator

Here is how you can create Observable from iterator:

    Stream<String> names = Stream.of("One", "Two", "Three", "Four", "Five"); 
    Observable.from(names.iterator()).subscribe(name -> System.out.println(name), 
       error -> error.printStackTrace());
 

Creating Observable from Spliterator

Here is how you can create Observable from spliterator:

    List<String> names = Arrays.asList("One", "Two", "Three", "Four", "Five"); 
    Observable.from(names.spliterator()).subscribe(System.out::println, 
       Throwable::printStackTrace);
 

Creating Observable from a single object

Here is how you can create Observable from a single object:

    Observable.just("value").subscribe(v -> System.out.println(v), 
       error -> error.printStackTrace());
    // if a single object is collection, it would be treated as a single entity, e.g.
    Observable.just(Arrays.asList(1, 2, 3)).subscribe( num -> System.out.println(num), 
       error -> error.printStackTrace());
 

Creating Observable for an error

Here is how you can create Observable that would return an error:

    Observable.throwing(new Error("test error")).subscribe(System.out::println, 
       error -> System.err.println(error));
    // this will print error 
 

Creating Observable from a consumer function

Here is how you can create Observable that takes user function for invoking onNext, onError and onCompleted function:

    Observable.create(observer -> {
       for (String name : names) {
          observer.onNext(name);
       }
       observer.onCompleted();
    }).subscribe(System.out::println, Throwable::printStackTrace);
 

Creating Observable from range

Here is how you can create Observable from stream that would create numbers from start to end range exclusively.

    // Creates range of numbers starting at from until it reaches to exclusively
    Observable.range(4, 8).subscribe(num -> System.out.println(num), 
       error -> error.printStackTrace());
    // will print 4, 5, 6, 7
 

Creating empty Observable

It would call onCompleted right away:

    Observable.empty().subscribe(System.out::println, 
       Throwable::printStackTrace, () -> System.out.println("Completed"));
 

Creating never Observable

It would not call any of call back methods:

    Observable.never().subscribe(System.out::println, Throwable::printStackTrace);
 
 

Changing Scheduler

By default Observable notifies observer asynchronously using thread-pool scheduler but you can change default scheduler as follows:

Using thread-pool scheduler

    Stream<String> names = Stream.of("One", "Two", "Three", "Four", "Five"); 
    Observable.from(names).subscribeOn(Scheduler.getThreadPoolScheduler()).
       subscribe(System.out::println, Throwable::printStackTrace);
 

Using new-thread scheduler

It will create new thread

    Stream<String> names = Stream.of("One", "Two", "Three", "Four", "Five"); 
    Observable.from(names).subscribeOn(Scheduler.getNewThreadScheduler()).
       subscribe(System.out::println, Throwable::printStackTrace);
 

Using timer thread with interval

It will notify at each interval

    Stream<String> names = Stream.of("One", "Two", "Three", "Four", "Five"); 
    Observable.from(names).subscribeOn(Scheduler.getTimerSchedulerWithMilliInterval(1000)).
       subscribe(System.out::println, Throwable::printStackTrace);
    // this will print each name every second
 

Using immediate scheduler

This scheduler calls callback functions right away on the same thread. You can use this if you synchronous data and don’t want to create another thread. On the downside, you cannot unsubscribe with this scheduler.

    Stream<String> names = Stream.of("One", "Two", "Three", "Four", "Five"); 
    Observable.from(names).subscribeOn(Scheduler.getImmediateScheduler()).
       subscribe(System.out::println, Throwable::printStackTrace);
 

Transforming

Observables keep sequence of items as streams and they support map/flatMap operation as supported by standard Stream class, e.g.

Map

    Stream<String> names = Stream.of("One", "Two", "Three", "Four", "Five"); 
    Observable.from(names).map(name -> name.hashCode()).
       subscribe(System.out::println, Throwable::printStackTrace);
 

FlatMap

    Stream integerListStream = Stream.of( Arrays.asList(1, 2), 
       Arrays.asList(3, 4), Arrays.asList(5));
    Observable.from(integerListStream).flatMap(integerList -> integerList.stream()).
       subscribe(System.out::println, Throwable::printStackTrace);
 

Filtering

Observables supports basic filtering support as provided by Java Streams, e.g.

Filter

    Stream<String> names = Stream.of("One", "Two", "Three", "Four", 
       "Five"); 
    Observable.from(names).filter(name -> name.startsWith("T")).
       subscribe(System.out::println, Throwable::printStackTrace);
    // This will only print Two and Three
 

Skip

skips given number of elements

    Stream<String> names = Stream.of("One", "Two", "Three", "Four", "Five"); 
    Observable.from(names).skip(2).subscribe(System.out::println, 
       Throwable::printStackTrace);
    // This will skip One and Two
 

Limit

    Stream<String> names = Stream.of("One", "Two", "Three", "Four", "Five"); 
    Observable.from(names).limit(2).subscribe(System.out::println, 
       Throwable::printStackTrace);
    // This will only print first two strings
 

Distinct

    Stream<String> names = Stream.of("One", "Two", "Three", "One");
    Observable.from(names).distinct.subscribe(System.out::println, 
       Throwable::printStackTrace);
    // This will print One only once
 

Merge

This concates two observable data:

    Observable<Integer> observable2 = Observable.from(Stream.of(4, 5, 6));
    observable1.merge(observable2).subscribe(System.out::println, 
       Throwable::printStackTrace);
    // This will print 1, 2, 3, 4, 5, 6
 

Summary

In summary, as Java 8 already supported a lot of functional primitives, adding support for reactive extensions was quite straight forward. For example, Nextflix’s implementation of reactive extensions in Java consists of over 80K lines of code but it took few hundred lines to implement core features with Java 8. You can download or fork the code from https://github.com/bhatti/RxJava8.


April 17, 2014

Introduction to Java 8 Lambda and Stream Syntax

Filed under: Java — admin @ 11:10 pm

Introduction

Java 8 was released in March 2014 with most language-level enhancements since Java 5 back in 2004. The biggest new feature is introduction to Lambda. Lambda or Closur is a block of code that you can pass to other methods or return from methods. Previously, Java supported a form of closure via anonymous class syntax, e.g.

 import java.awt.event.ActionEvent;
 import java.awt.event.ActionListener;
 import java.awt.Dimension;
 import java.awt.FlowLayout;
 import javax.swing.JButton;
 import javax.swing.JFrame;
 
 public class SwingExample extends JFrame {
   public SwingExample() {
     this.getContentPane().setLayout(new FlowLayout());
     final JButton btn = new JButton("Click Me");
     btn.setPreferredSize(new Dimension(400,200));
     add(btn);
     btn.addActionListener(new ActionListener() {
       public void actionPerformed(ActionEvent e) {
         btn.setText("Clicked");
         btn.setEnabled(false);
       }
     });
   }
 
   private static void createAndShowGUI() {
     JFrame frame = new SwingExample();
     frame.pack();
     frame.setVisible(true);
     frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
 
   }
 
   public static void main(String[] args) {
     javax.swing.SwingUtilities.invokeLater(new Runnable() {
       public void run() {
         createAndShowGUI(); 
       }
     });
   }
 }
 

In above example, using anonymous class could use locally defined data in method that declares it as long as it is defined with final. Here is how the example looks like with Java 8 syntax:

 import java.awt.event.ActionEvent;
 import java.awt.event.ActionListener;
 import java.awt.Dimension;
 import java.awt.FlowLayout;
 import javax.swing.JButton;
 import javax.swing.JFrame;
 
 public class SwingExample8 extends JFrame {
   public SwingExample8() {
     this.getContentPane().setLayout(new FlowLayout());
     JButton btn = new JButton("Click Me");
     btn.setPreferredSize(new Dimension(400,200));
     add(btn);
     btn.addActionListener(e -> {
         btn.setText("Clicked");
         btn.setEnabled(false);
       }
     );
   }
 
   private static void createAndShowGUI() {
     JFrame frame = new SwingExample8();
     frame.pack();
     frame.setVisible(true);
     frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
 
   }
 
   public static void main(String[] args) {
     javax.swing.SwingUtilities.invokeLater(new Runnable() {
       public void run() {
         createAndShowGUI(); 
       }
     });
   }
 }
 

As you can see, lambda syntax is very minimal. In addition, lambda syntax doesn’t require that you declare externally accessible data as final, though it cannot be changed. Java lambda also adds type inferencing so that you don’t have to define types of arguments. The lambda features are implemented using “invokedynamic” instruction to dispatch method calls, which was added in Java 7 to support dynamic languages. For example, let’s take a simple example:

 public class Run8 {
   public static void main(String[] args) {
     Runnable r = () -> System.out.println("hello there");
     r.run();
   }
 }
 

If you decompile it using

 javap -p Run8
 

You will see, it generated lambda$main$0 method, e.g.

 public class Run8 {
   public Run8();
   public static void main(java.lang.String[]);
   private static void lambda$main$0();
 }
 

You can see real byte code using

 javap -p -c Run8
 

and you will see new byte codes:

 public class Run8 {
   public Run8();
     Code:
        0: aload_0       
        1: invokespecial #1                  // Method java/lang/Object
        4: return        
 
   public static void main(java.lang.String[]);
     Code:
        0: invokedynamic #2,  0              // InvokeDynamic #0:run:()Ljava/lang/Runnable;
        5: astore_1      
        6: aload_1       
        7: invokeinterface #3,  1            // InterfaceMethod java/lang/Runnable.run:()V
       12: return        
 
   private static void lambda$main$0();
     Code:
        0: getstatic     #4                  // Field java/lang/System.out:Ljava/io/PrintStream;
        3: ldc           #5                  // String hello there
        5: invokevirtual #6                  // Method java/io/PrintStream.println:(Ljava/lang/String;)V
        8: return        
 }
 

This means lambdas don’t have to keep reference of enclosing class and “this” inside lambda does not create new scope.

Types of Functions

Java 8 provides predefined functions (See http://docs.oracle.com/javase/8/docs/api/java/util/function/package-summary.html), but there are four major types:

                 Supplier: () -> T
                 Consumer: T -> ()
                 Predicate: T -> boolean
                 Function: T -> R
 

The supplier method takes not arguments and produces an object, the consumer takes an argument for consumption, predicate evaluates given argument by returning true/false and function maps an argument of type T and returns an object of type R.

Supplier example

 import java.util.function.*;
 
 public class Supply8 {
   public static void main(String[] args) {
     Supplier<Double>> random1 = Math::random;
     System.out.println(random1.get());
     //
     DoubleSupplier random2 = Math::random;
     System.out.println(random2.getAsDouble());
   }
 }
 

Note: Java 8 provides special functions for primitive types that you can use instead of using wrapper classes for primitive types. Here is another example that shows how you can write efficient log messages:

 import java.util.function.*;
 
 public class SupplyLog {
   private static boolean debugEnabled; 
   public static void debug(Supplier<String> msg) {
     if (debugEnabled) {
       System.out.println(msg.get());
     }
   }
   public static void main(String[] args) {
     debug(() -> "this will not be printed");
     debugEnabled = true;
     debug(() -> "this will be printed");
   }
 }
 

Consumer example

 import java.util.function.*;
 
 public class Consume8 {
   public static void main(String[] args) {
     Consumer<String> consumer = s -> System.out.println(s);
     consumer.accept("hello there");
     consumer.andThen(consumer).accept("this will be printed twice");
   }
 }
 

Predicate example

 import java.util.function.*;
 
 public class Predicate8 {
   public static void main(String[] args) {
     Predicate<Integer> gradeA = score -> score >= 90;
     System.out.println(gradeA.test(80));
     System.out.println(gradeA.test(90));
   }
 }
 

In addition to test method, you can also use and, negate, or method to combine other predicates.

Function example

 import java.util.function.*;
 
 public class Function8 {
   public static void main(String[] args) {
     BinaryOperator<Integer> adder = (n1, n2) -> n1 + n2;
     System.out.println("sum " + adder.apply(4, 5));
     Function<Double>,Double> square = x -> x * x;
     System.out.println("square " + square.apply(5.0));
   }
 }
 

Custom Functions

In addition to predefined functions, you can define your own interface for functions as long as there is a single method is declared. You can optionally declare interface with @FunctionalInterface annotation so that compiler can verify it, e.g.

 public class CustomFunction8 {
   @FunctionalInterface
   interface Command<T> {
     void execute(T obj);
   }
 
   private static <T> void invoke(Command<T> cmd, T arg) {
     cmd.execute(arg);
   }
 
 
   public static void main(String[] args) {
     Command<Integer> cmd = arg -> System.out.println(arg);
     invoke(cmd, 5);
   }
 }
 

Method Reference

In addition to passing lambda, you can also pass instance or static methods as closures using method reference. There are four kinds of method references:

  • Reference to a static method ContainingClass::staticMethodName
  • Reference to an instance method of a particular object ContainingObject::instanceMethodName
  • Reference to an instance method of an arbitrary object of a particular type ContainingType::methodName
  • Reference to a constructor ClassName::new

Streams

In addition to lambda support, Java 8 has updated collection classes to support streams. Streams don’t really store anything but they behave as pipes for computation lazily. Though, collections have limited size, but streams can be unlimited and they can be only consumed once. Streams can be accessed from collections using stream() and parallelStream() methods or from an array via Arrays.stream(Object[]). There are also static factory methods on the stream classes, such as Stream.of(Object[]), IntStream.range(int, int), etc. Common intermediate methods using as pipes that you can invoke on streams:

  • filter()
  • distinct()
  • limit()
  • map()
  • peek()
  • sorted()
  • unsorted()

In above examples sorted, distinct, unsorted are stateful, whereas filter, map, limit are stateless. And here are terminal operations that trigger evaluation on streams:

  • findFirst()
  • min()
  • max()
  • reduce()
  • sum()

You can implement your own streams by using helper methods in StreamSupport class.

Iterating

Java 8 streams support forEach method for iterating, which can optionally take a consumer function, e.g.

 import java.util.stream.Stream;
 
 public class StreamForEach {
   public static void main(String[] args) {
     Stream<String> symbols = Stream.of("AAPL", "MSFT", "ORCL", "NFLX", "TSLA");
     symbols.forEach(System.out::println);
   }
 }
 

Parallel iteration:

 import java.util.Arrays;
 import java.util.List;
 import java.util.stream.Stream;
 
 public class ParStreamForEach {
   public static void main(String[] args) {
     List<String> symbols = Arrays.asList("AAPL", "MSFT", "ORCL", "NFLX", "TSLA");
     System.out.println("unordered");
     symbols.parallelStream().forEach(System.out::println);
     System.out.println("ordered");
     symbols.parallelStream().forEachOrdered(System.out::println);
   }
 }
 

Note that by default iterating parallel stream would be unordered but you can force ordered iteration using forEachOrdered method instead of forEach.

Filtering

We already saw predicate functions and filtering support in streams allow extract elements of collections that evaluates true to given predicate. Let’s create a couple of classes that we will use later:

 import java.util.ArrayList;
 import java.util.Collection;
 import java.util.function.IntPredicate;
 
 
 public class Game implements IntPredicate {
   enum Type {
     AGE_ALL,
     AGE_13_OR_ABOVE,
     AGE_18_OR_ABOVE
   }
 
   public final String name;
   public final Type type;
   public final Collection<Player> players = new ArrayList<>();
 
 
   public Game(String name, Type type) {
     this.name = name;
     this.type = type;
   }
 
   public boolean suitableForAll() {
     return type == Type.AGE_ALL;
   }
   public void add(Player player) {
     if (test(player.age)) {
       this.players.add(player);
     }
   }
 
   @Override
   public boolean test(int age) {
     switch (type) {
       case AGE_18_OR_ABOVE:
         return age >= 18;
       case AGE_13_OR_ABOVE:
         return age >= 13;
       default:
         return true;
     }
   }
   @Override 
   public String toString() {
     return name;
   }
 }
 
 public class Player {
   public final String name;
   public final int age;
   public Player(String name, int age) {
     this.name = name;
     this.age = age;
   }
   @Override 
   public String toString() {
     return name;
   }
 }
 

Now let’s create a class that will filter games by types:

 import java.util.Arrays;
 import java.util.Collection;
 import static java.util.stream.Collectors.*;
 import java.util.stream.Stream;
 
 
 public class GameFilter {
   public static void main(String[] args) {
     Collection<Game>> games = Arrays.asList(new Game("Birdie", Game.Type.AGE_ALL), new Game("Draw", Game.Type.AGE_ALL), new Game("Poker", Game.Type.AGE_18_OR_ABOVE), new Game("Torpedo", Game.Type.AGE_13_OR_ABOVE));
     Collection<Game>> suitableForAll = games.stream().filter(Game::suitableForAll).collect(toList());
     System.out.println("suitable for all");
     suitableForAll.stream().forEach(System.out::println);
     Collection<Game>> adultOnly = games.stream().filter(game -> game.type == Game.Type.AGE_18_OR_ABOVE).limit(10).collect(toList());
     System.out.println("suitable for adults only");
     adultOnly.stream().forEach(System.out::println);
   }
 }
 

As you can see, filter can accept lambda or method reference.

Map

Map operation on streams applies a given function to transform each element in stream and produces another stream with transformed elements.

 import java.util.Arrays;
 import java.util.Collection;
 import static java.util.stream.Collectors.*;
 import java.util.stream.Stream;
 
 
 public class GameMap {
   public static void main(String[] args) {
     Collection<Game>> games = Arrays.asList(new Game("Birdie", Game.Type.AGE_ALL), new Game("Draw", Game.Type.AGE_ALL), new Game("Poker", Game.Type.AGE_18_OR_ABOVE), new Game("Torpedo", Game.Type.AGE_13_OR_ABOVE));
     Collection<Player> players = Arrays.asList(new Player("John", 10), new Player("David", 15), new Player("Matt", 20), new Player("Dan", 30), new Player("Erica", 5));
     for (Game game : games) {
       for (Player player : players) {
         game.add(player);
       }
     }
     //
     Collection<Game>.Type> types = games.stream().map(game -> game.type).collect(toList());
     System.out.println("types:");
     types.stream().forEach(System.out::println);
     Collection<Player> allPlayers = games.stream().flatMap(game -> game.players.stream()).collect(toList());
     System.out.println("\nplayers:");
     players.stream().forEach(System.out::println);
   }
 }
 
 

Note that flatMap takes collection of objects for each input and flattens it and produces a single collection. Java streams also produces map methods for primitive types such as mapToLong, mapToDouble, etc.

Sorting

Previously, you had to implement Comparable interface or provide Comparator for sorting but you can now pass lambda for comparison, e.g.

 import java.util.Arrays;
 import java.util.List;
 import static java.util.stream.Collectors.*;
 import java.util.stream.Stream;
 import java.util.Comparator;
 
 
 public class GameSort {
   public static void main(String[] args) {
     List<Player> players = Arrays.asList(new Player("John", 10), new Player("David", 15), new Player("Matt", 20), new Player("Dan", 30), new Player("Erica", 5));
     players.sort(Comparator.comparing(player -> player.age));
     System.out.println(players);
   }
 }
 

Min/Max

Java streams provide helper methods for calculating min/max, e.g.

 import java.util.Arrays;
 import java.util.Collection;
 import static java.util.stream.Collectors.*;
 import java.util.stream.Stream;
 import java.util.Comparator;
 
 
 public class GameMinMax {
   public static void main(String[] args) {
     Collection<Player> players = Arrays.asList(new Player("John", 10), new Player("David", 15), new Player("Matt", 20), new Player("Dan", 30), new Player("Erica", 5));
     Player min = players.stream().min(Comparator.comparing(player -> player.age)).get();
     Player max = players.stream().max(Comparator.comparing(player -> player.age)).get();
     System.out.println("min " + min + ", max " + max);
   }
 }
 
 

Reduce/Fold

Reduce or fold generalizes the problem where we compuate a single value from collection, e.g.

 import java.util.Arrays;
 import java.util.Collection;
 import static java.util.stream.Collectors.*;
 import java.util.stream.Stream;
 
 
 public class GameReduce {
   public static void main(String[] args) {
     Collection<Player> players = Arrays.asList(new Player("John", 10), new Player("David", 15), new Player("Matt", 20), new Player("Dan", 30), new Player("Erica", 5));
     double averageAge1 = players.stream().mapToInt(player -> player.age).average().getAsDouble();
     double averageAge2 = players.stream().mapToInt(player -> player.age).reduce(0, Integer::sum) / players.size();
     double averageAge3 = players.stream().mapToInt(player -> player.age).reduce(0, (sum, age) -> sum + age) / players.size();
     System.out.println("average age " + averageAge1 + ", " + averageAge2 + ", or " + averageAge3);
   }
 }
 

Grouping/Partitioning

groupingBy method of Collectors allows grouping collection, e.g.

 import java.util.Arrays;
 import java.util.Collection;
 import java.util.List;
 import java.util.Map;
 import static java.util.stream.Collectors.*;
 import java.util.stream.Stream;
 
 
 public class GameGrouping {
   public static void main(String[] args) {
     Collection<Player> players = Arrays.asList(new Player("John", 10), new Player("David", 15), new Player("Matt", 20), new Player("Dan", 30), new Player("Erica", 5));
     Map<Integer, List<Player>> playersByAge = players.stream().collect(groupingBy(player -> player.age));
     System.out.println(playersByAge);
   }
 }
 

partitioningBy groups collection into two collection, e.g.

 import java.util.Arrays;
 import java.util.Collection;
 import java.util.List;
 import java.util.Map;
 import static java.util.stream.Collectors.*;
 import java.util.stream.Stream;
 
 
 public class GamePartition {
   public static void main(String[] args) {
     Collection<Player> players = Arrays.asList(new Player("John", 10), new Player("David", 15), new Player("Matt", 20), new Player("Dan", 30), new Player("Erica", 5));
     Map<Boolean, List<Player>> playersByAge = players.stream().collect(groupingBy(player -> player.age >= 18));
     System.out.println(playersByAge);
   }
 }
 

String joining

Here is an example of creating a string from collection:

 import static java.util.stream.Collectors.*;
 import java.util.stream.Stream;
 
 
 public class GameJoin {
   public static void main(String[] args) {
     Stream<Player> players = Stream.of(new Player("John", 10), new Player("David", 15), new Player("Matt", 20), new Player("Dan", 30), new Player("Erica", 5));
     System.out.println(players.map(Player::toString).collect(joining(",", "[", "]")));
   }
 }
 

Lazy Evaluation

Java Map interface now supports lazy evaluation by adding object to Map if the key is not present:

 import java.util.HashMap;
 import java.util.Map;
 
 public class Fib {
   private final static Map<Integer,Long> cache = new HashMap<Integer, Long>() {{
     put(0,0L);
     put(1,1L);
   }};
   public static long fib(int x) {
     return cache.computeIfAbsent(x, n -> fib(n-1) + fib(n-2));
   }
 
   public static void main(String[] args) {
     System.out.println(fib(10));
   }
 }
 

Parallel Streams

Though, streams processes elements serially but you can change stream() method of collection to parallelStream to take advantage of parallel processing of stream. Parallel stream use ForkJoinPool by default and use as many threads as you have processors (Runtime.getRuntime().availableProcessors()), e.g.

 import java.util.Arrays;
 import java.util.List;
 import java.util.stream.IntStream;
 import java.util.stream.Stream;
 
 public class ParStreamPrime {
   private static boolean isPrime(int n) {
     if (n%2==0) return false;
     for(int i=3;i*i<=n;i+=2) {
       if(n%i==0) return false;
     }
     return true;
   }
   private static long serialTest(int max) {
     long started = System.currentTimeMillis();
     IntStream.rangeClosed(1, max).forEach(num -> isPrime(num));
     return System.currentTimeMillis() - started;
   }
   private static long parallelTest(int max) {
     long started = System.currentTimeMillis();
     IntStream.rangeClosed(1, max).parallel().forEach(num -> isPrime(num));
     return System.currentTimeMillis() - started;
   }
   //
   public static void main(String[] args) {
     int max = 1000000;
     System.out.println("Serial " + serialTest(max));
     System.out.println("Parallel " + parallelTest(max));
   }
 }
 

If you need to customize thread pool size, you can create parallel stream inside fork-join-pool, e.g.

   private static void parallelTest(final int max) {
     ForkJoinPool forkJoinPool = new ForkJoinPool(2);
     forkJoinPool.submit(() ->
        IntStream.rangeClosed(1, max).parallel().forEach(num -> isPrime(num));
     ).get();
   }
 

Default methods and Mixins

For anyone who has to support interfaces for multiple clients knows the frustration of adding new methods because it requires updating all clients. You can now add default methods on interfaces and add static methods, e.g.

 interface Vehicle {
   float getMaxSpeedMPH();
   public static String getType(Vehicle v) {
     return v.getClass().getSimpleName();
   }
 }
 
 interface Car extends Vehicle {
   void drive();
   public default float getMaxSpeedMPH() {
     return 200;
   }
 }
 interface Boat extends Vehicle {
   void row();
   public default float getMaxSpeedMPH() {
     return 100;
   }
 }
 
 interface Plane extends Vehicle {
   void fly();
   public default float getMaxSpeedMPH() {
     return 500;
   }
 }
 
 public class AmphiFlyCar implements Car, Boat, Plane {
   @Override
   public void drive() {
     System.out.println("drive");
   }
   @Override
   public void row() {
     System.out.println("row");
   }
   @Override
   public void fly() {
     System.out.println("fly");
   }
   public float getMaxSpeedMPH() {
     return Plane.super.getMaxSpeedMPH();
   }
   public static void main(String[] args) {
     AmphiFlyCar v = new AmphiFlyCar();
     System.out.println(Vehicle.getType(v) + ": " + v.getMaxSpeedMPH());
   }
 }
 

Optional

One of biggest bane of Java applications is NullPointerException and you can get rid of those using Optional, which acts as Maybe monads in other languages, e.g.

 import java.util.HashMap;
 import java.util.Map;
 import static java.util.stream.Collectors.*;
 import java.util.stream.Stream;
 import java.util.Optional;
 
 
 public class OptionalExample {
   private static Map<String, Player> players = new HashMap<String, Player>() {{
     put("John", new Player("John", 10));
     put("David", new Player("David", 15));
     put("Matt", new Player("Matt", 20));
     put("Erica", new Player("Erica", 25));
   }};
 
   private static Optional<Player> findPlayerByName(String name) {
     Player player = players.get(name);
     return player == null ? Optional.empty() : Optional.of(player);
   }
 
   private static Integer getAge(Player player) {
     return player.age;
   }
 
 
   public static void main(String[] args) {
     findPlayerByName("John").ifPresent(System.out::println);
     Player player = findPlayerByName("Jeff").orElse(new Player("Jeff", 40));
     System.out.println("orElse " + player);
     Integer age = findPlayerByName("Jeff").map(OptionalExample::getAge).orElse(-1);
     System.out.println("Jeff age " + age);
   }
 }
 

CompletableFuture

Java has long supported future, but previously you had to call blocking get() to retrieve the result. With Java 8, you can use CompletableFuture to define the behavior when asynchronous processing is completed, e.g.

 private static final ExecutorService executor = Executors.newFixedThreadPool(2);
 public static CompletableFuture getPlayer(String name) {
    return CompletableFuture.supplyAsync(() -> new Player(), executor);
 }
 getQuote("name").thenAccept(player -> System.out.println(player));
 
 

Summary

Over the past few years, functional programming languages have become mainstream and Java 8 brings many of those capabilites despite being late. These new features help write better and more concise code. You will have to change existing code to make more use of immutable objects and use streams instead of objects when possible. As far as stability, I found java 8 compiler on Linux environment a bit buggy that crashed often so it may take a little while before Java 8 can be used in production.


September 12, 2013

NFJS Seattle 2013 Review

Filed under: Computing — admin @ 7:38 pm

I attended NFJS conference over the weekend. It was a short conference from Friday, Sep 6 to Sunday, Sep 8. It had a number of sessions on Java 8, Javascript, Mobile, and functional areas. Here are some of the sections that I enjoyed:

Java 8 Language Capabilities

This session gave a brief overview of new features of Java 8 mainly new closure/lambda syntax. Venkat is a great speaker and he was coding live throughout the session.

Concurrency without Pain in Pure Java

This was related to first talk by Venkat Subramaniam and covered a number of patterns such as Actors and STM for concurrency.

Functional SOLID

In this talk Matt Stine gave overview of SOLID principles and how functional programming make it easier to apply these patterns. This was more abstract talk and didn’t go into examples of those patterns.

Programming with Immutability

This was more practical talk by Matt Stine and he gave examples of immutability and functional programming in Java and Groovy with live coding. He mentioned a number of tools that can make it easier to build mutable and immutable pair of classes and how immutable classes can be used in other frameworks such as Hibernate.

Rich Web Apps with Angular

This was a short introduction to Angular by Raju Gandhi.

Vagrant: Virtualized Development Environments Made Simple

This was a great introduction to Vagrant and how to setup a complete development, testing and production environments on your local desktop.

Simulation Testing with Simulant

This was a short introduction to Simulant testing library that Stuart Halloway has been using for testing Datomic database. This testing library can be used for functional testing and load testing. It saves all data in datomic database and can be populated from existing data.

Generative Testing

This was another session of testing framework by Stuart Halloway. This framework provides great support for generating test data and is somewhat similar to QuickCheck, though it doesn’t offer reduction.

Summary

I didn’t go to OSCON this year and enjoyed smaller pavilion that NFJS provided. There were a couple of dud sessions, but overall I enjoyed it.


August 8, 2012

Back from OSCON 2012

Filed under: Computing — admin @ 5:58 pm

I went back to OSCON last month (July 2012), which was held in Portland again. I saw the biggest crowd this year and over 3000 folks attended the conference. Here are some of the tutorials and sessions I attended:

R

I attended R tutorial on the first day, which I found quite informative. There has been a lot of interest in Data Science and R is a great tool for statistical analysis and graphs.

Scala

On the second half of Monday, I attended Scala Koans, which was disappointing. First, they could not get us started with the session as Wifi died on us and they didn’t have much backup plan. We finally started the session after waiting for an hour and then were mostly left on our own to finish the Koans. Yeah, I could have done that myself by downloading Koans.

Android-Fu

On Tuesday, I attended Android-Fu, which was somewhat useful. I have had quite a bit Android development at work, but I got a few pointers.

Android Testing

The second half of Tuesday, I attended Android Testing, which was useful but the presenter was incredible boring and had hard time keeping awake. [Slides]

Go

The real conference started on Wednesday and I attended Go session, which was mostly about governance behind Go language.

Storm

I then attended session on Storm, which was informative described differences between Hadoop and Storm and showed some actual code.

MongoDB

I then attended session on Running MongoDB for High Availability, which showed some of weak areas and lessons learned while scaling MongoDB. [Slides]

Apache Zookeeper

This session was also very informative and it showed various uses of Zookeeper.

Disruptor

I then attended session on Disruptor, which is incredibly fast concurrency framework. [Slides]

Building Functional Hybrid Apps For The iPhone And Android

I then attended Building Functional Hybrid Apps For The iPhone And Android, which was mostly marketing talk for Websphere IDE and speakers ignored value of PhoneGap (Apache Cordova), which was behind their demo.

Node.js

On thursday, I attended session on Node.js, which was nice introduction with some code samples. [Slides]

I attended another session on Node.js about Node.js in Production: Postmortem Debugging and Performance Analysis, which showed a number of ways to debug your Node.js applications. [Slides]

Advanced MySQL Replication Architectures

This was also very informative session and you can take a look at slides.

The Art of Organizational Manipulation

This was entertaining talk about how to build influence in workplace.

High Performance Network Programming on the JVM

This was another informative talk about building network applications in Java and you would slides very helpful.

Summary

I found very few sessions on mobile this year and there were a couple of sessions on Android and I wanted to see more. There were a lot of vendor sponsored sessions on private clouds and a lot of vendors in exhibition hall were promoting frameworks such as OpenStack, CloudStack, Eucalyptus and others. There were also quite a few sessions on distributed frameworks such as Hadoop, HBase, MongoDB, Zookeeper, Storm, Disruptor, etc, which I enjoyed. I didn’t see a lot of sessions on functional languages as past and wanted to see some sessions on Clojure (which was cancelled).
NoSQL Databases – A few sessions were on HBase, MongoDB and Cassandra. There was a lot of enthusiasm for HTML5 again and a lot of sessions were sold out. Having attended similar sessions in past, I skipped most of them. Overall, I had good time but not sure if I would be back next year. You can read presentation slides from http://www.oscon.com/oscon2012/public/schedule/proceedings.

June 1, 2011

Deploying Rails 3.0 App on Amazon EC2

Filed under: EC2 — admin @ 12:56 pm

It’s been a few years since I wrote a short HOW-TO on working with EC2 , but recently I tried to migrate the backend of Trading Floor – Facebook and iOS game I have been developing to EC2. So I am documenting the steps for setting up the EC2 for Rails 3.0.

Pre-requisites

I assume you already signed up for EC2, otherwise go to http://aws.amazon.com/ec2/ to signup. Also, you will need Java 5.0 or above, which you can download it from Oracle.

Download EC2 API Tools

First, download EC2 from http://developer.amazonwebservices.com/connect/entry.jspa?externalID=351 and uncompress it in your root directory.

Create a X.509 Certificate

Next, create a X.509 certificate from the AWS Account section. You can then download your certificate and key safely, e.g. I saved them in .ec2 directory under my home directory. Note that you will not be able to download the key again, so don’t lose it.

Environment Variables

Next, I changed my shell as:

 export EC2_HOME=~/ec2-api-tools-1.4.2.4
 export PATH=$PATH:$EC2_HOME/bin
 export EC2_KEY_DIR=~/.ec2
 export EC2_PRIVATE_KEY=$EC2_KEY_DIR/pk-HFG55OCFPZARA6YHW5JGIE6JFD7EQE72.pem
 export EC2_CERT=$EC2_KEY_DIR/cert-HFG55OCFPZARA6YHW5JGIE6JFD7EQE72.pem
 

Where EC2_PRIVATE_KEY and EC2_CERT points to the X.509 key and certificate I downloaded from the Amazon.

Create a Key-Pair

Then I created a pair of keys as:

 ec2-add-keypair plexobject
 

Create a Security Group

I then created a security group for the server

 ec2-add-group web -d 'Web Server'
 ec2-authorize web -P tcp -p 22 -s 0.0.0.0/0
 ec2-authorize web -P tcp -p 80 -s 0.0.0.0/0
 ec2-authorize web -P tcp -p 443 -s 0.0.0.0/0
 

Finding a basic Ubuntu based AMI

Previously I used S3 based AMI, but Amazon now supports EBS based AMIs that has advantage that any changes to the root survive instances of EC2. I launched EC2 instance with basic Ubuntu 11.0 Natty from http://alestic.com/ as:

 ec2-run-instances ami-06ad526f --instance-count 1 --instance-type m1.small \
 --key plexobject --group web -z us-east-1d -m
 

Where -z describes the availability zone and -m turns on monitoring.

Installing Ubuntu Packages

I then proceeded to install basic packages such as Java, Curl, Git, Build, Ruby (1.8.7) and Rails (3.0.3) based on Rails/Ubuntu docs such as:

 sudo apt-get install openjdk-6-jdk
 sudo apt-get install mercurial
 sudo apt-get install curl git-core build-essential zlib1g-dev libssl-dev libreadline5-dev
 sudo  apt-get install libcurl4-openssl-dev 
 sudo apt-get install ruby
 sudo apt-get install rubygems1.8
 sudo gem install rubygems-update
 sudo update_rubygems
 sudo gem install rails
 

I then edited /etc/profile /etc/bash.bashrc and added environment variables

 export PATH=$PATH:/var/lib/gems/1.8/bin/
 export JAVA_HOME=/usr/lib/jvm/java-1.6.0-openjdk/
 

Next, I installed Sqlite and Mysql:

 sudo apt-get install sqlite3 libsqlite3-dev
 sudo gem install sqlite3-ruby
 sudo apt-get install mysql-server mysql-client
 sudo apt-get install libmysql-ruby libmysqlclient-dev
 

Next, I installed Apache and Passenger:

 sudo apt-get install apache2 apache2-mpm-prefork apache2-prefork-dev
 sudo apt-get install apache2-dev libapr1-dev libaprutil1-dev
 sudo gem install passenger
 sudo /var/lib/gems/1.8/gems/passenger-3.0.7/bin/passenger-install-apache2-module*
 

I then edited /etc/apache2/apache2.conf and added:

 LoadModule passenger_module /var/lib/gems/1.8/gems/passenger-3.0.7/ext/apache2/mod_passenger.so
 PassengerRoot /var/lib/gems/1.8/gems/passenger-3.0.7
 PassengerRuby /usr/bin/ruby1.8
 

and then restarted apache

 /etc/init.d/apache2 restart
 

Creating EBS Volume for Data

Next, I created an EBS volume to store all data such as database tables and Rails application from the AWS Console and then attached it to the instance as:

 ec2-stop-instances i-73ab181d 
 ec2-attach-volume  vol-612eaa0a -i i-73ab181d -d /dev/sdf
 ec2-start-instances i-73ab181d 
 

Note that you have to create the EBS volumen in the same availability zone as your instance. I then logged into my machine using

 ssh -i plexobject.pem ubuntu@ec2-50-19-134-251.compute-1.amazonaws.com
 

and then formatted the newly built EBS volume as:

 sudo fdisk -l
 sudo mkfs -t ext4 /dev/xvdf
 

I then edited /etc/fstab and added

 /dev/xvdf       /data   auto    defaults,nobootwait,noatime     0       0
 

and then rebooted machine

 sudo reboot
 

Moving Mysql Data Directory

Mysql installs data directory on the root volume in /var/lib/mysql directory, which I wanted to move to newly created volume. So I created a directory /data/mysql so I stopped mysql:

 sudo /etc/init.d/mysql stop
 

I then copied mysql data directory such as:

 sudo cp -R -p /var/lib/mysql/mysql /data/mysql
 sudo chown -R mysql:mysql /data/mysql/
 

I didn’t copy entire mysql directory, only mysql subdirectory. Next I edited /etc/mysql/my.cnf and changed datadir to /data/mysql directory and then edited /etc/apparmor.d/usr.sbin.mysqld and changed all /var/lib/mysql to /data/mysql. Finally I restarted AppArmor profiles as:

 sudo /etc/init.d/apparmor reload
 

Then restarted mysql:

 sudo /etc/init.d/mysql restart
 

I changed my root password and created a local mysql user as

 mysql> SET PASSWORD FOR 'root'@'localhost' = PASSWORD('mypass');
 mysql> GRANT ALL PRIVILEGES ON *.* TO 'tfuser'@'localhost' IDENTIFIED BY 'mypass' WITH GRANT OPTION;
 mysql> GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, INDEX, ALTER, CREATE TEMPORARY TABLES, LOCK TABLES ON dbname.* TO 'tfuser'@'localhost' IDENTIFIED BY 'mypass';
 

I copied my app to /data/trading_floor and changed permissions of all files to www-data

 sudo chown -R www-data:www-data /data/trading_floor
 

Then created /etc/apache2/sites-available/tf with

 
         ServerAdmin shahbhat@gmail.com
         ServerName tf.plexobject.com
         DocumentRoot /data/trading_floor/public/
         
                  AllowOverride all
                 Options -MultiViews
                 RailsEnv production
          
 
 

Finally, I restarted apache

 /etc/init.d/apache2 restart
 

Creating Elastic IP address

I wanted a permanent IP address for the server so I created EIP using AWS Console. Then associated my instance with the new IP address:

 ec2-associate-address 50.19.248.7 -i i-73ab181d
 

It rebooted your machine with the new IP address. I then changed DNS zone and pointed tf.plexobject.com to 50.19.248.7 (this may take hours to propagate). Next, I changed my Facebook app’s configuration and iOS app’s configuration to point to tf.plexobject.com.

Creating my EBS Image

Once I was happy with the server configuration, I created EBS image for future use. First, I detached data volume and then created image as follows:

 ec2-stop-instances i-f97ca197
 ec2-detach-volume vol-3f65d954
 ec2-create-image i-f97ca197 -n tf-20110601 -d 'Trading Floor Application Server'
 

I terminated my previous instance as

 ec2-terminate-instances -i  i-f97ca197 
 

and created instance with the new image

 ec2-run-instances ami-80837ae9 --instance-count 1 --instance-type m1.small --key tf --group web -z us-east-1d -m
 

After the launch, you would have to reattach the data volume

 ec2-stop-instances i-73ab181d 
 ec2-attach-volume  vol-612eaa0a -i i-73ab181d -d /dev/sdf
 ec2-start-instances i-73ab181d 
 

Summary

Voila, I had my game application running on the cloud. In order to cut per/hour cost I reserved instances for entire year. I am not quite done with my server and am now working on application specific configuration and adding better monitoring/backup. As, we have learned from recent Amazon Cloud outage that deploying your app on the cloud is only half the work, making it performant, scalable and fault tolerant is other half which is still manual work. Finally, I plan to release the Facebook app for Trading Floor and submit iOS app in a couple of weeks, be sure to try it and send me your suggestions.

September 22, 2010

An implementation of Virtual Node Router with Consistent Hash algorithm

Filed under: Java — admin @ 1:02 pm

Since the Dynamo paper, published a few years ago, DHTs and consistent hash have become mainstream. Here is my implementation of a virtual node router that uses consistent hash algorithm for splitting requests to the virtual nodes:

 import java.nio.ByteBuffer;
 import java.security.MessageDigest;
 import java.util.Comparator;
 import java.util.Iterator;
 import java.util.Map;
 import java.util.SortedMap;
 import java.util.TreeMap;
 
 public class VirtualNodeRouter {
     interface HashCalculator {
         long calculateHash(String key);
     }
 
     private static class VirtualNode {
         final String nodeName;
         final int replicaNumber;
 
         VirtualNode(final String nodeName, final int replicaNumber) {
             this.nodeName = nodeName.toLowerCase();
             this.replicaNumber = replicaNumber;
         }
 
         boolean matches(String host) {
             return nodeName.equalsIgnoreCase(host);
         }
 
         @Override
         public String toString() {
             return nodeName + ":" + replicaNumber;
         }
     }
 
     private final HashCalculator hashFunction;
     private final SortedMap<Long, VirtualNode> virtualNodePoolByHash = new TreeMap<Long, VirtualNode>(
             new Comparator<Long>() {
                 public int compare(Long i, Long j) {
                     if (i > j) {
                         return 1;
                     } else if (i < j) {
                         return -1;
                     } else {
                         return 0;
                     }
                 }
             });
 
     public VirtualNodeRouter() {
         this(new HashCalculator() {
             public long calculateHash(String key) {
                 try {
                     MessageDigest sha1 = MessageDigest.getInstance("SHA1");
                     sha1.update(key.getBytes());
                     byte[] digest = sha1.digest();
                     return bytesToLong(digest);
                 } catch (Exception e) {
                     throw new RuntimeException(e);
                 }
             }
         });
     }
 
     public VirtualNodeRouter(final HashCalculator f) {
         this.hashFunction = f;
     }
 
     /**
      * Adds a node with one replica
      * 
      * @param node
      *            - node name
      */
     public void add(String node) {
         add(node, 1);
     }
 
     /**
      * Adds a node to the available pool
      * 
      * @param node
      *            - node name
      * @param replicas
      *            - # of replicas - increase # of replicas based on the
      *            computing power of the machine
      */
     public void add(String node, int replicas) {
         // Note: You can call this method incrementally by adding more replicas,
         // so that you don't cause DOS on
         // your own services
         int existingReplicas = getReplicas(node);
 
         for (int i = 0; i < replicas; i++) {
             VirtualNode virtualNode = new VirtualNode(node, i
                     + existingReplicas);
             virtualNodePoolByHash.put(hashFunction.calculateHash(virtualNode
                     .toString()), virtualNode);
         }
     }
 
     /**
      * remove the node from available pool
      * 
      * @param node
      */
     public void remove(String node) {
         Iterator<Long> it = virtualNodePoolByHash.keySet().iterator();
         while (it.hasNext()) {
             Long key = it.next();
             VirtualNode virtualNode = virtualNodePoolByHash.get(key);
             if (virtualNode.matches(node)) {
                 it.remove();
             }
         }
     }
 
     public String getNode(String key) {
         if (virtualNodePoolByHash.isEmpty()) {
             return null;
         }
         long hash = hashFunction.calculateHash(key);
         for (Map.Entry<Long, VirtualNode> e : virtualNodePoolByHash.entrySet()) {
             if (hash < e.getKey()) {
                 return e.getValue().nodeName;
             }
         }
         SortedMap<Long, VirtualNode> tailMap = virtualNodePoolByHash
                 .tailMap(hash);
         hash = tailMap.isEmpty() ? virtualNodePoolByHash.firstKey() : tailMap
                 .firstKey();
         return virtualNodePoolByHash.get(hash).nodeName;
     }
 
     public void dump() {
         for (Map.Entry<Long, VirtualNode> e : virtualNodePoolByHash.entrySet()) {
             System.out.println("  " + e.getKey() + " => " + e.getValue());
         }
     }
 
     public int getReplicas(String nodeName) {
         int replicas = 0;
         for (VirtualNode node : virtualNodePoolByHash.values()) {
             if (node.matches(nodeName)) {
                 replicas++;
             }
         }
         return replicas;
     }
 
     private static long bytesToLong(byte[] b) {
         ByteBuffer bb = ByteBuffer.wrap(b);
         return bb.getLong();
     }
 }
 

The virtual nodes are added by specifying the node name and the number of replicas. You can add more virtual nodes for the more powerful machines than the low-end machines. You can call remove method when the node goes down or is unavailable. The getNode is called when a request needs to be routed. For example, if you are caching results of a service, you can use the key of cache to find the virtual node and then store the cache value at that node. You can read following resources to learn more about the consistent hash algorithm:



September 16, 2010

My impression of Diaspora codebase

Filed under: Computing — admin @ 5:07 pm

I briefly reviewed the Diaspora source code, which is all written in Ruby on Rails 3.0. Here are my initial thoughts on the codebase:

What I liked:

  • It’s all open source — YES!
  • Diaspora uses latest Rails 3.0 APIs such as global respond_to in controllers, thin controllers, and fat models with new query APIs. The code uses RSpec for tests and Factory-Girl for creating test objects instead of fixtures, which are much easily managed. There are a few Selenium tests, but most of the tests are in RSpec.
  • Diaspora is built using latest technologies and standards such as PKI (OpenSSL), HTML5, VCard, Websockets, Microformat, XRDS, PubSubHubbub along with popular libraries such as EventMachine, HAML, jQuery, Fancybox, Sprinkle, Bundler, Blueprint, etc.
  • Deployment scripts and documentation – Though, there isn’t any documentation on overall architecture or comments in the code, but I found installation documentation very helpful. Also, the deployment rake tasks and configurations are all included in the source code.

What I disliked:

Though, I found the code to be fairly easy to read and consistent in style I found following problems related to the performance, scalability and modularity.

  • Service API – Diaspora uses Rails controller for serving HTML as well as JSON, XML and XRDS requests but I would have preferred separate services for the API with much more defined contract.
  • Pagination – I found sporadic use of pagination in the code but a number of classes use Rails’ builtin relationships without any pagination. It’s like when you ask for banana you get the gorilla holding the banana. In my experience, this has been problem with all O/R mapping tools, which give you nice syntax for fetching related objects until your server runs out of memory or your database dies on you.
  • Before/after filters – I found a number of such filters in the code, which I have found to be another common issue with the scalability when before/after filters require a lot of overhead.
  • Asynchronous Messaging – I didn’t see any use of asynchronous messaging as a lot of requests such as adding/removing a friend can be done asynchronous.
  • Modularity – Diaspora code uses modules and models for the business logic, but I found a couple of models such as user to be too big, which can be divided into suitable modules.
  • MongoDB – I have nothing against MongoDB but I found a lot of code depends on MongoDB. I would have preferred using data access service instead, which completely encapsulates underlying storage technology and allows you to replace it without modifying all the code.

Conclusion

Despite the Knuth’s advice: “Forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil,” you need the architecture for scalability and performance when you are building the platform to replace Facebook. As social networking tools depend much more on Network effects or Metcalfe’s law instead of best technology, I hope early release of the software allows it to capture more users who use it. I was somewhat disappointed that identi.ca has not caught much attraction as open source alternative to Twitter. And I hope that Diaspora succeeds in becoming a good alternative to Facebook.

July 25, 2010

Tutorial days from OSCON 2010

Filed under: Computing — admin @ 10:21 pm

I had fun at OSCON last year, so I decided to go back this year. It’s incredible experience being part of hundreds of developers (more than 2500), who are excited about open source and upcoming technogolies. Though, there were a number of tracks, but I was mainly interested in Mobile Computing, Cloud Computing, No-SQL, and Scala. There were also interesting sessions on Emerging Languages and Hardware Hacking with Arduino, but I didn’t get chance to attend them.

Tutorials Day One

I arrived in Portalnd on Sunday night and the first session I attended on Monday morning was on Android. I saw fair number of Android phones at the conference and people were excited about Android. There was a lot of bashing of Apple, but I won’t get into it. Here are some of the highlights from the Android session:

Android for Java Developers

Android is a specially designed O/S based on Linux that uses a number of open source libraries and custom framework for building mobile applications. It even comes with a command line tool adb that opens shell directly on the device or emulator. Though, Android applications are written in Java but the byte-codes are converted into Dalvik instruction sets (.dex files). Dalvik is a register based Java VM as opposed to Sun’s stack based and uses JavaSE minus Swing/AWT APIs. You use Dalvik generated executable and resources (images, audio, configurations, etc.) to build the application file (APK), which is then signed by self-generated certificate. The best part of Android development is their deployment, which is an order of magnitude easier compare to Apple iOS.

Android SDK

Before developing, you would need to download the Eclipse IDE, Android SDK and then install Eclipse plugin from https://dl-ssl.google.com/android/eclipse/.

Creating Android Device

One of the downside of Android is variations of the capabilities of all Android devices, however you don’t need to own all variations. All you need is to create Android device by giving name, target version, SD card, skin and other hardware limitations. Android comes with emulator as opposed to simulator with iPhone SDK and follows hardware much more closely. On the downside, it takes a long time to start the Android emulator, but on the upside you can emulate call, SMS, and launch multiple emulators, which can other emulators. Note, the android device file definition goes into ~/.android/avd directory.

Hello World

You can create a new project and specify the build target, application name, package name, and main activity class. An activity class represents a controller class, which you define for each screen. Android uses UI to generate layout file (R file). If your application need resources such as images and property files, you store them in res directory. You can also create resource files for storing strings to make your application easily localized for various languages.

Activities

Activities represent screens and are managed by the activity managers. The activity can be in one of five states: starting, running, paused, stopped, and destroyed. The Activity class defines callback methods such as onCreate, onStart, onRestoreInstanceState, and onResume when the state changes.

Intent

Intent represents events or actions, which can be explicit or implicit.

Services

One of the key advantage of Android has been its support of multi-tasking and you can have a background processing using services. Services also have lifecycle, but simpler than activities.

Content Providers

Content providers allow sharing data between applications such as contacts, mediastore, settings, etc.

Broadcast Receivers

These allow pub-sub based mechanism for system events such as SMS messages.

Architecture of Android applications

Here are some of the tips that Marko Gargenta gave when designing application:

  • Isolate I/O operations such as network or disk operations into separate tasks or background services, which either use notification or database to communicate with interactive application. For example, in his sample twitter application, he used background service to poll tweets and stored them to the database. The activities then polled tweets from the database and also subscribed to the notification when new tweet arrives.
  • Use layout for screen design as it is more declarative and separate text values from the layout and use string resources.
  • Due to the variations of the Android devices, use layout weight and density intensity pixel (dp or sp) instead of fixed pixels (px) for components.
  • Android SDK provides GUI tool for designing layout and you will need to bind the UI components back to the activity classes, so use consistent naming convention for both layout file and the activity file, e.g.
       public class Twitter extends Activity {
         EditText editTweet;
         Button buttonUpdate;
         public void onCreate(...) {
           editTweet = (EditText) findViewById(R.id.editTweet);
           editButton = (Button) findViewById(R.id.editButton);
           buttonUpdate.setOnClickListener(this)
             String tweet = editTweet.getText().toString();
           }
         }
       }
     

Adapters

Android allows easily access large datasets as arrays that can be displayed on the screen.

Logging

Android uses custom libc and Java for Logging and you can add logging as:

   Log.debug("ClassName", "button clicked");
 

You can use “adb logcat” command to view logs, e.g. “adb logcat Twitter:* *:S”

Security Permissions

Any permissions that user need must be defined in the manifest file.

Advanced components

Android comes with a number of advanced components such as Map, Menus, Graphics, Animations, Multimedia, Preferences, Sqlite databases, etc.

Cloud to Device Push

This is a new service similar to iPhone push notification.

Debugging

Marko suggested use of Eclipse debugger, logcat, hierarchy viewer and traceview for debugging.

For more information on Android session, Download the slides.

The Seductions of Scala

For the second half of the day, I attended Dean Wampler’s session on Scala. I have been long interested in Scala and have done a little development on my own. As OSCON offered a lot of sessions on Scala, I took the opportunity to learn more on Scala. Dean highlighted concurrency, concise code, and correctness, better object model as major benefits of Scala.

Introduction

Scala comes with a number of features for concise code such as implicit type conversion, properties with uniform access principle and optional semicolons and paranthesis when function arguments are one or empty. Scala allows objects to act as function using apply method, e.g.

   class Logger(val level: Level) {
         def apply(message: String) = {
                 log(level, message)
         }
   }
 
 val error = new Logger(ERROR) ...
 error("Network error.")
 

Also, Scala treats primitive types as objects, but are comiled down as primitivies. Scala also treats functions as objects, e.g. you can create list or map without new

 val list = List(1, 2, 3, 4, 5)
 val map = Map("name" -> "Dean", "age" -> 39)
 

Above list syntax is same as 1::2::3::4::5::Nil. Scala also allows any symbol for function name so you can define functions that look like operator overloading. Scala uses infix operator notation, for example, following two expressions are equivalent:

 "hello" + "world" 
 "hello".+("world")
 

Scala gives you higher level operations such as map, filter, fold/reduce, e.g.

 val list = "a" :: "b" :: Nil
 list map {
         s => s.toUpperCase
 }
 

Generics

Martin Ordesky added limited support of Generics in Java, but he added fully functional generics support in Scala, e.g.

 class List[A] { ...
 def map[B](f: A => B): List[B]
 ... 
 }
 

Traits

One of nice feature of Scala is its support of Traits, which are interfaces with implementation and are similar to Ruby mixins. Here is an example:

 trait Logger { def log(level: Level,
 message: String) = { Log.log(level, message)
 } }
 val dean = new Person(...) extends Logger
 dean.log(ERROR, "Bozo alert!!")
 

Scala also defines traits for functions to convert them into objects, e.g.

 trait Function1[A,R] extends AnyRef {
 def apply(a:A): R
 ... }
 

User-defined factory methods

You can define functions as factory-methods to instantiate objects, e.g.

  val persons = Map("dean" -> deanPerson, "alex", -> alexPerson)
 

DSL

Scala offers powerful semantics to define internal DSLs, e.g. you can create your own controls, e.g.

 import java.io._ object Loop {
 {...}
 }
 def loop(file: File, f: (Int,String) => Unit) =
 ...
 loop (new File("...")) { (n, line) => ...
 }
 

Options as alternative to Null

Scala avoid NullPointerExceptions by wrapping nulls into options, e.g.

 abstract class Option[T] {...} case class Some[T](t: T)
 extends Option[T] {...} case object None
 extends Option[Nothing] {...}
 

Case Classes

Case classes provide succint syntax for creatng Javabeans.

For comprehensions

Scala provides for comprehensions, which are similar to Python generators, e.g.

 val l = List( Some("a"), None, Some("b"), None, Some("c"))
 for (Some(s) <- l) yield s
 

Actors

Scala provides Actor based concurrency similar to Erlang, though there are multiple implementations and Akka seems to provide better implementation than what comes with Scala. Here is an example:

 case class Point(
 x: Double, y: Double)
 abstract class Shape { def draw()
 }
 ....
 package shapes import scala.actors._, Actor._ object ShapeDrawingActor
 extends Actor { def act() {
 loop { receive {
   case s:Shape =>
   s.draw()
 ... }
 } }
 }
 

Tutorials Day Two

On day two, I attended all-day Scala summit, which covered various topics for practical Scala.

Why Scala?

The summit started with session on "Why Scala?" by Alex Payne and Dean Wampler. Dean repeated some of same concepts from Monday's session on Scala's conciseness, concurrency, correctness, infix operator, type inference, case classes, etc. Dean then gave some examples of actors using Akka, where he calls multiple services using actors and then gather the results, e.g.

 val futures = for { s   <- services
 server <- allServersFor(s) }
 yield (server !!! HeartBeat)
 Futures.awaitAll(futures) val results = for {
 future <- futures
 result <- future.result } yield result val all = results reduceLeft(
 (r1, r2) => r1 merge r2 ) compact(render(all))
 }}
 

Akka: Simpler Scalability, Fault-Tolerance, Concurrency & Remoting through Actors

Jonas Bonér then gave brief overview of Akka, which provides a number ofabstractions for concurrency such as actors, STM, and agents. Jonas gave introduction to actors, which provide concurrency based on message-passing, shared-nothing, and mailbox. In Akka, actors can be thread based or event based, where event based actors are very light-weight and you can create millions of them (each actor takes about 600 bytes as opposed to 300 bytes in Erlang).

factory-methods

Akka uses factory-methods to hide type of actors, e.g.

 val counter = actorOf[Counter]  // counter is ActorRef
 actor.start
 actor.stop
 
 

Jonas suggested use of actorOf as opposed to "new Counter" syntax as it avoids calling methods on the objects directly. Akka uses !, !! and !!! notations to send messages, where ! is just fire and forget, !! collects results using Futures and !!! returns Future, e..g

 counter ! Tick // send message -- fire-forget
 val result  = (actor !! Message).as[String] // uses Future under the hood with timeout
 val resultOption = actor !! Message
 val result = resultOption.getOrElse(defaultResult)
 val result = resultOption.getOrElse(throw new Exception("time out"))
 val future = actor !!! Message
 future.await
 val result = future.get
 
 Futures.awaitOne(List(fut1, ..))
 Futures.awaitAll(List(fut1,))
 
 

You use self.reply to reply back to the sender and access sender using self.sender or self.senderFuture.

Immutable Messages

In order to keep actors free of side effects, messages must be immutable using case classes, tuples or lists, e.g.

 - case class Register(user: User)
 - actor ! Register(user)
 - actor ! (username, password)
 - actor ! List("xxx", "yy")
 

Dispatchers

Akka comes with a number of dispatches such as event based, thread based, reactor, etc. See Dispatchers class for more information.

Queues

Akka also comes with various queue types such as unbounded LinkedBlockingQueue, bounded LinkedBlockingQueue, etc.

ActorRegistry

ActorRegistry provides lookup methods for actors such as ActorRegistry.actorsFor.

Fault tolerance

Akka borrows concepts of hierarchy of supervisors for managing actors or processes from Erlang. Erlang's philosophy for fault tolerance is let it crash and the supervisor automatically starts failed process or group of processes. You use link(actor), unlink(actor), startLink(actor) to connect actors with supervisors and trap events using trapExit = List(classOf[ServiceException], classOf[PersistentException]), e.g.

 class Supervisor extends Actor {
   import self._
   trapExit = List(classOf[Throwable])
 }
 
 class FaultTolerantService extends Actor
   override def preRestart
   override def postRestart
 

Remote Actors

You can start a node, which is remotely accessible using:

 RemoteNode.start("localhost", 9999)
 spawnLinkRemote[MyActor]("darkstar", 9999)
 
 RemoteNode.register("service:id", )
 

STM, Transactors, Modules, Camel, Storage

We ran out of time for the rest of contents, but you can read more from the Slides.

Simple Build Tool

Next, Mark Harrah presented SBT, which everyone raved at the summit. SBT uses Scala based DSL for writing build scripts and internally uses Ivy for managing dependencies. You can create a new project by creating a new directory and typing sbt. You can set various properties in sbt shell such as target version, e.g.

 set build.scala.versions 2.80
 reload
 

You can easily create custom tasks in sbt by extending DefaultProject, e.g.

 import sbt._
 class MyProject(info: ProjectInfo) extends DefaultProject(info) {
   lazy val hi = task { println("Hi"); None}
   lazy val goodbye = task { println("Bye"); None} dependsOn(hi)
 }
 

You can also target tasks for just test using

 import sbt._
 class MyProject(info: ProjectInfo) extends DefaultProject(info) {
   val sc = "org.scala-tools.testing" %% "scalacheck" % "1.7" % "test"
 }
 OR
 import sbt._
 class MyProject(info: ProjectInfo) extends DefaultProject(info) {
   val sc = "org.scala-tools.sbt" %% "launcher-interface" % "0.74"
   val tt = "org.scala-tools.sbt" %% "launcher-interface" % "0.74" % "test"
 }
 

You can define main application as follows:

 import xsbti._
 class HW extends AppMain {
   def run(config: AppConfiguration): MainResult = {config.arguments foreach println; new Exit {def code = 0}}
 }
 

You generate executable jar by typing publish-local in sbt shell. You can define plugins as follows:

 class Plugins(inf: ProjectInfo) extends PluginDefinition(info) {
 val android = "org.scala-tools.sbt" % "sbt-android-plugin" % "0.5.0"
 }
 

Finally, sbt allows you to create processors, which behave like scaffolding in Rails, e.g.

 import xsbti._
 import processor._
 class HW extends BasicProcessor {
   def apply(project: Project, args: String) {
     import project._
         val contents = "This is " + name + "" + version + "\n" + args + "\n"
         FileUtilities.write(info.projectPath / "README" asFile, contents, log)
   }
 }
 

When you type publish, it will create README file for the project. That was pretty much the introduction to the sbt.

Specs & Scala, Tips and Tricks for a Friendly DSL Syntax

Eric Torreborre talked about Spec, which a BDD based testing tool for Scala. Spec provides support for BDD, Structures, Matchers, ScalaCheck, Mocks, Runners and databases. You use matchers to compare strings or XML contents, e.g.

 class Reverse2Spec extends  Specficiation {
   reverse("") must_== ""
 ...
 
 

You can restrict scope by defining tag method, e.g.

 class Example(des : String) {
   def in(arg: Any) = expectations
   def tag(t : String) = this
 }
 

Scala DSL

Spec uses a number of tricks for simplifying the syntax such as implicit parameters, operators, lazy evaluation.

 "With a 3 tests ok" in {
   prop 
 }
 //Some paraemeters can be implicit
 implicit val defaultparams = new Params
 
 "this should not explode" in {
   error("boom")
 }
 def in(e: => Any) =                                     // parameters are evaluated lazily when you use it
 

It also uses principles such as add principle by adding new functionality, e.g.

 result + 1
 result.pp + 1
 

Spec also supports table similar to Fit and Fitness for writing concise tests. Overall, I was impressed with wide set of tools for writing tests.

Lift: Quick and Fun

Lift is a Scala based web framework for writing secure, typesafe, concise, and interactive (like desktop) applications. It abstracts much of plumbing of HTTP, which I personally don't like as I have found web frameworks that does that results in leaky abstractions. Lift also uses stateful web applications, which require sticky sessions, which is another area that I have found to be problematic for scalability and upgrade. Here is an example of Lift chat server:

 package code.comet
 import net.liftweb._
 import http._
 import actor._
 import scala.xml.
 object ChatServer extends LiftActor withListenerManager {
         private var msgs = List("Welcome")
         def createUpdate = msgs
         override def lowPriority = {
                 case s: String => msgs ::= s; updateListeners()
         }
 }
 
 class Chat extends CometActor withCometListener {
   private var msgs: List[String] = Nil
   def regiserWith = ChatServer
   override def lowPriority = {
      case l: List[String] = msgs = l; reRender(false)  // don't use reRender
   }
   def line(in: NodeSeq) : NodeSeq = msgs.reverse.flatMap(m => bind("chat", in, "item" -> m))
   def render = bind("chat", "line" -> line _)
 }
 

In Lift, every component has GUID and version that was used to render and then sets up long poll and then receive deltas (every 100ms). You can use sbt to deploy jetty and prepare war file, e.g.

 sbt
 >jetty-run
 >prepareWeb                                                     // uses JRebel to reload classes
 

Rewiring Android with Scala

This was another interesting talk by Nathan Hamblen for using Scala for writing Android applications. The key selling point of Scala has been conciseness of the language, and you can write simple code such as:

 dialog.setOnShowListener { di: DialogInterface => 
   runSomeCode() 
 }
 

instead of

 dialog.setOnShowListener(
   new DialogInterface.OnShowListener() {
     public void onShow(DialogInterface interface) {
       runSomeCode();
     }
   }
 );
 

or

 future { runSomeCode(myObject) }
 

instead of

 new AsyncTask () {
   protected Integer doInBackground(MyObject... objs) {
     runSomeCode(objs[0]);
   }
 }.execute(myObject);
 
 

Nathan showed how you can define Scala traits to add lazy handlers for android code, e.g.

 trait ScalaActivity extends Activity {
 ...
 lazy val handler = new Handler
 def post(block: => Unit) { 
   handler.post(new Runnable{
     def run { block }
   })
 }
 

Or you can extend APIs, e.g.

 implicit def f2cancel(block: DialogInterface => Unit) = 
   new DialogInterface.OnCancelListener {
     def onCancel(dialog: DialogInterface) { 
       block(dialog) 
     }
   }
 ...
 new AlertDialog.Builder(this)
   .setOnCancelListener { 
     di: DialogInterface => finish() 
   }
 

Nathan also showed a plugin (sbt-android-plugin) to create type-safe layout instead of using R.java file generated by Android, which you can get it from git clone git://github.com/meetup/meetabout.git. On the downside, Scala based android applications require Scala jar files and the size of application becomes considerable large. Though, you can use tools to extract the classes that you need, but it would still be larger than Java code.

Scala in Practice

Alex Payne and Coda Hale had a section on Scala in practice, but it was only Q/A session. I was a bit disappointed that they didn't come prepare with actual usage or war stories from their work environment.

High Wizardry in the Land of Scala

The last section of the day was a bit on type and category theory, which was interesting but mostly theortical. Daniel Spiewak explained difference between kind and type system. The only tip from the session I got was that Values are to types as types are to kinds. Finally, Daniel explained that newly released 2.8.0 version of Scala supports continuation but it's all broken and useless.

Summary

Overall, I found sessions on both Android and Scala were well worth the time and it peaked my interest in both. I think the ecosystem of Scala has matured and there is better tools support with the new version (2.8). I am going to try to influence co-workers into using it for new development. I am also going to start Android project pretty soon but I am a bit hesitant on writing in Scala due to increased application size.


Newer Posts »

Powered by WordPress