Shahzad Bhatti

February 6, 2016

Building a Generic Data Service

Filed under: Web Services — admin @ 10:44 pm

As REST based Micro-Services have become prevalent, I often find that web and mobile clients have to connect to different services for gathering data. You may have to call dozens of services to display data on a single screen or page. Also, you may only need subset of data from each service but you still have to pay for the bandwidth and parsing cost.

I created a new Java framework PlexDataProviders for aggregating and querying data from various underlying sources, which can be used to build a general-purpose data service. PlexDataProviders is a light-weight Java framework that abstract access to various data providers such as databases, files, web services, etc. It allows aggregation of data from various data providers.

The PlexDataProviders framework is divided into two components:

  • Data Provider – This component defines interfaces that are implemented to access data sources such as database or web services.
  • Query Engine – This component is used for querying and aggregating data.

The query engine can determine dependency between providers and it also allow you to use output of one of the data provider as input to another data provider. For example, let’s assume:

  • data-provider A requires input-a1, input-a2 and produces output-a1, output-a2
  • data-provider B requires input-b1 and output-a1 and produces output-b1, output-b2

Then you can pass input-a1, input-a2 to the query engine and request output-a1, output-a2, output-b1, output-b2 output data fields.

Benefits

PlexDataProviders provides offers following benefits:

  • It provides a unified way to search data and abstracts integration to underlying data sources.
  • It helps simplifying client side logic as they can use a single data service to query all data instead of using multiple data services.
  • This also help with managing end-points as you only a single end-point instead of connecting to multiple web services.
  • As clients can specify the data they need, this helps with payload size and network bandwidth.
  • The clients only need to create a single data parser so it keeps JSON parsing logic simple.
  • As PlexDataProviders supports multi-threading, it also helps with latency of the data fetch requests.
  • It partial failure so that a failure in a single data provider doesn’t effect other data providers and the data service can still return partial results. User
  • It supports timeout so that clients can receive available data that completes in given timeout interval

Data Structure

Following are primary data structures:

  • MetaField – This class defines meta information for each data field such as name, kind, type, etc.
  • MetaFieldType – This enum class supports primitive data types supported, i.e.
    • SCALAR_TEXT – simple text
    • SCALAR_INTEGER – integer numbers
    • SCALAR_DECIMAL – decimal numbers
    • SCALAR_DATE – dates
    • SCALAR_BOOLEAN – boolean
    • VECTOR_TEXT – array of text
    • VECTOR_INTEGER – array of integers
    • VECTOR_DECIMAL – array of decimals
    • VECTOR_DATE – array of dates
    • VECTOR_BOOLEAN – array of boolean
    • BINARY – binary data
    • ROWSET – nested data rowsets
  • Metadata – This class defines a set of MetaFields used in DataRow/DataRowSet
  • DataRow – This class abstracts a row of data fields
  • DataRowSet – This class abstracts a set of rows

PlexDataProviders also supports nested structures where a data field in DataRow can be instance of DataRowSet.

Adding a Data Provider

The data provider implements following two interfaces

public interface DataProducer {
    void produce(DataRowSet requestFields, DataRowSet responseFields,
            QueryConfiguration config) throws DataProviderException;
}

Note that QueryConfiguration defines additional parameters such as:

  • pagination parameters
  • ordering/grouping
  • filtering parameters
  • timeout parameters

The timeout parameter can be used to return all available data within defined time, e.g. query engine may invoke underlying data providers in multiple threads and if underlying query takes a long time then it would return available data.

public interface DataProvider extends DataProducer, Comparable<DataProvider> {
    String getName();
 
    int getRank();
 
    Metadata getMandatoryRequestMetadata();
 
    Metadata getOptionalRequestMetadata();
 
    Metadata getResponseMetadata();
 
    TaskGranularity getTaskGranularity();
}

Each provider defines name, rank (or priority when matching for best provider), set of mandatory/optional input and output data fields. The data provider can also define granularity as coarse grain or fine grain and the implementation may execute those providers on different threads.

PlexDataProviders also provides interfaces for converting data from domain objects to DataRowSet. Here is an example of provider implementation:

public class SecuritiesBySymbolsProvider extends BaseProvider {
    private static Metadata parameterMeta = Metadata.from(SharedMeta.symbol);
    private static Metadata optionalMeta = Metadata.from();
    private static SecurityMarshaller marshaller = new SecurityMarshaller();
 
    public SecuritiesBySymbolsProvider() {
        super("SecuritiesBySymbolsProvider", parameterMeta, optionalMeta,
                marshaller.getMetadata());
    }
 
    @Override
    public void produce(DataRowSet parameter, DataRowSet response,
            QueryConfiguration config) throws DataProviderException {
        final String id = parameter.getValueAsText(SharedMeta.symbol, 0);
        Map<String, Object> criteria = new HashMap<>();
        criteria.put("symbol", id.toUpperCase());
        Collection<Security> securities = DaoLocator.securityDao.query(criteria);
        DataRowSet rowset = marshaller.marshal(securities);
        addRowSet(response, rowset, 0);
    }
}

Typically, you will create data-provider for each different kind of query that you want to support. Each data provider specifies set of required and optional data fields that can be used to generate output data fields.

Here is an example of marshalling data from Securty domain objects to DataRowSet:

public DataRowSet marshal(Security security) {
    DataRowSet rowset = new DataRowSet(responseMeta);
    marshal(rowset, security, 0);
    return rowset;
}
 
public DataRowSet marshal(Collection<Security> securities) {
    DataRowSet rowset = new DataRowSet(responseMeta);
    for (Security security : securities) {
        marshal(rowset, security, rowset.size());
    }
    return rowset;
}
...

PlexDataProviders provides DataProviderLocator interface for registering and looking up provider, e.g.

public interface DataProviderLocator {
    void register(DataProvider provider);
 
    Collection<DataProvider> locate(Metadata requestFields, Metadata responseFields);
...
}

PlexDataProviders comes with a small application that provides data services by implementing various data providers. It uses PlexService framework for defining the service, e.g.

@WebService
@Path("/data")
public class DataServiceImpl implements DataService {
    private DataProviderLocator dataProviderLocator = new DataProviderLocatorImpl();
    private QueryEngine queryEngine = new QueryEngineImpl(dataProviderLocator);
 
    public DataServiceImpl() {
        dataProviderLocator.register(new AccountsByIdsProvider());
        dataProviderLocator.register(new AccountsByUseridProvider());
        dataProviderLocator.register(new CompaniesBySymbolsProvider());
        dataProviderLocator.register(new OrdersByAccountIdsProvider());
        dataProviderLocator.register(new PositionGroupsBySymbolsProvider());
        dataProviderLocator.register(new PositionsBySymbolsProvider());
        dataProviderLocator.register(new QuotesBySymbolsProvider());
        dataProviderLocator.register(new SecuritiesBySymbolsProvider());
        dataProviderLocator.register(new UsersByIdsProvider());
        dataProviderLocator.register(new WatchlistByUserProvider());
        dataProviderLocator.register(new SymbolsProvider());
        dataProviderLocator.register(new UsersProvider());
        dataProviderLocator.register(new SymbolSearchProvider());
    }
 
    @Override
    @GET
    public DataResponse query(Request webRequest) {
        final DataRequest dataRequest = DataRequest.from(webRequest .getProperties());
        return queryEngine.query(dataRequest);
    }
}

As you can see the data service simply builds DataRequest with input data fields and sends back response back to clients.

Here is an example client that passes a search query data field and requests quote data fields with company details

public void testGetQuoteBySearch() throws Throwable {
    String jsonResp = TestWebUtils.httpGet("http://localhost:" + DEFAULT_PORT
                    + "/data?responseFields=exchange,symbol,quote.bidPrice,quote.askPrice,quote.sales,company.name&symbolQuery=AAPL");
    ...

Note that above request will use three data providers, first it uses SymbolSearchProvider provider to search for matching symbols with given query. It then uses the symbol data field to request company and quote data fields from QuotesBySymbolsProvider and CompaniesBySymbolsProvider. The PlexDataProviders framework will take care of all dependency management for providers.

Here is an example JSON response from the data service:

{
    "queryResponse": {
        "fields": [
            [{
                "symbol": "AAPL_X"
            }, {
                "quote.sales": [
                    [{
                        "symbol": "AAPL_X"
                    }, {
                        "timeOfSale.volume": 56
                    }, {
                        "timeOfSale.exchange": "DOW"
                    }, {
                        "timeOfSale.date": 1455426008762
                    }, {
                        "timeOfSale.price": 69.49132317180353
                    }],
                    [{
                        "symbol": "AAPL_X"
                    }, {
                        "timeOfSale.volume": 54
                    }, {
                        "timeOfSale.exchange": "NYSE"
                    }, {
                        "timeOfSale.date": 1455426008762
                    }, {
                        "timeOfSale.price": 16.677774132458076
                    }],
                    [{
                        "symbol": "AAPL_X"
                    }, {
                        "timeOfSale.volume": 99
                    }, {
                        "timeOfSale.exchange": "NASDAQ"
                    }, {
                        "timeOfSale.date": 1455426008762
                    }, {
                        "timeOfSale.price": 42.17891320885568
                    }],
                    [{
                        "symbol": "AAPL_X"
                    }, {
                        "timeOfSale.volume": 49
                    }, {
                        "timeOfSale.exchange": "DOW"
                    }, {
                        "timeOfSale.date": 1455426008762
                    }, {
                        "timeOfSale.price": 69.61680149649729
                    }],
                    [{
                        "symbol": "AAPL_X"
                    }, {
                        "timeOfSale.volume": 69
                    }, {
                        "timeOfSale.exchange": "NYSE"
                    }, {
                        "timeOfSale.date": 1455426008762
                    }, {
                        "timeOfSale.price": 25.353316897552833
                    }]
                ]
            }, {
                "quote.askPrice": 54.99300665695502
            }, {
                "quote.bidPrice": 26.935682182171643
            }, {
                "exchange": "DOW"
            }, {
                "company.name": "AAPL - name"
            }],
            [{
                "symbol": "AAPL"
            }, {
                "exchange": "NASDAQ"
            }]
        ],
        "errorsByProviderName": {},
        "providers": ["QuotesBySymbolsProvider", "SymbolSearchProvider", "CompaniesBySymbolsProvider"]
    }
}
PlexDataProviders is available from github and is licensed under liberal MIT license. It also comes with a small sample application for demo purpose. Feel free to send me your suggestions.

 

December 23, 2007

Released ErlSDB 0.1

Filed under: Erlang,SimpleDB,Web Services — admin @ 7:09 pm

I started working on an Erlang library to access Amazon’s SimpleDB web service and I released an early version of the library this weekend. Here are some notes on its usage:
Installing

svn checkout http://erlsdb.googlecode.com/svn/trunk/ erlsdb-read-only

Building

make

Testing

edit Makefile and add access key and secret key, then type make test

Usage

Take a look at test/erlsdb_test.erl to learn usage, here is a sample code

Starting Server

erlsdb:start(type,
    	[#sdb_state{
		access_key = "YourAccessKey",
		secret_key = "YourSecretKey",
		domain = "YourDomain"
		}
	])

Creating Domain

    erlsdb:create_domain()

Note that the server will use the domain that was passed during initialization.

Listing all Domains

    {ok, List, _} = erlsdb:list_domains()

Deleting Domain

    erlsdb:delete_domain()

Adding an item

    Attributes = lists:sort([
	["StreetAddress", "705 5th Ave"],
        ["City", "Seattle"],
        ["State", "WA"],
        ["Zip", "98101"]
	]),
    erlsdb:put_attributes("TccAddr", Attributes)

Retrieving an item

    {ok, UnsortedAttrs} = erlsdb:get_attributes("TccAddr")

Deleting an item

    erlsdb:delete_attributes("TccAddr"),

Powered by WordPress