Shahzad Bhatti Welcome to my ramblings and rants!

February 3, 2010

A few recipes for reprocessing messages in Dead-Letter-Queue using ActiveMQ

Filed under: Computing — admin @ 2:42 pm

Messaging based asynchronous processing is a key component of any complexed software especially in transactional environment. There are a number of solutions that provide high performance and reliable messaging in Java space such as ActiveMQ, FUSE broker, JBossMQ, SonicMQ, Weblogic, Websphere, Fiorano, etc. These providers support JMS specification, which provides abstraction for queues, message providers and message consumers. In this blog, I will go over some recipes for recovering messages from dead letter queue when using ActiveMQ.

What is Dead Letter Queue

Generally, when a consumer fails to process a message within a transaction or does not send acknowledgement back to the broker, the message is put back to the queue. The message is then delivered upto certain number of times based on configuration and finally the message is put to dead letter queue when that limit is exceeded. The ActiveMQ documentation recommends following settings for defining dead letter queues:

<broker...>
	<destinationPolicy>
		<policyMap>
			<policyEntries>
				<!-- Set the following policy on all queues using the '>' wildcard -->
				<policyEntry queue=">">
					<deadLetterStrategy>
						<individualDeadLetterStrategy queuePrefix="DLQ." useQueueForQueueMessages="true" />
					</deadLetterStrategy>
				</policyEntry>
			</policyEntries>
		</policyMap>
	</destinationPolicy> ... 
</broker>

and you can control redelivery policy as follows:

RedeliveryPolicy policy = connection.getRedeliveryPolicy();
policy.setInitialRedeliveryDelay(500);
policy.setBackOffMultiplier(2);
policy.setUseExponentialBackOff(true);

policy.setMaximumRedeliveries(2);
It is important that you create dlq per queue, otherwise ActiveMQ puts them into a single dead letter queue.

Handle QueueViewMBean

ActiveMQ provides QueueViewMBean to invoke administration APIs on the queues. The easiest way to get this handle is to use BrokerFacadeSupport class, which is extended by RemoteJMXBrokerFacade and LocalBrokerFacade. You can use RemoteJMXBrokerFacade if you are connecting to remote ActiveMQ server, e.g. here is Spring configuration for setting it up:

<bean id="brokerQuery" class="org.apache.activemq.web.RemoteJMXBrokerFacade" autowire="constructor" destroy-method="shutdown">
	<property name="configuration">
		<bean class="org.apache.activemq.web.config.SystemPropertiesConfiguration"/>
	</property>
	<property name="brokerName">
		<null/>
	</property>
</bean>

Alternatively, you can use LocalBrokerFacade if you are running embedded ActiveMQ server, e.g. below is Spring configuration for it:

<bean id="brokerQuery" class="org.apache.activemq.web.LocalBrokerFacade" autowire="constructor" scope="prototype"/>

Getting number of messages from the queue

Once you got handle to QueueViewMBean, you can use following API to find the number of messages in the queue:

public long getQueueSize(final String dest) {
  try {
    return brokerQuery.getQueue(dest).getQueueSize();
  } catch (Exception e) {
    throw new RuntimeException(e);
  }
}

Copying Messages using JMS APIs

The JMS specification provides APIs to browse queue in read mode and then you can send the messages to another queue, e.g.

import org.apache.activemq.web.BrokerFacadeSupport;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.jms.core.JmsTemplate;
import org.springframework.jms.core.MessageCreator;

import javax.jms.*;
import java.util.Enumeration;

public class DlqReprocessor {
    @Autowired
    private JmsTemplate jmsTemplate;
    @Autowired
    BrokerFacadeSupport brokerQuery;
    @Autowired
    ConnectionFactory connectionFactory;

    @SuppressWarnings("unchecked")
    void redeliverDLQUsingJms(
            final String brokerName,
            final String from,
            final String to) {
        Connection connection = null;
        Session session = null;
        try {
            connection = connectionFactory.createConnection();
            connection.start();
            session = connection.createSession(false, Session.AUTO_ACKNOWLEDGE);
            Queue dlq = session.createQueue(from);
            QueueBrowser browser = session.createBrowser(dlq);
            Enumeration<Message> e = browser.getEnumeration();
            while (e.hasMoreElements()) {
                Message message = e.nextElement();
                final String messageBody = ((TextMessage) message).getText();
                jmsTemplate.send(to, new MessageCreator() {
                    @Override
                    public Message createMessage(final Session session) throws JMSException {
                        return session.createTextMessage(messageBody);
                    }
                })
            }
        } catch (Exception ex) {
            throw new RuntimeException(ex);
        } finally {
            try {
                session.close();
            } catch (Exception ex) {
            }
            try {
                connection.close();
            } catch (Exception ex) {
            }
        }
    }
}
The downside of above approach is that it leaves the original messages in the dead letter queue.

Copying Messages using Spring’s JmsTemplate APIs

You can effectively do the same thing with JmsTemplate provided by Spring with a bit less code, e.g.

    void redeliverDLQUsingJmsTemplateBrowse(
            final String from,
            final String to) {
        try {
            jmsTemplate.browse(from, new BrowserCallback() {
                @SuppressWarnings("unchecked")
                @Override
                public Object doInJms(Session session, QueueBrowser browser) throws JMSException {
                    Enumeration<Message> e = browser.getEnumeration();
                    while (e.hasMoreElements()) {
                        Message message = e.nextElement();
                        final String messageBody = ((TextMessage) message).getText();
                        jmsTemplate.send(to, new MessageCreator() {
                            @Override
                            public Message createMessage(final Session session) throws JMSException {
                                return session.createTextMessage(messageBody);
                            }
                        });
                    }
                    return null;
                }
            });
        } catch (Exception e) {
            throw new RuntimeException(e);
        }
    }

Moving Messages using receive/send APIs

As I mentioned, the above approaches leave messages in the DLQ, which may not be what you want. Thus, another simple approach would be to consume messages from the dead letter queue and send it to another,e.g.

    public void redeliverDLQUsingJmsTemplateReceive(
            final String from,
            final String to) {
        try {
            jmsTemplate.setReceiveTimeout(100);
            Message message = null;
            while ((message = jmsTemplate.receive(from)) != null) {
                final String messageBody = ((TextMessage) message).getText();
                jmsTemplate.send(to, new MessageCreator() {
                    @Override
                    public Message createMessage(final Session session)
                            throws JMSException {
                        return session.createTextMessage(messageBody);
                    }
                });
            }
        } catch (Exception e) {
            throw new RuntimeException(e);
        }
    }
  

Moving Messages using ActiveMQ’s API

Finally, the best approach I found waas to use ActiveMQ’s APIs to move messags, e.g.

    public void redeliverDLQUsingJMX(
            final String brokerName, final String from,
            final String to) {
        try {
            final QueueViewMBean queue = brokerQuery.getQueue(from);
            for (int i = 0; i < 10 && queue.getQueueSize() > 0; i++) {
                CompositeData[] compdatalist = queue.browse();
                for (CompositeData cdata : compdatalist) {
                    String messageID = (String) cdata.get("JMSMessageID");
                    queue.moveMessageTo(messageID, to);
                }
            }
        } catch (Exception e) {
            throw new RuntimeException(e);
        }
    }

I have been using this approach and have found to be reliable for reprocessing dead letter queue, though these techniques an also be used for general queues. I am sure there are tons of alternatives including using full-fledged enterprise service bus route. Let me know if you have interesting solutions to this problem.

January 20, 2010

PlexRBAC: an open source project for providing powerful role based security (II)

Filed under: Computing — admin @ 1:50 pm

This is continuation of my previous blog on my open source project PlexRBAC for managing role based access control. Last time, I covered REST APIs and in this blog I will cover internal domain model, RBAC APIs in Java and examples of instance or dynamic based security.

Layers

PlexRBAC consists of following layers

Business Domain Layer

This layer defines core classes that are part of the RBAC based security domain such as:

  • Domain – As described previously, the domain allows you to support multiple applications or realms.
  • Subject – The subject represents users who are defined in an application.
  • Role – A role represents job title or function.
  • Permission – A permission is composed of operation, target and an expression that is used for dynamic or instance based security.
  • SecurityError – Upon a permission failure, you can choose to store them in the database using SecurityError.

Repository Layer

This layer is responsible for accessing or storing above objects in the database. PlexRBAC uses Berkley DB for persistence and each domain is stored as a separate database, which allows you to segregate permissions and roles for distinct domains. Following are list of repositories supported by PlexRBAC:

  • DomainRepository – provides database access for Domains.
  • PermissionRepository – provides database access for Permissions.
  • SubjectRepository – provides database access for Subjects.
  • SecurityErrorRepository – provides database access for SecurityErrors.
  • RoleRepository – provides database access for Roles.
  • SecurityMappingRepository – provides APIs to map permissions with roles and to map subject with roles.
  • RepositoryFactory – provides factory methods to create above repositories.

Security Layer

This class defines PermissionManager for authorizing permissions.

Evaluation Layer

This layer proivdes evaluation engine for instance based security.

Service Layer

This layer defines REST services such as:

  • DomainService – this service provides REST APIs for accessing Domains.
  • PermissionService – this service provides REST APIs for accessing Permissions.
  • SubjectService – this service provides REST APIs for accessing Subjects.
  • RoleService – this service provides REST APIs for accessing Roles.
  • AuthenticationService – this service provides REST APIs for authenticating users.
  • AuthorizationService – this service provides REST APIs for authorizing permissions.
  • RolePermissionService – this service provides REST APIs for mapping permissions with roles.
  • SubjectRolesService – this service provides REST APIs for mapping subjects with roles.

JMX Layer

This layer defines JMX helper classes for managing services and configuration remotely.

Caching Layer

This layer provides caching security permissions to improve performance.

Metrics Layer

This layer provides performance measurement classes such as Timing class to measure method invocation benchmarks.

Utility Layer

This layer provides helper classes.

Web Layer

This layer provides filters for enforcing authentication and authorization when accessing REST APIs.

Example

Let’s use the same example that we described last time but with addition of instance based security. Let’s assume there are five roles: Teller, Customer-Service-Representative (CSR), Account, AccountingManager and LoanOfficer, where

  • A teller can modify customer deposit accounts — but only if customer and teller live in same region
  • A customer service representative can create or delete customer deposit accounts — but only if customer and teller live in same region
  • An accountant can create general ledger reports — but only if year is == current year
  • An accounting manager can modify ledger-posting rules — but only if year is == current year
  • A loan officer can create and modify loan accounts – but only if account balance is < 10000

In addition, following classes will be used to add domain specific security:

  1 
  2 class User {
 
  3 
  4     private String id;
  5     private String region;
  6 
 
  7     User() {
  8     }
  9 
 10     public User(String id, String region) {
 11         this.id = id;
 
 12         this.region = region;
 13     }
 14 
 15     public void setRegion(String region) {
 16         this.region = region;
 
 17     }
 18 
 19     public String getRegion() {
 20         return region;
 21     }
 
 22 
 23     public void setId(String id) {
 24         this.id = id;
 25     }
 26 
 
 27     public String getId() {
 28         return id;
 29     }
 30 }
 31 
 
 32 class Customer extends User {
 33 
 34     public Customer(String id, String region) {
 35         super(id, region);
 
 36     }
 37 }
 38 
 39 class Employee extends User {
 40 
 
 41     public Employee(String id, String region) {
 42         super(id, region);
 43     }
 44 }
 45 
 
 46 class Account {
 47 
 48     private String id;
 49     private double balance;
 
 50 
 51     Account() {
 52     }
 53 
 54     public Account(String id, double balance) {
 
 55         this.id = id;
 56         this.balance = balance;
 57     }
 58 
 59     /**
 
 60      * @return the id
 61      */
 62     public String getId() {
 
 63         return id;
 64     }
 65 
 66     /**
 67      * @param id
 
 68      *            the id to set
 69      */
 70     public void setId(String id) {
 
 71         this.id = id;
 72     }
 73 
 74     public void setBalance(double balance) {
 
 75         this.balance = balance;
 76     }
 77 
 78     public double getBalance() {
 79         return balance;
 
 80     }
 81 }
 82 
 83 
 

Bootstrapping

Let’s create handle to repository-factory as:

 1 
 2     private static final String TEST_DB_DIR = "test_db_dir_perms";
 
 3     RepositoryFactory repositoryFactory = new RepositoryFactoryImpl(TEST_DB_DIR);
 

And instance of permission manager as:

 1 PermissionManager permissionManager = new PermissionManagerImpl(repositoryFactory,
 
 2             new JavascriptEvaluator());
 

Creating a domain

Now, let’s create a domain for banking:

 1     private static final String BANKING = "banking";
 
 2     repositoryFactory.getDomainRepository().save(new Domain(BANKING, ""));
 

Creating Users

Next step is to create users for the domain or application so let’s define accounts for tom, cassy, ali, mike and larry, i.e.,

 1         final SubjectRepository subjectRepo = repositoryFactory
 
 2                 .getSubjectRepository(BANKING);
 3         Subject tom = subjectRepo.save(new Subject("tom", "pass"));
 4         Subject cassy = subjectRepo.save(new Subject("cassy", "pass"));
 
 5         Subject ali = subjectRepo.save(new Subject("ali", "pass"));
 6         Subject mike = subjectRepo.save(new Subject("mike", "pass"));
 
 7         Subject larry = subjectRepo.save(new Subject("larry", "pass"));
 8 
 

Creating Roles

Now, we will create roles for Teller, CSR, Accountant, AccountManager and LoanManager:

  1         final RoleRepository roleRepo = repositoryFactory
 
  2                 .getRoleRepository(BANKING);
  3         Role employee = roleRepo.save(new Role("Employee"));
  4         Role teller = roleRepo.save(new Role("Teller", employee));
 
  5         Role csr = roleRepo.save(new Role("CSR", teller));
  6         Role accountant = roleRepo.save(new Role("Accountant", employee));
 
  7         Role accountantMgr = roleRepo.save(new Role("AccountingManager",
  8                 accountant));
  9         Role loanOfficer = roleRepo
 
 10                 .save(new Role("LoanOfficer", accountantMgr));
 11 
 

Creating Permissions

We can then create new permissions and save them in the database as follows:

  1         final PermissionRepository permRepo = repositoryFactory
 
  2                 .getPermissionRepository(BANKING);
  3         Permission cdDeposit = permRepo.save(new Permission("(create|delete)",
  4                 "DepositAccount",
 
  5                 "employee.getRegion().equals(customer.getRegion())")); // 1
  6         Permission ruDeposit = permRepo.save(new Permission("(read|modify)",
  7                 "DepositAccount",
 
  8                 "employee.getRegion().equals(customer.getRegion())")); // 2
  9         Permission cdLoan = permRepo.save(new Permission("(create|delete)",
 10                 "LoanAccount", "account.getBalance() < 10000")); // 3
 
 11         Permission ruLoan = permRepo.save(new Permission("(read|modify)",
 12                 "LoanAccount", "account.getBalance() < 10000")); // 4
 
 13 
 14         Permission rdLedger = permRepo.save(new Permission("(read|create)",
 15                 "GeneralLedger", "year == new Date().getFullYear()")); // 5
 
 16 
 17         Permission rGlpr = permRepo
 18                 .save(new Permission("read", "GeneralLedgerPostingRules",
 19                         "year == new Date().getFullYear()")); // 6
 
 20 
 21         Permission cmdGlpr = permRepo.save(new Permission(
 22                 "(create|modify|delete)", "GeneralLedgerPostingRules",
 23                 "year == new Date().getFullYear()")); // 7
 
 24 
 

Mapping Subjects/Permissions to Roles

Now we will map subjects to roles as follows:

 1         final SecurityMappingRepository smr = repositoryFactory
 
 2                 .getSecurityMappingRepository(BANKING);
 3 
 4         // Mapping Users to Roles
 5         smr.addRolesToSubject(tom, teller);
 6         smr.addRolesToSubject(cassy, csr);
 7         smr.addRolesToSubject(ali, accountant);
 
 8         smr.addRolesToSubject(mike, accountantMgr);
 9         smr.addRolesToSubject(larry, loanOfficer);
 0 
 

Then we will map permissions to roles as follows:

 1         smr.addPermissionsToRole(teller, ruDeposit);
 2         smr.addPermissionsToRole(csr, cdDeposit);
 
 3         smr.addPermissionsToRole(accountant, rdLedger);
 4         smr.addPermissionsToRole(accountant, ruLoan);
 5         smr.addPermissionsToRole(accountantMgr, cdLoan);
 6         smr.addPermissionsToRole(accountantMgr, rGlpr);
 7         smr.addPermissionsToRole(loanOfficer, cmdGlpr);
 8 
 
 

Authorization

Now the fun part of authorization, let’s check if user “tom” can view deposit-accounts, e.g.

  1    public static Map<String, Object> toMap(final Object... keyValues) {
 
  2         Map<String, Object> map = new HashMap<String, Object>();
  3         for (int i = 0; i < keyValues.length - 1; i += 2) {
 
  4             map.put(keyValues[i].toString(), keyValues[i + 1]);
  5         }
  6         return map;
  7     }
 
  8     @Test
  9     public void testReadDepositByTeller() {
 10         initDatabase();
 11         permissionManager.check(new PermissionRequest(BANKING, "tom", "read",
 
 12                 "DepositAccount", toMap("employee", new Employee("tom",
 13                         "west"), "customer", new Customer("zak", "west"))));
 
 14     }
 15 
 16 
 

Note that above test method builds a PermissionRequest that encapsulates domain, subject, operation, target and context and then calls check method of SecurityManager, which throws SecurityException if permission fails.

Then we check if tom, the teller can delete deposit-account, e.g.

 1     @Test(expected = SecurityException.class)
 
 2     public void testDeleteByTeller() {
 3         initDatabase();
 4         permissionManager.check(new PermissionRequest(BANKING, "tom", "delete",
 
 5                 "DepositAccount", toMap("employee", new Employee("tom",
 6                         "west"), "customer", new Customer("zak", "west"))));
 
 7     }
 8 
 

Which would throw security exception.

Now let’s check if cassy, the CSR can delete deposit-account, e.g.

 1     @Test
 2     public void testDeleteByCsr() {
 
 3         initDatabase();
 4         permissionManager.check(new PermissionRequest(BANKING, "cassy",
 5                 "delete", "DepositAccount", toMap("employee",
 
 6                         new Employee("cassy", "west"), "customer",
 7                         new Customer("zak", "west"))));
 
 0 
 

Which works as CSR have permissions for deleting deposit-account. Now, let’s check if ali, the accountant can view general-ledger, e.g.

 1    @Test
 2     public void testReadLedgerByAccountant() {
 
 3         initDatabase();
 4         permissionManager.check(new PermissionRequest(BANKING, "ali", "read",
 5                 "GeneralLedger", toMap("year", 2010, "account",
 
 6                         new Account("zak", 500))));
 7     }
 8 
 9 
 

Which works as expected. Next we check if ali can delete general-ledger:

 1     @Test(expected = SecurityException.class)
 
 2     public void testDeleteLedgerByAccountant() {
 3         initDatabase();
 4         permissionManager.check(new PermissionRequest(BANKING, "ali", "delete",
 
 5                 "GeneralLedger", toMap("year", 2010, "account",
 6                         new Account("zak", 500))));
 7     }
 
 8 
 9 
 

Which would fail as only account-manager can delete. Next we check if mike, the account-manager can create general-ledger, e.g.

 1     @Test
 2     public void testCreateLedgerByAccountantManager() {
 
 3         initDatabase();
 4         permissionManager.check(new PermissionRequest(BANKING, "mike",
 5                 "create", "GeneralLedger", toMap("year", 2010,
 
 6                         "account", new Account("zak", 500))));
 7     }
 8 
 

Which works as expected. Now we check if mike can create posting-rules of general-ledger, e.g.

 1     @Test(expected = SecurityException.class)
 
 2     public void testPostLedgingRulesByAccountantManager() {
 3         initDatabase();
 4         permissionManager.check(new PermissionRequest(BANKING, "mike",
 
 5                 "create", "GeneralLedgerPostingRules", toMap("year",
 6                         2010, "account", new Account("zak", 500))));
 
 7     }
 8 
 

Which fails authorization. Then we check if larry, the loan officer can create posting-rules of general-ledger, e.g.

 1     @Test
 2     public void testPostLedgingRulesByLoanManager() {
 
 3         initDatabase();
 4         permissionManager.check(new PermissionRequest(BANKING, "larry",
 5                 "create", "GeneralLedgerPostingRules", toMap("year",
 
 6                         2010, "account", new Account("zak", 500))));
 7     }
 8 
 

Which works as expected. Now, let’s check the same permission but with different year, e.g.

 1     @Test(expected = SecurityException.class)
 
 2     public void testPostLedgingRulesByLoanManagerWithExceededAmount() {
 3         initDatabase();
 4         permissionManager.check(new PermissionRequest(BANKING, "larry",
 
 5                 "create", "GeneralLedgerPostingRules", IDUtils.toMap("year",
 6                         2011)));
 7     }
 8 
 

Which fails as year doesn’t match.

Summary

Above examples demonstrate how PlexRBAC API can be used along with instance or dynamic based security. In next post, I will describe caching and how PlexRBAC can be integrated with J2EE and Spring security.

January 10, 2010

PlexRBAC: an open source project for providing powerful role based security (I)

Filed under: Computing — admin @ 7:45 pm

Overview

In my last blog I described core pieces of a security system and mentioned a new open source project PlexRBAC I recently started to provide Role Based Security both as a REST service and Java library. In this post, I will go over the some of the features that are now available. This project is based on my experience with a number of home built solutions for RBAC and standard J2EE solutions. However, a key differentiator is that it adds instance based security or context based security that adds dynamic access control. The role based security consists of following components:

Domain

Though, domain is strictly not part of role based security but RBAC provides segregation of security policies by domains, where a domain can represent a security realm or an application.

Subject

The subject represents users who are defined in an application.

Role

A role represents job title or function. A subject or user belongs to one or more roles. One of key feature of PlexRBAC is that roles support inheritance where a role can have one or more roles. This helps define security policies that follow “don’t repeat yourself” or DRY.

Permission

A permission consists of two sub parts: operation and target, where operation is a “verb” that describes action and target represents “object” that is acted upon. All permissions are assigned to roles. In PlexRBAC, permissions also contains an expression which is evaluated to check dynamic security. PlexRBAC allows Javascript based expressions and provides access to runtime request parameters. Finally, PlexRBAC offers regular expressions for both operations and target, so you can define operations like “(read|write|create|delete)” or “read*”, etc.

Following diagram shows the relationship between these components:

Getting Started

PlexRBAC depends on Java 1.6+ and Maven 2.0+. You can download the project using git:

 git clone git@github.com:bhatti/PlexRBAC.git
 

Then you can start the REST based web service within Jetty by typing:

 mvn jetty:run-war
 

The service will listen on port 8080 and you can test it with curl.

Authentication

Though, PlexRBAC is not designed for authentication but it provides Basic authentication and all administration APIs are protected with the authentication. By default, it uses an account “super_admin” with password “changeme”, which you can modify with configurations. Also, as PlexRBAC supports domains to segregates security policies, subjects are also restricted to the domains where they are defined.

REST APIs

Following are APIs defined in PlexRBAC:

Domains

  • GET /api/security/domains – returns list of all domains in JSON format.
  • GET /api/security/domains/{domain-id} – returns details of given domain in JSON format.
  • PUT /api/security/domains/{domain-id} with body of domain details in JSON format.
  • DELETE /api/security/domains – deletes all domains.
  • DELETE /api/security/domains/{domain-id} – deletes domain identified by domain-id.

Subjects

  • GET /api/security/subjects/{domain-id} – returns list of all subjects in domain identified by domain-id in JSON format.
  • GET /api/security/subjects/{domain-id}/{id} – returns details of given subject identified by id in given domain.
  • PUT /api/security/subjects/{domain-id}/{id} with body of subject details in JSON format.
  • DELETE /api/security/subjects/{domain-id} – deletes all subjects in given domain.
  • DELETE /api/security/subjects/{domain-id}/{id} – deletes subject identified by id.

Roles

  • GET /api/security/roles/{domain-id} – returns list of all roles in domain identified by domain-id in JSON format.
  • GET /api/security/roles/{domain-id}/{id} – returns details of given role identified by id in given domain.
  • PUT /api/security/roles/{domain-id}/{id} with body of role details in JSON format.
  • DELETE /api/security/roles/{domain-id} – deletes all roles in given domain.
  • DELETE /api/security/roles/{domain-id}/{id} – deletes role identified by id.

Permissions

  • GET /api/security/permissions/{domain-id} – returns list of all permissions in domain identified by domain-id in JSON format.
  • GET /api/security/permissions/{domain-id}/{id} – returns details of given permission identified by id in given domain.
  • POST /api/security/permissions/{domain-id} with body of permission details in JSON format. Note that this API uses POST instead of PUT as the id will be assigned by the server.
  • DELETE /api/security/permissions/{domain-id} – deletes all permissions in given domain.
  • DELETE /api/security/permissions/{domain-id}/{id} – deletes permission identified by id.

Mapping of Roles and Permissions

  • PUT /api/security/role_perms/{domain-id}/{role-id} – adds permissions identified by permissionIds that stores list of permission-ids in JSON format. Note that permissionIds is passed as a form parameter.
  • DELETE /api/security/role_perms/{domain-id}/{role-id} – removes permissions identified by permissionIds that stores list of permission-ids in JSON format. Note that permissionIds is passed as a form parameter.

Mapping of Subjects and Roles

  • PUT /api/security/subject_roles/{domain-id}/{subject-id} – adds roles identified by rolenames that stores list of role-ids in JSON format. Note that rolenames is passed as a form parameter.
  • DELETE /api/security/subject_roles/{domain-id}/{subject-id} – removes roles identified by rolenames that stores list of role-ids in JSON format. Note that rolenames is passed as a form parameter.

Authorization

  • GET /api/security/authorize/{domain-id} – with query parameter of operation and target.

Example

Let’s start with a banking example where a bank-object can be account, general-ledger-report or ledger-posting-rules and account is further grouped into customer account or loan account, e.g.

Let’s assume there are five roles: Teller, Customer-Service-Representative (CSR), Account, AccountingManager and LoanOfficer, where

  • A teller can modify customer deposit accounts.
  • A customer service representative can create or delete customer deposit accounts.
  • An accountant can create general ledger reports.
  • An accounting manager can modify ledger-posting rules.
  • A loan officer can create and modify loan accounts.

Creating a domain

The first thing is to create a security domain for your application. As we are dealing with banking domain, let’s call our domain “banking”.

 curl -H "Content-Type: application/json" --user "default:super_admin:changeme" -X PUT "http://localhost:8080/api/security/domains/banking" -d '{"id":"banking"}'
 

It will return response:

 {"id":"banking","ownerSubjectNames":"super_admin"}
 

The first thing to note that we are passing user and password using Basic authentication as all accesses to administration APIs require login. Now, you can find out available domains via

 curl -v --user "super_admin:changeme" "http://localhost:8080/api/security/domains"
 

which would return something like:

 [{"id":"banking","ownerSubjectNames":"super_admin"},{"description":"default","id":"default","ownerSubjectNames":"super_admin"}]
 

Creating Users

Next step is to create users for the domain or application so let’s define accounts for tom, cassy, ali, mike and larry, i.e.,

 curl -H "Content-Type: application/json" --user "default:super_admin:changeme" -X PUT "http://localhost:8080/api/security/subjects/banking" -d '{"id":"tom","credentials":"pass"}'
 curl -H "Content-Type: application/json" --user "default:super_admin:changeme" -X PUT "http://localhost:8080/api/security/subjects/banking" -d '{"id":"cassy","credentials":"pass"}'
 curl -H "Content-Type: application/json" --user "default:super_admin:changeme" -X PUT "http://localhost:8080/api/security/subjects/banking" -d '{"id":"ali","credentials":"pass"}'
 curl -H "Content-Type: application/json" --user "default:super_admin:changeme" -X PUT "http://localhost:8080/api/security/subjects/banking" -d '{"id":"mike","credentials":"pass"}'
 curl -H "Content-Type: application/json" --user "default:super_admin:changeme" -X PUT "http://localhost:8080/api/security/subjects/banking" -d '{"id":"larry","credentials":"pass"}'
 

Note that each user is identified by an id or username and credentials and in above examples usernames or subject-ids are prefixed with domain-ids, e.g. “ddefault:super_admin”.

Creating Roles

As I mentioned, a role represents job title or responsibilities and each role can have one or more parents. By default, PlexRBAC defines an “anonymous” role, which is used for users who are not logged in and all user-defined roles extend “anonymous” role.

First, we create a role for bank employee called “Employee”:

 curl -H "Content-Type: application/json" --user "default:super_admin:changeme" -X PUT "http://localhost:8080/api/security/roles/banking" -d '{"id":"Employee"}'
 

which returns

 {"id":"Employee","parentIds":["anonymous"]}
 

As you can see the “Employee” role is created with parent of “anonymous”. Next, we create “Teller” role:

 curl -H "Content-Type: application/json" --user "default:super_admin:changeme" -X PUT "http://localhost:8080/api/security/roles/banking" -d '{"id":"Teller","parentIds":["Employee"]}'
 

which returns:

 {"id":"Teller","parentIds":["Employee"]}
 

Then we create a role for customer-service-representative called “CSR” that is extended by Teller e.g.

 curl -H "Content-Type: application/json" --user "default:super_admin:changeme" -X PUT "http://localhost:8080/api/security/roles/banking" -d '{"id":"CSR","parentIds":["Teller"]}' 
 

which returns:

 {"id":"CSR","parentIds":["Teller"]}
 

Then we create a role for “Accountant”:

 curl -H "Content-Type: application/json" --user "default:super_admin:changeme" -X PUT "http://localhost:8080/api/security/roles/banking" -d '{"id":"Accountant","parentIds":["Employee"]}' 
 

which returns:

 {"id":"Accountant","parentIds":["Employee"]}
 

Then we create a role for “AccountingManager”, which is extended by “Accountant”, e.g.

 curl -H "Content-Type: application/json" --user "default:super_admin:changeme" -X PUT "http://localhost:8080/api/security/roles/banking" -d '{"id":"AccountingManager","parentIds":["Accountant"]}' 
 

which returns:

 {"id":"AccountingManager","parentIds":["Accountant"]}
 

Finally, we create a role for “LoanOfficer”, e.g.

 curl -H "Content-Type: application/json" --user "default:super_admin:changeme" -X PUT "http://localhost:8080/api/security/roles/banking" -d '{"id":"LoanOfficer","parentIds":["Employee"]}' 
 

which returns:

 {"id":"LoanOfficer","parentIds":["Employee"]}
 

Creating Permissions

As described above, a permission is composed of operation, target and expression, where an operation and target can be any regular expression and expression can be any Javascript expression. However following permissions don’t define any expressions for simplicity. First, we create a permission to create or delete deposit-account, e.g.

 curl -H "Content-Type: application/json" --user "default:super_admin:changeme" -X POST "http://localhost:8080/api/security/permissions/banking" -d '{"operation":"(create|delete)","target":"DepositAccount","expression":""}' 
 

which returns:

 {"expression":"","id":"1","operation":"(create|delete)","target":"DepositAccount"}
 

Each permission is automatically assigned a unique numeric id. Next, we create a permission to read or modify deposit-account, e.g.

 curl -H "Content-Type: application/json" --user "default:super_admin:changeme" -X POST "http://localhost:8080/api/security/permissions/banking" -d '{"operation":"(read|modify)","target":"DepositAccount","expression":""}' 
 

which returns:

 {"expression":"","id":"2","operation":"(read|modify)","target":"DepositAccount"}
 

Then, we create a permission to create or delete loan-account

 curl -H "Content-Type: application/json" --user "default:super_admin:changeme" -X POST "http://localhost:8080/api/security/permissions/banking" -d '{"operation":"(create|delete)","target":"LoanAccount","expression":""}' 
 

which returns:

 {"expression":"","id":"3","operation":"(create|delete)","target":"LoanAccount"}
 

Then we create a permission to read or modify loan-account, e.g.

 curl -H "Content-Type: application/json" --user "default:super_admin:changeme" -X POST "http://localhost:8080/api/security/permissions/banking" -d '{"operation":"(read|modify)","target":"LoanAccount","expression":""}' 
 

which returns:

 {"expression":"","id":"4","operation":"(read|modify)","target":"LoanAccount"}
 

Then we create a role to view and create general-ledger, e.g.

 curl -H "Content-Type: application/json" --user "default:super_admin:changeme" -X POST "http://localhost:8080/api/security/permissions/banking" -d '{"operation":"(read|create)","target":"GeneralLedger","expression":""}' 
 

which returns:

 {"expression":"","id":"5","operation":"(read|create)","target":"GeneralLedger"}
 

Finally, we create a permission for modifying posting rules of general-ledger, e.g.

 curl -H "Content-Type: application/json" --user "default:super_admin:changeme" -X POST "http://localhost:8080/api/security/permissions/banking" -d '{"operation":"(read|create|modify|delete)","target":"GeneralLedgerPostingRules","expression":""}' 
 

which returns:

 {"expression":"","id":"6","operation":"(read|create|modify|delete)","target":"GeneralLedgerPostingRules"}
 

Mapping Permissions to Roles

Next task is to map permissions to roles. First we assign permission to view or modify customer deposit accounts to Teller role:

 curl -H "Content-Type: application/json" --user "default:super_admin:changeme" -X PUT "http://localhost:8080/api/security/role_perms/banking/Teller" -d 'permissionIds=["2"]'
 

which returns all permission-ids for given role, e.g.

 ["2"]
 

Then we assign permission to view, create, modify or delete customer deposit accounts to CSR (as CSR extends Teller it will automatically will get all permissions of Teller):

 curl -H "Content-Type: application/json" --user "default:super_admin:changeme" -X PUT "http://localhost:8080/api/security/role_perms/banking/CSR" -d 'permissionIds=["1"]'
 

Then we assign permissions to create general ledger to Accountant:

 curl -H "Content-Type: application/json" --user "default:super_admin:changeme" -X PUT "http://localhost:8080/api/security/role_perms/banking/Accountant" -d 'permissionIds=["5"]'
 

Then we assign permission to modify ledger-posting rules to AccountingManager:

 curl -H "Content-Type: application/json" --user "default:super_admin:changeme" -X PUT "http://localhost:8080/api/security/role_perms/banking/AccountingManager" -d 'permissionIds=["6"]' 
 

Mapping Users to Roles

A role is associated with one or more permissions and each user is assigned one or more role. First, we assign subject “tom” to Teller role:

 curl -H "Content-Type: application/json" --user "default:super_admin:changeme" -X PUT "http://localhost:8080/api/security/subject_roles/banking/tom" -d 'rolenames=["Teller"]'
 

which returns list of all roles for given subject or user, e.g.

 ["Teller"]
 

Then we assign subject “cassy” to CSR role:

 curl -H "Content-Type: application/json" --user "default:super_admin:changeme" -X PUT "http://localhost:8080/api/security/subject_roles/banking/cassy" -d 'rolenames=["CSR"]'
 

Next we assign subject “ali” to role of Accountant:

 curl -H "Content-Type: application/json" --user "default:super_admin:changeme" -X PUT "http://localhost:8080/api/security/subject_roles/banking/ali" -d 'rolenames=["Accountant"]'
 

Then we assign role AccountingManager to “mike”:

 curl -H "Content-Type: application/json" --user "default:super_admin:changeme" -X PUT "http://localhost:8080/api/security/subject_roles/banking/mike" -d 'rolenames=["AccountingManager"]'
 

Finally we assign subject “larry” to LoanOfficer role:

 curl -H "Content-Type: application/json" --user "default:super_admin:changeme" -X PUT "http://localhost:8080/api/security/subject_roles/banking/larry" -d 'rolenames=["LoanOfficer"]'
 

Authorization

Now we are ready to validate authorization based on above security policies. For example, let’s check if user “tom” can view deposit-accounts, e.g.

 curl -v --user "banking:tom:pass" "http://localhost:8080/api/authorize/banking?operation=read&target=DepositAccount"
 

On successful authorization, the API returns 200 http responose-code and on failure it returns 401 http response-code, e.g.

 < HTTP/1.1 200 OK
 

Then we check if tom, the teller can delete deposit-account, e.g.

 curl -v --user "banking:tom:pass" "http://localhost:8080/api/authorize/banking?operation=delete&target=DepositAccount"
 

which returns http-response-code 401, e.g.

 < HTTP/1.1 401 Unauthorized
 

Then we create if cassy, the CSR can delete deposit-account, e.g.

 curl -v --user "banking:cassy:pass" "http://localhost:8080/api/authorize/banking?operation=delete&target=DepositAccount"
 

which returns:

 < HTTP/1.1 200 OK
 

Then we check if ali, the accountant can view general-ledger, e.g.

 curl -v --user "banking:ali:pass" "http://localhost:8080/api/authorize/banking?operation=read&target=GeneralLedger"
 

which returns:

 < HTTP/1.1 200 OK
 

Next we check if mike, the accounting-manager can create general-ledger, e.g.

 curl -v --user "banking:mike:pass" "http://localhost:8080/api/authorize/banking?operation=create&target=GeneralLedger"
 

which returns:

 < HTTP/1.1 200 OK
 

Then we check if larry, the loan officer can create posting-rules of general-ledger, e.g.

 curl -v --user "banking:mike:pass" "http://localhost:8080/api/authorize/banking?operation=create&target=GeneralLedgerPostingRules"
 

which returns:

 < HTTP/1.1 200 OK
 

Next, ali tries to create posting rules via

 curl -v --user "banking:ali:pass" "http://localhost:8080/api/authorize/banking?operation=create&target=GeneralLedgerPostingRules"
 

which is denied:

 < HTTP/1.1 401 Unauthorized
 

Summary

Above examples demonstrate how PlexRBAC can be used to define and enforce flexible security policies. In next post, I will describe instance based security, regular expressions and Java APIs for PlexRBAC.

December 27, 2009

Building Security Systems

Filed under: Computing — admin @ 11:20 pm

Being software developer for over eighteen years, I have observed a number of recurring problems and one of those recurring problems is security system. Most systems you build will require some kind of security so in this post I will go over core concepts when adding security to your system.

User Registration

A pre-requisite for any security system is to allow users to register to the system and store those users in some database, LDAP, Active Directory, or storage system. Though, for an internal application this step may be unnecessary.

Authentication

The authentication allows systems to validate users based on password or other form of verification. For internal applications within a company, users may have to use multiple applications with their own authentication and each external website would also require unique authentication. This quickly becomes burdensome for both users and applications as users have to remember the passwords and systems have to maintain them. Thus, many companies employ some form of Single-Sign-On and I have used many solutions such as SiteMinder, IChain, Kerberos, Open SSO, Central Authentication Service (CAS), or other home built solutions. These Single-Sign-On systems use reverse proxy servers that sit in front of the application and intercepts all requests and automatically redirects users to login page if the users are not authenticated. When an internal system consists of multiple tiers such as services, it is often required to pass authentication tokens to those services. In J2EE systems, you can Common Secure Interoperability (CSIv2) protocol to pass the authentication to other tiers, which uses Security Attribute Service (SAS) protocol to perform client authentication and impersonation.

For external systems, Open ID is a way to go and I have used RPX to integrate Open ID for a number of sites I have developed such as http://wazil.com/, http://dealredhot.com/, etc.

There are a number of factors that make authentication a bit tricky such as when part of your system does not require authentication, you have to ensure the authentication policy is being used correctly. Also, in general authentication requires https instead of http, so you have to ensure that the site use those protocols consistently. In generaly, static contents such as css, javascript and images do not require authentication but often they are also put behind authentication by mistake.

Another factor related to authentication is session management. A session determines how long the user can access the system without login. Though, many systems provide remember-me feature, but often sessions require system resources on the server. It’s essential to keep the session short as it can effect scalability if it’s stored on the server. I generally prefer keeping the session very short and storing only user-id and a couple of other database-ids such as shopping-cart-id, request-id, etc. If they are short, they can also be stored in cookies that makes a stateless system so you can scale easily.

Authorization

Not all users are same in most systems, thus authorization allows you to provide access control to limit the usage based on permissions and access control. There are a number of ways to define authorization such as Access control list, Role-based access control, Capability-based security, etc. In most systems, I have used J2EE/EJB Security, Java Web Security, JAAS, Acegi, which is now part of Spring and home built systems. As security is a cross cutting concern, I prefer to define those declaratively in a common security file or with annotations. There is nothing worse than sporadic security code mixed with your business logic.

One of feature I have found lacked in most of open source and commercial tools is support for instance based security or dynamic security that verifies runtime properties. For example, in most RBAC systems you can define rule that a purchase order can be approved by a role “POApprover”, but it does not allow you to say that “POApprover” can only approve if the user is from the same department or if amount is less than $10,000, etc.

UI or Resource Protection

When users have various level of access, it is essential to hide the UI elements and resources that are not accessible. Though, I have seen some systems employ security by obscurity that only hide the resources without actually enforcing the permissions, but it’s a bad idea. This can be complicated when the access level is very fine grained such as when a single form has fields based on role and permissions.

Database Security

The security must be enforced in depth, ranging from the UI, business and database tier. The database operations must use security to prevent access to unauthorized data. For example, let’s assume a user can post and edit blogs, it is essential that the database only allows the user to modify his/her blog. Also, it is critical that any kind of sensitive data such as passwords or personal identification with encryption. This is another reason I like OpenId or SSO solution because you don’t need to maintain them.

Method/Message Security

The message security ensures that a user only invokes the operations that he/she is authorized. For example, Acegi provides an annotation based mechanism to protect unauthorized methods.

Data Integrity

Any communication based systems may need to use message authentication check (MAC) to detect changes to the data.

Confidentiality

Any communication based systems may need to encrypt sensitive data with HTTPS.

Non-repudiation

The system must audit users action so that they cannot repudiate them.

Summary

As achieving high level of security can be difficult and expensive so you need to treat security as a risk and employ the level of security that suits the underlying system. Finally, as I have found most RBAC systems lack, I have started my own open source project PlexRBAC to provide instance based security. Of course if you hare interested in assisting with the effort, you are welcome to join the project.

December 13, 2009

Dynamic Inheritance and Composition

Filed under: Languages — admin @ 3:55 pm

Static Inheritance

Inheritance is a core feature of object oriented languages that has been used to simulate real world by modeling closely related objects and to build reusable code. The inheritance relationship is defined statically in class specifications and it comes in various flavors such as:

Single Inheritance

It allows a class to be extended by just one other class.

Multiple Inheritance

It allows a class to be derived from multiple classes and historically has been difficult to maintain and has been source of diamond inheritance in C++, though other languages use order such as Method Resolution Order (MRO) in Python to avoid those issues.

Interfaces

The interfaces are used in C# and Java to define methods without implementation and a class can implement multiple interfaces without the downsides of multiple inheritance.

Mixins

The mixins are available in Ruby and D, that use mixins for code reuse. The mixins are similar to interfaces with implementations except they aggregate methods and attributes at runtime.

Traits

The traits are available in Squeak and Scala and are conceptually similar to Mixins except traits do not allow attributes.

Dynamic Inheritance

As opposed to static inheritance, dynamic inheritance can be added at runtime using Object Extension Pattern, which I first learned in Erich Gamma, et al’s Gof patterns. In late 90s, I used Voyager ORB for building distributed systems, which used this pattern. Following example shows how this pattern can be used:

Let’s define a marker interface Extension in Java such as:

 1 package ext;
 2 
 3 public interface Extension {
 4 
 5 }
 6 
 7 
 

Then create a factory class such as

 1 package ext;
 2 
 3 public class ExtensionsFactory {
 4     public void register(final Class subject, final Extension ... exts) {/* ... */}
 5     public <T> T get(final Object subject, final Class<T> extClass) { /* ... */ return …;}
 6 }
 7 
 8 
 

The subject is object that needs to extend extensions, e.g. let’s assume you have a User class and you need to add hobbies, you can do it as follows:

 1 package domain;
 2 
 3 public class User {
 4     //...
 5 }
 6 
 7 
 

And you then define Hobbies as follows:

 1 package domain;
 2 
 3 public class Hobbies implements ext.Extension {
 4     public Hobbies(User user) {
 5         // ...
 6     }
 7 }
 8 
 9 
 

At runtime, you can register Hobbies to User and use it as follows

  1 package test;
  2 
  3 public class Main {
  4     public static void main(String[] args) {
  5         ExtensionsFactory f = new ExtensionsFactory();
  6 
  7         f.register(User.class, Hobbies.class);
  8 
  9         //
 10         User user = new User();
 11         Hobbies hobbies = f.get(user, Hobbies.class);
 12     }
 13 
 14 }
 15 
 16 
 

The dynamic inheritance allows you to follow open-closed principle by extending classes without modifying existing classes and allows you to choose features that you need at runtime. Of course, dynamic languages such as Ruby make this a lot easier as you can extend classes or objects with modules at runtime, e.g.

  1 ### defining Hobbies extension
  2 module Hobbies
  3   def hobbies
  4   end
  5 end
  6 
  7 ### defining User class
  8 class User
  9 end
 10 
 11 user = User.new.extend(Hobbies)
 12 
 13 puts user.singleton_methods   #["hobbies"]
 14 
 15 ## or
 16 ### binding Hobbies with User at runtime
 17 class << User
 18   include Hobbies
 19 end
 20 puts User.singleton_methods   # ["hobbies"]
 21 
 22 
 23 
 

In real life, the inheritance relationship can be difficult to get right and often you have to use Liskov Substitution Principle to ensure base class can be replaced by derived class in all uses of the base class. However, dynamic inheritance acts more like Composition feature so above technique can also be used to implement dynamic composition. The dynamic inheritance or composition allows you to mix and match features you need at runtime and build extendable systems. This technique has been success key of evolution of Eclipse IDE. Also, this technique goes nicely with the Adaptive Object Modeling technique I described in my last post to build easily extendable systems.

November 16, 2009

Applying Adaptive Object Model using dynamic languages and schema-less databases

Filed under: Java — admin @ 3:10 pm

Introduction to Adaptive/Active Object Model

Adaptive or Active Object Model is a design pattern used in domains that requires dynamic manipulation of meta information.
Though, it is quite extensive topic of research, but general idea from original paper of
Ralph Johnson is to treat meta information such as attributes,
rules and relationships as a data. It is usually used when the number of sub-classes is huge or unknown upfront and the system requires adding new functionality without downtime.
For example, let’s say we are working in automobile domain and we need to model different type of vehicles. Using an object oriented design would result in vehicle hierarchy such as follows:

In above example, all type hierarchy is predefined and each class within the hierarchy defines attributes and operations. Adaptive Object Modeling on the other hand use Object Type pattern, which treats classes like objects. The basic Adaptive Object Model uses type square model such as:

In above diagram, EntityType class represents all classes and instance of this class defines actual attributes and operations supported by the class. Similarly, PropertyType defines names and types of all attributes. Finally, instance of Entity class will actual be real object instance that would store collection of properties and would refer to the EntityType.

Java Implementation

Let’s assume we only need to model Vehicle class from above vehicle hierarchy. In a typical object oriented language such as Java, the Vehicle class would be defined as follows:

  1 /*
  2  * Simple Vehicle class
  3  * 
  4  */
 
  5 package com.plexobject.aom;
  6 
  7 import java.util.Date;
  8 
 
  9 public class Vehicle {
 10 
 11     private String maker;
 12     private String model;
 
 13     private Date yearCreated;
 14     private double speed;
 15     private long miles;
 
 16     //... other attributes, accessors, setters
 17 
 18     public void drive() {
 19         //
 
 20     }
 21 
 22     public void stop() {
 23         //
 
 24     }
 25 
 26     public void performMaintenance() {
 27         //
 28     }
 
 29     //... other methods
 30 }
 31 
 32 
 33 
 

As you can see all attributes and operations are defined within the Vehicle class. The Adaptive Object Model would use meta classes such as Entity, EntityType, Property and PropertyType to build the Vehicle metaclass. Following Java code defines core classes of type square model:

The Property class defines type and value for each attribute of class:

  1 /*
  2  * Property class defines attribute type and value
  3  * 
  4  */
 
  5 package com.plexobject.aom;
  6 
  7 public class Property {
 
  8 
  9     private PropertyType propertyType;
 10     private Object value;
 11 
 12     public Property(PropertyType propertyType, Object value) {
 
 13         this.propertyType = propertyType;
 14         this.value = value;
 15     }
 16 
 17     public PropertyType getPropertyType() {
 
 18         return propertyType;
 19     }
 20 
 21     public Object getValue() {
 22         return value;
 
 23     }
 24     //... other methods
 25 }
 26 
 27 
 

The PropertyType class defines type information for each attribute of class:

  1 /*
  2  * PropertyType class defines type information
  3  * 
  4  */
 
  5 package com.plexobject.aom;
 
  6 
  7 public class PropertyType {
  8 
  9     private String propertyName;
 
 10     private String type;
 11 
 12     public PropertyType(String propertyName, String type) {
 13         this.propertyName = propertyName;
 14         this.type = type;
 
 15     }
 16 
 17     public String getPropertyName() {
 18         return propertyName;
 19     }
 
 20 
 21     public String getType() {
 22         return type;
 23     }
 24     //... other methods
 
 25 }

The EntityType class defines type of entity:

  1 /*
  2  * EntityType class defines attribute types and operations
  3  * 
  4  */
  5 package com.plexobject.aom;
 
  6 
  7 import java.util.Collection;
  8 import java.util.HashMap;
  9 import java.util.Map;
 
 10 
 11 public class EntityType {
 12 
 13     private String typeName;
 14     private Map<String, PropertyType> propertyTypes = new HashMap<String, PropertyType>();
 
 15     private Map<String, Operation> operations = new HashMap<String, Operation>();
 16 
 17     public EntityType(String typeName) {
 
 18         this.typeName = typeName;
 19     }
 20 
 21     public String getTypeName() {
 22         return typeName;
 
 23     }
 24 
 25     public void addPropertyType(PropertyType propertyType) {
 26         propertyTypes.put(propertyType.getPropertyName(),
 27                 propertyType);
 
 28     }
 29 
 30     public Collection<PropertyType> getPropertyTypes() {
 31         return propertyTypes.values();
 
 32     }
 33 
 34     public PropertyType getPropertyType(String propertyName) {
 35         return propertyTypes.get(propertyName);
 36     }
 
 37 
 38     public void addOperation(String operationName, Operation operation) {
 39         operations.put(operationName, operation);
 40 
 41     }
 
 42 
 43     public Operation getOperation(String name) {
 44         return operations.get(name);
 45     }
 46 
 
 47     public Collection<Operation> getOperations() {
 48         return operations.values();
 49     }
 50     //... other methods
 
 51 }
 52 
 53 
 

The Entity class defines entity itself:

  1 /*
  2  * Entity class represents instance of actual metaclass
  3  * 
  4  */
  5 package com.plexobject.aom;
 
  6 
  7 import java.util.Collection;
  8 import java.util.Collections;
  9 
 
 10 public class Entity {
 11 
 12     private EntityType entityType;
 13     private Collection<Property> properties;
 
 14 
 15     public Entity(EntityType entityType) {
 16         this.entityType = entityType;
 17     }
 18 
 19     public EntityType getEntityType() {
 
 20         return entityType;
 21     }
 22 
 23     public void addProperty(Property property) {
 
 24         properties.add(property);
 25     }
 26 
 27     public Collection<Property> getProperties() {
 28         return Collections.unmodifiableCollection(properties);
 
 29     }
 30 
 31     public Object perform(String operationName, Object[] args) {
 32         return entityType.getOperation(operationName).perform(this, args);
 
 33     }
 34     //... other methods
 35 }

The Operation interface is used for implementing behavior using Command pattern:

  1 /*
  2  * Operation interface defines behavior
  3  * 
  4  */
  5 package com.plexobject.aom;
 
  6 
  7 public interface Operation {
  8 
  9     Object perform(Entity entity, Object[] args);
 
 10 }

Above meta classes would be used to create classes and objects. For example, the type information of Vehicle class would be defined in EntityType and PropertyType and the instance would be defined using Entity and Property classes as follows. Though, in real applications, type binding would be stored in XML configuration or will be defined in some DSL, but I am binding programmatically below:

  1 /*
  2  * an example of binding attributes and operations of Vehicle
  3  * 
  4  */
 
  5 package com.plexobject.aom;
  6 
  7 import java.util.Date;
  8 
 
  9 
 10 public class Initializer {
 11 
 12     public void bind() {
 
 13         EntityType vehicleType = new EntityType("Vehicle");
 14         vehicleType.addPropertyType(new PropertyType("maker",
 15                 "java.lang.String"));
 
 16         vehicleType.addPropertyType(new PropertyType("model",
 17                 "java.lang.String"));
 18         vehicleType.addPropertyType(new PropertyType("yearCreated",
 
 19                 "java.util.Date"));
 20         vehicleType.addPropertyType(new PropertyType("speed",
 21                 "java.lang.Double"));
 22         vehicleType.addPropertyType(new PropertyType("miles",
 
 23                 "java.lang.Long"));
 24         vehicleType.addOperation("drive", new Operation() {
 25 
 26             public Object perform(Entity entity, Object[] args) {
 
 27                 return "driving";
 28             }
 29         });
 30         vehicleType.addOperation("stop", new Operation() {
 
 31 
 32             public Object perform(Entity entity, Object[] args) {
 33                 return "stoping";
 34             }
 35         });
 
 36         vehicleType.addOperation("performMaintenance", new VehicleMaintenanceOperation());
 37 
 38 
 39         // now creating instance of Vehicle
 40         Entity vehicle = new Entity(vehicleType);
 
 41         vehicle.addProperty(new Property(vehicleType.getPropertyType("maker"),
 42                 "Toyota"));
 43         vehicle.addProperty(new Property(vehicleType.getPropertyType("model"),
 
 44                 "Highlander"));
 45         vehicle.addProperty(new Property(vehicleType.getPropertyType("yearCreated"),
 46                 new Date(2003, 0, 1)));
 
 47         vehicle.addProperty(new Property(vehicleType.getPropertyType("speed"), new Double(120)));
 48         vehicle.addProperty(new Property(vehicleType.getPropertyType("miles"), new Long(3000)));
 
 49         vehicle.perform(
 50                 "drive", null);
 51 
 52     }
 53 }
 
 54 
 55 
 

The operations define runtime behavior of the class and can be defined as closures (anonymous classes) or external implementation such as VehicleMaintenanceOperation as follows:

  1 /*
 
  2  * an example of operation
  3  * 
  4  */
 
  5 package com.plexobject.aom;
  6 
  7 class VehicleMaintenanceOperation implements Operation {
 
  8 
  9     public VehicleMaintenanceOperation() {
 10     }
 11 
 12     public Object perform(Entity entity, Object[] args) {
 
 13         return "maintenance";
 14     }
 15 }
 16 
 17 
 
 

In real applications, you would also have meta classes for business rules, relationships, strategies, validations, etc as instances. As, you can see AOM provides powerful way to adopt new business requirements and I have seen it used successfully while working as consultant. On the downside, it requires a lot of plumbing and tooling support such as XML based configurations or GUI tools to manipulate meta data. I have also found it difficult to optimize with relational databases as each attribute and operation are stored in separate rows in the databases, which results in excessive joins when building the object. There are a number of alternatives of Adaptive Object Model such as code generators, generative techniques, metamodeling, and table-driven systems. These techniques are much easier with dynamic languages due to their support of metaprogramming, higher order functions and generative programming. Also, over the last few years, a number of schema less databases such as CouchDB, MongoDB, Redis, Cassendra, Tokyo Cabinet, Riak, etc. have become popular due to their ease of use and scalability. These new databases solve excessive join limitation of relational databases and allow evolution of applications similar to Adaptive Object Model. They are also much more scalable than traditional databases. The combination of dynamic languages and schema less databases provides a simple way to add Adaptive Object Model features without a lot of plumbing code.

Javascript Implementation

Let’s try above example in Javascript due to its supports of higher order functions, and prototype based inheritance capabilities. First, we will need to add some helper methods to Javascript (adopted from Douglas Crockford’s “Javascript: The Good Parts”), e.g.

  1 
  2 if (typeof Object.beget !== 'function') {
 
  3     Object.beget = function(o) {
  4         var F = function() {};
  5         F.prototype = o;
 
  6         return new F();
  7     }
  8 }
  9 
 
 10 Function.prototype.method = function (name, func) {
 11     this.prototype[name] = func;
 12     return this;
 13 };
 
 14 
 15 
 16 Function.method('new', function() {
 17     // creating new object that inherits from constructor's prototype
 
 18     var that = Object.beget(this.prototype);
 19     // invoke the constructor, binding -this- to new object
 
 20     var other = this.apply(that, arguments);
 21     // if its return value isn't an object substitute the new object
 
 22     return (typeof other === 'object' && other) || that;
 23 });
 24 
 
 25 Function.method('inherits', function(Parent) {
 26     this.prototype = new Parent();
 27     return this;
 
 28 });
 29 
 30 Function.method('bind', function(that) {
 31     var method = this;
 
 32     var slice = Array.prototype.slice;
 33     var args = slice.apply(arguments, [1]);
 34     return function() {
 35         return method.apply(that, args.concat(slice.apply(arguments,
 
 36             [0])));
 37     };
 38 });
 39 
 40 // as typeof is broken in Javascript, trying to get type from the constructor
 
 41 Object.prototype.typeName = function() {
 42     return typeof(this) === 'object' ? this.constructor.toString().split(/[\s\(]/)[1] : typeof(this);
 
 43 };
 44 
 45 
 

There is no need to define Operation interface, Property and PropertyType due to higher order function and dynamic language support. Following Javascript code defines core functionality of Entity and EntityType classes, e.g.:

  1 
  2 var EntityType = function(typeName, propertyNamesAndTypes) {
 
  3     this.typeName = typeName;
  4     this.propertyNamesAndTypes = propertyNamesAndTypes;
  5     this.getPropertyTypesAndNames = function() {
 
  6         return this.propertyNamesAndTypes;
  7     };
  8     this.getPropertyType = function(propertyName) {
 
  9         return this.propertyNamesAndTypes[propertyName];
 10     };
 11     this.getTypeName = function() {
 12         return this.typeName;
 
 13     };
 14     var that = this;
 15     for (propertyTypesAndName in propertyNamesAndTypes) {
 
 16         that[propertyTypesAndName] = function(name) {
 17             return function() {
 18                 return propertyNamesAndTypes[name];
 
 19             };
 20         }(propertyTypesAndName);
 21         
 22     }
 
 23 };
 24 
 25 
 26 
 27 var Entity = function(entityType, properties) {
 28     this.entityType = entityType;
 
 29     this.properties = properties;
 30     this.getEntityType = function() {
 31         return this.entityType;
 32     };
 
 33     var that = this;
 34     for (propertyTypesAndName in entityType.getPropertyTypesAndNames()) {
 35         that[propertyTypesAndName] = function(name) {
 
 36             return function() {
 37                 if (arguments.length == 0) {
 38                     return that.properties[name];
 39                 } else {
 
 40                     var oldValue = that.properties[name];
 41                     that.properties[name] = arguments[0];
 42                     return oldValue;
 43                 }
 44             };
 
 45         }(propertyTypesAndName);
 46         
 47     }
 48 };
 
 

Following Javascript code shows binding and example of usage (again in real application binding will be stored in configurations):

  1 
  2 var vehicleType = new EntityType('Vehicle', {
 
  3     'maker' : 'String',              // name -> typeName
  4     'model' : 'String',
 
  5     'yearCreated' : 'Date',
  6     'speed' : 'Number',
  7     'miles' : 'Number'
 
  8 });
  9 
 10 var vehicle = new Entity(vehicleType, {
 11     'maker' : 'Toyota',
 
 12     'model' : 'Highlander',
 13     'yearCreated' : new Date(2003, 0, 1),
 14     'speed' : 120,
 
 15     'miles' : 3000
 16 });
 17 
 18 vehicle.drive = function() {
 19     }.bind(vehicle);
 
 20 
 21 vehicle.stop = function() {
 22     }.bind(vehicle);
 23 
 24 vehicle.performMaintenance = function() {
 
 25     }.bind(vehicle);

A big difference with dynamic languages is that you can bind properties operations to the objects at runtime so you can invoke them as if they were native. For example, you can invoke vehicleType.maker() to get maker property of the vehicle-type or call vehicle.drive() to invoke operation on vehicle object. Another difference is that a lot of plumbing code disappears with dynamic languages.

Ruby Implementation

Similarly, above example in Ruby may look like:

  1 require 'date'
 
  2 require 'forwardable'
  3 class EntityType
  4   attr_accessor :type_name
 
  5   attr_accessor :property_names_and_types
  6   def initialize(type_name, property_names_and_types)
  7     @type_name = type_name
 
  8     @property_names_and_types = property_names_and_types
  9   end
 10   def property_type(property_name)
 11     @property_names_and_types[property_name]
 
 12   end
 13 end
 14 
 15 
 16 class Entity
 
 17   attr_accessor :entity_type
 18   attr_accessor :properties
 19   def initialize(entity_type, attrs = {})
 
 20     @entity_type = entity_type
 21     bind_properties(entity_type.property_names_and_types)
 22     attrs.each do |name, value|
 23       instance_variable_set("@#{name}", value)
 
 24     end
 25   end
 26   def bind_properties(property_names_and_types)
 27     (class << self; self; end).module_eval do
 
 28       property_names_and_types.each do |name, type|
 29         define_method name.to_sym do
 30           instance_variables_get("@#{name}")
 
 31         end
 32         define_method name.to_sym do
 33           instance_variables_set("@#{name}", value)
 
 34         end
 35       end
 36     end
 37   end
 38 end
 
 39 
 66 
 67 
 68 
 

We can then use Singleton, Lambdas and metaprogramming features of Ruby to add Adaptive Object Model support, e.g.

  1 vehicle_type = EntityType.new('Vehicle', {
 
  2     'maker' => 'String',             # class.name
  3     'model' => 'String',
 
  4     'yearCreated' => 'Time',
  5     'speed' => 'Fixnum',
 
  6     'miles' => 'Float'});
  7 
  8 
  9 vehicle = Entity.new(vehicle_type, {
 
 10     'maker' => 'Toyota',
 11     'model' => 'Highlander',
 12     'yearCreated' => DateTime.parse('1-1-2003'),
 
 13     'speed' => 120,
 14     'miles' => 3000});
 15 class << vehicle
 
 16   def drive
 17     "driving"
 18   end
 19   def stop
 
 20     "stopping"
 21   end
 22   def perform_maintenance
 23     "performing maintenance"
 
 24   end
 25 end
 26 
 27 
 

Ruby code is a lot more succint and as Ruby supports adding or removing methods dynamically, you can invoke properties and operations directly on the objects. For example, you can invoke vehicleType.maker() to get maker property of the vehicle-type or call vehicle.drive() to invoke operation on vehicle object. Also, Ruby provides a lot more options for higher order functions such as monkey patching, lambdas/procs/methods, send, delegates/forwardables, etc. Finally, Ruby provides powerful generative capabilities to build DSL that can bind all properties and operations at runtime similar to how Rails framework work.

Schema-less Databases

Now, the second half of the equation for Adaptive Object Model is persisting, which I have found to be challenge with relational databases. However, as I have been using schemaless databases such as CouchDB, it makes it trivial to store meta information as part of the plain data. For example, if I have to store this vehicle in CouchDB, all I have to do is create a table such as vehicles (I could use Single Table Inheritance to store all types of vehicles in same table):

 curl -XPUT http://localhost:5984/vehicles
 curl -XPUT http://localhost:5984/vehicle_types
 

and then add vehicle-type as

 curl -XPOST http://localhost:5984/vehicle_types/ -d '{"maker":"String", "model":"String", "yearCreated":"Date", "speed":"Number", "miles":"Number"}'
 

which returns

 {"ok":true,"id":"bb70f95e43c3786f72cb46b372a2808f","rev":"1-3976038079"}
 

Now, we can use the id of vehicle-type and add vehicle a follows

 curl -XPOST http://localhost:5984/vehicles/ -d '{"vehicle_type_id":"bb70f95e43c3786f72cb46b372a2808f", "maker":"Toyota", "model":"Highlander", "yearCreated":"2003", "speed":120, "miles":3000}'
 

which returns id of newly created vehicle as follows:

 {"ok":true,"id":"259237d7c041c405f0671d6774bfa57a","rev":"1-367618940"}
 

Summary

It is often said in software development that you can solve any problem with another level of indirection. Adaptive Object Model uses another level of indirection to create powerful applications that meet increasingly changing requirements. When it is used with dynamic languages that support metaprogramming and generative programming, it can be used build systems that can be easily evolved with minimum changes and downtime. Also, Schema-less databases eliminates drawbacks of many implementations of AOM that suffer from poor performance due to excessive joins in the relational databases.

October 14, 2009

Querying and Indexing CouchDB documents using Lucene

Filed under: Uncategorized — admin @ 9:32 pm

I have been playing with CouchDB lately and was looking for a way to index documents stored in the CouchDB. So, I started an open source project DocuSearch. It includes both POJO based and REST based services for indexing and searching that are hosted in Jetty server.

Getting Started

To get started download the source using:

 svn checkout http://docusearch.googlecode.com/svn/trunk/ docusearch-read-only  
 or
 git clone git://github.com/bhatti/DocuSearch.git
 

You will need to install Java 1.6, Maven 2.0+ and CouchDB before start using the services. On Mac, you can install CouchDB via:

 sudo port install couchdb
 

Then manually start the CouchDB using

 sudo /opt/local/bin/couchdb
 

You can verify if CouchDB is running using http://localhost:5984/_utils/index.html.

Building

Type “mvn” to build the project. Maven will download a bunch of files that may take a few minutes and will cache those locally and will then proceed to compile, test and build war file.

Populating Database

You are free to choose your favorite way to add or import data into CouchDB, though the DocuSearch includes some ETL programs to add comma or tab delimited data into CouchDB. For example, let say you want to find authorized e-file providers for IRS, so you download some data from IRS that has following format:

 business_name,street_address_1,street_address_2,city,state,zip,zip_4,contact_first_name,contact_middle_name,contact_last_name,phone,flag1,flag2,flag3,flag4
 

You can import it to the couchdb using

 mvn exec:java -Dexec.mainClass="com.plexobject.docusearch.etl.DocumentLoader" \
 -Dexec.args="efile_providers data/wa.txt none business_name,street_address_1,street_address_2,city,state,zip,zip_4,contact_first_name,contact_middle_name,contact_last_name,phone"
 

Which takes following arguments:

  • name-of-database, e.g. efile_providers
  • name of comma delimited file, e.g. data/wa.txt
  • id-column or none if database ids will automatically be generated
  • comma-delimited list of fields to be imported

Once the data is loaded, you can create Lucene index, but before that you will have to specify the index policy, which is just another CouchDB document. The index policy specifies fields to be indexed, whether they should be stored in index, score and boost values. These policy configurations are stored in the_config database and you can add the policy using:

 curl -X PUT http://127.0.0.1:5984/the_config/index_policy_for_efile_providers -d \
 '{"_id":"index_policy_for_efile_providers","dbname":"the_config","score":0,"boost":0,"fields":[{"name":"business_name", "storeInIndex":"true"},{"name":"street_address_1"},{"name":"city"},{"name":"zip"},{"name":"contact_first_name"},{"name":"contact_last_name"}]}'
 

It will return

 {"ok":true,"id":"index_policy_for_efile_providers","rev":"1-0fd2f5b2e2012f898df677c68daf4592"}
 

Note that you will need to pass the “_rev” parameter if you need to update the index policy. Later, you can retrieve the policy using:

 curl http://localhost:5984/the_config/index_policy_for_efile_providers
 

Now you are ready to build the index but let’s first start the Jetty with the REST based services via

 mvn jetty:run-war
 

Now hop on to browser and point to

 http://localhost:8080
 

Finally, you can use curl to build the index via:

 curl -vX POST http://localhost:8080/api/index/primary/efile_providers
 

Before you can query, you will have to specify query policy that is also stored in CouchDB and specifies list of fields that are searched, e.g.

 curl -X PUT http://127.0.0.1:5984/the_config/query_policy_for_efile_providers -d \ '{"_id":"query_policy_for_efile_providers","dynamo":"the_config","fields":[{"name":"efile_providers.business_name", "boost":2},{"name":"efile_providers.street_address_1"},{"name":"efile_providers.city"},{"name":"efile_providers.zip"},{"name":"efile_providers.contact_first_name"},{"name":"efile_providers.contact_last_name"}]}'
 

Which will return

 {"ok":true,"id":"query_policy_for_efile_providers","rev":"1-618703c1fd66996f23b89c4414dd0842"}
 

Again, you will need to pass “_rev” parameter when updating the query policy. Next you can search contents of the index via:

 curl "http://localhost:8080/api/search/efile_providers?keywords=mike"
 

Which will return

 {"suggestions":[],"keywords":"mike","start":0,"limit":0,"totalHits":7,"docs":[{"_id":"0352d18145532a05714bfec2e1e649dd","dbname":"efile_providers","indexDate":"20091121","doc":"53","score":"0.0","owner":"*","efile_providers.business_name":"Mr Tax Man"},{"_id":"062d548eb394db3534782c5b6ded0529","dbname":"efile_providers","indexDate":"20091121","doc":"96","score":"0.0","owner":"*","efile_providers.business_name":"Liberty Tax Service"},{"_id":"1ddc6006a2315dd0b0119c0dbc22c1a7","dbname":"efile_providers","indexDate":"20091121","doc":"450","score":"0.0","owner":"*","efile_providers.business_name":"1040 PLUS INC"},{"_id":"3621cc7edde5f191bcc5f3a41160f61e","dbname":"efile_providers","indexDate":"20091121","doc":"793","score":"0.0","owner":"*","efile_providers.business_name":"MIKE A PASSECK CPA"},{"_id":"37a2a152ff120ac293ea67daac1a11aa","dbname":"efile_providers","indexDate":"20091121","doc":"811","score":"0.0","owner":"*","efile_providers.business_name":"Liberty Tax Service"},{"_id":"be0fd60800b9eed6d418601f8cba06f3","dbname":"efile_providers","indexDate":"20091121","doc":"2856","score":"0.0","owner":"*","efile_providers.business_name":"Liberty Tax Service"},{"_id":"dfa948236e87d0c6ba90c612cb166635","dbname":"efile_providers","indexDate":"20091121","doc":"3395","score":"0.0","owner":"*","efile_providers.business_name":"MIKE FOLEYS TAX SERVICE"}]}
 

This query functionality can also be tested through a simple html based interface by just pointing your browser to http://localhost:8080/, e.g.

The index stores id of the document that is indexed so you can also retrieve details of each link using

 http://localhost:8080/api/storage/efile_providers/0352d18145532a05714bfec2e1e649dd
 

This feature can be tested from HTML interface by clicking on details link, e.g.

You can also debug why certain results are showing up using following API

 http://localhost:8080/api/search/explain/efile_providers?keywords=mike
 

This feature can be tested from HTML interface by clicking on explain button, e.g.

Next, you can also find top terms used in the index using:

 http://localhost:8080/api/search/rank/efile_providers?limit=1000
 

Again, this feature can be tested from HTML interface by clicking on top terms button, e.g.

You can also find similar searches for a particular search using

 http://localhost:8080/api/search/similar/efile_providers?externalId=37a2a152ff120ac293ea67daac1a11aa&luceneId=811&detailedResults=true
 

Which will return

 {"externalId":"37a2a152ff120ac293ea67daac1a11aa","luceneId":811,"start":0,"limit":0,"totalHits":973,"docs":[{"zip":"98107","phone":"206\/782-2772","contact_first_name":"TOR","street_address_2":"","street_address_1":"5919 NW 15TH AVE","state":"WA","city":"SEATTLE","_rev":"1-6f14e2e9d2092e63173002cd95785963","business_name":"LIBERTY TAX SERVICE","_id":"00684037657ef8960ede2f155339420e","contact_middle_name":"","zip_4":"","dbname":"efile_providers","contact_last_name":"SLINNING"},{"zip":"98118","phone":"206\/850-0505","contact_first_name":"ANDREW","street_address_2":"","street_address_1":"5021 SOUTH BARTON","state":"WA","city":"SEATTLE","_rev":"1-f910f4736db188d05f24751a68070b86","business_name":"H&A TAX PREPARATION SVCS","_id":"00b6c05dd24c30c4740b7aa1257ef308","contact_middle_name":"H","zip_4":"5336","dbname":"efile_providers","contact_last_name":"HODGE"},{"zip":"98682","phone":"360\/891-6701","contact_first_name":"MARILYN","street_address_2":"","street_address_1":"5101 NE 121ST AVE #50","state":"WA","city":"VANCOUVER","_rev":"1-6ab3f77b03eee2c529f910c559236eb3","business_name":"AFFORDABLE BOOKKEEPING & TAX SERVIC","_id":"00e35b6bbfbfe68db8e962bc41ec6c99","contact_middle_name":"C","zip_4":"","dbname":"efile_providers","contact_last_name":"BOON"},{"zip":"98406","phone":"206\/322-2226","contact_first_name":"MAN","street_address_2":"","street_address_1":"602 6TH AVE","state":"WA","city":"TACOMA","_rev":"1-4efeefbdaadaf9dde2a49f7246f884b5","business_name":"INSTANT TAX PRO","_id":"00fe1df22fe5731e01515cada787efd2","contact_middle_name":"V","zip_4":"","dbname":"efile_providers","contact_last_name":"SAM"},{"zip":"98208","phone":"425\/338-0118","contact_first_name":"STEPHEN","street_address_2":"","street_address_1":"3615 100TH ST SE","state":"WA","city":"EVERETT","_rev":"1-381ea9171f405bf40f78597a91730588","business_name":"ADSUM TAX & BOOKKEEPING LLC","_id":"014548ee3e23e5d56d4521b76de8434a","contact_middle_name":"D","zip_4":"","dbname":"efile_providers","contact_last_name":"TANGEN"},{"zip":"99116","phone":"509\/633-3829","contact_first_name":"RICHARD","street_address_2":"","street_address_1":"102 STEVENS","state":"WA","city":"COULEE DAM","_rev":"1-1b767048def4829db756f04014733681","business_name":"MEYER TAX SERVICE","_id":"016a700252b54fc170ffc0f69c60ce93","contact_middle_name":"W","zip_4":"","dbname":"efile_providers","contact_last_name":"AVEY"},{"zip":"98391","phone":"253\/862-5573","contact_first_name":"Tim","street_address_2":"","street_address_1":"20616 SR 410 E","state":"WA","city":"Bonney Lake","_rev":"1-5b6ee0d167743b1679c8c3f84f16d78b","business_name":"Barrans Tax Service","_id":"017a48b555806eda9f7999b426b00d14","contact_middle_name":"","zip_4":"","dbname":"efile_providers","contact_last_name":"Barrans"},{"zip":"98503","phone":"360\/456-5084","contact_first_name":"THOMAS","street_address_2":"","street_address_1":"4440 PACIFIC AVE SE","state":"WA","city":"LACEY","_rev":"1-dcbfcb5c112e3ef1e109f7bbfd410e9a","business_name":"TAX CENTERS OF AMERICA","_id":"01a024fe186a0df2f313191a951dbb1c","contact_middle_name":"B","zip_4":"","dbname":"efile_providers","contact_last_name":"OTT"},{"zip":"WA","phone":"Stevenson","contact_first_name":"","street_address_2":"924 West S Circle","street_address_1":"LLC","state":"Washougal","city":"","_rev":"1-9a7678e5c46651998fb7c0c83c9018b1","business_name":"Columbia Tax","_id":"01aa9791195e915093ee207518e6bf34","contact_middle_name":"Gina","zip_4":"98671","dbname":"efile_providers","contact_last_name":"A"},{"zip":"98188","phone":"303\/888-1040","contact_first_name":"CARL","street_address_2":"","street_address_1":"17600 PACIFIC HWY S","state":"WA","city":"SEATTLE","_rev":"1-976ac1c5ba46f59a42b57786af76e9b2","business_name":"NEXT DAY TAX CASH","_id":"01b21a8653b83789b561040887be7a28","contact_middle_name":"","zip_4":"","dbname":"efile_providers","contact_last_name":"PALMER"},{"zip":"98032","phone":"253\/852-6182","contact_first_name":"TOM","street_address_2":"# A-148","street_address_1":"1819 CENTRAL AVE S","state":"WA","city":"KENT","_rev":"1-1d0a9dbc409944bfc4618f598541a97f","business_name":"TAX GALLERY\/ TOM COKE ASSOCIATES","_id":"02051caa9f2faa9ca8386792c9653ff6","contact_middle_name":"C","zip_4":"","dbname":"efile_providers","contact_last_name":"ARMON"},{"zip":"98686","phone":"702\/320-0727","contact_first_name":"ARMOGAST","street_address_2":"","street_address_1":"14605 NE 20TH AVE","state":"WA","city":"VANCOUVER","_rev":"1-adf4be610b7fa43794d4d7dd3f8dc7de","business_name":"SUPREME BOOKKEEPING & TAX LLC.","_id":"0220b0609b83dd5621a90d9f7fe342ca","contact_middle_name":"J","zip_4":"","dbname":"efile_providers","contact_last_name":"MWASHIGHADI"},{"zip":"98665","phone":"360\/896-9897","contact_first_name":"GERALD","street_address_2":"","street_address_1":"7700 HWY 99","state":"WA","city":"VANCOUVER","_rev":"1-78234fab75e3e44d79648dab756a7791","business_name":"JACKSON HEWITT TAX SERVICE","_id":"02c2c5983ca8b4a00173dca208cc86de","contact_middle_name":"D","zip_4":"","dbname":"efile_providers","contact_last_name":"BREUNIG"},{"zip":"98531","phone":"360\/556-4906","contact_first_name":"David","street_address_2":"SUITE A","street_address_1":"417 W. MAIN ST.","state":"WA","city":"CENTRALIA","_rev":"1-fcb886c533f0a223f835397a0d5cf773","business_name":"Liberty Tax Service","_id":"02c466c8097ee00a1ae2d27aafd808aa","contact_middle_name":"C","zip_4":"","dbname":"efile_providers","contact_last_name":"Dunsmore"},{"zip":"98626","phone":"909\/849-1174","contact_first_name":"CINDY","street_address_2":"","street_address_1":"2640 ROBERT CT","state":"WA","city":"Kelso","_rev":"1-5513b1057c7882a7657a79a3e888b21d","business_name":"THE TAX WARD","_id":"032a14eaca19a1548364609fd480a1b9","contact_middle_name":"J","zip_4":"","dbname":"efile_providers","contact_last_name":"WARD"},{"zip":"98036","phone":"425\/774-6633","contact_first_name":"Mike","street_address_2":"","street_address_1":"20015 HIGHWAY 99","state":"WA","city":"LYNNWOOD","_rev":"1-d7f17073757afa70e869e543099a7bf5","business_name":"Mr Tax Man","_id":"0352d18145532a05714bfec2e1e649dd","contact_middle_name":"C","zip_4":"6073","dbname":"efile_providers","contact_last_name":"McKinnon"},{"zip":"98284","phone":"360\/595-9138","contact_first_name":"LAURA","street_address_2":"","street_address_1":"765 SUMERSET WAY","state":"WA","city":"SEDRO WOOLLEY","_rev":"1-b2ebc210bbf481c2ed37751a45a2249e","business_name":"CAIN LAKE TAX SERVICE","_id":"03cc2bcb150f981519a8d93093a015ca","contact_middle_name":"L","zip_4":"","dbname":"efile_providers","contact_last_name":"COZZA"},{"zip":"99350","phone":"509\/786-1269","contact_first_name":"ERNEST","street_address_2":"","street_address_1":"1002 LILLIAN","state":"WA","city":"PROSSER","_rev":"1-cd626358950c19bd2519859fbd50bbce","business_name":"E & R TAX SERVICE","_id":"03f620e0ae8ac148af79cb7848c3bf41","contact_middle_name":"W","zip_4":"","dbname":"efile_providers","contact_last_name":"TROEMEL"},{"zip":"99301","phone":"509\/851-8808","contact_first_name":"Aaron","street_address_2":"SUITE E","street_address_1":"5024 NORTH ROAD 68","state":"WA","city":"PASCO","_rev":"1-e9c9843292f9233f46e07628105ee72c","business_name":"Liberty Tax Service of West Pasco","_id":"03fe45d715110d0aa2d3022bbe7325e7","contact_middle_name":"J","zip_4":"","dbname":"efile_providers","contact_last_name":"Welles"},{"zip":"98329","phone":"253\/884-3566","contact_first_name":"ROY","street_address_2":"","street_address_1":"13215 139TH AVE KPN","state":"WA","city":"GIG HARBOR","_rev":"1-70d070d37c72fb13b9162ba4523ab70f","business_name":"MYR-MAR ACCOUNTING SERVICE INC","_id":"040ab87550b5a4a200c97d2e6a6b96a7","contact_middle_name":"M","zip_4":"","dbname":"efile_providers","contact_last_name":"KEIZUR"}]}
 

This feature can be tested from HTML interface by clicking on similar link, e.g.

Conclusion

DocuSearch makes it easy to query documents on CouchDB, however I have also started adding support for Berkley DB if you choose to use it. I found CouchDB wastes a lot of space and is a bit slow so that may be alternative option for some. I also plan to add ngrams and stem based analyzers to create better search experience. I also welcome you to join the project. You can add yourself to http://code.google.com/p/docusearch/ or http://github.com/bhatti/DocuSearch/ project(s) and start contributing.

September 21, 2009

Installing Ubuntu Remix and Troubleshooting Network connections

Filed under: Computing — admin @ 10:00 am

I recently ordered ASUS Eeee PC 1005HA netbook that actually got lost in mail and had to reorder. Anyway, I finally received it this weekend and it comes with Windows XP that I decided to replace with Ubuntu. Though, there is a special distribution of Ubuntu called Remix or UNR, but support of netbooks on Ubuntu is still work in progress so it took longer than I expected. Here are the steps I went through to install and setup UNR on my ASUS netbook:

Download Ubuntu Remix

This was easy, I downloaded UNR from http://www.ubuntu.com/GetUbuntu/download-netbook and saved img file on my local netbook (which was running XP at that time).

Download USB Imager

Then, I downloaded USB Disk Imager for windows.

Creating UNR Image

After downloading imager, I opened the application, inserted my USB drive and copied the image, so far so good.

Changing BIOS to boot from USB

The ASUS reboots automatically from hard disk so I had to change the BIOS settings. I shutdown
the machine completely, then started while holding F2. It brought up BIOS settings and I changed the Boot sequence to boot from USB and then saved the settings with F10.

Installing UNR

After rebooting, the UNR loaded from the USB. First, I played without installing and figured out quickly that network isn’t working. I decided to install the UNR despite these issues. I allocated half of disk space about 70G to Linux and left Windows partition alone in case I fail. I then allocated swap space and then proceeded to install, which was fairly standard. After installation, I rebooted the machine and the GRUB loader showed me both Windows and UNR options.

Troubleshooting Network

Now, the fun started. Neither my wired nor wireless network was working. I found a number of forums with similar problems. I tried

 iwconfig
 iwlist scan
 lsmod
 

to see what’s installed and available but didn’t see the drivers. Also, “dmesg” wasn’t helpful and

  sudo /etc/init.d/networking restart
 

didn’t help either. I then typed

 lspci
 

Which showed

 02:00.0 Network controller: Atheros Communications Inc. AR9285 Wireless Network Adapter (PCI-Express) (rev 01)
 

I then switched to my Mac and I then looked for driver of AR9285. I found a good resource http://partner.atheros.com/Drivers.aspx and downloaded Linux driver and then copied to another USB drive.
I built the driver with

 tar -xzf 
 cd src
 make 
 sudo make install
 sudo insmod atl1e.ko
 

After rebooting, it fixed the wired network and I could then use the wired network to continue troubleshooting. I tried following instructions from http://wireless.kernel.org/en/users/Download, which suggested

 sudo apt-get install linux-backports-modules-jaunty
 

But it didn’t work for me. I then tried

 apt-get install linux-backports-modules-$(uname -r)
 

And that didn’t work. Finally, I decided to upgrade to Karmic Koala by issuing this command:

 sudo do-release-upgrade -d
 

It took a while to download all packages, it then removed a bunch of obsolete packages and after reboot complained about a bunch of old configurations that are not compatible anymore. Nevertheless, my wireless started working, yeah. Next, I am going to install Regdb, CRDA, and IW to track any other wireless issues.

I still left option to dual boot on my netbook but I am definitely going to live in UNR for most part.

September 2, 2009

Introduction to CouchDB

Filed under: Computing — admin @ 6:51 pm

I have been following growth and popularity of CouchDB for a while and even attended an excellent talk by J Chris Anderson of http://couch.io. However, only recently I am getting chance to actually use it. I am building an internal Search Engine based on Lucene, but I am storing documents in CouchDB. Though, CouchDB is pretty easy to setup, but its documentation is sporadic. Here are basic steps to get it running:

Installation and Launch

I installed CouchDB on my MacPro notebook using:

 sudo port install couchdb
 

CouchDB is available for Linux distributions and you can use yum or apt to install it, though official binaries are not available for Windows. You can also setup to load it at startup on Mac usng:

 sudo launchctl load -w /opt/local/Library/LaunchDaemons/org.apache.couchdb.plist
 

Once you installed it, you can start the couchdb server using:

 sudo /opt/local/bin/couchdb
 

Alternatively, you can skip installation & launch and instead use hosting solution from http://hosting.couch.io using “booom-couch” password for private beta.

Verify Installation

Once couchdb is started you can point your browser to http://127.0.0.1:5984/ or type in:

 curl http://127.0.0.1:5984/
 

As CouchDB uses JSON format for communication, it would show something like:

 {"couchdb":"Welcome","version":"0.9.0"}
 

Alternatively, you can use curl to communication with couchd server:

 curl http://127.0.0.1:5984/
 

Creating a database

CouchDB is REST based service, and you can review all APIs at http://wiki.apache.org/couchdb/HTTP_Document_API. CouchDB uses PUT operation to create a database, e.g.

 curl -X PUT http://127.0.0.1:5984/guestbook
 

It will return

 {"ok":true}
 

Based on REST principles, PUT is used when adding a new data where the resource is specified by the client. However, if you call this API again with the same arguments, it will return in error, e.g.:

 {"error":"file_exists","reason":"The database could not be created, the file already exists."}
 

Adding documents

Each document is a JSON object that consists of name value pairs. Also, each document is specified a unique identifier or uuid. You can generate uuid in your application or get it from the CouchDB server. For example, to generate 10 UUIDs, call

 curl -X GET http://127.0.0.1:5984/_uuids?count=10
 

and it will return something like:

 {"uuids":["152019530472f7b0b364367bc2ec571d","cba55d13244afe7b924265760deccced","41a8d0d7093ac11827b3147565a08a80","281dc15503fffee17c9da332748e9288","90613ae77c78c8bd81849b728d648055","23c320522473bdd47071d56b72667172","bb8b72a9dc391e95ffd5e155d8bf7011","87b8da3e3cf0c16110e030a711dc26b3","cfdf87adc2cf4593a92e4edf38f2f557","dc80745c5cb478de48230e48efaf5ede"]}
 

You can then add a document using:

 curl -X PUT http://127.0.0.1:5984/guestbook/152019530472f7b0b364367bc2ec571d -d '{"name":"Sally", "message":"hi there"}'
 

It will return verification message:

 {"ok":true,"id":"152019530472f7b0b364367bc2ec571d","rev":"1-3525253587"}
 

Note, it generated a version of the document. Alternatively, you can use POST request to add document using server-generated UUID, e.g.

 curl -X POST http://127.0.0.1:5984/guestbook -d '{"name":"John", "message":"hi there"}'
 

That returns UUID and version of newly created object, e.g.

 {"ok":true,"id":"b4bb85ab50271f3d12d25feb219cb66e","rev":"1-657551114"}
 

Also, you can add binaries such as images to the CouchDB as well, e.g.

 curl -vX PUT http://127.0.0.1:5984/guestbook/6e1295ed6c29495e54cc05947f18c8af/image.jpg?rev=2-2739352689 -d@image.jpg -H "Content-Type: image/jpg"
 

Reading documents

CouchDB uses GET operation to read the document and you pass the id of the document, e.g.

 curl -X GET http://127.0.0.1:5984/guestbook/152019530472f7b0b364367bc2ec571d
 

which returns

 {"_id":"152019530472f7b0b364367bc2ec571d","_rev":"1-3525253587","name":"Sally","message":"hi there"}
 

Updating documents

CouchDB uses optimistic locking to update documents so this version number must be passed when we update document. Also, CouchDB is append-only database so it will create a new version of the document upon updated. For example, if you type same command again you would see:

 {"error":"conflict","reason":"Document update conflict."}
 

In order to update the document, the version must be specified, e.g.

 curl -X PUT http://127.0.0.1:5984/guestbook/152019530472f7b0b364367bc2ec571d -d '{"_rev":"1-3525253587", "name":"Sally", "message":"hi there", "date":"September 5, 2009"}'
 

This will in turn, create a new version and will return:

 {"ok":true,"id":"152019530472f7b0b364367bc2ec571d","rev":"2-1805813096"}
 

Deleting document/database

You can delete a document using DELETE operation, e.g.

 curl -X DELETE http://127.0.0.1:5984/guestbook/b4bb85ab50271f3d12d25feb219cb66e -d '{"rev":"1-657551114"}'
 

Similarly, you can delete a database using:

 curl -X DELETE http://127.0.0.1:5984/guestbook
 

Querying Documents

CouchDB uses Javascript based map and reduce functions to query and view documents, where map function takes a document object and returns (emits) attributes from the document. Here is simplest map function that returns entire document:

 function(doc) {
       emit(null, doc);
 }
 

Here is another example, that returns names of people who posted to guestbook:

 function(doc) {
     if (doc.Type == "guestbook") {
         emit(null, {name: doc.name});
     }
 }
 

Reduce function is similar to aggregation functions in most relatinal databases, for example to count all names you could define map function as

 function (doc) {
     if (doc.Type == "guestbook") {
         emit(doc.name, 1);
     }
 }
 

and reduce function as

 function (name, counts) {
     int sum=0;
     for (var i=0; i<counts.length; i++) {
         sum+=counts[i];
     }
     return sum;
 }
 

All Databases

You can list names of the database using:

 curl -X GET http://127.0.0.1:5984/_all_dbs
 

You can also get all documents for a particular database (guestbook):

 curl -X GET http://127.0.0.1:5984/guestbook/_all_docs
 

CouchDB also comes with a web based Futon application to create, update, and list databases and documents, simply go to http://127.0.0.1:5984/_utils/ and you will all databases in the system.
You can also control replication from that UI, which is pretty handy. Also, you can poll database changes using:

 curl -X GET 'http://127.0.0.1:5984/guestbook/_changes?feed=longpoll&since=2'
 

Also, you can get statistics using:

 curl -X GET http://127.0.0.1:5984/_stats/
 

And Config via:

 curl -X GET http://127.0.0.1:5984/_config
 

Replication

CouchDB is written in Erlang and uses many of internal features of Erlang such as replication of databases (that use Mnesia). In order to replicate, just create a database on another server, e.g.

 curl -X PUT http://127.0.0.1:5984/guestbook-replica
 

Then replicate using:

 curl -X POST http://127.0.0.1:5984/_replicate -H 'Content-Type: application/json' -d '{"source":"guestbook", "target":"http://127.0.0.1:5984/guestbook-replica"}'
 

Security

You can add user/password based basic authentication by editing /opt/local/etc/couchdb/local.ini file. You will then need to pass user/password when accessing CouchDB server, e.g.

 
 curl -basic -u 'user:pass' -X PUT http://127.0.0.1:5984/guestbook
 

Summary

I just started using CouchDB and I am still learning more advanced features and its capabilities in enterprise level environment. Though, it looks very promising, but I am keeping Berkely DB in the back pocket in case I run into severe issues.

August 15, 2009

Releasing Wazil.com

Filed under: Computing — admin @ 11:30 am

I just finished a brand new website Wazil.com and companion facebook app for posting yellow pages and classifieds. I am working on starting a local communities for this website that will show local search results based on your location. Please give it a try and post me your comments and suggestions.

« Newer PostsOlder Posts »

Powered by WordPress