Shahzad Bhatti Welcome to my ramblings and rants!

September 2, 2009

Introduction to CouchDB

Filed under: Computing — admin @ 6:51 pm

I have been following growth and popularity of CouchDB for a while and even attended an excellent talk by J Chris Anderson of http://couch.io. However, only recently I am getting chance to actually use it. I am building an internal Search Engine based on Lucene, but I am storing documents in CouchDB. Though, CouchDB is pretty easy to setup, but its documentation is sporadic. Here are basic steps to get it running:

Installation and Launch

I installed CouchDB on my MacPro notebook using:

 sudo port install couchdb
 

CouchDB is available for Linux distributions and you can use yum or apt to install it, though official binaries are not available for Windows. You can also setup to load it at startup on Mac usng:

 sudo launchctl load -w /opt/local/Library/LaunchDaemons/org.apache.couchdb.plist
 

Once you installed it, you can start the couchdb server using:

 sudo /opt/local/bin/couchdb
 

Alternatively, you can skip installation & launch and instead use hosting solution from http://hosting.couch.io using “booom-couch” password for private beta.

Verify Installation

Once couchdb is started you can point your browser to http://127.0.0.1:5984/ or type in:

 curl http://127.0.0.1:5984/
 

As CouchDB uses JSON format for communication, it would show something like:

 {"couchdb":"Welcome","version":"0.9.0"}
 

Alternatively, you can use curl to communication with couchd server:

 curl http://127.0.0.1:5984/
 

Creating a database

CouchDB is REST based service, and you can review all APIs at http://wiki.apache.org/couchdb/HTTP_Document_API. CouchDB uses PUT operation to create a database, e.g.

 curl -X PUT http://127.0.0.1:5984/guestbook
 

It will return

 {"ok":true}
 

Based on REST principles, PUT is used when adding a new data where the resource is specified by the client. However, if you call this API again with the same arguments, it will return in error, e.g.:

 {"error":"file_exists","reason":"The database could not be created, the file already exists."}
 

Adding documents

Each document is a JSON object that consists of name value pairs. Also, each document is specified a unique identifier or uuid. You can generate uuid in your application or get it from the CouchDB server. For example, to generate 10 UUIDs, call

 curl -X GET http://127.0.0.1:5984/_uuids?count=10
 

and it will return something like:

 {"uuids":["152019530472f7b0b364367bc2ec571d","cba55d13244afe7b924265760deccced","41a8d0d7093ac11827b3147565a08a80","281dc15503fffee17c9da332748e9288","90613ae77c78c8bd81849b728d648055","23c320522473bdd47071d56b72667172","bb8b72a9dc391e95ffd5e155d8bf7011","87b8da3e3cf0c16110e030a711dc26b3","cfdf87adc2cf4593a92e4edf38f2f557","dc80745c5cb478de48230e48efaf5ede"]}
 

You can then add a document using:

 curl -X PUT http://127.0.0.1:5984/guestbook/152019530472f7b0b364367bc2ec571d -d '{"name":"Sally", "message":"hi there"}'
 

It will return verification message:

 {"ok":true,"id":"152019530472f7b0b364367bc2ec571d","rev":"1-3525253587"}
 

Note, it generated a version of the document. Alternatively, you can use POST request to add document using server-generated UUID, e.g.

 curl -X POST http://127.0.0.1:5984/guestbook -d '{"name":"John", "message":"hi there"}'
 

That returns UUID and version of newly created object, e.g.

 {"ok":true,"id":"b4bb85ab50271f3d12d25feb219cb66e","rev":"1-657551114"}
 

Also, you can add binaries such as images to the CouchDB as well, e.g.

 curl -vX PUT http://127.0.0.1:5984/guestbook/6e1295ed6c29495e54cc05947f18c8af/image.jpg?rev=2-2739352689 -d@image.jpg -H "Content-Type: image/jpg"
 

Reading documents

CouchDB uses GET operation to read the document and you pass the id of the document, e.g.

 curl -X GET http://127.0.0.1:5984/guestbook/152019530472f7b0b364367bc2ec571d
 

which returns

 {"_id":"152019530472f7b0b364367bc2ec571d","_rev":"1-3525253587","name":"Sally","message":"hi there"}
 

Updating documents

CouchDB uses optimistic locking to update documents so this version number must be passed when we update document. Also, CouchDB is append-only database so it will create a new version of the document upon updated. For example, if you type same command again you would see:

 {"error":"conflict","reason":"Document update conflict."}
 

In order to update the document, the version must be specified, e.g.

 curl -X PUT http://127.0.0.1:5984/guestbook/152019530472f7b0b364367bc2ec571d -d '{"_rev":"1-3525253587", "name":"Sally", "message":"hi there", "date":"September 5, 2009"}'
 

This will in turn, create a new version and will return:

 {"ok":true,"id":"152019530472f7b0b364367bc2ec571d","rev":"2-1805813096"}
 

Deleting document/database

You can delete a document using DELETE operation, e.g.

 curl -X DELETE http://127.0.0.1:5984/guestbook/b4bb85ab50271f3d12d25feb219cb66e -d '{"rev":"1-657551114"}'
 

Similarly, you can delete a database using:

 curl -X DELETE http://127.0.0.1:5984/guestbook
 

Querying Documents

CouchDB uses Javascript based map and reduce functions to query and view documents, where map function takes a document object and returns (emits) attributes from the document. Here is simplest map function that returns entire document:

 function(doc) {
       emit(null, doc);
 }
 

Here is another example, that returns names of people who posted to guestbook:

 function(doc) {
     if (doc.Type == "guestbook") {
         emit(null, {name: doc.name});
     }
 }
 

Reduce function is similar to aggregation functions in most relatinal databases, for example to count all names you could define map function as

 function (doc) {
     if (doc.Type == "guestbook") {
         emit(doc.name, 1);
     }
 }
 

and reduce function as

 function (name, counts) {
     int sum=0;
     for (var i=0; i<counts.length; i++) {
         sum+=counts[i];
     }
     return sum;
 }
 

All Databases

You can list names of the database using:

 curl -X GET http://127.0.0.1:5984/_all_dbs
 

You can also get all documents for a particular database (guestbook):

 curl -X GET http://127.0.0.1:5984/guestbook/_all_docs
 

CouchDB also comes with a web based Futon application to create, update, and list databases and documents, simply go to http://127.0.0.1:5984/_utils/ and you will all databases in the system.
You can also control replication from that UI, which is pretty handy. Also, you can poll database changes using:

 curl -X GET 'http://127.0.0.1:5984/guestbook/_changes?feed=longpoll&since=2'
 

Also, you can get statistics using:

 curl -X GET http://127.0.0.1:5984/_stats/
 

And Config via:

 curl -X GET http://127.0.0.1:5984/_config
 

Replication

CouchDB is written in Erlang and uses many of internal features of Erlang such as replication of databases (that use Mnesia). In order to replicate, just create a database on another server, e.g.

 curl -X PUT http://127.0.0.1:5984/guestbook-replica
 

Then replicate using:

 curl -X POST http://127.0.0.1:5984/_replicate -H 'Content-Type: application/json' -d '{"source":"guestbook", "target":"http://127.0.0.1:5984/guestbook-replica"}'
 

Security

You can add user/password based basic authentication by editing /opt/local/etc/couchdb/local.ini file. You will then need to pass user/password when accessing CouchDB server, e.g.

 
 curl -basic -u 'user:pass' -X PUT http://127.0.0.1:5984/guestbook
 

Summary

I just started using CouchDB and I am still learning more advanced features and its capabilities in enterprise level environment. Though, it looks very promising, but I am keeping Berkely DB in the back pocket in case I run into severe issues.

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URL

Leave a comment

You must be logged in to post a comment.

Powered by WordPress