JavaOne 2010 – Keeping your options open if the Cloud is not
September 23rd, 2010 § 2 Comments
This was one of the best presentations at JavaOne, probably due to the oratory talents of Doug Tidwell.
Doug presented libcloud and the Simple Cloud API, respectively a common library for interacting with the popular cloud server providers and controlling their VMs (reboot/create/destroy/list/images) and a common interface for the three most common cloud application services (File Storage, Document Storage and Simple Queues). The whole idea is to code to those common APIs rather than using the vendor’s own API directly in an effort to increase portability (the ability to move a piece of code from one vendor API to the other) and interoperability (the ability to run the same piece of code across multiple vendors’ APIs). Common sense dictates that portability is harder to achieve for vendor APIs that provide the most functionality. Here is a list of some vendors for which the common interfaces were built:
- Amazon Relation Database Service (RDS)
- Amazon SimpleDB
- Amazon Machine Image (AMI) hosting an application container
- Microsoft Azure
- Google Application Engine
Before proceeding I would like to mention another effort in this area; jclouds has an interesting API and Apache Nuvem (from the folks who brought you Tuscany) focuses more on the application layer but is at an early development stage.
Of course the community has been hotly debating the two initiatives (libcloud and Simple Cloud API) arguing that it may be too early to begin standardizing access to the Cloud for fear of locking their design while people are still trying to figure out the best way to accomplish some tasks. Doug, on the other hand, argues that we already know what needs to be done to manage a VM (start, stop, provision, etc…) and interoperability is the real issue that needs to be addressed so it’s not too early for some standard to emerge. I do think that the common interface approach makes sense, especially since we don’t know how many of those vendors will be around/bought/merged/etc… and some large systems will need to interoperate.
The path to libcloud was described as follows:
- First we worried about APIs that deal with the XML and the JSON that goes into the Request and Response message i.e. the focus was on the wire
- Then we considered language-specific features to handle SOAP or REST messages
- Then we considered service-specific features while thinking about our organization needs and that’s the right level; the common interfaces should handle all the plumbing underneath the messages that get sent to the various Clouds
I will list here a purposefully boring-looking list of libcloud APIs to give a sense of 1) what the project tries to achieve (the methods are self-descriptive) and 2) the effort required to build such as an API since a deep dive with the various vendors APIs will show that not all capabilities are equally supported: Driver. getName(), Driver.listImages(), Driver.listLocations(), Driver.listNodes(), Driver.listSizes(), Driver.getId(), Driver.getPrivateId(), Driver.getPublicId(), Driver.getUuid() and Driver.getState().
The Simple Cloud API is a join effort between Zend, GoGrid, IBM, Microsoft, Nirvanix and Rackspace. Its makes heavy usage of Dependency Injection and configuration files to keep the code free of vendors’ specifics. For File Storage you would use a StorageAdapter to invoke common operations such as storeItem(), fetchItem(), deleteItem(), copyItem(), moveItem(), renameItem(), listFolders(), listItems(), storeMetadata(), fetchMetadata() and deleteMetadata(). However not all File Storage systems support, for example, renaming files, so proper exception flows must be put in place when using the API, and Doug pointed out that the best way for the common interface to handle these cases is still being debated: Introspection, instanceof, XSLT-style?
For Document Storage a common interface is trickier: Some data stores are relational, most aren’t, some support schema, most don’t, some support concurrency, most don’t, etc… The other issues arise from the nature of Clouds’ Document Storage implementations: They are designed to scale to infinity and as such have no concept of indexed keys (you need to create one using Java Uuid, for example), they tend to be de-normalized (Amazon has no joins on tables), etc… I think that by Document Storage a paradigm shift is expected, indeed the name is so generic to indicate that it most probably does not refer to a relational database.
Finally Simple Queue: Right now it supports Amazon and Azure queues through a simple API: createQueue(), deleteQueue(), listQueues(), sendMessage(), receiveMessage(), deleteMessage(). SQS lets you peek at the Queue, Azure does not. Cloud queues are known to experience large delays and I think that there are probably two ways to look at it: A delay of around 30 seconds (between a Producer and a Consumer) is so large that Cloud Queues are not worth looking at (un-reasonable) or a delay of around 30 seconds is quite large but then again Queue-based systems should be decoupled and Cloud Queues should not be used for low-latency situations (more reasonable).
Doug wrapped it with a demo where a Producer running an Order Processing system across many VMs on one Cloud created order details and placed them in a “Cloud DB”, created then an Order Message and placed it in a “Cloud Queue”, a second Order Processing running on another cloud picked the message from the “Cloud Queue”, got the order’s details from the “Cloud DB”, stored the invoice in a “Cloud Storage” and deleted the message from the “Cloud Queue”. As you guessed expressions between quotes refer to common interfaces and the two clouds were from different vendors. It worked like a charm and I will be eager to experiment with this setup.
In conclusion Cloud vendors are providing drastically different services so it makes sense to experiment with those two common interfaces; in particular libcloud should prove to be a productivity tool. Even the Simple Cloud API is an interesting abstraction layer that development teams would probably consider building in-house anyway if it did not exist, but I think that it will evolve quite a bit.
[...] This post was mentioned on Twitter by tweetcloud, Bala Subra. Bala Subra said: #Lab49 : JavaOne 2010 – Keeping your options open if the Cloud is not: This was one of the best presentations at J… http://bit.ly/9HItls [...]
[...] Cache, Data Store and Queuing. This nicely complements the libcloud effort described in my earlier post where we saw common functionality to manage the various Clouds (list, reboot, create, destroy, [...]