JavaOne 2010 – Developing Composite Applications for the Cloud with Apache Tuscany (SCA)

September 25th, 2010 § Leave a Comment

A great take on a great technology, SCA (Service Component Architecture), was given by Jean-Sebastien Delfino (IBM) and Luciano Resende (Shutterfly). Can Tuscany (a SCA  implementation) shield you from the complexities of component assembly/component integration/deployment/inter-component communications/client protocols/… in the Cloud? It turns out that many of these problems are not specific to the Cloud, they apply to any distributed environment, but the live demo made it a point to show how SCA works in a Cloud.

First they quickly reviewed the SCA goals:

  1. Abstract out the APIs, protocols and QoS (as in addressing, authentication levels, etc…) that define a service
  2. A structure for application componentization (refer to my earlier post Patterns For Modularity )
  3. A way to assemble/wire/re-wire components
  4. Encourage Open Source implementations (Tuscany, Fabric3, etc…)
  5. Encourage products implementations (IBM WebSphere Process Server, Oracle has an implementation inside its BPEL Process Manager, Tibco ActiveMatrix, SAP NetWeaver Platform, etc…)
  6. Target SOA, multi-languages applications, application integration

But how can SCA help you in the Cloud? To answer this we need to look at the SCA Composite (briefly, in the SCA Assembly Model a Composite is the main unit of work  providing Components to implement the business function, exposing the Services  through various bindings such as WS or JMS, and listing its dependencies on other Composites through References). A Composite is defined in a SCDL (Service Component Definition Language) file. The Components can be coded in Java (POJOs or Spring Beans for e.g.), BPEL, JEE, Groovy, XQuery, etc… The multi-language support means that a Component can target different Cloud vendors’ run-times. The Service bindings are similarly diversified (WS, JMS, JCA, SLSB or Stateless Session Bean, HTTP, JSON, etc…) which means that various Cloud clients can access the same Component through various bindings. Finally one Composite can refer to another co-located or remotely located Composite, which makes them suitable for distributed environments such as the Cloud.

Let’s take a look at the Apache Tuscany project run-time. It comes in two flavors: Java, the most common run-time and the SCA native run-time (Components can be written in Python for potential deployments in the Google Application Engine Cloud or can be written in C++ – for situations requiring a small footprint). Scheme support for Components is currently experimental.

The demo was quite convincing: A SCA Composite was written in Java and deployed to Amazon EC2. Another Java Composite, part of the same application, was  pushed to Google Application Engine. That same Composite’s Component was re-written in Python and moved to Google Application Engine Cloud, the application behaved as before. Finally that same Composite’s Component was re-written in C++ and pushed to Amazon EC2, the application behaved as before. What’s worth mentioning is the minimal effort  required to move from one Component implementation to the other: Just one line of change in the SCDL .composite file.

So far so good but as mentioned before not really Cloud-specific. The Tuscany folks have recently forked another project, Nuvem, to address some Cloud-specific issues. Various utilities standardize access to Cloud services such as User Authentication and Authorization, Distributed Cache, Data Store and Queuing. This nicely complements the libcloud effort described in my earlier post where we saw common functionality to manage the various Clouds (list, reboot, create, destroy, etc…) getting standardized but Nuvem seems redundant with some areas of the Simple Cloud API (such as Queuing). Right now Nuvem is concentrating on Amazon and Google Application Engine.  What I liked about Nuvem was the REST-approach: REST itself is Cloud friendly, all operations are described in 4 simple verbs, this should help simplify the API. But no demo was given of the Nuvem capabilities, I guess the project is still too young. Jean Sebastien actually shared a wish list for Nuvem: XMPP (for EC2, it’s there for GAE), Maven, OpenID, hierarchical cache, key/value store, RDBMS, etc…

In conclusion developing a distributed application for deployment in your data center should not be fundamentally different from developing that same application for deployment in the Cloud. It’s the plumbing that differs but that’s transparent to the application. So the Cloud in itself should not be enough to entice you to adopt Tuscany or another SCA implementation. The complexity of the application, the desire to make it modular and service-oriented, the need to expose it business logic through many bindings and the option to support many component models (read many languages/frameworks) should entice you to adopt Tuscany or another SCA implementation.

JavaOne 2010 – Keeping your options open if the Cloud is not

September 23rd, 2010 § 2 Comments

This was one of the best presentations at JavaOne, probably due to the oratory talents of Doug Tidwell.

Doug presented libcloud and the Simple Cloud API, respectively a common library for interacting with the popular cloud server providers and controlling their VMs (reboot/create/destroy/list/images) and a  common interface for the three most common cloud application services (File Storage, Document Storage and Simple Queues). The whole idea is to code to those common APIs rather than using the vendor’s own API directly in an effort to increase portability (the ability to move a piece of code from one vendor API to the other) and interoperability (the ability to run the same piece of code across multiple vendors’ APIs). Common sense dictates that portability is harder to achieve for vendor APIs that provide the most functionality. Here is a list of some vendors for which the common interfaces were built:

  • Amazon Relation Database Service (RDS)
  • Amazon SimpleDB
  • Amazon Machine Image (AMI) hosting an application container
  • Microsoft Azure
  • Google Application Engine

Before  proceeding I would like to mention another effort in this area; jclouds has an interesting API and Apache Nuvem (from the folks who brought you Tuscany) focuses more on the application layer but is at an early development stage.

Of course the community has been hotly debating the two initiatives (libcloud and Simple Cloud API) arguing that it may be too early to begin standardizing access to the Cloud for fear of locking their design while people are still trying to figure out the best way to accomplish some tasks. Doug, on the other hand, argues that we already know what needs to be done to manage a VM (start, stop, provision, etc…) and interoperability is the real issue that needs to be addressed so it’s not too early for some standard to emerge. I do think that the common interface approach makes sense, especially since we don’t know how many of those vendors will be around/bought/merged/etc… and some large systems will need to interoperate.

The path to libcloud was described as follows:

  1. First we worried about APIs that deal with the XML and the JSON that goes into the Request and Response message i.e. the focus was on the wire
  2. Then we considered language-specific features to handle SOAP or REST messages
  3. Then we considered service-specific features while thinking about our organization needs and that’s the right level; the common interfaces should handle all the plumbing underneath the messages that get sent to the various Clouds

I will list here a purposefully boring-looking list of libcloud APIs to give a sense of 1) what the project tries to achieve (the methods are self-descriptive) and 2) the effort required to build such as an API since a deep dive with the various vendors APIs will show that not all capabilities are equally supported: Driver. getName(), Driver.listImages(), Driver.listLocations(), Driver.listNodes(), Driver.listSizes(), Driver.getId(), Driver.getPrivateId(), Driver.getPublicId(), Driver.getUuid() and Driver.getState().

The Simple Cloud API is a join effort between Zend, GoGrid, IBM, Microsoft, Nirvanix and Rackspace. Its makes heavy usage of Dependency Injection and configuration files to keep the code free of vendors’ specifics. For File Storage you would use a StorageAdapter to invoke common operations such as storeItem(), fetchItem(), deleteItem(), copyItem(), moveItem(), renameItem(), listFolders(), listItems(), storeMetadata(), fetchMetadata() and deleteMetadata(). However not all File Storage systems support, for example, renaming files, so proper exception flows must be put in place when using the API, and Doug pointed out that the best way for the common interface to handle these cases is still being debated: Introspection, instanceof, XSLT-style?

For Document Storage a common interface is trickier: Some data stores are relational, most aren’t, some support schema, most don’t, some support concurrency, most don’t, etc… The other issues arise from  the nature of Clouds’ Document Storage implementations: They are designed to scale to infinity and as such have no concept of indexed keys (you need to create one using Java Uuid, for example), they tend to be de-normalized (Amazon has no joins on tables), etc… I think that by Document Storage a paradigm shift is expected, indeed the name is so generic to indicate that it most probably does not refer to a relational database.

Finally Simple Queue: Right now it supports Amazon and Azure queues through a simple API: createQueue(), deleteQueue(), listQueues(), sendMessage(), receiveMessage(), deleteMessage(). SQS lets you peek at the Queue, Azure does not. Cloud queues are known to experience large delays and I think that there are probably two ways to look at it: A delay of around 30 seconds (between a Producer and a Consumer)  is so large that Cloud Queues are not worth looking at (un-reasonable) or a delay of around 30 seconds is quite large but then again Queue-based systems should be decoupled and Cloud Queues should not be used for low-latency situations (more reasonable).

Doug wrapped it with a demo where a Producer running an Order Processing system across many VMs on one Cloud created order details and placed them in a “Cloud DB”, created then an Order Message and placed it in a “Cloud Queue”, a second Order Processing running on another cloud picked the message from the “Cloud Queue”, got the order’s details from the “Cloud DB”, stored the invoice in a “Cloud Storage” and deleted the message from the “Cloud Queue”. As you guessed expressions between quotes refer to common interfaces and the two clouds were from different vendors. It worked like a charm and I will be eager to experiment with this setup.

In conclusion Cloud vendors are providing drastically different services so it makes sense to experiment with those two common interfaces; in particular libcloud should prove to be a productivity tool. Even the Simple Cloud API is an interesting abstraction layer that development teams would probably consider building in-house anyway if it did not exist, but I think that it will evolve quite a bit.

JavaOne 2010 – Effective XML: Leveraging JAXB and SDO

September 22nd, 2010 § 4 Comments

Blaise Doughan (Team lead for the TopLink and the EclipseLink JAXB & SDO projects) gave a very informative talk comparing and contrasting two mapping technologies: JAXB and SDO. By the way let me make it clear that mapping and binding are two distinct things: In this context, a mapping framework maps a Java Class to an XML Schema or an XML Document (if a schema is not available) and vice versa while a binding frameworks maintains a live connection between the Java Object and the XML document, and that’s a very powerful concept (I know it’s a simplistic defintion, Mark Hansen gives a more detailed description in his book SOA Using Java Web Services). JAXB is a mapping and a binding framework, SDO is a mapping framework.

Blaise quickly explained the mechanics behind JAXB and SDO binding mechanisms and pointed out that the most important thing, actually, is not their API for manipulating XML and bridging the gap between the two domains (Java and XML), rather it is their ability to play nicely with other frameworks to persist this data and represent/expose this data. In other words data binding is not an end in itself but a transitional (and extremely important) phase before data gets manipulated by the likes of JPA, JAX-WS, JAX-RS and SCA.

Now let’s look at how JAXB and SDO differ:

  • At design time JAXB generates annotated POJO while SDO generates annotated/non-annotated Interface. Because JAXB generates plain POJOs serialization (say for RMI) and reuse are not an issue. SDO, on the other hand, uses a DataObject with a much richer meta model (Change Summary, Type and Sequence). To generate a GUI representation from a JAXB POJO you would need to use reflection while an SDO DataObject exposes directly attributes such as name, uri, baseType, dataType, open and sequenced
  • The rich meta-data model of SDO is worth looking at since that’s were the designers spent most of their time: An SDO DataObject was designed to be language-agnostic and supports sophisticated object relationships at run-time: The open property, for example, refer to a set of attributes  that apply to certain instances of the Interface and not others (useful for versionning)
  • Change Tracking: JAXB has none while SDO has a nice Change Summary to track created/modified/deleted DataObject(s), get the original values and undo changes. That’s all great but it duplicates many of the JPA  capabilities and when used with JPA it’s just overhead.
  • Run-time model: The JAXB  model is quite simple, you deal with only a handful of objects: JAXBContext, Marshaller, Unmarshaller and the Binder (ref the earlier discussion on mapping v/s binding – briefly the Binder allows to preserve the Infoset). The SDO run-time is much more complex due to the fact that DataObject is a very generic entity. You start with a HelperContext and invoke on an as-needed basis an XSDHelper, an XMLHelper, a CopyHelper, a JavaHelper, an EqualityHelper, a TypeHelper, a DataFactory and a DataHelper. You get the picture…
  • Switching between run-times: Now that’s an argument which really makes a difference. In JAXB you simply change one line in the jaxb.properties file! In SDO it’s not quite simple because no two implementations will generate the same interfaces with the same exact names, the specs are much looser. In my own experience I have switched during development between many different implementations because one would be more optimized for a flat schema, another would bet better at handlingh  large DOM and another would simply have bugs!
  • XML-to-Java: JAXB has an edge because it was meant for Java while SDO can handle any Source

The session concluded with a case study: How to build a Data Access Service (DAS) that reads/writes from/to a data source and exposes the resulting objects through JSON or XML. JPA is used for persistence. The JAXB story is a straightforward one and you end up /start with properly annotated classes (@XmlRootElement, @Entity, etc…). The SDO story is not so simple since it was not meant to emit POJOs, so you could use JAXB to generate POJOs from the Schema, wrap the POJOs as DataObjects and use SDO  to link the DataObjects to the Schema.

In conclusion it seems that JAXB is more straightforward to use with an ecosystem comprised of a Java Web Services Stack (JAX-WS and JAX-RS) and a standard persistence mechanism (JPA). SDO has an edge when used with SCA (Service Component Architecture – a very powerful and modular component architecture) and when a rich meta-model is expected from the Java Objects. JAXB is the more popular of the two mechanisms and with the growing popularity of REST (and JAX-RS) it definitely has an edge. The only caveat is that not all implementations are created equal. You might want to check for yourself the most popular ones and test them, it’s as simple as changing one line in the jaxb.properties file:

JavaOne 2010 – Enabling Transformation Through the Cloud (a non-IT perspective)

September 22nd, 2010 § Leave a Comment

This round-table gathered four KPMG consultants (Steve Hill, Egidio Zarello, John Cummings and Mark Foreman) to discuss the adoption of the Cloud from a business perspective. A few good points were made during the hour that the roundtable lasted, although these points could have easily been delivered in half an hour without any loss of information.

The main theme was about the Cloud being embraced more and more by the business units as opposed to the IT departments and that they don’t see this trend slowing any time soon. I have to say that I completely agree with their point of view; our own experience at Lab49 corroborate this  trend and not just in the Cloud adoption area. End-users are pushing for the adoption of a business enabling technology and IT can and should have a role in it. When asked to define that role John Cummings clearly said that this role was a governance one: Defining the privacy model, the deployment model, negotiating the SLAs, etc… What is meant exactly by “business enabling technology”? If the cloud is a technology allowing self-procurement and data-center virtualization then surely it can also virtualize business processes.

The discussion then veered towards the inhibitor aspects: The macro-economy as a whole may prevent certain places from experimenting with the cloud, security concerns, transparency and control over the data. But when one thinks hard about each one of those aspects it quickly becomes clear that they should not stand in the way of cloud adoption. The tough economical environment is actually a driver for the Cloud adoption to reduce the cost of data-centers, security concerns can be addressed with the proper governance model, transparency can also be addressed with the proper SLA in place, privacy is already a concern for corporations conducting trans-frontiers transactions. What’s left is data control. The roundtable did not address this issue and I think that most architects would favor a hybrid solution right now.

Finally the Cloud is being adopted all over Asia, with the most obvious examples being China and India; China mobile alone will spend 58 billion USD over three years (that’s not a typo) over their cloud infrastructure while India sees the Cloud as a way to “virtualize the nation”: Think about it, hundred of million of people have become urbanized over the past couple of decades (now that’s the real revolution of the end of the twentieth century, not the Internet, as Michel Serres has explained it at large) and Infrastructure as a Service (IaaS) is seen as the way to go to give them cheap access to technology. The US government is doing its share to push for Cloud adoption across various agencies: many agencies can now go and buy SaaS with a credit card instead of waiting for the two-year cycle of budget approval by Congress.

JavaOne 2010 – Patterns for modularity

September 22nd, 2010 § 5 Comments


This BOF featured Jaroslav Tulach, founder of NetBeans,along with Anton Epple and Zoran Sevarac. It was not really about new technology but about formalizing the approach and terminology for building modular systems. The talk targeted both desktop and server developers. The premise was that OO alone did not deliver on code-reuse hence the need to apply patterns, similar to the GoF patterns, to modules.

The speakers did make it clear that patterns exist within a context, i.e. some patterns might not be applicable to a given language, for example, since that language might already have constructs to provide the solution to the common problem addressed by the pattern; having said that the discussion centered exclusively on Java.

Anton defined his “Five Criteria for Modularity Patterns” even though last  I checked they were six (they were still fixing the slides as we entered the room…):

  1. Maximize re-use
  2. Minimize coupling
  3. Deal with change (i.e. a smaller system deals with change easier than a larger one; JDevelop, for example, can make changes to its core easier than, say, Eclipse)
  4. Ease of maintenance (each release should have a theme, such as release X of NetBeans will support OSGI; this well defined theme should not affect the existing system)
  5. Ease of extensibility (how powerful and simple is your plug-ins architecture)
  6. Save resources (your modules should not affect the start-up time, especially true for desktop systems, and the memory footprint should be kept manageable)

Another set of definitions followed to formalize the Relationships Dependencies Management:

  1. A Relationships Dependencies Management is said to be direct if Module 2 depends directly on Module 1:            M2 -> M1
  2. A Relationships Dependencies Management is said to be indirect if Module 3 depends on Module 2 which depends on Module 1: M3 -> M2 -> M1
  3. A Relationships Dependencies Management is said to be cyclical if Module 2 depends directly on Module 1 and Module 1 depends directly on Module 2 (no need to draw a picture)
  4. Relationships Dependencies can be defined as Incoming (M1 -> M2 <- M3) – they make M2 hard to change, or Outgoing (M1 <- M2 -> M3) – they make M2 easy to change
  5. Finally Relationships Dependencies can be designed using three classical patterns: The Adapter pattern (Adapter Interface between 2 modules to introduce an Indirect dependency; it forwards method calls from M3 to M1), the Mediator pattern (sits in between 2 or more modules, aka the Bridge pattern in NetBeans) and the Facade pattern (provides a front Interface for a set of two or more modules)

The last portion was the more practical one as it provided an overview of the existing tools in the Java space to solve the problem of Reducing Communication Dependencies. The problem of Communication Dependencies can simply be stated as follows: Given an Interface TextFilter, an implementation class UpperCaseTextFilter and  a client Class Editor, how can Editor get an implementation of TextFilter? Ideally it should know nothing about  UpperCaseTextFilter at design time. The ideal run-time solution should provide the following (now we’re getting at the heart of system modularity):

  1. Register a Service
  2. Retrieve a Service
  3. Disable a Service
  4. Replace a Service
  5. Order a Service (as in providing some sort of ranking)
  6. Declarative support for a Service (as in meta data)
  7. Codeless Service (as in configuration)
  8. Availability of required Services

Five solutions were described; I would like to point out that SCA was left out which is a shame because it is a well thought technology and Apache Tuscany is only one of its many implementations.

  1. JDK 1.6 own service loader mechanism; it is declarative (in META-INF/services) and returns an iterable typed collection of services but it is way too simple, it is not dynamic (you can’t react to a situation where the client uninstall a plug-in) , it can be dangerous as it loads all services at start-up and it does not provide factory methods
  2. NetBeans solution: Uses Lookup and XML files. This one is declarative and dynamic, allows for ordering, for lazy loading, has factory methods and codeless extensions
  3. OSGI Service Registry: Services are registered with code using bundleContext.register(), it is dynamic, it has factory methods, it filters services and it is configurable with code, which means that you now have dependencies on the OSGI framework in your code, eager creation can slowdown start-up times and it is not type safe
  4. OSGI Declarative Services (OSGI II if you prefer): It’s better than Service Registry in the sense that it is declarative (XML configs)
  5. Dependency Injection: Spring offers an alternative solution using @Autowired and it is declarative in nature (the wiring is specified in  your beans.xml) and the framework is usually transparent to your code

Jaroslav then discussed the hotly debated issue: Are Singletons evil? And the answer is “it really depends on the context” (recall the statement earlier in this post). In a Dependency Injection solution there are many contexts: The Application Context, the reckless contexts and the Session context. Singletons would then be viewed as bad given all these contexts and the false sense of security that they provide. But at the module level (say a jar) they can be helpful if they are carefully designed:

  • The Singleton must be ready for use: No prior initialization code should be required prior to requesting a Singleton
  • The Singleton must be injectable: We should be able to inject different Singleton implementations at run-time depending on the context (say DEV v/s PROD)
  • Singletons are OK when used with the proper Service Loader/Lookup mechanism

Finally a few thoughts on performance were given: Since modular applications tend to be so large start-up time becomes critical so one should obviously avoid calling in OSGI bundle.start() for each jar because it is inefficient and because often time the jar is a 3rd party library that can’t be trusted. You are better off using a declarative registration method (as in an XML config file – interpret it and cache it); which is why JSR-198 falls short in the performance area. JSR 198 is indeed declarative (XML)  but you need to create a handler for each service, which slows the start-up time.

Again, this session did not break any new grounds but it helped organizing the ideas around modules, what to look for when evaluating different solutions and last but not least how to learn from a large system such as NetBeans that has been continuously evolving for the last 13 years.

JavaOne 2010 – OpenJDK BOF

September 22nd, 2010 § 6 Comments


The OpenJDK BOF was an informal Q&A session, attendees were free to ask JDK-related questions and Kelly O’Hair, Dalibor Topic and Mark Reinhold were there to answer. I think that this setup was appropriate for such a sensitive topic given the degree of anxiety of the Java community and probably the state of mind of the former Sun employees themselves. I will try to capture the most relevant  Questions and Answers.

  1. Will the JRockit VM get open-sourced? No, the plans are to keep HotSpot open-sourced as Open JDK 7 and add to it some of JRockit unique features by mid 2011.
  2. Will Oracle keep going through the JCP? The plans are to keep using the JCP for OpenJDK 7 features such as the Lambda project and even for Java SE 8. No other guarantees were made beyond that.
  3. Comment on JDK 7 v/s Open JDK 7: Sun used to provide the OpenJDK + some of its own proprietary binaries (themselves largely based on the OpenJDK, such as plugins) gratis (as in free beer). Oracle will continue doing so in OpenJDK 7 and at the same time provide JDK 7 which should be 98% identical to OpenJDK 7.
  4. Will Oracle make some performance-sensitive features (such as sockets, io) part of JDK 7 as opposed to OpenJDK 7? No because Oracle has little/nothing to gain from such a model. It would only defragment the  code base and make the merge back into an OpenJDK a nightmare. On the other hand JRockit Mission Control API will remain proprietary but some of its features will make it to the Open JDK such as the pluggable verbose logging and the JMX management tool (it can handle port numbers while the JMX on the HotSpot currently can’t).
  5. Will the deterministic GC of JRockit make it to OpenJDK? No because there are currently paying customers that Oracle would like to keep as such. On the other hand certain aspects of the JRockit GC could make it such as incremental sliding compaction.
  6. Will the Da Vinci code continue to thrive? Yes, unfortunately not all of John Rose great ideas can/will make it to OpenJDK 7. For now JSR 292 (Lambda project – closures) will make it. The rest, most notably support for tail recursion, will not.
  7. What’s the state of Jigsaw? It’s in a state of flux, we’re not at a point where we can readily start portions of the JDK. But it is actively worked on for OpenJDK 7.
  8. What about Project Verrazano? For now it is a research project (it takes a JAR and a platform spec, modularizes the classes themselves to reduce their code size to an optimal one).
  9. What about OpenJDK 6? Actually the effort was started after the one for OpenJDK 7 and so it did not start with the Java SE 6 code base, rather it started with the OpenJDK 7 code base and engineers started removing features to match JDK 6. It slightly differs from JDK 6, depending on the particular repository but the main features such as CORBA, JAXB, JAXP and JAX-WS, when they differ, do so very little, and usually only in terms of exact licensing. Currently there are are only four different features between JKD 6 and OpenJDK 6: Graphics, Fonts (the most prominent one), SNMP and Color Management.
  10. Speaking of repositories, which distributed SCM do you recommend? The Solaris team opted for Mercurial (Python-based), it’s great, we could have gone with Git but not sure what to do about the added complexity

So there you have it, the latest on OpenJDK 7; the session started slow with few questions but as time went by the audience asked more and more questions, mostly I think to get reassured about the open-source fate of OpenJDK. The message was indeed a reassuring one, but one thing remains to be seen is: What will happen to Java SE post 2012?

JavaOne 2010 – Enterprise Service Bus, Lessons from the field

September 21st, 2010 § 3 Comments


Good presentation about the ESB adoption for a major web site, nfl.com. The two presenters, Earl Nolan and Monal Daxini, were eager to share their pain points during the adoption of Mule as the ESB (Enterprise Service Bus). Unfortunately Mule was the only ESB discussed, but during the Q&A the presenters admitted that Spring Integration would have been considered for a smaller-scale effort. They did not look at Apache ServiceMix because three years ago it was not quite as stable or feature-rich as it is today, so were they to evaluate the offerings today ServiceMix might have been adopted.
They started off by (aptly) saying what ESB is not: ESB is not JMS or a messaging middleware platform and ESB is not a heavy web services stack. They (also aptly) gave a simple definition of an ESB: A solution for integration problems. Put it simply, any time 3 or more applications need to integrate you have the potential for an ESB adoption. The work of Greg Hohpe was cited quite often during the course of the talk.

So why adopt an ESB to solve integration problems? I can think of a couple of other ways myself where an ESB is not warranted/desirable. There are quite a few cases where an SOA solution or an OSGI-based solution are more desirable but it is clear that ESB is relevant to solve problems where a large number of applications need to talk to each others and along the way transform/enrich the data in some way. An ESB allows you to decouple the integration logic from the business logic (more on that later, but think for now configuration over coding) and evolve from a hub-and-spoke or point-to-point communication model towards a more distributed model.  Finally an ESB has typically an impressive list of connectors that can alone justify its adoption (we are not talking JCA here…)

So why choose one ESB vendor over another? Here are a few tips from this session:

  • Go for Configuration over Coding – actually that’s the whole idea of an ESB, eschew fancy APIs in favor of fancy configurations. Later we’ll talk about when you might want to use the API
  • Go for a lightweight ESB (quick start-up time and easier to test)
  • Go for an ESB that favors (and properly document) an incremental adoption of the product – that’s just common sense but often time the documentation offers few clues on how to partially get started with ESB
  • Scalable and distributed (had to mention this one…. one more bullet… but hey, it’s really important)
  • Embeddable – what’s meant here is that the ESB should be able to reside (maybe in a first phase) inside your existing process to ease deployment and simplify dealing with the IT ops team
  • And finally go for some type of SEDA architecture – we’ll go over this point in details

What’s SEDA? It stands for Staged-Driven Event Architecture (come to think about it, it’s quite reminiscent of the EDA model I encounter in CEP applications) and it’s a programming paradigm where the application is decomposed into different stages, each stage is fronted by a Queue and each stage (except the last) feeds onto the next one. An analogy can be made with microprocessor stages and their pipelines: Stage 1 would pipeline new inputs into the queue if stage 2 is not done processing yet. This approach avoid complex concurrency problems. Mule has wholeheartedly adopted the SEDA model by providing a Queue Controller at the input of the system, the system is made up of stages, each stage contains an Event Handler and the developer is only responsible for implementing the event handlers. The thread pool, the Queue Controllers (including the one that provides feedback to slow down/stop incoming messages) are implemented by Mule which instantiates as many Eventy Handlers as required. The queues themselves enable modeling and capacity planning.

Next the speakers described the stage itself since it is at the heart of the SEDA model: Each stage is fronted by an input (queued) channel and outputs data through a (queued) outbound channel; inside the stage data goes successively through an inbound router, a service component and an outbound routers; transformations are handled by the routers. What’s interesting to note is the number of protocols that an ESB such as Mule can handle at the channels: JMS, FTP, File system, HTTP, UDP, TCP, IPoAC, etc…

Next was the paradigm shift introduced by ESB; this is actually similar to any mentality shift that a developer must undergo when adopting a new technology, and again Greg Hope was quoted: Your coffee shop doesn’t use 2-phase commit! Mainly don’t fall into the Leaky Abstraction trap and do make use of the ESB components such as splitters (the net effect for the developer is the ability to deal with a single thread).

Next was a series of don’t do:

  • ESB is not a pass-through proxy: what they meant is that given all the layers involved in a typical ESB it does not make sense to use the ESB as a simple pass-through if there is no-value added (i.e. transformation, projection, etc…) It simply complicates the stack and you will have to think of a myriad of problems such as caching, host down-time, etc… You are better off using a CDN for that.
  • ESB is not a glorified CRON job scheduler – use a cron tool for that (this applies to shops with a heavy reliance on batch jobs)
  • EASB is not an application glue; use a Dependency Injection framework instead

Finally a few best-practices were presented such as the separation of validation from transformation (use an ETL plug-in if needed for complex transformations – I am not sure about this advice, ETL left me with a bad taste even when used topically) and enforce data canonicalization – I think that this last point applies to most platforms, whether developing web services, ESB or OSGI. It’s worth repeating it: Time spent defining a canonical model for your data is time well spent and will save you from redundant validation/transformation/exception flows down the line.
About data validation: The actual way to go about it is quite controversial, do you go with a strict (as in XSD schema) model or with a more relaxed model (RelaxNG and Schematron)? I don’t think that there is a clear-cut answer, it really depends on how dynamic your environment is and how many external dependencies you have (for Internet-facing applications the relax model is probably better).

About your event model: Again two models were discussed, push v/s pull but here it’s easy to see why a push model would be preferred unless you have very stringent reliability requirements I can hear the Nirvana folks scream since they do have buffering/replay capability). The low latency/low cycles consumption features of the push model makes it a clear winner but it is not always enforceable when dealing with 3rd party data providers.

The session wrapped up with security and deployment issues.

It’s interesting to note that most recommendations are applicable to most development situations, and that’s what reassuring about the ESB adoption: There is nothing fundamentally awkward about the model, just a formalized way of doing data integration that forces you to modularize your aspects, make the proper architecture choice (validation, push/pull, transactions or not) and decouple the integration logic from the business logic. All in all an entertaining talk, one of many that dealt with ESB, showing that this technology is very much relevant whether you operate in an EDA environment, an SOA stack or a more traditional back-end system.

JavaOne 2010 – JAX-WS.Next: Future Directions and Community Input

September 21st, 2010 § Leave a Comment


I thought that I should mention an interesting BOF: JAX-WS.Next: Future Directions and Community Input. JAX-WS, as you know, is the worthy successor of JAX-RPC, improving on it in many ways and it has become increasingly important since most app servers are supporting web profiles. It is pretty much the standard way of doing web services in the Java EE/light EE world.
This session presented many ideas being explored by Sun/Oracle engineers in the RI v2.2.2, most notably how JAX-WS will now take advantage of the Servlet 3.0 spec (1 request can be serviced by many threads) and the wsdl pluggability (what you see on Tomcat 7 would become portable to other containers). It was stressed quite a few times that the ideas discussed in this BOF still need final approval from the JCP.
Some of the features being proposed:

  • Support for stateful web services for H/A (support for broken HTTP connections)
  • Schema validation: That’s a welcome addition as most people do it (not necessarily in the production environment) one way or another in one-off ways; the class would be annotated with the @SchemaValidation; this would ensure that input/output are properly validated
  • Official support for doc/literal wrapper style
  • The ability to close Proxy.close() and have a chance to clean up resources
  • WSDL 1.1 binding extensions for SOAP 1.2; this would allow the developer to run with the -extension
  • MTOM Policy support via @MTOM to allow for the optimized serialization of wsdl/messages; the policy itself gets published in the wsdl
  • Addressing policy: long-running operations would send the response in another HTTP Connection; also allow for anonymous/non-anonymous response mechanisms
  • Finally (and most important) support for asynchronous behavior on the server side; the client models would remain the same with a choice of polling or callback but the server side invoke method returns void (i.e. immediately) and does not block

Most of these features are already in Glassfish, WebSphere and WebLogic.

JavaOne 2010 – KeyNote

September 21st, 2010 § 2 Comments


I decided to attend the JavaOne KeyNote hoping to hear some important announcements even if the price to pay was pretty steep; you do have to sit after all in a huge auditorium and stoically listen to executives going through an incredibly boring, extremely well rehearsed (to the point of being comically predictable) and amazingly unassuming (a 5-minute intro concocted by Oracle lawyer warns you that nothing in this session should be considered as a commitment, rather these are just forward looking statements and all the assurances about deliveries/roadmaps/future versions are nothing but hopeful wishes) presentation. But there were a few points that could be taken away from this keynote.
So let’s start with the different JVMs: It was made clear that Sun HotSpot was the JVM of choice and actually a JRockit engineer demonstrated a “flight-recorder” type of tool that records the past n seconds of all events in your VM so they can be replayed and analyzed before a dramatic event. The JRockit Flight-Recorder itself targets the Sun HotSpot as well as the JRockit VM. You get the feeling that the two will converge and that HotSpot has the edge.
On the much anticipated JDK 7 issue Oracle promised two releases, one in 2011 and one in 2012, but again these dates should be taken with a grain of salt given all the legal disclaimers. Three projects were prominently listed: Project Coin (to increase the productivity), Project Jigsaw (to modularize the JDK which has grown too huge – startups, for example, would be faster) and Project Lambda (to add Lambda expressions, aka closures, to the Java language).

Oracle seemed very eager to stress their efforts to develop Java on all three platforms: The desktop, the server and on mobile devices. On the server side a couple of interesting announcements: Continuing the effort to support multiple languages (although project Da Vinci was not explicitly mentioned), efforts to simplify (again) EJBs (Web Beans 1.0), efforts to take JAX-WS further (to support server-side a synchronicity) and Dependency Injection to further the convergence with Spring (Rod Johnson and Bob Lee are the spec leads).

On the GUI front their was a JavaFX demo that was supposed to showcase its power as a 2D and 3D graphics platform; unfortunately the demo itself was pretty lame featuring an air hockey game with looks circa 1990, a Java coffe cup fuming and an animation built on top of a video. Adobe Flex, Nokia Qt and Microsoft WPF are probably not having nightmares over it as we speak. What’s worth noting, though, is that the JavaFX API will provide a uniform API for coders to produce desktop/native and web browser applications (i.e. produce HTML5, CSS and JavaScript code). The later is probably an admission of the success of Google GWT.

Bioware (the maker of Star Wars The Old Republic) was brought on-stage and the screens displayed dazzling graphics of the game being played. The funny thing was that Bioware does use Java (“Glassfish” and “JDK” were mentioned) but not for the sexy graphics: They use it mostly for players’ authentication and billing (sure someone needs to get paid).

All in all it seems that Oracle has grandiose plans for Java (the mobile platform with its billion of devices from regular Java-enabled cell phones to smart cards was emphasized over and over), Oracle also wanted to reassure the community about its commitment to open source (JDK7, the JavaFX controls, etc…) and finally Oracle wanted to prove that they own the full Java development stack, from the close partnership with Intel which produces code and GC profilers to the various platforms JVMs to the  development tools (NetBeans was cited a few times) . It looks good in presentations but it remains to be seen whether they can deliver on such an  aggressive roadmap and whether the community will not be scared by their licensing tactics. Many in the audiences had this dual feeling: They desperately wanted to embrace the message but at the same time were thinking of alternatives.  If Oracle delivers on its open-source promises, though, the Java platform can look forward to great days ahead.

JavaOne 2010 – about Mission-Critical Enterprise/Cloud Applications

September 20th, 2010 § Leave a Comment

I attended this morning Mission Critical Enterprise Cloud Applications presented by a cheerful Eugene Ciurana; the presentation can be found on his site and Eugene managed to make it entertaining. I will not repeat here the contents of the presentation by I will try to capture its spirit and what made it particularly interesting. Eugene was not really after explaining what a Cloud is or why you should be adopting the Cloud in the enterprise, rather he focused on the classical usage of the Cloud in a hybrid architecture. In the hybrid case, part of your application is pushed to the cloud and part is hosted in your data center. The cloud could take over the data center but that’s not necessarily happening in the immediate/medium term future for reasons outlined here:

  • SLA: As long as your SLA (Service Level Agreement) is reasonnable (say four nines as in 99.99% availablility) the cloud makes financial sense, beyond that the cloud becomes an expensive proposition
  • Uptime is not the same thing as availability! The cloud may give you the impression of excellent up-time but your overall system availability may have dependencies on critical components that are better left (for the time being) to the data center

Two important questions to ask yourself before embarking on an adventure with Cloud vendor:

  • What is the impact on business if the Cloud becomes unavailable?
  • How can I/the vendor recover from a disaster

On the other hand cloud architectures are quite diversified: Eugene mentioned PaaS (Platform as a Service) , SaaS (Software as a Service), IaaS (Infrastructure as a Service) and finally Private Clouds (buily on top of VMWare, Eucalyptus, etc…)

He noted that event-based applications tend to scale better in the Cloud, but I think that this is a general statement that’s true even outside the Cloud. Event-based applications are simply more decoupled, the producer need not know anything about the consumer and vice-versa. He also noted that all Cloud implementations seem to have the following four characteristics:

  1. Quick deployment of pre-packaged applications (typically an image that gets deployed again and again)
  2. Commoditized H/W : consider Amazon EC2 and S3, Google App Engine, Rackspace
  3. Pay as you consume billing system -> brought an interesting point from a CFO pt of view:  clouds become an operational expense
  4. Horizontal scalability is highly touted

The most interesting part of the presentation was a real-world study of a complex un-maintainable and un-scalable application that became hybridized with some functionality ported to the Cloud. The main feature of that re factoring effort was actually the introduction of an ESB (Mule)  in the data-center to allow the services to scale without being actually tied to the physical databases: all JDBC calls are placed on the bus and memcache is used to alleviate performance issues. The (calculated) side effect was the ability to easily accomplish data mining by intercepting all calls going through the ESB. In the cloud a  no-SQL datastore (such as S3) is used for write-once/consult type of access.  healhy mix of Java and Jython was introduced to speed up development time. The final stack was Tomcat – Mule – Spring.

As for load balancing,  it really depends on the vendor: Google App Engine uses a “mother-of-all-Servlets” for natural balancing while Amazon provides an explicit Elastic Load Balancer.

In conclusion it seemed that a hybrid solution represents the best compromise right now for most enterprises: Stateless/Computationally intensive services can safely reside in the cloud while your data can stay in your data center. As vendors start offering more legally-binding and stringent SLAs enterprises can start thinking of moving their infrastructure to the Cloud.

Where Am I?

You are currently viewing the archives for September, 2010 at Computing Thoughts by Roger Rached.

Follow

Get every new post delivered to your Inbox.