Amit Piplani

Wednesday, January 13, 2010

Does Google App engine provide first degree Multi-tenancy???

As per Phil Wainewright's defined definition for First Degree Mutil-tenancy in his blog on ‘Degrees of Multi-tenancy’, dated June 8th, 2009 as "First-degree Multi-tenancy is a purist model where every layer/component of the architecture is shared all the way down to the database."
Google App engine provides the runtime scalable platform for applications running in google infrastructure. The runtime environment is an abstraction above the operating system that allows App engine to manage resource allocation, computation, request handling, scaling and load distribution without the application’s involvement. So their run-time environment provides multi-tenancy at runtime environment level.

Next thing to identify is whether their datastore(not database) provides first degree multi-tenancy or not. Google App engine’s database system most closely resembles an object database(is sometime compared to BigTable). The design of app engine datastore is an abstraction that allows App engine to handle the details of distributing and scaling the application.Datastore entities are schemaless and most of the multi-tenant defintions are being defined for relational databases.

Every datastore entity has a unique key that is either provided by the application or generated by App engine. The key is not a property, but an independent aspect of entity. But Google App engine appends the app-id defined in applications configuration to the key while storing this entity in datastore(not visible to application), but stores the entities in distributed datastores. So this can't be said that entities created for a given kind by a given application will be stored in the same datastore. But while reading the entities from the datastore, the app-id is always enforced to check that entities created by one application are not shared with another application.

So in that scenario, the google app engine can be called as first degree multi-tenant platform for applications. But application build on Google App engine are not multi-tenant by default and kindly check this http://apps.gepportal.com/products-getting-started/isv for more details.Support for multi-tenant applications is in process, but for now, applications can get part of the way there by using hooks in the datastore to namespace all of the entities for a particular user/tenant.

Friday, January 8, 2010

Evaluating Software Quality Attributes for Applications developed using Google App Engine

Evaluating Software Quality Attributes for Applications developed using Google App Engine --------By Amit Piplani

Google App engine (GAE) is a web application hosting service. GAE can serve traditional website content too (such as documents and images), but the environment is especially designed for real-time dynamic web applications.
This article will use the FURPS model for evaluating the following software quality attributes (functional & non-functional) for applications developed using Java runtime for applications:
1. Functionality - Feature set, Capabilities, Generality, Security
2. Usability - Human factors, Aesthetics, Consistency, Documentation
3. Reliability - Frequency/severity of failure, Recoverability, Predictability, Accuracy, Mean time to failure
4. Performance - Speed, Efficiency, Resource consumption, Throughput, Response time, Scalability
5. Supportability - Testability, Extensibility, Adaptability, Maintainability, Compatibility, Configurability, Serviceability, Install ability, Localizability, Portability
1. Functionality

Feature Set - Irrespective of the platform, feature spot holds the prime spot in any software development project & primary concern for software architect/designer/developer. So any web application irrespective of the platform should be able to meet the requirements.
Capabilities & Generality - Also an architect/designer needs to check whether the system capabilities can be met by Google App engine’s core components – Sandbox environment, datastore & the services(for example, applications relying on multi-threading, using lot of network bandwidth, relational database triggers, accessing the in-network data) before choosing Google as platform. Also the Google App engine costs nothing to get started and hence the capital expenditure [CapEx] at the start of the project is negligible (considering the deployment servers only). So this process can be somehow called as server- less topology at on-premise networks.
Each App Engine resource is measured against one of two kinds of quota: a billable quota or a fixed quota. Every application gets an amount of each billable quota for free. Customer will be charged only for the resources web app actually uses, and only for the amount of resources used above the free quota thresholds. Fixed quotas are resource maximums set by App Engine to ensure the integrity of the system. These resources describe the boundaries of the architecture, and all applications are expected to run within the same limits. They ensure that another app that is consuming too many resources will not affect the performance of your app.
The quotas and limits allowed by Google app engine platform needs to be checked too before making a selection & can be found on App Engine’s Site http://code.google.com/appengine/docs/quotas.html . So using the operational expenditure (OpEx) costs are directly proportional to the amount of resources used in a given calendar day. Also only Web Apps (with limited cron functionality) can be developed on Google App engine platform.
Security –
Sandbox environment increases the security feature of web app as application code can’t access the server on which it is running in the traditional sense. An application can read its own files from the file system, but it can’t write to files & it cannot read files that belong to other applications. An application can see/access the environmental variables set by App engine, but manipulations of these variables do not necessarily persist between requests. An application cannot access the networking facilities of the server hardware. App engine features integration with Google Accounts and hence can use the authentication capabilities of Google with minimal coding/configuration. Also If application is using the authentication via Google Accounts, App Engine includes HTTP headers in the response that give the information about the resources used by a given request.
So sandbox environment can act as "Blessing in disguise" with providing secured wrapper to web application but limiting the capabilities of an application & needs to be carefully checked to see if it really meets the needs of the required web application.

2. Usability
Although this concept is somehow holds the same weight age whether web application is developed using Google App engine Platform or not.
By using Google Web Toolkit (GWT) by far the quickest way to deploy a better UI experience full stack web application using the absolute best possible technologies available (Google’s scalable infrastructure, asynchronous http, image bundling, monolithic JavaScript compilation, easy RPC).Google app engine provides a separate set of Static file servers dedicated to delivering the static files (HTML, CSS, and JavaScript). These servers are optimized to handle requests for static resources. The static files for a given application can be uploaded along with application code along with configuring how static files are served, including the URLs for static files, content types, instruction for browsers to keep copies of the files in a cache for a given amount of time to reduce traffic and speed up rendering of the page.
3. Reliability
This attribute is primarily related to the datastore capabilities provided by Google App engine. The App engine datastore is designed for applications that need to read quickly, ensuring that data remains in consistent state. Unlike traditional databases, the datastore uses a distributed architecture to manage scaling to very large data sets.
The update of a single entity occurs in a transaction in App engine and each transaction is atomic: the transaction either succeeds completely or fails completely, and cannot succeed or fail in smaller pieces. It leaves the entity in consistent state. The app engine datastore natively supports local transactions. An application can read or update multiple entities in a given transaction, but it must tell App engine which entities will be updated when it creates the entities. The application does this by creating entities in entity groups. If a user tries to update an entity while another user’s update of an entity is in progress, the datastore API returns immediately with a concurrency failure exception. App engine uses optimistic concurrency control. Reading the entity never fails due to concurrency; the application just gets the entity in most recent stable state. Multiple reads can be performed in transaction to make sure that the data read in transaction is current and consistent with self. If application is getting lot of concurrency failures, then it’s important to design entity groups.
So the downtime of application will be due to either failure of the Google servers or when the application is taking lot of time to process the request (more than 30 seconds). The first one goes back to availability of Google infrastructure & no one can question presently their uptime and second point goes back to the performance issue of application (which will be discussed in next section).
4. Performance
The Google App engine's main components - "The sandbox environment, the datastore & the services" defines this quality attribute mainly. When the App engine request handler receives the request and identifies the application from the domain name of the address, it selects a server from many possible servers to process the request & this selection is based on which server is most likely to provide a faster response. App engine mitigates the risk of starting the application for every web request by keeping the application in memory as long as possible & when a server needs to reclaim resources, it purges the least recently used app. So distributed scalable architecture comes at the expense of a little performance degradation.
When the application creates new entities and updates existing ones using the datastore API, the call returns with success or failure after creating/updating entities along with updating every corresponding index. This makes queries very fast at the expense of the entity updates. But again Google app engine runtime uses the memcache service for caching the results of frequently performed queries or calculations. The application checks for a cached value, and if the value is not present in cache, then it performs the query or calculation & stores the value in cache for future use.
Also the runtime environment also limits the amount of clock time, CPU use and memory a single request can take. App engine keeps these limits flexible & applies limits to those applications that use up more resources to protect shared resources from “runaway” applications. But the response time for application can also determine the number of requests the application can handle dynamically. These figures can be checked for quotes & limits restriction defined by Google Platform and can be accessed via URLS mentioned in references section.

URLFetch Services, task Queues & cron jobs are also being defined so that web application can respond to web requests quickly & hence better the performance of the web application.
URL Fetch services are being used by app engine applications to access other web services. The service makes HTTP requests to other servers to retrieve pages or interact with web services and this interaction can be made to fetch the URLs in background so that request handler can continue to process the request. But the fetch URLs process must complete during the request handler lifetime.
Task Queues let request handlers describe the work to be done at a later time, outside the scope of a web browser. Queues ensure that every task gets completed eventually and configured at the rate at which queues are processed to spread the workload throughout the day. A queue performs the task by calling a request handler. It can include a data payload provided by the code that created a task and delivered to the task’s handler as an HTTP request.
App engine has Cron Jobs service for executing tasks at specific times of the day. The scheduled tasks can invoke a request handler at a specific time of the day, week or month based on the schedule provided in configuration.
Scalability attribute is considered as one of the selling points for web applications developed on Google App engine as the applications scale automatically. All three components – the sandbox environment, the (scalable) datastore & the services scales independently from each other.
5. Supportability
Testing –
JUnit test cases can be used for testing the services, datastore & task queues. Google app engine also provides access to different versions of the same application at a given moment of time and can be tested parallel by hitting different urls (along with version-id). This feature can be very useful if newer version of application can be tested completing before launching the same to outside world.
Manageability –
Very minimal as management tools to manage the resources used by application is being done by Google itself and report for the same can be accessed via Administration console.
Configurability -
web.xml is the configuration file defaulted for any web application. In addition to this, Google App engine appengine-web.xml needs to be provided before application is deployed or uploaded onto Google Infrastructure. Appengine-web.xml specifies the app's registered application ID and the version identifier of the latest code, and to identify which files in the app's WAR are static files (like images) and which are resource files used by the application.
Install ability –
As easy as it can get to install an application to Google infrastructure using admin console (via browser) or even using the plug-in for development environment.
Portability & Migration –
This can be one of the attribute not working in favor of the most of the platform providers as they provide vendor lock-in for data. Although the existing applications (using java.io, EJB’s, multi-threaded etc) are not the good candidates for migrating into Google App engine platform (as mentioned these ones are limited by platform capabilities. Even though there is vendor-lock in for the data being stored in datastore and need to retrieve the same back in some format so that the application can be hosted on in-premise infrastructure instead of Google platform.
Some of the open source applications are being written to synchronize the data between Google App engine datastore and relational database. For e.g., AppRocket is an open-source replication engine that synchronizes Google App engine datastore and MySQL database.
App engine includes a tool for uploading and downloading data via the remote API. The tool can create new datastore entities using data from a comma separated values (CSV) data file & can even create CSV files with data from the app’s datastore. These remote API comes into two parts – remote API request handler and the tools & the libraries that call the handler. The remote API handlers are part of Java & Python runtime environment whereas the remote Access tools & libraries are only available for Python. These tools & libraries can be used within Java Application via the Java Remote API request handler, but Google is working on it to provide these libraries and tools part of Java runtime too so that it makes the migration and porting of java application easily into Google Infrastructure.

So selection of Google App engine as platform is mainly driven by no-capital cost, pay-per-use model for resources used beyond free quota, scalability, manageability, server less on-premise infrastructure but limited by its sandbox capabilities and quotas, only web application support and on-going activities to add additional features in java runtime.
References
1. http://en.wikipedia.org/wiki/FURPS - FURPS Model
2. http://code.google.com/appengine/docs/quotas.html - Billable Quotas and Fixed Quotas
3. http://code.google.com/p/approcket/ - AppRocket provides live synchronization between Google App Datastore and MySQL

Sunday, September 27, 2009

WS-Policy v/s WS-Security

Posting an article written in Sept 2008 regarding WS-Security and WS-Policy comparison after going through Policy Driven SOA by Sreedhar Kajeepeta.

1.Definition

WS-Policy - Will describe the capabilities and constraints of the security (and other business) policies on intermediaries and endpoints (e.g. required security tokens, supported encryption algorithms,privacy rules).

WS-Security- Describes how to attach signature and encryption headers to SOAP messages. In addition, it describes how to attach security tokens, including binary security tokens such as X.509 certificates and Kerberos tickets, to messages.

2.Details
WS-Policy- WS-Policy will describe how senders and receivers can specify their requirements and capabilities.WS-Policy will be fully extensible and will not place limits on the types of requirements and capabilities that may be described; however, the specification will likely identify several basic service attributes including privacy attributes, encoding formats, security token requirements, and supported algorithms. This specification will define a generic SOAP policy format, which can support more than just security policies. This specification will also define a mechanism for attaching service policies to SOAP messages.

WS-Security -WS-Security describes enhancements to SOAP messaging to provide quality of protection through message integrity and message confidentiality. Message integrity is provided by leveraging XML Signature in conjunction with security tokens (which may contain or imply key data) to ensure that messages are transmitted without modifications. Similarly, message confidentiality is provided by leveraging XML Encryption in conjunction with security tokens to keep portions of SOAP messages confidential. Finally, WS-Security describes a mechanism for encoding binary security tokens.

3.Defining a Policy

WS-Policy- Policies are formulated through the use of different elements and document-level subjects provided by the various specifications under the WS-Policy Framework.

WS-Security - Coded as message handler using SAAJ API’s.

4.Integrating policies with services

WS-Policy-Policies may be integrated with services through addition of metadata either directly through usage of WS-PolicyAttachments or indirectly adding reusable Policy Definitions to registry/repository, and eventually referring to these through registry key references in the business service definition.

WS-Security - Message Handlers handling security configured in SOAP message chain using webservices.xml(Web Services deployment descriptors).

5.Policy enforcement Points(PEP)

WS-Policy-A policy enforcement tool references the registry/repository to determine which policies should be enforced for a given service. There are two ways to enforce policies: Using agents & Using a gateway.

WS-Security - SOAP Message chain enforces the security policy using SOAP headers. No additional tool required to enforce policy.

6. Policy Aware clients

WS-Policy - Yes, they can retrieve information about the policies through WS-MetadataExchange, and perform dynamic bindings with the endpoints, which satisfy the given criteria.

WS-Security - No. But once the contract is defined between service provider and service consumer, then SAAJ message handlers need to be coded for enforcing WS-Security specifications. But again not defined in WSDL or in repository.

7. Overhead of Frameworks

WS-Policy - Yes. Need to know PolicyExpressions, PolicyAssertive while defining policy. WS-Policy Attachment to integrate policy in WSDL. WS-MetadataExchange to get information about policy. Policy enforcement tools are required to enforce policies.

WS-Security - Only SAAJ are required to implement message handler.

8.Can policies be centrally managed

WS-Policy - Yes and can be implemented using XML network structure.

WS-Security - No

9.Policy registry/repository

WS-Policy - Policies can be centrally stored in registry/repository, which can be also used by policy ware clients to gather information about policy.

WS-Security - No. Implemented as part of web services deployment descriptor.

10.Centralized Management

WS-Policy - Yes, possible using some policy managements tools.

11.Monitoring and Alerting

WS-Policy - Yes, possible using some Monitoring tools.

12. Message Validation and Compliance

WS-Policy - Yes, can be done using the XML gateway(hardware) by reading the policies.

WS-Security - Message Interceptor Gateway can be coded using the message handler.

13.Access protection

WS-Policy - As part of web services infrastructure security, direct access to all service endpoints can be all disabled. Using an XML firewall or Web-proxy infrastructure that masks all the underlying service endpoints and communicates through network address translation (NAT) or URL rewriting mechanisms.

WS-Security - No

--Amit G --

Wednesday, September 23, 2009

First Shot at Service Component Architecture

First Shot at Service Component Architecture

Looking at the recent buzzword or key technologies in IT Industry - web services, Services Oriented Architecture, Software as a Service , Platform as a Service etc.The one thing is the Services is one of the common factor in those buzzwords and going forward IT industry can be renamed as Services Industry.
I am trying to write on Service Component Architecture specifications. heard about this specification from my SOA Knowledge Bank colleague namedd Ravi Venkidapathy. The first impression of this specification - SCA attempts to simplify building of Service Oriented Architectures by focusing on: Composition, Assembly and Securitized deployment aspects of SOA. But whether i agree or disagree to the definition defined by my colleague will be presently in latter half of this blog.

High Level View of Specifications

The SCA specifications define how to create components and how to combine those components into complete applications. Well these specification works on primary concept of breaking down of application/process in components and how these components work together. SCA brought using terms like Service, Component, Composite, Domain, Contribute and new configuration XML file called SCDL(Service Component Definition Language). Diagram 1 tells about this relationship

But within SCA application, components can be written in language independent way and even application can be accessed by non-SCA world. Although SCA application is defining the components and is not primarily defined for UI and Data services. They provide good integration APIS to integrate with UI and data tiers.
Defining Components as Remotable and local are reminding the feature of remote and local EJBs to avoid network latency.This information goes in Component but not in SCDL. THIS CAN BE CONSIDERED AS ENHANCEMENT TO SPECIFICATIONS as Remotable/Local component configurations should be set in SCDL where composites are being configured.
To allow cross-vendor interoperability, all remotable interfaces must be expressible in WSDL, which means they can potentially be accessed via SOAP.

A composite can also expose one or more services, where these services are implemented by components within composite - Known as Promotion of Services.

Shot at SCDL

SCDL takes its name analogy from WSDL and contents from Spring as Bean Configuration.SCDL defines references for other services and properties defined for component. Springs' dependency injection is used to set the values into component using constructore-level dependency injection, setter-method injection & property-level injection.SCDL alos defines the definition of composites, components(their names,implementation classes), bindings for services defined.Bindings can be assigned to services and to references, and each one specifies a particular protocol.
Bindings separate how a component communicates from what it does, they let the component’s business logic be largely divorced from the details of communication.A single service or reference can have multiple bindings, allowing different remote software to communicate with it in different ways, so separating these independent concerns can make life simpler for application designers and developers.
Instead, the bindings a service or reference relies on are either chosen by the runtime, for intra-domain communication, or set explicitly in a component’s SCDL configuration file[with URI information].

Domain-Runtime depenency

Runtime is presently used for wiring of the components(when have references defined) by creating the Proxy object for the component's references.It’s up to the SCA runtime to generate WSDL interfaces from the Java interfaces[for servcies defined with WS binding], fix up the service to be callable via SOAP, and do everything else required to let this component communicate via Web services.
Concept of wiring is similar to the one brought by Spring - "A wire is an abstract representation of the relationship between a reference and some service that meets the needs of that reference. Exactly what kind of communication a wire provides can vary:it depends on the specific runtime that’s used, what bindings are specified (if any), and other things."

Even though an SCA composite runs in a single-vendor environment, it can still communicate with applications outside its own domain.All of the communication between components and composites within each domain is done in a vendor-specific way.An SCA application communicating with another SCA application in a different domain sees that application just like a non-SCA application; its use of SCA isn’t visible outside its domain.

Every SCA runtime also provides an SCA binding.Instead, the SCA binding is only used when a service and its client are both running in the same domain.

SCA policy association framework that allows policies and policy subjects specified using WS-Policy and WS-PolicyAttachment, as well as with other policy languages(like WS-ReliableMessaging), to be associated with SCA components. These policies can be interaction(affects the interaction at runtime) or implementation(how components behave at runtime).Policyset can be attached to Component or Composite in SCDL and are enforced by SCA runtime.

Diagram 2 tells about SCA runtime relationship with other containers.

Bottom Line -

1.A primary goal of SCA composites is to provide a consistent way to assemble Spring,BPEL,Java different technology component into coherent applications and so make this diversity more manageable.
Although Sping,JEE(EJB,Jax-WS) has been providing lot of these features extensively & widely used by industry and also very less Runtimes(Tuscany, Fabric3) available for SCA makes a long road going ahead for SCA.
Even though the SCA is used as one of the frameworks to build SOA, but that's not the intent of SCA(from specifications).And i don't agree with ravi's
view on same.
2.If component Implementation class is changed from Java to BPEL, then with small changes in the SCDL configuration file the runtime will behave differently to execute this BPEL component(without changing the component definition).
3.Integration with Policy framework for components in SCDL.
3.Presently Can't create a component that spans multiple domains, and hence also exists vendor lock-in for SCA runtime for component.
4.SCA runtime are trying to work on adding extensions to runtime using OSGI.

So it looks like SCA specifications has been taken lot of points from existing technologies(Spring, EJB, JAX-WS) & applying configurable policies & bindings.

So wait for next post for more details on SCA.

--Amit G Piplani--

Reference -
SCA v/s JBI Article by Ravi Venkidapathy
Introducing SCA by David Chappell

Attachments in SOAP Messages

Web Services rely on SOAP, an XML-based communication protocol for exchanging messages between computers regardless of their operating systems, programming environment. SOAP is the de facto standard messaging protocol used by web services and codifies the use of XML as an encoding scheme for request and response parameters using HTTP as a means for transport. For better memory requirements, smaller message size, smaller process times for SOAP Messages, Attachments are used to prevent large volume of Data being send as part of SOAP Message or non-XML Data ( for e.g. media files etc) also need to be part of SOAP Message. Following different approaches were used to send attachments in SOAP Messages:
1. WS-Attachments over DIME(Direct Internet Message Encapsulation)
DIME is a packaging mechanism that allows multiple records of arbitrarily formatted data to be streamed together. Records are serialized into the stream one after the other and are delineated with an efficient binary header. For large records or records where the size of the data is not initially known, DIME has defined a "record chunk”. WS-Attachments indicate that the Primary SOAP Message Part (Main Message) must be contained in the first record of a DIME message. WS-Attachments define the use of the HREF attribute for making a reference to Attachment. For the most part it is similar to simply sending the primary SOAP message part on its own, except that the HTTP Content-Type header must be set to "application/dime" and the body of the HTTP request is the DIME message instead of the SOAP message.
2. SOAP with Attachments ( SwA)
SwA defines a way for binding attachments to a SOAP envelope using the multipart/related MIME type. MIME cannot be represented as an XML Infoset – this effectively breaks the web services model since attachments cannot be secured using WS-Security.
3. Message Transmission and Optimization Mechanism(MTOM)
MTOM is based over MIME and includes attachments as part of the Infoset (since SOAP 1.2 is built around Infoset), thus making the SOAP 1.2 processing model applicable to the attachments as well. MTOM combines the composability of Base 64 encoding with the transport efficiency of SOAP with Attachments. Non-XML data is processed just as it is with SOAP with Attachments SWA – the data is simply streamed as binary data in one of the MIME message parts.
MTOM is composed of three distinct specifications:
• MTOM CORE describes an abstract feature for optimizing the transmission and/or wire format of a SOAP 1.2 message by selectively encoding portions of the message, while still presenting an XML Infoset to the SOAP application.
• XOP (XML-binary Optimization Packaging) specifies the method for serializing XML Infosets with non-XML content into MIME packages.
• The Resource Representation SOAP Header Block specification defines a SOAP header block that can carry resource representations within SOAP messages.
MTOM attachments are streamed as binary data within a MIME message part, making it fairly easy to pass MTOM attachments to SWA or receive SWA attachments into an MTOM implementation.Image1 describes the attachment using MTOM.

MTOM specification is in fact part of Messaging (includes SOAP, WS-Addressing, and MTOM) part of WSIT & Web Services Enhancements (WSE) for Microsoft .NET specifications. Image-2 compares the different attachment approaches in SOAP messages.

MTOM support is fully interoperable with .NET clients and servers using the Metro distribution and is currently the best option for sending the attachments in SOAP Messages.

Hibernate Search ---- Bridging Lucene and Hibernate

Hibernate Search is a project that complements Hibernate Core by providing the ability to do full-text search queries on persistent domain models. So in this article will try to introduce the Lucene, Hibernate and disadvantages of Lucene while integrating full text search engine based on domain model and how Hibernate Search overcomes that problem.

Lucene is a powerful full-text search engine library hosted at the Apache Software Foundation (http://lucene.apache.org/java). It has rapidly become the de facto standard for implementing full-text search solutions in Java. Lucene consists of core APIs that allow indexing and searching of text.

Hibernate Core is probably the most famous and most used ORM tool in the Java industry. An ORM lets
you express your domain model in a pure object-oriented paradigm, and it persists this model to a relational database transparently for you. Hibernate Core lets you express queries in an object-oriented way through the use of its own portable SQL extension (HQL), an object-oriented criteria API, or a plain native SQL query. Typically, ORMs such as Hibernate Core apply optimization techniques that an SQL handcoded
solution would not: transactional write behind, batch processing, and first- and second-level caching.

With many Web2.0 web applications, providing the extensive text based search functionality to the end-users. The simple text based search on column can be implemented using Hibernate criteria and HQL. But with search getting complicated involving the multiple column value and displaying the search results as per rank etc., the application started using Lucene as full-text based search engine on domain model. But difficulties of integrating a Lucene into a Java application centered on a domain model and using Hibernate or Java Persistence to persist data are:

• Structural mismatch—How to convert the object domain into the text-only index; how to deal with relations between objects in the index. How to manage the type conversion to String, which is the form Lucene uses to store the index.
• Synchronization mismatch—How to keep the database and the index synchronized all the time.
• Retrieval mismatch—How to get a seamless integration between the domain model-centric data-retrieval methods and full-text search.

Hibernate Search leverages the Hibernate ORM and Apache Lucene (full-text search engine) technologies to address these mismatches. Hibernate Search is a bridge that brings Lucene features to the Hibernate world. Hibernate Search hides the low-level and sometimes complex Lucene API usage, applies the necessary options under the hood, and lets you index and retrieve the Hibernate persistent domain model
with minimal work.

So let’s see how hibernate search works & what needs to be done on top of hibernate configurations to achieve indexing on domain model?
Firstly hibernate search works well on top of both Hibernate and JPA. So firstly, hibernate-search.jar & lucene-core.jar needs to be added in Classpath and then modifying the “hibernate.search.default.indexBase” property hibernate configurations to indicate where index files are stored on file system. If application is using only Hibernate Core(not hibernate annotations), then additional configurations will need to be added for Hibernate Event Listeners, whenever there is an update, insert or deletes happen on entity. The index stays synchronized with the database state automatically and transparently for the application. This feature helps to overcome the synchronization mismatch.

Once configurations are done, we will be mapping Object Model to Index Model. Firstly searchable entities should be annotated using the @Index so that Hibernate Search gathers the list of indexed entities from the list of persistence entities marked with the @Indexed annotation & stores them in the directory configured in configuration file. The second thing to do is to add a @DocumentId on the entity’s identity property. Hibernate Search uses this property to make the link between a database entry and an index entry. This documented will help in updating the document (in index) when the entity object is updated. To index a property, we need to use an @Field annotation. This annotation tells Hibernate Search that the property needs to be indexed in the Lucene document. So this helps us to overcome structural mismatch problem.
Once the domain problem has been setup to be index, let’s dive into core features of indexing and searching. Hibernate Search extends the Hibernate Core main API to provide access to Lucene capabilities. A FullTextSession is a subinterface of Session. Similarly, a FullTextEntityManager is a subinterface of EntityManager). Those two sub interfaces have the ability to manually index an object. Hibernate Search provides a helper class (org.hibernate.search.jpa.Search) to retrieve a FullTextEntityManager from a Hibernate EntityManager as well as a helper class to retrieve a FullTextSession from a Session (org.hibernate.search.Search).Following snippet of code will index the list of Employee objects to index:
FullTextEntityManager ftem = Search.getFullTextEntityManager(em);
ftem.getTransaction().begin();
List emps = em.createQuery("select i from Employee i").getResultList();
for (Employee emp : emps) {
ftem.index(emp);
}
ftem.getTransaction().commit();

The Lucene index will thus contain the necessary information to execute full-text queries matching these employees. Hibernate Search’s query facility integrates directly into the Hibernate query API and secondly, returns Hibernate managed objects out of the persistence context after running search. Following snippet of code tells about building the Hibernate Search Query on top of Lucene Query and firing it:
String searchQuery = "title:Software Engineer OR speciality:Java";
QueryParser parser = new QueryParser("title",new StandardAnalyzer());
org.apache.lucene.search.Query luceneQuery =parser.parse(searchQuery);

FullTextSession ftSession = Search.getFullTextSession(session);
org.hibernate.Query query = ftSession.createFullTextQuery(luceneQuery, Employee.class);
List results = query.list();
//iterates the searched employees from list.
Query objects are respectively of type org.hibernate.Query or javax.persistence.Query but the returned result is composed of objects from domain model and not Documents from the Lucene API.

By focusing on ease of use and smooth integration with existing Hibernate Applications, Hibernate Search makes easy and affordable the benefits of full text search provided by Lucene in any web application.

--Amit G Piplani--

Complex Event Processing (CEP) – Part 1

Both Service Oriented Architecture (SOA) and Event Driven Architecture (EDA) are architecture styles which promote the concept of loose coupling through distributed computing. Although SOA helps deliver a loosely coupled solution, the resulting solution is generally synchronous in nature. In contrast, EDA provides loose coupling using the asynchronous publish-and-subscribe pattern. Having said that, SOA and EDA are not mutually exclusive and they bring complimentary features to the table. Event-driven architecture can complement service-oriented architecture (SOA) because services can be activated by triggers fired on incoming events. So, what is an event? And how is it different from the existing units of work in other architectures? What differentiates event processing from the other architectural paradigms? Our goal is to shed some light on these questions in this article and hopefully, go a longer mile in the forthcoming version. An event can be defined as significant change in the state of a system, and a complex event is an abstraction of other events. What differentiates EDA from other paradigms is that the system is described as a succession of events and the subsequent processing/handling of these events. At a minimum, we are then looking towards “events” and “event handlers”. There are three well-defined event processing engines in EDA:
 Simple Event Processing (SEP) - is concerned with simple events that are directly related to specific, measurable changes of condition. The common example of SEP is a typical pub-and-sub pattern being used in industry.
 Event Stream Processing (ESP) - deals with the task of processing multiple streams of event data with the goal of identifying the meaningful events within those streams.
 Complex Event Processing (CEP) - deals with the task of processing multiple events with the goal of identifying the meaningful events within an event cloud (range of events generated from multiple systems).
Then, comes the perennial question – how is ESP different from CEP? We will strive to bring out some of the differences between ESP and CEP processing through the following illustration:

So, how is CEP changing the development/tools landscape? SOA middleware vendors have expanded their CEP capabilities in order to offer event-driven architecture as an alternative or supplement to SOA. More vendors are working to “CEP-enable” their BPM environments and BAM tools to support split-second response to changing business conditions. Moreover, enterprise service bus (ESB) vendors are investing in CEP to provide a user-friendly event aggregation, correlation and visualization overlay to their publish-and-subscribe environments. Aleri , Apama (Progressive), AptSoft (IBM), Coral8, Streambase, TIBCO Business Events, BEA Event Server, IBM’s InfoSphere Streams etc. are a few of the commercial ESP/CEP engine providers, whereas Esper is an Open Source ESP/CEP engine provider. An EDA is a core requirement for most CEP applications. When an organization has implemented an EDA and event-enabled their business-sensory information, they can consider deploying CEP functionality in the form of high-speed rules engines, neural networks, Bayesian networks, and other analytical models. In upcoming articles, we will be concentrating on functional reference architecture for complex event processing and event stream processing engine for event driven architectures.

--Amit G Piplani--