Wednesday, September 23, 2009

Hibernate Search ---- Bridging Lucene and Hibernate


Hibernate Search is a project that complements Hibernate Core by providing the ability to do full-text search queries on persistent domain models. So in this article will try to introduce the Lucene, Hibernate and disadvantages of Lucene while integrating full text search engine based on domain model and how Hibernate Search overcomes that problem.

Lucene is a powerful full-text search engine library hosted at the Apache Software Foundation (http://lucene.apache.org/java). It has rapidly become the de facto standard for implementing full-text search solutions in Java. Lucene consists of core APIs that allow indexing and searching of text.

Hibernate Core is probably the most famous and most used ORM tool in the Java industry. An ORM lets
you express your domain model in a pure object-oriented paradigm, and it persists this model to a relational database transparently for you. Hibernate Core lets you express queries in an object-oriented way through the use of its own portable SQL extension (HQL), an object-oriented criteria API, or a plain native SQL query. Typically, ORMs such as Hibernate Core apply optimization techniques that an SQL handcoded
solution would not: transactional write behind, batch processing, and first- and second-level caching.

With many Web2.0 web applications, providing the extensive text based search functionality to the end-users. The simple text based search on column can be implemented using Hibernate criteria and HQL. But with search getting complicated involving the multiple column value and displaying the search results as per rank etc., the application started using Lucene as full-text based search engine on domain model. But difficulties of integrating a Lucene into a Java application centered on a domain model and using Hibernate or Java Persistence to persist data are:

• Structural mismatch—How to convert the object domain into the text-only index; how to deal with relations between objects in the index. How to manage the type conversion to String, which is the form Lucene uses to store the index.
• Synchronization mismatch—How to keep the database and the index synchronized all the time.
• Retrieval mismatch—How to get a seamless integration between the domain model-centric data-retrieval methods and full-text search.

Hibernate Search leverages the Hibernate ORM and Apache Lucene (full-text search engine) technologies to address these mismatches. Hibernate Search is a bridge that brings Lucene features to the Hibernate world. Hibernate Search hides the low-level and sometimes complex Lucene API usage, applies the necessary options under the hood, and lets you index and retrieve the Hibernate persistent domain model
with minimal work.

So let’s see how hibernate search works & what needs to be done on top of hibernate configurations to achieve indexing on domain model?
Firstly hibernate search works well on top of both Hibernate and JPA. So firstly, hibernate-search.jar & lucene-core.jar needs to be added in Classpath and then modifying the “hibernate.search.default.indexBase” property hibernate configurations to indicate where index files are stored on file system. If application is using only Hibernate Core(not hibernate annotations), then additional configurations will need to be added for Hibernate Event Listeners, whenever there is an update, insert or deletes happen on entity. The index stays synchronized with the database state automatically and transparently for the application. This feature helps to overcome the synchronization mismatch.

Once configurations are done, we will be mapping Object Model to Index Model. Firstly searchable entities should be annotated using the @Index so that Hibernate Search gathers the list of indexed entities from the list of persistence entities marked with the @Indexed annotation & stores them in the directory configured in configuration file. The second thing to do is to add a @DocumentId on the entity’s identity property. Hibernate Search uses this property to make the link between a database entry and an index entry. This documented will help in updating the document (in index) when the entity object is updated. To index a property, we need to use an @Field annotation. This annotation tells Hibernate Search that the property needs to be indexed in the Lucene document. So this helps us to overcome structural mismatch problem.
Once the domain problem has been setup to be index, let’s dive into core features of indexing and searching. Hibernate Search extends the Hibernate Core main API to provide access to Lucene capabilities. A FullTextSession is a subinterface of Session. Similarly, a FullTextEntityManager is a subinterface of EntityManager). Those two sub interfaces have the ability to manually index an object. Hibernate Search provides a helper class (org.hibernate.search.jpa.Search) to retrieve a FullTextEntityManager from a Hibernate EntityManager as well as a helper class to retrieve a FullTextSession from a Session (org.hibernate.search.Search).Following snippet of code will index the list of Employee objects to index:
FullTextEntityManager ftem = Search.getFullTextEntityManager(em);
ftem.getTransaction().begin();
List emps = em.createQuery("select i from Employee i").getResultList();
for (Employee emp : emps) {
ftem.index(emp);
}
ftem.getTransaction().commit();

The Lucene index will thus contain the necessary information to execute full-text queries matching these employees. Hibernate Search’s query facility integrates directly into the Hibernate query API and secondly, returns Hibernate managed objects out of the persistence context after running search. Following snippet of code tells about building the Hibernate Search Query on top of Lucene Query and firing it:
String searchQuery = "title:Software Engineer OR speciality:Java";
QueryParser parser = new QueryParser("title",new StandardAnalyzer());
org.apache.lucene.search.Query luceneQuery =parser.parse(searchQuery);

FullTextSession ftSession = Search.getFullTextSession(session);
org.hibernate.Query query = ftSession.createFullTextQuery(luceneQuery, Employee.class);
List results = query.list();
//iterates the searched employees from list.
Query objects are respectively of type org.hibernate.Query or javax.persistence.Query but the returned result is composed of objects from domain model and not Documents from the Lucene API.

By focusing on ease of use and smooth integration with existing Hibernate Applications, Hibernate Search makes easy and affordable the benefits of full text search provided by Lucene in any web application.


--Amit G Piplani--

1 comment:

  1. If we store a file say a word file or pdf on our file system using lucene we can search them but

    If we are to store them in a database then what options do we have to allow full text searches to pick them up

    ReplyDelete