Sunday, February 26, 2012

Aperture + Lucene

Though Lucene gives most of the features for creating index and building a search application, it does not provide out of the box crawler for pulling contents from the repository. I found Aperture very good for crawling and indexing. Incremental indexing works great with persistent RDF repository is used.

Once you get an overview of Aperture general structure, you can understand that all you need is a Lucene handler and the proper configuration for the Aperture web crawler.

Listing: LuceneHandler.java


Listing: IntranetCrawler.java

Cassandra - why?

Cassandra Client Requests

Cassandra Snitches

Cassandra Replication