Though Lucene gives most of the features for creating index and building a search application, it does not provide out of the box crawler for pulling contents from the repository. I found Aperture very good for crawling and indexing. Incremental indexing works great with persistent RDF repository is used.
Once you get an overview of Aperture general structure, you can understand that all you need is a Lucene handler and the proper configuration for the Aperture web crawler.
Listing: LuceneHandler.java
Listing: IntranetCrawler.java
Once you get an overview of Aperture general structure, you can understand that all you need is a Lucene handler and the proper configuration for the Aperture web crawler.
Listing: LuceneHandler.java
Listing: IntranetCrawler.java
No comments:
Post a Comment