Lucene: Getting Started

Making some notes on how to get started with Lucene. The first step is to clone the repo. The best way to getting started is by running the demo program org.apache.lucene.demo.IndexFiles which indexes files and org.apache.lucene.demo.SearchFiles which searches files. In below we are working from the lucene subdirectory of the project root.

to see all tasks available:

../gradlew tasks

by running:

../gradlew publishJarsPublicationToMavenLocal

I was able to install jars to M2 repository.

../gradlew assemble

is equivalent of mvn compile.

But how do we run the program?

You can run using java but you need to give it the full classpath. How to get it?

Add following to demo/build.gradle:

plugins {
  id 'com.github.johnrengelman.shadow' version '7.1.2'
}

task printClasspath {
  doLast {
    println configurations.runtimeClasspath.asPath
  }
}

Now you can get the classpath by running:

../gradlew printClasspath

In my case this gave me:

/Users/xxx/code/lucene/lucene/facet/build/libs/lucene-facet-9.10.0-SNAPSHOT.jar:/Users/xxx/code/lucene/lucene/queryparser/build/libs/lucene-queryparser-9.10.0-SNAPSHOT.jar:/Users/xxx/code/lucene/lucene/sandbox/build/libs/lucene-sandbox-9.10.0-SNAPSHOT.jar:/Users/xxx/code/lucene/lucene/queries/build/libs/lucene-queries-9.10.0-SNAPSHOT.jar:/Users/xxx/code/lucene/lucene/analysis/common/build/libs/lucene-analysis-common-9.10.0-SNAPSHOT.jar:/Users/xxx/code/lucene/lucene/expressions/build/libs/lucene-expressions-9.10.0-SNAPSHOT.jar:/Users/xxx/code/lucene/lucene/codecs/build/libs/lucene-codecs-9.10.0-SNAPSHOT.jar:/Users/xxx/code/lucene/lucene/core/build/libs/lucene-core-9.10.0-SNAPSHOT.jar:/Users/xxx/.gradle/caches/modules-2/files-2.1/com.carrotsearch/hppc/0.9.1/4bf4c51e06aec600894d841c4c004566b20dd357/hppc-0.9.1.jar:/Users/xxx/.gradle/caches/modules-2/files-2.1/org.antlr/antlr4-runtime/4.11.1/69214c1de1960040729702eb58deac8827135e7/antlr4-runtime-4.11.1.jar:/Users/xxx/.gradle/caches/modules-2/files-2.1/org.ow2.asm/asm-commons/7.2/ca2954e8d92a05bacc28ff465b25c70e0f512497/asm-commons-7.2.jar:/Users/xxx/.gradle/caches/modules-2/files-2.1/org.ow2.asm/asm-analysis/7.2/b6e6abe057f23630113f4167c34bda7086691258/asm-analysis-7.2.jar:/Users/xxx/.gradle/caches/modules-2/files-2.1/org.ow2.asm/asm-tree/7.2/3a23cc36edaf8fc5a89cb100182758ccb5991487/asm-tree-7.2.jar:/Users/xxx/.gradle/caches/modules-2/files-2.1/org.ow2.asm/asm/7.2/fa637eb67eb7628c915d73762b681ae7ff0b9731/asm-7.2.jar

Then you can run the demo program like this:

java \
-cp $CLASSPATH:$CWD/demo/build/classes/java/main \
org.apache.lucene.demo.IndexFiles \
-docs $DOCS_DIR

This will index the files in DOCS_DIR. To search the files run:

java \
-cp $CLASSPATH:$CWD/demo/build/classes/java/main \
org.apache.lucene.demo.SearchFiles

This will just print the filenames of matching files. Normally, you also want to see the matching text. To do that we use following code:

Analyzer analyzer = new StandardAnalyzer();
SimpleHTMLFormatter formatter = new SimpleHTMLFormatter("", "");
Highlighter highlighter = new Highlighter(formatter, new QueryScorer(query))
...
String text = readAllText(path); // reads all text in the file
String highlightedText = highlighter.getBestFragment(analyzer, "contents", text); // get text that matched the query
if (highlightedText != null) {
  System.out.println(highlightedText);
}

we also need to open up the package org.apache.lucene.search.highlight. In demo/src/java/module-info.java:

requires org.apache.lucene.highlighter;
This entry was posted in Computers, programming, Software and tagged . Bookmark the permalink.

Leave a comment