lucene – Make Me Engineer

How to implement auto suggest using Lucene’s new AnalyzingInfixSuggester API?

June 1, 2023 by Tarik

I’ll give you a pretty complete example that shows you how to use AnalyzingInfixSuggester. In this example we’ll pretend that we’re Amazon, and we want to autocomplete a product search field. We’ll take advantage of features of the Lucene suggestion system to implement the following: Ranked results: We will suggest the most popular matching products … Read more

Is it possible to iterate through documents stored in Lucene Index?

May 14, 2023 by Tarik

IndexReader reader = // create IndexReader for (int i=0; i<reader.maxDoc(); i++) { if (reader.isDeleted(i)) continue; Document doc = reader.document(i); String docId = doc.get(“docId”); // do something with docId here… }

Elasticsearch vs Cassandra vs Elasticsearch with Cassandra

May 8, 2023 by Tarik

One of our applications uses data that is stored into both Cassandra and ElasticSearch. We use Cassandra to access those records whenever we can, and have data duplicated into query tables designed to adhere to specific application-side requests. For a more liberal search than our query tables can allow, ElasticSearch performs that functionality nicely. We … Read more

NoSQL (MongoDB) vs Lucene (or Solr) as your database [closed]

May 7, 2023 by Tarik

This is a great question, something I have pondered over quite a bit. I will summarize my lessons learned: You can easily use Lucene/Solr in lieu of MongoDB for pretty much all situations, but not vice versa. Grant Ingersoll’s post sums it up here. MongoDB etc. seem to serve a purpose where there is no … Read more

How to control Indexing a field in lucene 4.0

May 6, 2023 by Tarik

Constructors taking Field.Index arguments are available, but are deprecated in 4.0, and should not be used. Instead, you should look to subclasses of Field to control how a field is indexed. StringField is the standard un-analyzed indexed field. The field is indexed is a single token. It is appropriate things like identifiers, for which you … Read more

How to do query auto-completion/suggestions in Lucene?

November 10, 2022 by Tarik

Based on @Alexandre Victoor’s answer, I wrote a little class based on the Lucene Spellchecker in the contrib package (and using the LuceneDictionary included in it) that does exactly what I want. This allows re-indexing from a single source index with a single field, and provides suggestions for terms. Results are sorted by the number … Read more

Can a raw Lucene index be loaded by Solr?

November 1, 2022 by Tarik

Success! With Pascal’s suggestion of changes to schema.xml I got it working in no time. Thanks! Here are my complete steps for anyone interested: Downloaded Solr and copied dist/apache-solr-1.4.0.war to tomcat/webapps Copied example/solr/conf to /usr/local/solr/ Copied pre-existing Lucene index files to /usr/local/solr/data/index Set solr.home to /usr/local/solr In solrconfig.xml, changed dataDir to /usr/local/solr/data (Solr looks for … Read more

N-gram generation from a sentence

October 11, 2022 by Tarik

I believe this would do what you want: import java.util.*; public class Test { public static List<String> ngrams(int n, String str) { List<String> ngrams = new ArrayList<String>(); String[] words = str.split(” “); for (int i = 0; i < words.length – n + 1; i++) ngrams.add(concat(words, i, i+n)); return ngrams; } public static String concat(String[] … Read more

get cosine similarity between two documents in lucene

October 10, 2022 by Tarik

As Julia points out Sujit Pal’s example is very useful but the Lucene 4 API has substantial changes. Here is a version rewritten for Lucene 4. import java.io.IOException; import java.util.*; import org.apache.commons.math3.linear.*; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.core.SimpleAnalyzer; import org.apache.lucene.document.*; import org.apache.lucene.document.Field.Store; import org.apache.lucene.index.*; import org.apache.lucene.store.*; import org.apache.lucene.util.*; public class CosineDocumentSimilarity { public static final String CONTENT … Read more

Filename search with ElasticSearch

October 7, 2022 by Tarik

You have various problems with what you pasted: 1) Incorrect mapping When creating the index, you specify: “mappings”: { “files”: { But your type is actually file, not files. If you checked the mapping, you would see that immediately: curl -XGET ‘http://127.0.0.1:9200/files/_mapping?pretty=1’ # { # “files” : { # “files” : { # “properties” : … Read more