How to implement auto suggest using Lucene’s new AnalyzingInfixSuggester API?

I’ll give you a pretty complete example that shows you how to use AnalyzingInfixSuggester. In this example we’ll pretend that we’re Amazon, and we want to autocomplete a product search field. We’ll take advantage of features of the Lucene suggestion system to implement the following: Ranked results: We will suggest the most popular matching products … Read more

Elasticsearch vs Cassandra vs Elasticsearch with Cassandra

One of our applications uses data that is stored into both Cassandra and ElasticSearch. We use Cassandra to access those records whenever we can, and have data duplicated into query tables designed to adhere to specific application-side requests. For a more liberal search than our query tables can allow, ElasticSearch performs that functionality nicely. We … Read more

How to control Indexing a field in lucene 4.0

Constructors taking Field.Index arguments are available, but are deprecated in 4.0, and should not be used. Instead, you should look to subclasses of Field to control how a field is indexed. StringField is the standard un-analyzed indexed field. The field is indexed is a single token. It is appropriate things like identifiers, for which you … Read more

Can a raw Lucene index be loaded by Solr?

Success! With Pascal’s suggestion of changes to schema.xml I got it working in no time. Thanks! Here are my complete steps for anyone interested: Downloaded Solr and copied dist/apache-solr-1.4.0.war to tomcat/webapps Copied example/solr/conf to /usr/local/solr/ Copied pre-existing Lucene index files to /usr/local/solr/data/index Set solr.home to /usr/local/solr In solrconfig.xml, changed dataDir to /usr/local/solr/data (Solr looks for … Read more

N-gram generation from a sentence

I believe this would do what you want: import java.util.*; public class Test { public static List<String> ngrams(int n, String str) { List<String> ngrams = new ArrayList<String>(); String[] words = str.split(” “); for (int i = 0; i < words.length – n + 1; i++) ngrams.add(concat(words, i, i+n)); return ngrams; } public static String concat(String[] … Read more

get cosine similarity between two documents in lucene

As Julia points out Sujit Pal’s example is very useful but the Lucene 4 API has substantial changes. Here is a version rewritten for Lucene 4. import java.io.IOException; import java.util.*; import org.apache.commons.math3.linear.*; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.core.SimpleAnalyzer; import org.apache.lucene.document.*; import org.apache.lucene.document.Field.Store; import org.apache.lucene.index.*; import org.apache.lucene.store.*; import org.apache.lucene.util.*; public class CosineDocumentSimilarity { public static final String CONTENT … Read more

Filename search with ElasticSearch

You have various problems with what you pasted: 1) Incorrect mapping When creating the index, you specify: “mappings”: { “files”: { But your type is actually file, not files. If you checked the mapping, you would see that immediately: curl -XGET ‘http://127.0.0.1:9200/files/_mapping?pretty=1’ # { # “files” : { # “files” : { # “properties” : … Read more