Search Engine – Lucene or Solr

Lucene: Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search Solr: Solr is an open source enterprise search server based on the Lucene Java search library, with XML/HTTP and JSON APIs, hit highlighting, faceted search, caching, replication, … Read more

Apache Solr string field or text field?

The fields as default defined in the solr schema are vastly different. String stores a word/sentence as an exact string without performing tokenization etc. Commonly useful for storing exact matches, e.g, for facetting. Text typically performs tokenization, and secondary processing (such as lower-casing etc.). Useful for all scenarios when we want to match part of … Read more

Solr Custom Similarity

I figured it out on my own. I have stored my own implementation of DefaultSimilarity under /dist/ folder in solr. Then i add <lib dir=”../../../dist/org/apache/lucene/search/similarities/” regex=”.*\.jar”/> to my solrconfig.xml and everything works fine. package; import org.apache.lucene.index.FieldInvertState; import; public class MyNewSimilarityClass extends DefaultSimilarity { @Override public float coord(int overlap, int maxOverlap) { return 1.0f; … Read more

Solr exact word search

I presume your field is a TextField, by default solr does a fuzzy search on this field. What you want is to set up your field as a string field and add no tokenizer then you’ll get an exact match. You can even combine the exact search with a fuzzy search and use DisMax to … Read more

Solr index vs stored

That is correct. Typically you will want your field to be either indexed or stored or both. If you set both to false, that field will not be available in your Solr docs (either for searching or for displaying). See Alexandre’s answer for the special cases when you will want to set both to false. … Read more

How to select distinct field values using Solr?

Faceting would get you a results set that contains distinct values for a field. E.g. http://localhost:8983/solr/select/?q=*%3A*&rows=0&facet=on&facet.field=txt You should get something back like this: <response> <responseHeader><status>0</status><QTime>2</QTime></responseHeader> <result numFound=”4″ start=”0″/> <lst name=”facet_counts”> <lst name=”facet_queries”/> <lst name=”facet_fields”> <lst name=”txt”> <int name=”value”>100</int> <int name=”value1″>80</int> <int name=”value2″>5</int> <int name=”value3″>2</int> <int name=”value4″>1</int> </lst> </lst> </lst> </response> Check out the wiki for … Read more