contact us

Type the characters you see in this picture. (verify using audio)
Type the characters you see in the picture above; if you can't read them, submit the form and a new image will be generated. Not case sensitive.
Kuldev Bisht

Apache Solr : Nuisance with "Mixed Case"

0 Oct 08, 2010

Solr is an open source enterprise search server based on  Lucene Java search library, with XML/HTTP and JSON APIs, hit highlighting, faceted search, caching, replication, and a web administration interface. It runs in a Java servlet container such as TomCat.

Apache Solr is a powerful search suggested by most developers for a medium to large sized website. Drupal has been a front-runner in providing with a module to integrate with the Solr service. When we present the power of Solr to our clients, they are more than happy to accept it as a default for their site. Major reasons for Solr to be so loved are:

  • Its incredible indexing features.
  • Blazing fast speed.
  • Provision for Faceted search.
  • Content Recommendation
  • and spelling suggestions.

The amazing part is that all this works out of the box. However when it comes to tweaking Apache Solr for some performance issues or making it behave the way the developer wants, it can quickly become a nightmare. Here is one of the problems that i faced in a recent project that i was working on.

No results showed up before the tweak for the keyword "rUral".

Apache Solr by default does not provide for Mixed case searches. The search "rUral" will not search for any result inspite of the fact that the keyword "rural" may have several results tucked in. How do we make it work then? Here is what i found out after a grueling four hours research (ok, i may be exaggerating it a bit!).

If you ever want Apache Solr to work with Case Transitions/Mixed case, you just need to change positions of one filter inside analyzer(type=query) tag in schema.xml.

i.e. ,
In '<analyzer type="query">' tag, find '<filter class="solr.LowerCaseFilterFactory"/>' and replace it just before '<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1".....' 

7 results showed after the tweak was done

After changing this configuration restart the Apache Solr server and voila! The search will now show same result for both 'exPeRt IndiA' and 'expert india'.

Why this works? Actually 'WordDelimiterFilterFactory' filter splits word into subwords applying different rules to each.One of them is being "Case Transitions". The order of setting defines how it needs to be operated upon. Hence by changing the order we could achieve what was seemingly difficult without some additional patch work.

Hope this helps the developers in understanding how Apache Solr Schema works and how some minor problems can be solved by tweaking the Schema itself.