Getting data points using facets

In user perspective, a facet (termed as Dynamic Navigation in Google Search Appliance) can be defined as a component of the result set when the search output is split down into multiple categories (quite similar to grouping the search results based on certain parameters) and displayed to the end user along with the document count associated that is associated to the individual component. These facets allow user to further restrict their search quite conveniently. Moreover, faceted search plays around with the documents that are indexed, and doesn’t act on the stored documents.

In order to understand facets better, let us consider a simple example of an eStore that sells digital cameras manufactured by various companies. Typically, information such as manufacturer, resolution, zoom capability, price and description should be sufficient to describe a camera. Thus, we can categorize the products based on the manufacturer, resolution, zoom range and price range fields as demonstrated in the following screenshot:

Facet values and facet count

In the above screenshot, the complete set of products (in this case, cameras) have been classified into four categories (Manufacturer, Resolution, Zoom range and Price range) which are called as facets. Each facet constitutes of its value which has been termed as facet values and each facet value is associated to the total number of products (termed as facet count) that match the requirement (for instance, we have seven cameras in our store that belongs to the manufacturer Nikon). If the user clicks on or selects any of the facet values, the search will be further filtered, keeping this added search criteria into account.

As far as faceting in Solr is concerned, it is quite easy and doesn’t require any additional configuration as such. Solr provides us the following faceting types to work with:

  • Field faceting – It retrieves the count of all terms, or just the top level terms in a specific field that has been indexed.
  • Query faceting – It returns the number of documents that suffice the given query.
  • Range faceting – It returns the number of documents that falls within a certain range. This range can be date range, price range, and so on.

Implementing faceted search is quite simple. We just need to append the faceting commands to our standard Solr query request, and the final result set consists of the document count associated to each facet values along with the result we expect to get in case faceting commands are missing.

Let us discuss the preceding faceting types one-by-one.

Field faceting

Field faceting is carried out in case we wish to categorize based on one or more field values. As an example, we want the manufacturer field (field name manufr) to be faceted. We assume that this field has been defined in the schema and has been indexed as a single token.
 
When a user types a search keyword (say for example, camera) in the search box. Our non-faceted query would look like the following:

http://localhost:8983/solr/query?q=camera

In order to retrieve the facet count associated to each facet value, we append the following parameters to our preceding Solr query:

$facet=true
$facet.field=manufr

Before we look into the response, let us increase the complexity a bit by appending the resolution field (field name resol) to our faceting query. So, the following parameters need to be appended to our normal Solr query:

$facet=true
$facet.field=manufr
$facet.field=resol

Likewise, we can even keep adding the fields that we want us to get faceted. The response to our recent faceted query would look like the following:

"facet_fields" : {
  "manufr" : [
    "Canon" , 15,
    "Sony" , 4,
    "Nikon" , 7 ],
  "resol" : [
    "4 megapixels" , 6,
    "6 megapixels" , 8,
    "8 megapixels" , 12 ]
}

Query and Range faceting

If we implement the field faceting on the price field, we get the document count along with the normal result set individually. How about categorizing the search results based on the price range, instead of the individual prices? We can think of a work around wherein we index another field that holds the price range exclusively and we use facet on this field. However, the best solution is to use query faceting that provides us an ability to get the facet count for each price range.

Let us assume that we consider the price range as $300 and less, $400 to $600, $800 to $1000, $1000 to $1200, and finally $1200 and greater. To achieve the desired result, we will append the following facet query command to our normal command:

&facet=true
&facet.query=price:[* TO 300]
&facet.query=price:[400 TO 600]
&facet.query=price:[800 TO 1000]
&facet.query=price:[1000 TO 1200]
&facet.query=price:[1200 TO *]

The response to our faceting query would look something like the following, which also contains the facet count associated to each of the specified price range along with the output that was expected from the normal query:

"facet_queries" : {
  "price:[* TO 300]" : 0,
  "price:[400 TO 600]" : 12,
  "price:[800 TO 1000]" : 6,
  "price:[1000 TO 1200]" : 8,
  "price:[1200 TO *]" : 0
}

By now, we have learnt how to retrieve facet count. How about allowing the end user to pierce into the search results in an intention to filter them further? Yes of course, we can achieve this by implementing the standard Solr filter queries wherein the result set is filtered by a number of random queries.

Let us again assume that the user types camera as the search keyword. Our query would look like the following one:

http://localhost:8983/solr/query?q=camera
  &facet=on
  &facet.field=manufr
  &facet.field=resol
  &facet.query=price:[* TO 300]
  &facet.query=price:[400 TO 600]
  &facet.query=price:[800 TO 1000]
  &facet.query=price:[1000 TO 1200]
  &facet.query=price:[1200 TO *]

Let us imagine a situation wherein the user has a limited budget and he or she is more interested in manufacturer and resolution information of such cameras that fall in the price range of $400 to $600. In order to retrieve the relevant and restricted search results, we use the fq (filter query) parameter which allows us to filter by a query. Since we also want the facet count that matches the user’s criteria, we will also use the appropriate faceting command. Our updated Solr query request would be as follows:

http://localhost:8983/solr/query?q=camera
  &facet=on&facet.field=manufr&facet.field=resol&fq=price:[400 TO 600]

We can include as many number of fq parameters as we like in the request query where the order of the parameters doesn’t matter. Notice that we haven’t included the facet count for the price range here in the preceding query. This is because, specify the price range the fq parameter, we have already restricted the results to fall within this range. Thus, other ranges will have the facet count as zero. Along with the expected response, it also includes a $400-$600 breadcrumb which can be removed by the end user in case he/she doesn’t need it.

Now, let us try out this example wherein we add one more restriction to it. It also means that we add another fq parameter. Along with the price range filter ($400-$600), we assume that the user is only interested in cameras that have resolution of 8 megapixels. So, here we go with our updated query:

http://localhost:8983/solr/query?q=camera
  &facet=on&facet.field=manufr
  &fq=price:[400 TO 600]
  &fq=resol:'8 megapixels'

Once Solr responds to this request, we can add an additional breadcrumb for 8 megapixels in the same way as we did for the price range $400-$600.