Understanding the ApacheSolr CCK API

Robert Douglass's picture

UPDATE April 20, 2010: This post has been updated for the DRUPAL-6--2 branch.

In this article I will show you how you can write a tiny bit of code that will reveal new fields and facets for searching with the ApacheSolr module and Acquia Search. Using Acquia Drupal we'll write an example module that takes the file type from CCK file and image fields and makes them into their own search fields. This results in us being able to filter our search results based on file type. This code fulfils the situation where you want, for example, to find a specific post that has a JPEG image, or all of the posts with PDFs that match a particular keyword.

To start you may want to download the PDF file of screenshots that trace all of the steps I took to set up Acquia Drupal, Acquia Search, and the custom module. The broad steps are to:

  1. Sign up for a free trial.
  2. Download and install Acquia Drupal.
  3. Follow along with the code examples below to create the example.module.
  4. Set up a content type to have image and file fields. Create some content and upload a variety of files.
  5. Run cron to make sure your content has been indexed.
  6. Enable the new filters and blocks that the example.module is responsible for having created.
  7. Search!

What are facets?

The ApacheSolr module is a revolution in Drupal search. It allows you to search for the most general keyword that applies to what you're looking for, and then use the provided facet links to drill down to exactly the right content. An example would be searching for "Drupal search" on Drupal.org, then filtering by project and robertDouglass to get just the modules that I have written that deal with search. Facets, in other words, are the better version of advanced search forms, which, to be honest, suck.

What facets are available?

By default, ApacheSolr makes facets available for content type, author, language, taxonomy terms, and all CCK fields that are text fields with option widgets (select, radio, checkbox). This article assumes that you want some different facets, and you're using CCK fields. The file field and image field both have some interesting information that would make a great facet - their file type. Every file you upload has a distinct type: pdf, png, gif, tiff, doc, and so forth. Wouldn't it be nice to have this available as a facet? Of course!

What needs to happen to add new facets?

The ApacheSolr module comes equipped with an API for extending what gets indexed and how searching works. One of the important hooks in this API is hook_apachesolr_cck_fields_alter(&$mappings). We're going to write a module that implements this hook, and use it to tell ApacheSolr how to make facets out of the file type on file and image fields.

To do this the hook is only going to have to tell ApacheSolr four things:

  1. What data type should be used in the index.
  2. What CCK widget types to be looking for during indexing.
  3. An indexing callback function to use for extracting the data from the CCK field while indexing. We write this function ourselves.
  4. A display callback function to use for displaying the data from the CCK field during searches. We write this function ourselves.

The callback function that we write will then receive each node and each field name as they are being indexed. From that it must extract or generate whatever information interests us. In this case we're just extracting the file type, which is already present in the field. We could, however, return any amount of data doing any arbitrary processing that we care to. See the code example below to understand the structure of the array that the callback has to return.

The example module implementing hook_apachesolr_cck_fields_alter(&$mappings)

The first step in writing any module is to creat an .info file. Here's ours:

; file example/example.info
name = Apache Solr CCK Example
description = Example module showing custom CCK facets.
core = 6.x
package = Apache Solr

The next step is to have a module file. This is the example/example.module file:

<?php
/**
* Implementation of hook_apachesolr_cck_fields_alter
*
* @param $mappings
*   The existing array, received by reference, of the cck mappings.
*/
function example_apachesolr_cck_fields_alter(&$mappings) {
 
// 'filefield' is here the CCK field_type. Correlates to $field['field_type']
 
$mappings['filefield'] = array(
   
// Each widget type gets its own array. Filefield_widget and imagefield_widget
    // are the two we concern ourselves with.
   
'filefield_widget' => array(
     
// This function will get called at index time to prepare the data
      // into a key and value for inclusion in the Solr index.
     
'indexing_callback' => 'example_filefield_indexing_callback',
     
// This function will get called when the facet links are displayed
      // in facet blocks at search time.
     
'display_callback' => 'example_filefield_display_callback',
     
'index_type' => 'string',
    ),
   
'imagefield_widget' => array(
     
'indexing_callback' => 'example_filefield_indexing_callback',
     
'display_callback' => 'example_filefield_display_callback',
     
'index_type' => 'string',
    ),
  );
}

/**
* A callback function that returns key value pairs (can be more than one pair)
* at index time. This is how CCK data gets put into the Solr index.
*
* @param $node
*   The node being indexed
* @param $field_name
*   The CCK field name
* @param $cck_info
*   The rest of the CCK field info
*/
function example_filefield_indexing_callback($node, $field_name, $cck_info) {
 
// $fields is an array because we send back a key => value pair for every
  // value of the field.
 
$fields = array();
 
// Don't do anything if this node doesn't have this field.
 
if (isset($node->$field_name)) {
   
// Get the index key based on the $cck_info.
   
$index_key = apachesolr_index_key($cck_info);
    foreach (
$node->$field_name as $field) {
     
// For every field, add a key => value pair to the $fields array.
     
$fields[] = array(
       
'key' => $index_key,
       
// The actual value we're indexing is MIME type.
       
'value' => $field['filemime'],
      );
    }
  }
  return
$fields;
}

/**
* Determines what to display in facet links for this field.
*
* @param $facet
*   The raw value (from the index) of this facet field. This corresponds to
*   'value' from the indexing callback.
* @param $options
*   CCK info. In this example it will be an array with a "delta" key with either a
*   "filefile" or "imagefield" value. Here we ignore it and just return the $facet.
*/
function example_filefield_display_callback($facet, $options) {
  return
$facet;
}
?>

The example_apachesolr_cck_fields_alter(&$mappings) function returns an array that says "for any filefield CCK fields (this includes imagefields), use the function example_filefield_indexing_callback() while indexing, store the data as strings, use the function example_filefield_display_callback() when showing facet links, and apply these instructions to filefield_widgets and imagefield_widgets".

The example_filefield_indexing_callback($node, $field_name, $cck_info), a function we specified as a callback, will get called with the $node, $field_name, and $cck_info during indexing. We use that information to dig around and get the file type, which is is found in $field['filemime'].

Results

Now when we search, we have two new facet blocks available letting us drill down into the search results based on the type of files that are uploaded to each one. Not bad for 15 lines of code (excluding comments)!

Searching using file type facets

AttachmentSize
apachesolr_cck.pdf2.97 MB

Comments

Robert Douglass's picture
Robert Douglass
Acquia Staff

Peter Wolanin points out

Posted on April 7, 2009 - 1:22pm by Robert Douglass.

Peter Wolanin points out that, at the moment, if you try to use a variation of this code on textfields, they get clobbered by the existing optionwidgets definitions in the ApacheSolr module itself. Just a warning for those of you with talents for discovering edge cases =)

Robert Douglass
Senior Drupal Advisor, Acquia

Robert, it is amazing how

Posted on April 7, 2009 - 10:59pm by Tim Archambault.

Robert, it is amazing how far solr has come with Drupal in the last 2 years. I started researching this back at the open source CMS conference at Yahoo! and you had also just begun to work on this. I gave up and you didn't so thank you. Wish I was more of a developer I guess.

Anyhow, your outsourced solr hosting is a fantastic idea. Eased the implementation process greatly for those who can't do Tomcat, etc.

Again, thank you.

1) Using horse height as a

Posted on June 15, 2009 - 3:44pm by Paul Birnie.

1) Using horse height as a facet example (please these comments are from my first drupal-solr project - so please feel free to correct) - a simpler way to create a custom facet rather than use the above function is to

* define a new text field - say called field_horseheight_class (assuming your actual horse height is stored in field_horseheight)
* and in the drupal GUI define the 'possible values' eg: 'Upto 12 hands', '12-14 hands'
* At the point where your are creating / updating the node - you can

        // ------         
        $horseheightClass =  'Upto 12 hands';
       
        if ( $node->field_horseheight < 12 ){
           
            $horseheightClass =  'Upto 12 hands';   
        }
        else if ( $node->field_horseheight >= 12 && $node->field_horseheight <= 14 ){
           
            $horseheightClass =  '12-14 hands';
        }
...
       $node->field_horseheight_class = array( '0' => array( 'value'  => $horseheightClass));

* set the widget type for the field_horseheight_class as "check box/radio buttons" the apachesolr config pannel will automatically display the option to enable this as a facet

2) You may want to look at apachesolr_og module - its a nice example on how to create your own facet (and not use the apachesolr_cck_field_mappings function above ) - (simply cut and paste the apachesolr_og class and replace with your values)

3) None of the documentation points this out - I think its important to explain that the way apachesolr understands/learns about new fields with out needing a change to schema.xml is by using the dynamic fields and a naming convention. You can see the dynamic fields defined in schema.xml - for example

   <!-- Dynamic field definitions.  If a field name is not found, dynamicFields
        will be used if the name matches any of the patterns.
        RESTRICTION: the glob-like pattern in the name attribute must have
        a "*" only at the start or the end.
        EXAMPLE:  name="*_i" will match any field ending in _i (like myid_i, z_i)
        Longer patterns will be matched first.  if equal size patterns
        both match, the first appearing in the schema will be used.  -->
   <dynamicField name="is_*"  type="integer" indexed="true"  stored="true" multiValued="false"/>
   <dynamicField name="im_*"  type="integer" indexed="true"  stored="true" multiValued="true"/>
   <dynamicField name="sis_*" type="sint"    indexed="true"  stored="true" multiValued="false"/>
   <dynamicField name="sim_*" type="sint"    indexed="true"  stored="true" multiValued="true"/>

4) You can see the dynamic fields that have been created by going to your solr admin port - by default http://localhost:8983/solr/ and clicking on "schema browser" -> "dynamic fields" and then "ss_*" or one of the other dynamic field prefixes

5) By default - apachesolr integration will take all the fields associated with your node and dump them into the document 'body' that is sent for indexing to apachesolr. This means that if you have

A) A private field that you do not want indexed or
B) Some field that your are using for admin purposes (not displaying to the users)
it will match on searches

For example: lets say you have a private field containing the data "evil company" - and a user searches for "evil" the node will match.

One way to prevent certain fields from being indexed is to

function apachesolr_myproject_nodeapi(&$node, $op, $a3 = NULL, $a4 = NULL){

    // This function gets called a _lot_
    if ( $op == 'view' ){

        // we are in the process of running  the cron.php to build the search index
        if ( isset($node) && ($node->build_mode == NODE_BUILD_SEARCH_INDEX) ){
           
            if ( isset( $node->content['body'] ) ){

                // we dont want fields like field_video_rights to go into the search index -
                // (because they match the search result and appear in the snipet that is in the search result
                //
                // so we remove all of them and only allow body to go though to solr
                $content = (array) $node->content;
               
                $body = $content['body'];
                $node->content = array( 'body' => $body );                           
            }
        }
    }
}

6) During development of your search functionality - it really pays to cut the data in your dev database down to 2/3 per content type - this saves a LOT of time in reindexing cycles. The simplest way to do this is identify the nodes you want to keep and then on your dev database run

delete from node where nid not in ( 1111, 1112, .... etc )

and then dont forget to flush - the cache

7) Also just to point out - In general if you want a field to be listed in the apachesolr pannel as a possible facet - you should set widget type to "check box/radio buttons" and NOT "single on/off checkbox"

updated example code for

Posted on August 29, 2009 - 5:25pm by Fredrik Lassen.

updated example code for apachesolr-6.1-rc2

hi robert,

thanks for the great apachesolr project and this article! I noticed though that the code in this article isn't compatible with the current rc2-api/code. i have adjusted the code snippets that should work w/ rc2 (see below & please review :) ). perhaps you could make a brief note in the article or update the examples.

thanks again for the amazing effort - best, fredrik

1. adjust hook name and structure of return array

/**
* Implementation of hook_apachesolr_cck_fields_alter
*/
function example_apachesolr_cck_fields_alter(&$mappings) {
  // either for all CCK of a given field_type and widget option
  // 'filefield' is here the CCK field_type. Correlates to $field['field_type']
  $mappings['filefield'] = array(
    'filefield_widget' => array('callback' => 'example_callback', 'index_type' => 'string'),
    'imagefield_widget' => array('callback' => 'example_callback', 'index_type' => 'string')
  );
  // or per-field indexing assuming field_example is a filefield cck field
  $mappings['per-field']['field_example'] = array(
    // The callback function gets called at indexing time to get the values.
    'callback' => 'example_callback',
    // Common types are 'text', 'string', 'integer',
    // 'double', 'float', 'date', 'boolean'
    'index_type' => 'string',
  );
}

2. return 'value'=>'XXXX' instead of 'safe'=>check_plain('XXXX') as apachesolr_clean_text() is called on the value in apachesolr_node_to_document()?

/**
* A function that gets called during indexing.
* @node The current node being indexed
* @fieldname The current field being indexed
*
* @return an array of arrays. Each inner array is a value, and must be
* keyed 'value' => $value
*/
function example_callback($node, $fieldname) {
  $fields = array();
  foreach ($node->$fieldname as $field) {
    // In this case we are indexing the filemime type. While this technically
    // makes it possible that we could search for nodes based on the mime type
    // of their file fields, the real purpose is to have facet blocks during
    // searching.
    $fields[] = array('value' => $field['filemime']);
  }
  return $fields;
}
Robert Douglass's picture
Robert Douglass
Acquia Staff

Thanks for the updated code,

Posted on August 30, 2009 - 1:46am by Robert Douglass.

Thanks for the updated code, fredrik. I've added a note to the article referencing it.

Robert Douglass
Senior Drupal Advisor, Acquia

robert i tried all of codes

Posted on September 13, 2009 - 5:52am by william carter.

robert i tried all of codes and they are working wonderfull. before i wasnt understand here but i asked some friend and he tell me this is very easy.. >> * keyed 'safe' => $value
*/
function example_callback($node, $fieldname) {
$fields = array();
foreach ($node->$fieldname as $field) {
// In this case we are indexing the filemime type. While this technically
// makes it possible that we could search for nodes based on the mime type
// of their file fields, the real purpose is to have facet blocks during
// searching.
$fields[] = array('safe' => check_plain($field['filemime']));
}
return $fields;
} but i understand now and i know very good thank you robert

I've created a cck filed of

Posted on September 25, 2009 - 7:07am by prasanth m.

I've created a cck filed of type textarea with name filed_desc, how do i get this field to index in solr.

i have tried this but it is not indexing the filed, can somebody help.

<?php
// $Id$
/**
* Implementation of hook_apachesolr_cck_fields_alter
*/
function example_apachesolr_cck_fields_alter(&$mappings) {
 
// either for all CCK of a given field_type and widget option
  // 'filefield' is here the CCK field_type. Correlates to $field['field_type']
 
$mappings['text'] = array(
   
'text_textarea' => array('callback' => 'example_callback', 'index_type' => 'string'),
  
  );

}

/**
* A function that gets called during indexing.
* @node The current node being indexed
* @fieldname The current field being indexed
*
* @return an array of arrays. Each inner array is a value, and must be
* keyed 'value' => $value
*/
function example_callback($node, $fieldname) {
 
$fields = array();
  foreach (
$node->$fieldname as $field) {
   
// In this case we are indexing the filemime type. While this technically
    // makes it possible that we could search for nodes based on the mime type
    // of their file fields, the real purpose is to have facet blocks during
    // searching.
   
$fields[] = array('value' => $field['field_desc']);
  }
  return
$fields;
}
?>

Related Content

AcquiaBlog