Drupal's Search Framework: The execution of a search

Drupal's ambitious search module provides a framework for building searches of all kinds. By isolating the tasks involved in searching, and allowing the actual search implementations to be handled by other modules, the search framework sets the stage for all sorts of creative search applications. This article, which applies to Drupal 6, explores the structure of the search framework by following the steps needed to execute a search.
Stucture of a search
Here are the basic steps involved in searching:
- Build a search index.
- Build a search form.
- Accept a POST request from the form.
- Redirect POST to GET with search query values expressed in the URL.
- Parse search query values.
- Construct search based on query values.
- Return formatted results.
Build a search index
The search module's API for indexing HTML content is very simple.
<?php
search_index($sid, $type, $text);
?>Example 1: search_index is the way you put stuff into the search index.
$sid is the unique id for a piece of content, $type corresponds to the name of the search implementation (see 'name' $op for hook_search), and $text is the HTML that is to be indexed.
Build a search form
The basic search form is simply a text field with a submit button. This form is available by default to every module that implements search using hook_search. The main method for extending this form is through hook_form_alter (see node.module, node_form_alter). It is also possible to build search functionality using other tools that don't rely totally on the search framework. See views_fastsearch for one such hybrid approach.
Accept a POST request from the form
The search_menu and the search_view functions cooperate to make sure that any incoming POST requests for search get redirected to the same path using GET, only with the POST information expressed as part of the GET request.
<?php
// In search.module
function search_menu() {
[...
snip...]
foreach (
module_implements('search') as $name) {
$items['search/'. $name .'/%menu_tail'] = array(
'page callback' => 'search_view',
'page arguments' => array($name),
'type' => MENU_LOCAL_TASK,
'parent' => 'search',
);
}
return $items;
}
// In search.pages.inc
/**
* Menu callback; presents the search form and/or search results.
*/
function search_view($type = 'node') {
// Search form submits with POST but redirects to GET. This way we can keep
// the search query URL clean as a whistle:
// search/type/keyword+keyword
if (!isset($_POST['form_id'])) {
if ($type == '') {
// Note: search/node can not be a default tab because it would take on the
// path of its parent (search). It would prevent remembering keywords when
// switching tabs. This is why we drupal_goto to it from the parent instead.
drupal_goto('search/node');
}
// [...snip...]
// Do the search and build the form, expressed as $output
return $output;
}
return
drupal_get_form('search_form', NULL, empty($keys) ? '' : $keys, $type);
}
?>Example 2: search_menu and search_view.
In search_menu it can be seen how a path is being built for every search module that implements a search. You can see this in action on any Drupal installation with search.module enabled at the path http://example.com/search. The Content and the Users tabs come from the node and user modules' search implementations, respectively. An interesting and important detail is the path description: $items['search/'. $name .'/%menu_tail']. The %menu_tail bit passes everything it matches as a parameter, verbatim, without splitting it into further segments. This is important if you want to be able to search for the string "foo/bar", for example. Normally that would be split into path segments based on the forward slash, but %menu_tail prevents the splitting.

Figure 1: A basic search form with Content and User tabs.
Redirect POST to GET with search query values expressed in the URL
The first lines of search_view promise to redirect to GET, but the mechanism for doing this isn't visible in the code:
<?php
// Search form submits with POST but redirects to GET. This way we can keep
// the search query URL clean as a whistle:
// search/type/keyword+keyword
if (!isset($_POST['form_id'])) {
?>Example 2: Detail of search_view. Only do something on GET.
Clearly the search itself doesn't happen unless there is no POST form_id value (ie it is a GET request), but how does the redirect happen? The answer lies deep within the handling of the search form:
<?php
// From search.module
/**
* As the search form collates keys from other modules hooked in via
* hook_form_alter, the validation takes place in _submit.
* search_form_validate() is used solely to set the 'processed_keys' form
* value for the basic search form.
*/
function search_form_validate($form, &$form_state) {
form_set_value($form['basic']['inline']['processed_keys'], trim($form_state['values']['keys']), $form_state);
}
/**
* Process a search form submission.
*/
function search_form_submit($form, &$form_state) {
$keys = $form_state['values']['processed_keys'];
if ($keys == '') {
form_set_error('keys', t('Please enter some keywords.'));
// Fall through to the drupal_goto() call.
}
$type = $form_state['values']['module'] ? $form_state['values']['module'] : 'node';
$form_state['redirect'] = 'search/'. $type .'/'. $keys;
return;
}
?>Example 3: Search form validation and submission.
When the search form is submitted, the values first go to search_form_validate(). The sole purpose of the validation is to make sure the processed search keys (the values of the form submission), are passed on to the submit handler. The submit handler, search_form_submit(), does the unusual task of validating the form (checking if there are actually keys, or if an empty form was submitted). It can be debated whether that validation actually belongs in the search_form_validate function. More interesting to us, however, is the setting of $form_state['redirect']. This is how POSTed search forms get redirected via GET with the search query in the URL. The Forms API will do the redirect after the submit handler has finished.
This process is one of the first mysteries of the search module that often confuses people when they attempt to understand its inner workings. Despite being somewhat mystical in its behavior, the POST -> GET redirect has a very practical advantage: search result pages can be bookmarked.
Parse search query values
The one thing that virtually every function needs in the process of doing a search is the $keys variable that contains the search query. In Drupal 6, the entire search query is represented as a string. The function search_get_keys() can be used to fetch this string, and it is a simple function that looks first to the path, and then to the submitted form values in order to find a keyword query. Whatever is found is stored statically in the function and cannot be changed during the lifetime of the request.
Management of this keyword query string is an interesting issue, especially in the context of the advanced search form. The search module offers two functions, search_query_insert($keys, $option, $value = '') and search_query_extract($keys, $option), which aid in the manipulation of the query string. If you call search_query_extract("foo nid:4711", "nid"), you get the value 4711 in return. If you call search_query_insert("bar", "uid", 42), you get "bar uid:42" in return. Neither of these functions actually interact with search_get_keys, however, so they cannot be used to fetch or manipulate the statically cached keys. See node_form_alter and the $op = 'search' part of node_search for usage examples of these functions. Note in particular how the form is always used as the storage mechanism for the search query string.
Construct search based on query values
The search framework expects modules to use the parsed search query string to do a search for values and return a structured array of results. This process gets triggered in search_view, which calls search_data, which is a wrapper first and foremost for this code: $results = module_invoke($type, 'search', 'search', $keys); In other words, the $op = 'search' phase of hook_search is initiated. The other responsibility of search_data is to theme the results page, either by invoking the hook_search_page implementation for the module doing the search, or by defaulting to theme('search_results').
Despite the fact that the search framework expects modules to do their own searches, it also provides a mechanism for searching the search index (see step #1). The function do_search, in its simplest form, is a breeze to use. Take a list of keywords and specify a type ('node' for searching node content), and get search results like this:
<?php
$results = do_search('foo bar baz', 'node');
?>Example 4: Using do_search to find content.
The $results will be an array of top ranking node ids for the keywords "foo" or "bar" or "baz". As this function is one of the core API functions of the search module, you can feel free to call it for your own purposes any time you want. For example, call it from within a block, taking keywords from taxonomy terms or a user's profile interests, and use the returned results as a form of content recommendation.
The full function signature for do_search, however, is quite intimidating:
<?php
do_search($keywords, $type, $join1 = '', $where1 = '1', $arguments1 = array(),
$columns2 = 'i.relevance AS score', $join2 = '', $arguments2 = array(),
$sort_parameters = 'ORDER BY score DESC') {
?>Example 5: do_search, Search's API function for finding content.
Discussing all the possible values for the parameters is outside the scope of this article, but the plethora of options are there so that calling code can interact with two distinct queries by injecting JOIN and WHERE clauses into each of them. Sorts can be specified as well, although I don't recall ever seeing this feature utilized.
Return formatted results
If you want to utilize the search module's standard formatting for search results, your hook_search('search') has to build a structured array of results where each result follows the format:
Required keys:
- link: The URL of the item.
- type: The translated type, et. "Blog entry".
Optional keys:
- title: The title of the result.
- user: The themed username of the user who created the search result (ie. node author).
- date: The timestamp associated with the search result.
- snippet: An excerpt of text that gives the context of the keywords that were found in the search result. The search module provides a function, search_excerpt(), which can be used to highlight the keywords within this snippet, but you must call it yourself while building the search result.
Conclusion
There are potentially many steps that go into doing a search and displaying the results. The search module provides a framework for managing all of these steps, and an API for accessing the various bits and pieces even outside of the context of a traditional search page. The functions search_excerpt, search_index and do_search, in particular, can be called by modules outside of the traditional hook_search context.
Related Content
AcquiaBlog

2010 has been an inflection point for the Acquia partner program. We are doing more business than ever with partners, including case studies with Palantir.net, Blink Reaction, and IBM Global Services.
Bryan House
It is that phase of my life! I'm just turning 30 in a month, working with Drupal for 7 years and just had my third Acquia anniversary a week ago. Time to look back and evaluate how things went, all the good and bad things; even better if the wisdom can be shared with others. This was part of my thinking when I submitted the session titled "Come for the software, stay for the community" for Drupalcon Copenhagen.
Gábor Hojtsy
It sounded like a really simple request: "Is it easy to add a search filter for 'My posts'?". In other words, add a search result facet for posts by the current (logged in) user through the Apache Solr Search Integration module APIs?
But then the wheels start turning - we want not just one blind link, but a real facet link that tells us how many results we'll get. Also, if we are filtering by 'My posts' then we probably have an equal use case for the opposite filter 'Posts not by me'. So we really need a facet block with two links and facets counts.
Peter Wolanin







Comments
Excellent article. One
Excellent article. One aspect of the search API that is a bit limiting is that it creates a separate tab in the search results for other modules' invocation of hook_search. It would be nice if module developers could override that behavior and integrate the results from their module's search into the same tab as the results from search.module.
The reason I am pointing this out is that I am the maintainer of the search_attachments module, and the most requested feature from its users is the ability to put the hits on regular node content (i.e., those found by search.module) and hits on files (i.e., those found by search_attachments) in the same tab. In responding to a user request at http://drupal.org/node/242748 I've started to think about ways of doing this but haven't dug too far in so for. Any suggestions?
Robert Douglass
Mark, the first suggestion
Mark, the first suggestion is to open up a feature request against Drupal 7 and add it to the list of issues here: http://groups.drupal.org/node/10569
There is a lot of activity going on with search at the moment, and every bit of help counts. Thanks for your awesome contribution (the search_attachments mod). Look forward to discussing "Unified search across implementations" with you in the search group.
Thanks a lot, will do.
Thanks a lot, will do.
I'm trying to hack in
I'm trying to hack in search-by-date fields (published before and published after fields) into the advanced search of nodes on my D6 site but I'm running into trouble. I've found how to add the necessary fields in
node_form_alter()and the code to add these parameters to the search query innode_search()but then the submitted data from those fields don't make it to the generated$keysinsearch_form_submit(). I've been trying for the past few hours to figure out what's going on whensearch_form_validate(),form_set_value(), and finally_form_set_value()are called, but at this point in the day I'm getting totally lost.I've pretty much confirmed that
search_form_validate()is the point at which it breaks. If I manually type the URL with the appropriate GET query, the search works just fine.Can I get a little help? :-)
Robert Douglass
You’re on the right track.
You're on the right track. If you're doing this in a module you need to add a validation function. If you're hacking to make a core patch, you need to update node_search_validate. Whichever option you choose, you need to study node_search_validate to see how it rebuilds the string with the $keys using search_query_insert and then packs that string into the form like this:
<?phpif (!empty($keys)) {
form_set_value($form['basic']['inline']['processed_keys'], trim($keys), $form_state);
}
?>
This is awkward and I hope that we will soon come up with a nicer paradigm for building this (and other) advanced search forms.
I didn't think of looking at
I didn't think of looking at
node_search_validate()! Thanks. Now I'm confident that my new search parameters are getting through, but now I'm having a problem with the query results. I'm getting zero results with search parameters that I'm sure should yield at least one result.I have the following code in
node_search():if ($start = search_query_extract($keys, 'after')) {$conditions1 .= ' AND nz.created >= %d';
$arguments1[] = intval(date('U', strtotime($start)));
}
if ($end = search_query_extract($keys, 'before')) {
$conditions1 .= ' AND n.created <= %d';
$arguments1[] = intval(date('U', strtotime($end)));
}
Is there anything I missed?
Woops, the third line should
Woops, the third line should have
n.create. I intentionally made it "nz" so I could see the generated query in the error. :-PRobert Douglass
I'd have to see the actual
I'd have to see the actual query being generated before I could say. Make sure to use the devel module and turn query logging on so that you can see all of the queries getting executed, and analyze the query being built, comparing it to the query you expected.
Hi, simple question. After
Hi, simple question. After the post back, the keys query string is set textually in the "Enter your keywords:" text box. so, if my keys value is "somesearch xvalue:something", then my textbox has this entire string instead of just "somesearch".
how can we ensure the proper value is set at postback?
thanks!
Robert Douglass
This depends on what you
This depends on what you mean by "proper value". In the ApacheSolr module I decided that no matter what comes in as the URL or POST, any field queries (like nid:5) would not be displayed in the form. This is because the module relies heavily on faceted searching and if you click 5 facets to drill down, with their somewhat long, non-human friendly names, the form will become overpopulated with all sorts of trash. So in the ApacheSolr module all of this extra information is stored in a special singleton object. Look here at the apachesolr_form_alter function, and look here at the get_query_basic function to see how it is done.
In Drupal's core search, the field values are passed on in the $form. Look at node_form_alter, node_search_validate, and node_search (in that order) to see how the values of the field queries are persisted.
Claudio Cicali
Hi Robert, I'm using your
Hi Robert,
I'm using your great ApacheSolr module in a pre-production site (looking forward for the 1.0 and reading all the opens issues so far, particularly about the DISMAX query).
One thing that puzzles me is how to change the default behaviour of the search form *block*. I'm on D6, and I'd like it to take the search directly to Solr and not to the default Drupal one (which, in production, I'll hide to the users), where I need to click the "Search" tab.
Thank you!
Robert Douglass
Hi Claudio, look under
Hi Claudio,
look under /admin/settings/apachesolr/settings for the Advanced Settings fieldset. In there you can find "Make Apache Solr Search the default". This should solve your problems.
Robert Douglass
Senior Drupal Advisor, Acquia
Hi Robert I'm trying to
Hi Robert
I'm trying to create a custom search but getting stuck.
What I want is to have a drop-downbox so the user can choose where to search in.
These options can mean 1 or more content types.
So if he chooses options A, then the search will look in node-type P,Q,R.
But he may not give those results, but only the uid's which will be then themed to gather specific data for that user.
To make it a little bit clearer, Suppose I want to look for people, then the search will look the keywords in 2 content profile types (nodes), giving back the user (from $node->uid).
I started with creating a form with a text field and the drop-down box.
Then, in the submit handler, i created the keys and redirected to another pages with those keys as a tail. This page has been defined in the menu hook, just like how search does it.
After that I want to call hook_view to do the actual search by calling node_search, and give back the results.
I really would like to know if I am on the right track.
Is this the way to create a custom search?
Thx for your help.
Here's the code for some clarity:
?php
// $Id$
/*
* @file
* Searches on Project, Person, Portfolio or Group.
*/
/**
* returns an array of menu items
* @return array of menu items
*/
function vm_search_menu() {
$subjects = _vm_search_get_subjects();
foreach ($subjects as $name => $description) {
$items['zoek/'. $name .'/%menu_tail'] = array(
'page callback' => 'vm_search_view',
'page arguments' => array($name),
'type' => MENU_LOCAL_TASK,
);
}
return $items;
}
/**
* create a block to put the form into.
* @param $op
* @param $delta
* @param $edit
* @return mixed
*/
function vm_search_block($op = 'list', $delta = 0, $edit = array()) {
switch ($op) {
case 'list':
$blocks[0]['info'] = t('Algemene zoek');
return $blocks;
case 'view':
if (0 == $delta) {
$block['subject'] = t('');
$block['content'] = drupal_get_form('vm_search_general_form');
}
return $block;
}
}
/**
* Define the form.
*/
function vm_search_general_form() {
$subjects = _vm_search_get_subjects();
foreach ($subjects as $key => $subject) {
$options[$key] = $subject['desc'];
}
$form['subjects'] = array(
'#type' => 'select',
'#options' => $options,
'#required' => TRUE,
);
$form['keys'] = array(
'#type' => 'textfield',
'#required' => TRUE,
);
$form['submit'] = array(
'#type' => 'submit',
'#value' => t('Zoek'),
);
return $form;
}
function vm_search_general_form_submit($form, &$form_state) {
$subjects = _vm_search_get_subjects();
$keys = $form_state['values']['keys']; //the search keys
//the content types to search in
$keys .= ' type:' . implode(',', $subjects[$form_state['values']['subjects']]['types']);
//redirect to the page, where vm_search_view will handle the actual search
$form_state['redirect'] = 'zoek/'. $form_state['values']['subjects'] .'/'. $keys;
}
/**
* Menu callback; presents the search results.
*/
function vm_search_view($type = 'node') {
// Search form submits with POST but redirects to GET. This way we can keep
// the search query URL clean as a whistle:
// search/type/keyword+keyword
if (!isset($_POST['form_id'])) {
if ($type == '') {
// Note: search/node can not be a default tab because it would take on the
// path of its parent (search). It would prevent remembering keywords when
// switching tabs. This is why we drupal_goto to it from the parent instead.
drupal_goto($front_page);
}
$keys = search_get_keys();
// Only perform search if there is non-whitespace search term:
$results = '';
if (trim($keys)) {
// Log the search keys:
watchdog('vm_search', '%keys (@type).', array('%keys' => $keys, '@type' => $type));
// Collect the search results:
$results = node_search('search', $type);
if ($results) {
$results = theme('box', t('Zoek resultaten'), $results);
}
else {
$results = theme('box', t('Je zoek heeft geen resultaten opgeleverd.'));
}
}
}
return $results;
}
/**
* returns array where to look for
* @return array
*/
function _vm_search_get_subjects() {
$subjects['opdracht'] =
array('desc' => t('Zoek opdracht'),
'types' => array('project')
);
$subjects['persoon'] =
array('desc' => t('Zoek persoon'),
'types' => array('types_specialisatie', 'smaak_en_interesses')
);
$subjects['groep'] =
array('desc' => t('Zoek groep'),
'types' => array('Villamedia_groep')
);
$subjects['portfolio'] =
array('desc' => t('Zoek portfolio'),
'types' => array('artikel')
);
return $subjects;
}
How to change the number of
How to change the number of search results per page?
I've been looking everywhere, and it seems that the number of search results per page are specifically set at 10 in the do_search() function.
Can I override this value without hacking core? How?
Robert Douglass
It's fairly senselessly
It's fairly senselessly hardcoded into the core search module. It is, of course, a big shortcoming. It's one more thing the Apache Solr module and Acquia Search give you control over.
Robert Douglass
Senior Drupal Advisor, Acquia