You can do a simple keyword search using the LIKE operator. Unfortunately, this can be slow. Fortunately, an efficient keyword search is available in many systems.
CREATE TABLE story ( author varchar(100), body varchar(1000) ); INSERT INTO story (author,body) VALUES('Atzeni' ,'Many database systems, through the use of SQL,↵ are wonderful at collating...'); INSERT INTO story (author,body) VALUES('Adams' ,'The definitions involved in understanding SQL databases are big.↵ You may have thought the distance from your chair to the fridge↵ was big, but that''s peanuts compared to standard definitions.'); INSERT INTO story (author,body) VALUES('Russell and Cumming'↵ ,'Often you must store large chunks of text in a table.');
If you wanted to find out which body has the phrase “database system” in it, you could do the following:
SELECT author FROM story WHERE body LIKE '%database system%'
This accurately returns matches where the exact match is found.
However, for some text searches, partial matches would also be useful,
as well as common roots (such as “system” and “systems”) and
result weighting (a higher score for “database
system” than for “the database used a system”).
SELECT author FROM story WHERE LOWER(body) LIKE '%database system%'
LIKE clause forces the
database system to do a linear scan of the text fields in order to find
the words of interest, and therefore performance will be slow. What you
really need is an index on the words in these text strings. The
FULLTEXT construct supports this type of
indexing. It has other advantages as well: it can use a natural language
engine to aid the matching algorithm, and it can return the quality of
the match (rather than just
ALTER TABLE story ADD FULLTEXT(body);
Now that the index is created, you can perform the query:
SELECT author FROM story WHERE MATCH (body) AGAINST ('database systems');
MATCH returns a
floating-point number, where 0.0 is irrelevant and higher numbers
indicate an increasingly better match quality. You can specify the
match quality in the
SELECT author, MATCH (body) AGAINST ('database systems')->
ORDER BY 2 DESC;+---------------------+-------------------------------------------+ | author | MATCH (body) AGAINST ('database systems') | +---------------------+-------------------------------------------+ | Atzeni | 1.3253291845322 | | Adams | 0 | | Russell and Cumming | 0 | +---------------------+-------------------------------------------+
By default, words are not indexed unless they are at least four
characters long, are composed of characters from only a particular
range, and are not too popular (popular words are those that appear in
more than 50 percent of the rows). Also, query elements using “filler
words” are silently ignored. All of these limits are administrator
MySQL also has an
MODE text-searching capability.
This can work without a
index (but it might be much slower if you use it this way). It
modifies the behavior of
allow Boolean tests to be defined. It does not give scores other than
1 or 0. You could use this to perform the search:
MATCH (body) AGAINST ('+database +systems' IN BOOLEAN MODE)->
ORDER BY 2 DESC;+---------------------+-------+ | author | SCORE | +---------------------+-------+ | Atzeni | 1 | | Adams | 0 | | Russell and Cumming | 0 | +---------------------+-------+
To get full text searching in PostgreSQL, you need to use the Tsearch2 module. A more detailed guide on how to do this
is available from devx (
To install Tsearch2 (from a source-code install) go to your source directory for PostgreSQL and type the following at the Linux or Unix shell prompt (you may need to be root for the install step):
To use Tsearch2 in a particular database, you need to issue this command:
tsearch2.sql should be in
your install directory’s
directory (for instance,
If you encounter permission errors you might be better off using the
Postgres user account for this procedure.
The script creates a number of helper tables, all of which
GRANTs to allow the
required users to access the tables. These tables are
pg_ts_parser. If you want to try things out
you can just continue to use the Postgres user account.
To use this new searching capability, you need to add a column to the tables to be searched (to hold some system vector data concerning the field to be searched), add an index, and prepare the new column for searching:
ALTER TABLE story ADD COLUMN vectors tsvector; CREATE INDEX story_index ON story USING gist(vectors); SELECT set_curcfg('default'); UPDATE story SET vectors = to_tsvector(body);
Finally, you can perform your search:
SELECT author,rank (vectors,q)dbname->
FROM story, to_tsquery('database&systems') AS qdbname->
ORDER BY rank(vectors,q) DESC;author | rank ---------------------+----------- Atzeni | 0.0991032 Adams | 1e-20 Russell and Cumming | 1e-20
Implementation of full text searching in SQL Server utilizes the Microsoft Search Engine. This is external to the database and has to be configured separately first. Part of this configuration requires you to specify a location for saving the full text indexes. Because these indexes are stored outside the normal database structure, you need to remember to back these up separately when you are doing a database backup.
Make sure you have the Microsoft Search Engine installed on your machine. Then, using SQL Server Management Studio (which is the current name for the Enterprise Manager), expand the nodes by selecting Databases→your database name→ Storage. Right-click on Full-Text Catalog and select New Full-Text Catalog. You will be requested for the filename and location to use for the new catalog.
If you are using the Express edition of SQL Server, you will have to download SQL Server Management Studio separately.
To build and use the index on a table, right-click on the table and choose Full-Text Index Table and then Define Full-Text Indexing on a Table. The Full-Text Wizard will start, which requires you to specify the following: a unique index name, the columns to index, the catalog in which to store the index, and the schedule on which you want the index to be rebuilt. Note that if you did not create a catalog as described earlier, you can create one from this wizard.
With the index defined, you still need to populate the index. Right-click on the table, and choose Full-Text Index Table and then Start Full Population; the index will be built using the table’s current data set. You have to repeat this whenever the table is modified.
With the index built, you can query it with
FREETEXT. Alternatively, you can use
FREETEXTTABLE, which will return
the answer as a table object. You use
FREETEXT as an operator, along with the
TABLE version, if you want to use
the result of the search directly in a
SELECT author FROM story WHERE FREETEXT(body,'database systems')
You can use
CONTAINS (and the
CONTAINSTABLE) as an
FREETEXT. It offers
more flexibility and a considerable number of extended options,
allowing a wide range of ways to weight the match. This includes how
far or near a search term is from another term. The details are
available at the MSDN library at
In Oracle, you have many different extensions and options
available for text string indexing. You can find one of the best,
simple tutorials on this at
To use the indexing in Oracle, the user who will be maintaining
the index must have
(permission to use Oracle’s text features). He also needs to set up a
lexer preference. Different
lexers are available for different languages (such as English and
GRANT ctxapp to andrew; BEGIN CTX_DDL.CREATE_PREFERENCE( 'english_lexer','basic_lexer'); CTX_DDL.SET_ATTRIBUTE( 'english_lexer','index_themes', 'no');
Once this is complete, the user can create an index for
CREATE INDEX song_index ON story(body) INDEXTYPE IS CTXSYS.CONTEXT PARAMETERS('LEXER english_lexer STOPLIST ctxsys.default_stoplist');
You can use this index in your SQL query:
Learn more about this topic from SQL Hacks.
Whether you're running Access, MySQL, SQL Server, Oracle, or PostgreSQL, this book will help you push the limits of traditional SQL to squeeze data effectively from your database. SQL Hacks offers 100 hacks -- unique tips and tools -- that bring you the knowledge of experts who apply what they know in the real world to help you take full advantage of the expressive power of SQL. You'll find practical techniques to address complex data manipulation problems.