diff options
Diffstat (limited to 'doc/src/sgml/textsearch.sgml')
-rw-r--r-- | doc/src/sgml/textsearch.sgml | 41 |
1 files changed, 21 insertions, 20 deletions
diff --git a/doc/src/sgml/textsearch.sgml b/doc/src/sgml/textsearch.sgml index d3e7a148ea5..547c0153ac8 100644 --- a/doc/src/sgml/textsearch.sgml +++ b/doc/src/sgml/textsearch.sgml @@ -1,4 +1,4 @@ -<!-- $PostgreSQL: pgsql/doc/src/sgml/textsearch.sgml,v 1.51 2009/04/27 16:27:36 momjian Exp $ --> +<!-- $PostgreSQL: pgsql/doc/src/sgml/textsearch.sgml,v 1.52 2009/06/17 21:58:49 tgl Exp $ --> <chapter id="textsearch"> <title id="textsearch-title">Full Text Search</title> @@ -389,7 +389,7 @@ text @@ text <para> Text search parsers and templates are built from low-level C functions; - therefore C programming ability is required to develop new ones, and + therefore it requires C programming ability to develop new ones, and superuser privileges to install one into a database. (There are examples of add-on parsers and templates in the <filename>contrib/</> area of the <productname>PostgreSQL</> distribution.) Since dictionaries and @@ -519,7 +519,7 @@ CREATE INDEX pgweb_idx ON pgweb USING gin(to_tsvector(config_name, body)); recording which configuration was used for each index entry. This would be useful, for example, if the document collection contained documents in different languages. Again, - queries that wish to use the index must be phrased to match, e.g., + queries that are meant to use the index must be phrased to match, e.g., <literal>WHERE to_tsvector(config_name, body) @@ 'a & b'</>. </para> @@ -860,7 +860,8 @@ SELECT plainto_tsquery('english', 'The Fat & Rats:C'); <term> <synopsis> - ts_rank(<optional> <replaceable class="PARAMETER">weights</replaceable> <type>float4[]</>, </optional> <replaceable class="PARAMETER">vector</replaceable> <type>tsvector</>, <replaceable class="PARAMETER">query</replaceable> <type>tsquery</> <optional>, <replaceable class="PARAMETER">normalization</replaceable> <type>integer</> </optional>) returns <type>float4</> + ts_rank(<optional> <replaceable class="PARAMETER">weights</replaceable> <type>float4[]</>, </optional> <replaceable class="PARAMETER">vector</replaceable> <type>tsvector</>, + <replaceable class="PARAMETER">query</replaceable> <type>tsquery</> <optional>, <replaceable class="PARAMETER">normalization</replaceable> <type>integer</> </optional>) returns <type>float4</> </synopsis> </term> @@ -1042,7 +1043,7 @@ LIMIT 10; Ranking can be expensive since it requires consulting the <type>tsvector</type> of each matching document, which can be I/O bound and therefore slow. Unfortunately, it is almost impossible to avoid since - practical queries often result in a large number of matches. + practical queries often result in large numbers of matches. </para> </sect2> @@ -1068,7 +1069,7 @@ LIMIT 10; <para> <function>ts_headline</function> accepts a document along - with a query, and returns an excerpt of + with a query, and returns an excerpt from the document in which terms from the query are highlighted. The configuration to be used to parse the document can be specified by <replaceable>config</replaceable>; if <replaceable>config</replaceable> @@ -1085,8 +1086,8 @@ LIMIT 10; <itemizedlist spacing="compact" mark="bullet"> <listitem> <para> - <literal>StartSel</>, <literal>StopSel</literal>: the strings to delimit - query words appearing in the document, to distinguish + <literal>StartSel</>, <literal>StopSel</literal>: the strings with + which to delimit query words appearing in the document, to distinguish them from other excerpted words. You must double-quote these strings if they contain spaces or commas. </para> @@ -1188,7 +1189,7 @@ SELECT id, ts_headline(body, q), rank FROM (SELECT id, body, q, ts_rank_cd(ti, q) AS rank FROM apod, to_tsquery('stars') q WHERE ti @@ q - ORDER BY rank DESC + ORDER BY rank DESC LIMIT 10) AS foo; </programlisting> </para> @@ -1678,9 +1679,9 @@ SELECT title, body FROM messages WHERE tsv @@ to_tsquery('title & body'); </para> <para> - A limitation of built-in triggers is that they treat all the + A limitation of these built-in triggers is that they treat all the input columns alike. To process columns differently — for - example, to weigh title differently from body — it is necessary + example, to weight title differently from body — it is necessary to write a custom trigger. Here is an example using <application>PL/pgSQL</application> as the trigger language: @@ -1722,8 +1723,8 @@ ON messages FOR EACH ROW EXECUTE PROCEDURE messages_trigger(); </para> <synopsis> - ts_stat(<replaceable class="PARAMETER">sqlquery</replaceable> <type>text</>, <optional> <replaceable class="PARAMETER">weights</replaceable> <type>text</>, - </optional> OUT <replaceable class="PARAMETER">word</replaceable> <type>text</>, OUT <replaceable class="PARAMETER">ndoc</replaceable> <type>integer</>, + ts_stat(<replaceable class="PARAMETER">sqlquery</replaceable> <type>text</>, <optional> <replaceable class="PARAMETER">weights</replaceable> <type>text</>, </optional> + OUT <replaceable class="PARAMETER">word</replaceable> <type>text</>, OUT <replaceable class="PARAMETER">ndoc</replaceable> <type>integer</>, OUT <replaceable class="PARAMETER">nentry</replaceable> <type>integer</>) returns <type>setof record</> </synopsis> @@ -2087,7 +2088,7 @@ SELECT alias, description, token FROM ts_debug('http://example.com/stuff/index.h by the parser, each dictionary in the list is consulted in turn, until some dictionary recognizes it as a known word. If it is identified as a stop word, or if no dictionary recognizes the token, it will be - discarded and not indexed or searched. + discarded and not indexed or searched for. The general rule for configuring a list of dictionaries is to place first the most narrow, most specific dictionary, then the more general dictionaries, finishing with a very general dictionary, like @@ -2439,7 +2440,7 @@ CREATE TEXT SEARCH DICTIONARY thesaurus_simple ( <programlisting> ALTER TEXT SEARCH CONFIGURATION russian - ALTER MAPPING FOR asciiword, asciihword, hword_asciipart + ALTER MAPPING FOR asciiword, asciihword, hword_asciipart WITH thesaurus_simple; </programlisting> </para> @@ -2679,9 +2680,9 @@ CREATE TEXT SEARCH DICTIONARY english_stem ( </para> <para> - As an example, we will create a configuration - <literal>pg</literal> by duplicating the built-in - <literal>english</> configuration. + As an example we will create a configuration + <literal>pg</literal>, starting by duplicating the built-in + <literal>english</> configuration: <programlisting> CREATE TEXT SEARCH CONFIGURATION public.pg ( COPY = pg_catalog.english ); @@ -3137,7 +3138,7 @@ SELECT plainto_tsquery('supernovae stars'); </indexterm> <para> - There are two kinds of indexes which can be used to speed up full text + There are two kinds of indexes that can be used to speed up full text searches. Note that indexes are not mandatory for full text searching, but in cases where a column is searched on a regular basis, an index is @@ -3204,7 +3205,7 @@ SELECT plainto_tsquery('supernovae stars'); to check the actual table row to eliminate such false matches. (<productname>PostgreSQL</productname> does this automatically when needed.) GiST indexes are lossy because each document is represented in the - index using a fixed-length signature. The signature is generated by hashing + index by a fixed-length signature. The signature is generated by hashing each word into a random bit in an n-bit string, with all these bits OR-ed together to produce an n-bit document signature. When two words hash to the same bit position there will be a false match. If all words in |