diff options
author | Tom Lane <tgl@sss.pgh.pa.us> | 2014-05-11 12:06:04 -0400 |
---|---|---|
committer | Tom Lane <tgl@sss.pgh.pa.us> | 2014-05-11 12:06:04 -0400 |
commit | 12e611d43e6efbf0e36014a3055ed47366facee7 (patch) | |
tree | 9c012d4a86bb7aed5cd0fcdf971a28a2bd82c8e0 /doc/src | |
parent | e136271a947d142eaa7093a789e6fa9ceda18c17 (diff) | |
download | postgresql-12e611d43e6efbf0e36014a3055ed47366facee7.tar.gz postgresql-12e611d43e6efbf0e36014a3055ed47366facee7.zip |
Rename jsonb_hash_ops to jsonb_path_ops.
There's no longer much pressure to switch the default GIN opclass for
jsonb, but there was still some unhappiness with the name "jsonb_hash_ops",
since hashing is no longer a distinguishing property of that opclass,
and anyway it seems like a relatively minor detail. At the suggestion of
Heikki Linnakangas, we'll use "jsonb_path_ops" instead; that captures the
important characteristic that each index entry depends on the entire path
from the document root to the indexed value.
Also add a user-facing explanation of the implementation properties of
these two opclasses.
Diffstat (limited to 'doc/src')
-rw-r--r-- | doc/src/sgml/gin.sgml | 4 | ||||
-rw-r--r-- | doc/src/sgml/json.sgml | 45 |
2 files changed, 42 insertions, 7 deletions
diff --git a/doc/src/sgml/gin.sgml b/doc/src/sgml/gin.sgml index 0b3d6eeb633..1cbc73c70cf 100644 --- a/doc/src/sgml/gin.sgml +++ b/doc/src/sgml/gin.sgml @@ -395,7 +395,7 @@ </entry> </row> <row> - <entry><literal>jsonb_hash_ops</></entry> + <entry><literal>jsonb_path_ops</></entry> <entry><type>jsonb</></entry> <entry> <literal>@></> @@ -415,7 +415,7 @@ <para> Of the two operator classes for type <type>jsonb</>, <literal>jsonb_ops</> - is the default. <literal>jsonb_hash_ops</> supports fewer operators but + is the default. <literal>jsonb_path_ops</> supports fewer operators but offers better performance for those operators. See <xref linkend="json-indexing"> for details. </para> diff --git a/doc/src/sgml/json.sgml b/doc/src/sgml/json.sgml index 518fe63873e..66426189ca5 100644 --- a/doc/src/sgml/json.sgml +++ b/doc/src/sgml/json.sgml @@ -156,7 +156,7 @@ </table> <sect2 id="json-keys-elements"> - <title><type>jsonb</> Input and Output Syntax</title> + <title>JSON Input and Output Syntax</title> <para> The input/output syntax for the JSON data types is as specified in <acronym>RFC</> 7159. @@ -366,11 +366,11 @@ SELECT '"foo"'::jsonb ? 'foo'; <programlisting> CREATE INDEX idxgin ON api USING gin (jdoc); </programlisting> - The non-default GIN operator class <literal>jsonb_hash_ops</> + The non-default GIN operator class <literal>jsonb_path_ops</> supports indexing the <literal>@></> operator only. An example of creating an index with this operator class is: <programlisting> -CREATE INDEX idxginh ON api USING gin (jdoc jsonb_hash_ops); +CREATE INDEX idxginp ON api USING gin (jdoc jsonb_path_ops); </programlisting> </para> @@ -444,10 +444,10 @@ SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @> '{"tags": ["qu </para> <para> - Although the <literal>jsonb_hash_ops</literal> operator class supports + Although the <literal>jsonb_path_ops</literal> operator class supports only queries with the <literal>@></> operator, it has notable performance advantages over the default operator - class <literal>jsonb_ops</literal>. A <literal>jsonb_hash_ops</literal> + class <literal>jsonb_ops</literal>. A <literal>jsonb_path_ops</literal> index is usually much smaller than a <literal>jsonb_ops</literal> index over the same data, and the specificity of searches is better, particularly when queries contain keys that appear frequently in the @@ -456,6 +456,41 @@ SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @> '{"tags": ["qu </para> <para> + The technical difference between a <literal>jsonb_ops</literal> + and a <literal>jsonb_path_ops</literal> GIN index is that the former + creates independent index items for each key and value in the data, + while the latter creates index items only for each value in the + data.<footnote><para>For this purpose, the term <quote>value</> + includes array elements, though JSON terminology sometimes considers + array elements distinct from values within objects.</para></footnote> + But in <literal>jsonb_path_ops</literal>, each index item is a hash + of both the value and the key(s) leading to it; for example to index + <literal>{"foo": {"bar": "baz"}}</literal>, a single index item would + be created incorporating all three of <literal>foo</>, <literal>bar</>, + and <literal>baz</> into the hash value. Thus a containment query + looking for this structure would result in an extremely specific index + search; but there is no way at all to find out whether <literal>foo</> + appears as a key. On the other hand, a <literal>jsonb_ops</literal> + index would create three index items representing <literal>foo</>, + <literal>bar</>, and <literal>baz</> separately; then to do the + containment query, it would look for rows containing all three of + these items. While GIN indexes can perform such an AND search fairly + efficiently, it will still be less specific and slower than the + equivalent <literal>jsonb_path_ops</literal> search, especially if + there are a very large number of rows containing any single one of the + three index items. + </para> + + <para> + A disadvantage of the <literal>jsonb_path_ops</literal> approach is + that it produces no index entries for JSON structures not containing + any values, such as <literal>{"a": {}}</literal>. If a search for + documents containing such a structure is requested, it will require a + full-index scan, which is quite slow. <literal>jsonb_path_ops</> is + therefore ill-suited for applications that often perform such searches. + </para> + + <para> <type>jsonb</> also supports <literal>btree</> and <literal>hash</> indexes. These are usually useful only if it's important to check equality of complete JSON documents. |