aboutsummaryrefslogtreecommitdiff
path: root/doc/src
diff options
context:
space:
mode:
authorTom Lane <tgl@sss.pgh.pa.us>2019-04-01 17:37:26 -0400
committerTom Lane <tgl@sss.pgh.pa.us>2019-04-01 17:37:34 -0400
commit26a76cb64072df6fa5585c2c15df39970ccdce01 (patch)
tree80a79706dff2d1a97c2e790619f887b38fd65383 /doc/src
parent4fd05bb55b40a3c9dde2b19942f275fc31b5225a (diff)
downloadpostgresql-26a76cb64072df6fa5585c2c15df39970ccdce01.tar.gz
postgresql-26a76cb64072df6fa5585c2c15df39970ccdce01.zip
Restrict pgbench's zipfian parameter to ensure good performance.
Remove the code that supported zipfian distribution parameters less than 1.0, as it had undocumented performance hazards, and it's not clear that the case is useful enough to justify either fixing or documenting those hazards. Also, since the code path for parameter > 1.0 could perform badly for values very close to 1.0, establish a minimum allowed value of 1.001. This solution seems superior to the previous vague documentation warning about small values not performing well. Fabien Coelho, per a gripe from Tomas Vondra Discussion: https://postgr.es/m/b5e172e9-ad22-48a3-86a3-589afa20e8f7@2ndquadrant.com
Diffstat (limited to 'doc/src')
-rw-r--r--doc/src/sgml/ref/pgbench.sgml27
1 files changed, 11 insertions, 16 deletions
diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index f11d36620d6..ee2501be552 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -1543,29 +1543,17 @@ f(x) = PHI(2.0 * parameter * (x - mu) / (max - min + 1)) /
middle quarter (1.0 / 4.0) of the interval (i.e. from
<literal>3.0 / 8.0</literal> to <literal>5.0 / 8.0</literal>) and 95% from
the middle half (<literal>2.0 / 4.0</literal>) of the interval (second and third
- quartiles). The minimum <replaceable>parameter</replaceable> is 2.0 for performance
- of the Box-Muller transform.
+ quartiles). The minimum allowed <replaceable>parameter</replaceable>
+ value is 2.0.
</para>
</listitem>
<listitem>
<para>
- <literal>random_zipfian</literal> generates an approximated bounded Zipfian
- distribution. For <replaceable>parameter</replaceable> in (0, 1), an
- approximated algorithm is taken from
- "Quickly Generating Billion-Record Synthetic Databases",
- Jim Gray et al, SIGMOD 1994. For <replaceable>parameter</replaceable>
- in (1, 1000), a rejection method is used, based on
- "Non-Uniform Random Variate Generation", Luc Devroye, p. 550-551,
- Springer 1986. The distribution is not defined when the parameter's
- value is 1.0. The function's performance is poor for parameter values
- close and above 1.0 and on a small range.
- </para>
- <para>
+ <literal>random_zipfian</literal> generates a bounded Zipfian
+ distribution.
<replaceable>parameter</replaceable> defines how skewed the distribution
is. The larger the <replaceable>parameter</replaceable>, the more
frequently values closer to the beginning of the interval are drawn.
- The closer to 0 <replaceable>parameter</replaceable> is,
- the flatter (more uniform) the output distribution.
The distribution is such that, assuming the range starts from 1,
the ratio of the probability of drawing <replaceable>k</replaceable>
versus drawing <replaceable>k+1</replaceable> is
@@ -1576,6 +1564,13 @@ f(x) = PHI(2.0 * parameter * (x - mu) / (max - min + 1)) /
itself is produced <literal>(3/2)*2.5 = 2.76</literal> times more
frequently than <literal>3</literal>, and so on.
</para>
+ <para>
+ <application>pgbench</application>'s implementation is based on
+ "Non-Uniform Random Variate Generation", Luc Devroye, p. 550-551,
+ Springer 1986. Due to limitations of that algorithm,
+ the <replaceable>parameter</replaceable> value is restricted to
+ the range [1.001, 1000].
+ </para>
</listitem>
</itemizedlist>