From ed802e7dc36059efbc6669b4bfeebad43f0898c1 Mon Sep 17 00:00:00 2001 From: Robert Haas Date: Wed, 30 Jul 2014 13:22:08 -0400 Subject: pgbench: Allow \setrandom to generate Gaussian/exponential distributions. Mitsumasa KONDO and Fabien COELHO, with further wordsmithing by me. --- doc/src/sgml/pgbench.sgml | 61 ++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 58 insertions(+), 3 deletions(-) (limited to 'doc/src') diff --git a/doc/src/sgml/pgbench.sgml b/doc/src/sgml/pgbench.sgml index f264c245ec0..b7d88f30005 100644 --- a/doc/src/sgml/pgbench.sgml +++ b/doc/src/sgml/pgbench.sgml @@ -748,8 +748,8 @@ pgbench options dbname - \setrandom varname min max - + \setrandom varname min max [ uniform | [ { gaussian | exponential } threshold ] ] + @@ -760,10 +760,65 @@ pgbench options dbname having an integer value. + + By default, or when uniform is specified, all values in the + range are drawn with equal probability. Specifiying gaussian + or exponential options modifies this behavior; each + requires a mandatory threshold which determines the precise shape of the + distribution. + + + + For a Gaussian distribution, the interval is mapped onto a standard + normal distribution (the classical bell-shaped Gaussian curve) truncated + at -threshold on the left and +threshold + on the right. + To be precise, if PHI(x) is the cumulative distribution + function of the standard normal distribution, with mean mu + defined as (max + min) / 2.0, then value i + between min and max inclusive is drawn + with probability: + + (PHI(2.0 * threshold * (i - min - mu + 0.5) / (max - min + 1)) - + PHI(2.0 * threshold * (i - min - mu - 0.5) / (max - min + 1))) / + (2.0 * PHI(threshold) - 1.0) + + Intuitively, the larger the threshold, the more + frequently values close to the middle of the interval are drawn, and the + less frequently values close to the min and + max bounds. + About 67% of values are drawn from the middle 1.0 / threshold + and 95% in the middle 2.0 / threshold; for instance, if + threshold is 4.0, 67% of values are drawn from the middle + quarter and 95% from the middle half of the interval. + The minimum threshold is 2.0 for performance of + the Box-Muller transform. + + + + For an exponential distribution, the threshold + parameter controls the distribution by truncating a quickly-decreasing + exponential distribution at threshold, and then + projecting onto integers between the bounds. + To be precise, value i between min and + max inclusive is drawn with probability: + (exp(-threshold*(i-min)/(max+1-min)) - + exp(-threshold*(i+1-min)/(max+1-min))) / (1.0 - exp(-threshold)). + Intuitively, the larger the threshold, the more + frequently values close to min are accessed, and the + less frequently values close to max are accessed. + The closer to 0 the threshold, the flatter (more uniform) the access + distribution. + A crude approximation of the distribution is that the most frequent 1% + values in the range, close to min, are drawn + threshold% of the time. + The threshold value must be strictly positive. + + Example: -\setrandom aid 1 :naccounts +\setrandom aid 1 :naccounts gaussian 5.0 -- cgit v1.2.3