Skip to content

Commit 2c5b57e

Browse files
committed
pgbench: Change terminology from "threshold" to "parameter".
Per a recommendation from Tomas Vondra, it's more helpful to refer to the value that determines how skewed a Gaussian or exponential distribution is as a parameter rather than a threshold. Since it's not quite too late to get this right in 9.5, where it was introduced, back-patch this. Most of the patch changes only comments and documentation, but a few pgbench messages are altered to match. Fabien Coelho, reviewed by Michael Paquier and by me.
1 parent 550e9c2 commit 2c5b57e

File tree

2 files changed

+78
-60
lines changed

2 files changed

+78
-60
lines changed

doc/src/sgml/ref/pgbench.sgml

Lines changed: 38 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -776,7 +776,7 @@ pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</>
776776

777777
<varlistentry>
778778
<term>
779-
<literal>\setrandom <replaceable>varname</> <replaceable>min</> <replaceable>max</> [ uniform | { gaussian | exponential } <replaceable>threshold</> ]</literal>
779+
<literal>\setrandom <replaceable>varname</> <replaceable>min</> <replaceable>max</> [ uniform | { gaussian | exponential } <replaceable>parameter</> ]</literal>
780780
</term>
781781

782782
<listitem>
@@ -792,54 +792,63 @@ pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</>
792792
By default, or when <literal>uniform</> is specified, all values in the
793793
range are drawn with equal probability. Specifying <literal>gaussian</>
794794
or <literal>exponential</> options modifies this behavior; each
795-
requires a mandatory threshold which determines the precise shape of the
795+
requires a mandatory parameter which determines the precise shape of the
796796
distribution.
797797
</para>
798798

799799
<para>
800800
For a Gaussian distribution, the interval is mapped onto a standard
801801
normal distribution (the classical bell-shaped Gaussian curve) truncated
802-
at <literal>-threshold</> on the left and <literal>+threshold</>
802+
at <literal>-parameter</> on the left and <literal>+parameter</>
803803
on the right.
804+
Values in the middle of the interval are more likely to be drawn.
804805
To be precise, if <literal>PHI(x)</> is the cumulative distribution
805806
function of the standard normal distribution, with mean <literal>mu</>
806-
defined as <literal>(max + min) / 2.0</>, then value <replaceable>i</>
807-
between <replaceable>min</> and <replaceable>max</> inclusive is drawn
808-
with probability:
809-
<literal>
810-
(PHI(2.0 * threshold * (i - min - mu + 0.5) / (max - min + 1)) -
811-
PHI(2.0 * threshold * (i - min - mu - 0.5) / (max - min + 1))) /
812-
(2.0 * PHI(threshold) - 1.0)</>.
813-
Intuitively, the larger the <replaceable>threshold</>, the more
807+
defined as <literal>(max + min) / 2.0</>, with
808+
<literallayout>
809+
f(x) = PHI(2.0 * parameter * (x - mu) / (max - min + 1)) /
810+
(2.0 * PHI(parameter) - 1.0)
811+
</literallayout>
812+
then value <replaceable>i</> between <replaceable>min</> and
813+
<replaceable>max</> inclusive is drawn with probability:
814+
<literal>f(i + 0.5) - f(i - 0.5)</>.
815+
Intuitively, the larger <replaceable>parameter</>, the more
814816
frequently values close to the middle of the interval are drawn, and the
815817
less frequently values close to the <replaceable>min</> and
816-
<replaceable>max</> bounds.
817-
About 67% of values are drawn from the middle <literal>1.0 / threshold</>
818-
and 95% in the middle <literal>2.0 / threshold</>; for instance, if
819-
<replaceable>threshold</> is 4.0, 67% of values are drawn from the middle
820-
quarter and 95% from the middle half of the interval.
821-
The minimum <replaceable>threshold</> is 2.0 for performance of
822-
the Box-Muller transform.
818+
<replaceable>max</> bounds. About 67% of values are drawn from the
819+
middle <literal>1.0 / parameter</>, that is a relative
820+
<literal>0.5 / parameter</> around the mean, and 95% in the middle
821+
<literal>2.0 / parameter</>, that is a relative
822+
<literal>1.0 / parameter</> around the mean; for instance, if
823+
<replaceable>parameter</> is 4.0, 67% of values are drawn from the
824+
middle quarter (1.0 / 4.0) of the interval (i.e. from
825+
<literal>3.0 / 8.0</> to <literal>5.0 / 8.0</>) and 95% from
826+
the middle half (<literal>2.0 / 4.0</>) of the interval (second and
827+
third quartiles). The minimum <replaceable>parameter</> is 2.0 for
828+
performance of the Box-Muller transform.
823829
</para>
824830

825831
<para>
826-
For an exponential distribution, the <replaceable>threshold</>
827-
parameter controls the distribution by truncating a quickly-decreasing
828-
exponential distribution at <replaceable>threshold</>, and then
832+
For an exponential distribution, <replaceable>parameter</>
833+
controls the distribution by truncating a quickly-decreasing
834+
exponential distribution at <replaceable>parameter</>, and then
829835
projecting onto integers between the bounds.
830-
To be precise, value <replaceable>i</> between <replaceable>min</> and
836+
To be precise, with
837+
<literallayout>
838+
f(x) = exp(-parameter * (x - min) / (max - min + 1)) / (1.0 - exp(-parameter))
839+
</literallayout>
840+
Then value <replaceable>i</> between <replaceable>min</> and
831841
<replaceable>max</> inclusive is drawn with probability:
832-
<literal>(exp(-threshold*(i-min)/(max+1-min)) -
833-
exp(-threshold*(i+1-min)/(max+1-min))) / (1.0 - exp(-threshold))</>.
834-
Intuitively, the larger the <replaceable>threshold</>, the more
842+
<literal>f(x) - f(x + 1)</>.
843+
Intuitively, the larger <replaceable>parameter</>, the more
835844
frequently values close to <replaceable>min</> are accessed, and the
836845
less frequently values close to <replaceable>max</> are accessed.
837-
The closer to 0 the threshold, the flatter (more uniform) the access
838-
distribution.
846+
The closer to 0 <replaceable>parameter</>, the flatter (more uniform)
847+
the access distribution.
839848
A crude approximation of the distribution is that the most frequent 1%
840849
values in the range, close to <replaceable>min</>, are drawn
841-
<replaceable>threshold</>% of the time.
842-
The <replaceable>threshold</> value must be strictly positive.
850+
<replaceable>parameter</>% of the time.
851+
<replaceable>parameter</> value must be strictly positive.
843852
</para>
844853

845854
<para>

src/bin/pgbench/pgbench.c

Lines changed: 40 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -100,7 +100,7 @@ static int pthread_join(pthread_t th, void **thread_return);
100100
#define LOG_STEP_SECONDS 5 /* seconds between log messages */
101101
#define DEFAULT_NXACTS 10 /* default nxacts */
102102

103-
#define MIN_GAUSSIAN_THRESHOLD 2.0 /* minimum threshold for gauss */
103+
#define MIN_GAUSSIAN_PARAM 2.0 /* minimum parameter for gauss */
104104

105105
int nxacts = 0; /* number of transactions per client */
106106
int duration = 0; /* duration in seconds */
@@ -503,47 +503,47 @@ getrand(TState *thread, int64 min, int64 max)
503503

504504
/*
505505
* random number generator: exponential distribution from min to max inclusive.
506-
* the threshold is so that the density of probability for the last cut-off max
507-
* value is exp(-threshold).
506+
* the parameter is so that the density of probability for the last cut-off max
507+
* value is exp(-parameter).
508508
*/
509509
static int64
510-
getExponentialRand(TState *thread, int64 min, int64 max, double threshold)
510+
getExponentialRand(TState *thread, int64 min, int64 max, double parameter)
511511
{
512512
double cut,
513513
uniform,
514514
rand;
515515

516-
Assert(threshold > 0.0);
517-
cut = exp(-threshold);
516+
Assert(parameter > 0.0);
517+
cut = exp(-parameter);
518518
/* erand in [0, 1), uniform in (0, 1] */
519519
uniform = 1.0 - pg_erand48(thread->random_state);
520520

521521
/*
522-
* inner expresion in (cut, 1] (if threshold > 0), rand in [0, 1)
522+
* inner expresion in (cut, 1] (if parameter > 0), rand in [0, 1)
523523
*/
524524
Assert((1.0 - cut) != 0.0);
525-
rand = -log(cut + (1.0 - cut) * uniform) / threshold;
525+
rand = -log(cut + (1.0 - cut) * uniform) / parameter;
526526
/* return int64 random number within between min and max */
527527
return min + (int64) ((max - min + 1) * rand);
528528
}
529529

530530
/* random number generator: gaussian distribution from min to max inclusive */
531531
static int64
532-
getGaussianRand(TState *thread, int64 min, int64 max, double threshold)
532+
getGaussianRand(TState *thread, int64 min, int64 max, double parameter)
533533
{
534534
double stdev;
535535
double rand;
536536

537537
/*
538-
* Get user specified random number from this loop, with -threshold <
539-
* stdev <= threshold
538+
* Get user specified random number from this loop,
539+
* with -parameter < stdev <= parameter
540540
*
541541
* This loop is executed until the number is in the expected range.
542542
*
543-
* As the minimum threshold is 2.0, the probability of looping is low:
543+
* As the minimum parameter is 2.0, the probability of looping is low:
544544
* sqrt(-2 ln(r)) <= 2 => r >= e^{-2} ~ 0.135, then when taking the
545545
* average sinus multiplier as 2/pi, we have a 8.6% looping probability in
546-
* the worst case. For a 5.0 threshold value, the looping probability is
546+
* the worst case. For a parameter value of 5.0, the looping probability is
547547
* about e^{-5} * 2 / pi ~ 0.43%.
548548
*/
549549
do
@@ -568,10 +568,10 @@ getGaussianRand(TState *thread, int64 min, int64 max, double threshold)
568568
* over.
569569
*/
570570
}
571-
while (stdev < -threshold || stdev >= threshold);
571+
while (stdev < -parameter || stdev >= parameter);
572572

573-
/* stdev is in [-threshold, threshold), normalization to [0,1) */
574-
rand = (stdev + threshold) / (threshold * 2.0);
573+
/* stdev is in [-parameter, parameter), normalization to [0,1) */
574+
rand = (stdev + parameter) / (parameter * 2.0);
575575

576576
/* return int64 random number within between min and max */
577577
return min + (int64) ((max - min + 1) * rand);
@@ -1498,7 +1498,7 @@ doCustom(TState *thread, CState *st, instr_time *conn_time, FILE *logfile, AggVa
14981498
char *var;
14991499
int64 min,
15001500
max;
1501-
double threshold = 0;
1501+
double parameter = 0;
15021502
char res[64];
15031503

15041504
if (*argv[2] == ':')
@@ -1569,41 +1569,49 @@ doCustom(TState *thread, CState *st, instr_time *conn_time, FILE *logfile, AggVa
15691569
{
15701570
if ((var = getVariable(st, argv[5] + 1)) == NULL)
15711571
{
1572-
fprintf(stderr, "%s: invalid threshold number: \"%s\"\n",
1572+
fprintf(stderr, "%s: invalid parameter: \"%s\"\n",
15731573
argv[0], argv[5]);
15741574
st->ecnt++;
15751575
return true;
15761576
}
1577-
threshold = strtod(var, NULL);
1577+
parameter = strtod(var, NULL);
15781578
}
15791579
else
1580-
threshold = strtod(argv[5], NULL);
1580+
parameter = strtod(argv[5], NULL);
15811581

15821582
if (pg_strcasecmp(argv[4], "gaussian") == 0)
15831583
{
1584-
if (threshold < MIN_GAUSSIAN_THRESHOLD)
1584+
if (parameter < MIN_GAUSSIAN_PARAM)
15851585
{
1586-
fprintf(stderr, "gaussian threshold must be at least %f (not \"%s\")\n", MIN_GAUSSIAN_THRESHOLD, argv[5]);
1586+
fprintf(stderr, "gaussian parameter must be at least %f (not \"%s\")\n", MIN_GAUSSIAN_PARAM, argv[5]);
15871587
st->ecnt++;
15881588
return true;
15891589
}
15901590
#ifdef DEBUG
1591-
printf("min: " INT64_FORMAT " max: " INT64_FORMAT " random: " INT64_FORMAT "\n", min, max, getGaussianRand(thread, min, max, threshold));
1591+
printf("min: " INT64_FORMAT " max: " INT64_FORMAT " random: " INT64_FORMAT "\n",
1592+
min, max,
1593+
getGaussianRand(thread, min, max, parameter));
15921594
#endif
1593-
snprintf(res, sizeof(res), INT64_FORMAT, getGaussianRand(thread, min, max, threshold));
1595+
snprintf(res, sizeof(res), INT64_FORMAT,
1596+
getGaussianRand(thread, min, max, parameter));
15941597
}
15951598
else if (pg_strcasecmp(argv[4], "exponential") == 0)
15961599
{
1597-
if (threshold <= 0.0)
1600+
if (parameter <= 0.0)
15981601
{
1599-
fprintf(stderr, "exponential threshold must be greater than zero (not \"%s\")\n", argv[5]);
1602+
fprintf(stderr,
1603+
"exponential parameter must be greater than zero (not \"%s\")\n",
1604+
argv[5]);
16001605
st->ecnt++;
16011606
return true;
16021607
}
16031608
#ifdef DEBUG
1604-
printf("min: " INT64_FORMAT " max: " INT64_FORMAT " random: " INT64_FORMAT "\n", min, max, getExponentialRand(thread, min, max, threshold));
1609+
printf("min: " INT64_FORMAT " max: " INT64_FORMAT " random: " INT64_FORMAT "\n",
1610+
min, max,
1611+
getExponentialRand(thread, min, max, parameter));
16051612
#endif
1606-
snprintf(res, sizeof(res), INT64_FORMAT, getExponentialRand(thread, min, max, threshold));
1613+
snprintf(res, sizeof(res), INT64_FORMAT,
1614+
getExponentialRand(thread, min, max, parameter));
16071615
}
16081616
}
16091617
else /* this means an error somewhere in the parsing phase... */
@@ -2297,8 +2305,9 @@ process_commands(char *buf, const char *source, const int lineno)
22972305
if (pg_strcasecmp(my_commands->argv[0], "setrandom") == 0)
22982306
{
22992307
/*
2300-
* parsing: \setrandom variable min max [uniform] \setrandom
2301-
* variable min max (gaussian|exponential) threshold
2308+
* parsing:
2309+
* \setrandom variable min max [uniform]
2310+
* \setrandom variable min max (gaussian|exponential) parameter
23022311
*/
23032312

23042313
if (my_commands->argc < 4)
@@ -2323,7 +2332,7 @@ process_commands(char *buf, const char *source, const int lineno)
23232332
if (my_commands->argc < 6)
23242333
{
23252334
syntax_error(source, lineno, my_commands->line, my_commands->argv[0],
2326-
"missing threshold argument", my_commands->argv[4], -1);
2335+
"missing parameter", my_commands->argv[4], -1);
23272336
}
23282337
else if (my_commands->argc > 6)
23292338
{

0 commit comments

Comments
 (0)