The Histogram Library (histogram)

Definitions of relations for creating histograms in Rel.

Module: histograms

Contains functions for creating single-column histograms. The histogram is created on the last argument of a given relation and statistics are computed on each bucket of the histogram. By default, min, max, and count statistics are computed and returned by all the functions of the module. The supported types of histograms are equi-width and equi-depth. In an equi-width histogram, domain values are bucketized on intervals of equal ranges. In an equi-depth histogram, the buckets have equal size (in terms of the number of tuples contained in the bucket). The equi-depth histograms are constructed on a best-effort basis and do not guarantee an optimal bucketization.

The module offers functions of 2 types with different interfaces:

Hardcoded number of statistic functions

Here, the number of requested statistics is part of the function name, e.g. histogram_equiwidth_2[R, 10, F1, F2] where

  • R is a relation
  • 10 is the number of buckets
  • F1 and F2 are two aggregate statistic functions to be applied to the values of the tuples contained in the bucket, such as average or sum.

The output is a relation with arity $4+i$, where $i$ is the number of requested statistics: (bucket_id, min_value, max_value, count, F1_stat, ..., Fi_stat) where

  • bucket_id is an integer identifier for the bucket starting from 1
  • min_value is the minimum value among tuples contained in the bucket
  • max_value is the maximum value among tuples contained in the bucket
  • count is the number of tuples contained in the bucket
  • The remaining arguments are the values of the requested statistics, in the order that they were specified.

The maximum supported number of statistic functions that can be requested via this interface is 3.

Example 1

def T = {1; 2; 3; 4; 6; 11}

with histograms use histogram_equiwidth_2

def output = histogram_equiwidth_2[T, 2, average, sum]
(1, 1, 4, 4, 2.5, 10)
(2, 6, 11, 2, 8.5, 17)

Example 2

def T = {1; 2; 3; 4; 6; 11}

with histograms use histogram_equidepth_1

def output = histogram_equidepth_1[T, 2, average]
(1, 1, 3, 3, 2.0)
(2, 4, 11, 3, 7.0)

Arbitrary number of statistic functions given as a module

With this interface, any number of statistics can be requested, by wrapping them inside a module. For example: histogram_equiwidth[R, 10, STATS] where

  • R is a relation
  • 10 is the number of buckets
  • STATS is a module that contains definitions of statistic functions.

For example:

@inline
module STATS
def myAvg[F] = average[F]
def mySum[F] = sum[F]
end

Because there is no order between the statistic functions of the module, the returned relation is ternary, with the middle argument specifying the name of the statistic function as a string: (bucket_id, stat_name, stat_value) where

  • bucket_id is an integer identifier for the bucket starting from 1
  • stat_name is a string that specifies the statistic function
  • stat_value is the value of the statistic function applied on the bucket

Note that min, max, and count are returned by default via this interface too, identified by the strings "min", "max", and "count" respectively.

Example 1

def T = {1; 2; 3; 4; 6; 11}

with histograms use histogram_equiwidth

@inline
module STATS
def myAvg[F] = average[F]
def mySum[F] = sum[F]
end

def output = histogram_equiwidth[T, 2, STATS]
(1, "myAvg", 2.5)
(2, "myAvg", 8.5)
(1, "max", 4)
(1, "min", 1)
(1, "count", 4)
(1, "mySum", 10)
(2, "max", 11)
(2, "min", 6)
(2, "count", 2)
(2, "mySum", 17)

Example 2

def T = {1; 2; 3; 4; 6; 11}

with histograms use histogram_equidepth

@inline
module STATS
def myAvg[F] = average[F]
end

def output = histogram_equidepth[T, 2, STATS]
(1, "max", 3)
(1, "min", 1)
(1, "count", 3)
(2, "max", 11)
(2, "min", 4)
(2, "count", 3)
(1, "myAvg", 2.0)
(2, "myAvg", 7.0)

histogram_equiwidth_0

Computes an equi-width histogram on (the last argument of) the relation R with n buckets. min, max, count are returned by default.

Definition
def histogram_equiwidth_0[R, n](b, lower, upper, cnt) =
lower = min[Bw[R, n][b]] and upper = max[Bw[R, n][b]] and
cnt = count[Bw[R, n][b]]

histogram_equiwidth_1

Computes an equi-width histogram on (the last argument of) the relation R with n buckets and computes 1 statistic function F1 on each bucket. min, max, count are returned by default.

Definition
def histogram_equiwidth_1[R, n, F1](b, lower, upper, cnt, f1) =
lower = min[Bw[R, n][b]] and upper = max[Bw[R, n][b]] and
cnt = count[Bw[R, n][b]] and
f1 = F1[Bw[R, n][b]]

histogram_equiwidth_2

Computes an equi-width histogram on (the last argument of) the relation R with n buckets and computes 2 statistic functions F1 and F2 on each bucket. min, max, count are returned by default.

Definition
def histogram_equiwidth_2[R, n, F1, F2](b, lower, upper, cnt, f1, f2) =
lower = min[Bw[R, n][b]] and upper = max[Bw[R, n][b]] and
cnt = count[Bw[R, n][b]] and
f1 = F1[Bw[R, n][b]] and f2 = F2[Bw[R, n][b]]

histogram_equiwidth_3

Computes an equi-width histogram on (the last argument of) the relation R with n buckets and computes 3 statistic functions F1, F2, and F3 on each bucket. min, max, count are returned by default.

Definition
def histogram_equiwidth_3[R, n, F1, F2, F3](b, lower, upper, cnt, f1, f2, f3) =
lower = min[Bw[R, n][b]] and upper = max[Bw[R, n][b]] and
cnt = count[Bw[R, n][b]] and
f1 = F1[Bw[R, n][b]] and f2 = F2[Bw[R, n][b]] and f3 = F3[Bw[R, n][b]]

histogram_equiwidth

Computes an equi-width histogram on the last argument of R with n buckets and computes all the statistic functions contained in the module A on each bucket. min, max, count are returned by default.

Definition
def histogram_equiwidth[R, n, A](b, stat_name, stat_val) =
(stat_name = "min" and stat_val = min[Bw[R, n][b]]) or
(stat_name = "max" and stat_val = max[Bw[R, n][b]]) or
(stat_name = "count" and stat_val = count[Bw[R, n][b]]) or
(stat_name = relname_string[Stat] and stat_val = A[Stat, Bw[R, n][b]] from Stat)

histogram_equidepth_0

Computes an equi-depth histogram on the last argument of R with n buckets. min, max, count are returned by default.

Definition
def histogram_equidepth_0[R, n](b, lower, upper, cnt) =
lower = min[Bd[R, n][b]] and upper = max[Bd[R, n][b]] and
cnt = count[Bd[R, n][b]]

histogram_equidepth_1

Computes an equi-depth histogram on the last argument of R with n buckets and computes 1 statistic function F1 on each bucket. min, max, count are returned by default.

Definition
def histogram_equidepth_1[R, n, F1](b, lower, upper, cnt, f1) =
lower = min[Bd[R, n][b]] and upper = max[Bd[R, n][b]] and
cnt = count[Bd[R, n][b]] and
f1 = F1[Bd[R, n][b]]

histogram_equidepth_2

Computes an equi-depth histogram on the last argument of R with n buckets and computes 2 statistic functions F1 and F2 on each bucket. min, max, count are returned by default.

Definition
def histogram_equidepth_2[R, n, F1, F2](b, lower, upper, cnt, f1, f2) =
lower = min[Bd[R, n][b]] and upper = max[Bd[R, n][b]] and
cnt = count[Bd[R, n][b]] and
f1 = F1[Bd[R, n][b]] and f2 = F2[Bd[R, n][b]]

histogram_equidepth_3

Computes an equi-depth histogram on the last argument of R with n buckets and computes 3 statistic functions F1, F2, and F3 on each bucket. min, max, count are returned by default.

Definition
def histogram_equidepth_3[R, n, F1, F2, F3](b, lower, upper, cnt, f1, f2, f3) =
lower = min[Bd[R, n][b]] and upper = max[Bd[R, n][b]] and
cnt = count[Bd[R, n][b]] and
f1 = F1[Bd[R, n][b]] and f2 = F2[Bd[R, n][b]] and f3 = F3[Bd[R, n][b]]

histogram_equidepth

Computes an equi-depth histogram on the last argument of R with n buckets and computes all the statistic functions contained in the module A on each bucket. min, max, count are returned by default.

Definition
def histogram_equidepth[R, n, A](b, stat_name, stat_val) =
(stat_name = "min" and stat_val = min[Bd[R, n][b]]) or
(stat_name = "max" and stat_val = max[Bd[R, n][b]]) or
(stat_name = "count" and stat_val = count[Bd[R, n][b]]) or
(stat_name = relname_string[Stat] and stat_val = A[Stat, Bd[R, n][b]] from Stat)