The Histogram Library (histogram)
Definitions of relations for creating histograms in Rel.
Module: histograms
View source Contains functions for creating single-column histograms. The histogram is created on the last argument of a given relation and statistics are computed on each bucket of the histogram. By default, min, max, and count statistics are included by all the functions of the module. The supported types of histograms are equi-width and equi-depth. In an equi-width histogram, domain values are bucketized on intervals of equal ranges. In an equi-depth histogram, the buckets have equal size (in terms of the number of tuples contained in the bucket). The equi-depth histograms are constructed on a best-effort basis and do not guarantee an optimal bucketization.
The module offers functions of 2 types with different interfaces:
Hardcoded number of statistic functions
Here, the number of requested statistics is part of the function name, e.g.
histogram_equiwidth_2[R, 10, F1, F2]
where
R
is a relation10
is the number of bucketsF1
andF2
are two aggregate statistic functions to be applied to the values of the tuples contained in the bucket, such asaverage
orsum
.
The output is a relation with arity , where is the number of requested statistics:
(bucket_id, min_value, max_value, count, F1_stat, ..., Fi_stat)
where
bucket_id
is an integer identifier for the bucket starting from 1min_value
is the minimum value among tuples contained in the bucketmax_value
is the maximum value among tuples contained in the bucketcount
is the number of tuples contained in the bucket- The remaining arguments are the values of the requested statistics, in the order that they were specified.
The maximum supported number of statistic functions that can be requested via this interface is 3.
Example 1
def T = {1; 2; 3; 4; 6; 11}
with histograms use histogram_equiwidth_2
def output = histogram_equiwidth_2[T, 2, average, sum]
(1, 1, 4, 4, 2.5, 10)
(2, 6, 11, 2, 8.5, 17)
Example 2
def T = {1; 2; 3; 4; 6; 11}
with histograms use histogram_equidepth_1
def output = histogram_equidepth_1[T, 2, average]
(1, 1, 3, 3, 2.0)
(2, 4, 11, 3, 7.0)
Arbitrary number of statistic functions given as a module
With this interface, any number of statistics can be requested,
by wrapping them inside a module. For example:
histogram_equiwidth[R, 10, STATS]
where
R
is a relation10
is the number of bucketsSTATS
is a module that contains definitions of statistic functions.
For example:
@inline
module STATS
def myAvg[F] = average[F]
def mySum[F] = sum[F]
end
Because there is no order between the statistic functions of the module,
this relation is ternary, with the middle argument specifying the name
of the statistic function as a string:
(bucket_id, stat_name, stat_value)
where
bucket_id
is an integer identifier for the bucket starting from 1stat_name
is a string that specifies the statistic functionstat_value
is the value of the statistic function applied on the bucket
Note that min, max, and count are included by default via this interface too,
identified by the strings "min"
, "max"
, and "count"
respectively.
Example 1
def T = {1; 2; 3; 4; 6; 11}
with histograms use histogram_equiwidth
@inline
module STATS
def myAvg[F] = average[F]
def mySum[F] = sum[F]
end
def output = histogram_equiwidth[T, 2, STATS]
(1, "myAvg", 2.5)
(2, "myAvg", 8.5)
(1, "max", 4)
(1, "min", 1)
(1, "count", 4)
(1, "mySum", 10)
(2, "max", 11)
(2, "min", 6)
(2, "count", 2)
(2, "mySum", 17)
Example 2
def T = {1; 2; 3; 4; 6; 11}
with histograms use histogram_equidepth
@inline
module STATS
def myAvg[F] = average[F]
end
def output = histogram_equidepth[T, 2, STATS]
(1, "max", 3)
(1, "min", 1)
(1, "count", 3)
(2, "max", 11)
(2, "min", 4)
(2, "count", 3)
(1, "myAvg", 2.0)
(2, "myAvg", 7.0)
histogram_equiwidth_0
View source
Computes an equi-width histogram on (the last argument of) the relation R
with n
buckets.
min, max, count are included by default.
histogram_equiwidth_1
View source
Computes an equi-width histogram on (the last argument of) the relation R
with n
buckets
and computes 1 statistic function F1
on each bucket.
min, max, count are included by default.
histogram_equiwidth_2
View source
Computes an equi-width histogram on (the last argument of) the relation R
with n
buckets
and computes 2 statistic functions F1
and F2
on each bucket.
min, max, count are included by default.
histogram_equiwidth_3
View source
Computes an equi-width histogram on (the last argument of) the relation R
with n
buckets
and computes 3 statistic functions F1
, F2
, and F3
on each bucket.
min, max, count are included by default.
histogram_equiwidth
View source
Computes an equi-width histogram on the last argument of R
with n
buckets
and computes all the statistic functions contained in the module A
on each bucket.
min, max, count are included by default.
histogram_equidepth_0
View source
Computes an equi-depth histogram on the last argument of R
with n
buckets.
min, max, count are included by default.
histogram_equidepth_1
View source
Computes an equi-depth histogram on the last argument of R
with n
buckets
and computes 1 statistic function F1
on each bucket.
min, max, count are included by default.
histogram_equidepth_2
View source
Computes an equi-depth histogram on the last argument of R
with n
buckets
and computes 2 statistic functions F1
and F2
on each bucket.
min, max, count are included by default.
histogram_equidepth_3
View source
Computes an equi-depth histogram on the last argument of R
with n
buckets
and computes 3 statistic functions F1
, F2
, and F3
on each bucket.
min, max, count are included by default.
histogram_equidepth
View source
Computes an equi-depth histogram on the last argument of R
with n
buckets
and computes all the statistic functions contained in the module A
on each bucket.
min, max, count are included by default.