Code Overview¶
Data¶
Module to deal with all matters relating to loading example data sets, which we tune ML models to.

class
bayesmark.data.
ProblemType
[source]¶ The different problem types we consider. Currently, just regression (reg) and classification (clf).

bayesmark.data.
get_problem_type
(dataset_name)[source]¶ Determine if this dataset is a regression of classification problem.
 Parameters
dataset (str) – Which data set to use, must be key in DATA_LOADERS dict, or name of custom csv file.
 Returns
problem_type – Enum to indicate if regression of classification data set.
 Return type

bayesmark.data.
load_data
(dataset_name, data_root=None)[source]¶ Load a data set and return it in, preprocessed into numpy arrays.
 Parameters
 Returns
data (
numpy.ndarray
of shape (n, d)) – The feature matrix of the data set. It will be float array.target (
numpy.ndarray
of shape (n,)) – The target vector for the problem, which is int for classification and float for regression.problem_type (
bayesmark.data.ProblemType
) – Enum to indicate if regression of classification data set.
Expected Max Estimation¶
Compute expected maximum or minimum from iid samples.

bayesmark.expected_max.
expected_max
(x, m)[source]¶ Compute unbiased estimator of expected
max(x[1:m])
on a data set. Parameters
x (
numpy.ndarray
of shape (n,)) – Data set we would like expectedmax(x[1:m])
on.m (int or
numpy.ndarray
with dtype int) – This function is for estimating the expected maximum over m iid draws. Requirem >= 1
. This can be broadcasted. Ifm > n
, the weights will be nan, because there is no way to get unbiased estimate in that case.
 Returns
E_max_x – Unbiased estimate of mean max of m draws from distribution on x.
 Return type

bayesmark.expected_max.
expected_min
(x, m)[source]¶ Compute unbiased estimator of expected
min(x[1:m])
on a data set. Parameters
x (
numpy.ndarray
of shape (n,)) – Data set we would like expectedmin(x[1:m])
on. Requirelen(x) >= 1
.m (int or
numpy.ndarray
with dtype int) – This function is for estimating the expected minimum over m iid draws. Requirem >= 1
. This can be broadcasted. Ifm > n
, the weights will be nan, because there is no way to get unbiased estimate in that case.
 Returns
E_min_x – Unbiased estimate of mean min of m draws from distribution on x.
 Return type

bayesmark.expected_max.
get_expected_max_weights
(n, m)[source]¶ Get the Lestimator weights for computing unbiased estimator of expected
max(x[1:m])
on a data set. Parameters
n (int) – Number of data points in data set
len(x)
. Must be>= 1
.m (int or
numpy.ndarray
with dtype int) – This function is for estimating the expected maximum over m iid draws. Requirem >= 1
. This can be broadcasted. Ifm > n
, the weights will be nan, because there is no way to get unbiased estimate in that case.
 Returns
pdf – The weights for Lestimator. Will be positive and sum to one.
 Return type
numpy.ndarray
, shape (n,)
Experiment Aggregation¶
Aggregate the results of many studies to prepare analysis.

bayesmark.experiment_aggregate.
concat_experiments
(all_experiments, ravel=False)[source]¶ Aggregate the Datasets from a series of experiments into combined Dataset.
 Parameters
all_experiments (typing.Iterable) – Iterable (possible from a generator) with the Datasets from each experiment. Each item in all_experiments is a pair containing
(meta_data, data)
. See load_experiments for details on these variables,ravel (bool) – If true, ravel all studies to store batch suggestions as if they were serial.
 Returns
all_perf (
xarray.Dataset
) – DataArray containing all of the perf_da from the experiments. The metadata from the experiments are included as extra dimensions. all_perf has dimensions(ITER, SUGGEST, TEST_CASE, METHOD, TRIAL)
. To convert the uuid to a trial, there must be an equal number of repetition in the experiments for each TEST_CASE, METHOD combination. Likewise, all of the experiments need an equal number of ITER and SUGGEST. If ravel is true, then the SUGGEST is singleton.all_time (
xarray.Dataset
) – Dataset containing all of the time_ds from the experiments. The new dimensions are(ITER, TEST_CASE, METHOD, TRIAL)
. It has the same variables as time_ds.all_suggest (
xarray.Dataset
) – DataArray containing all of the suggest_ds from the experiments. It has dimensions(ITER, SUGGEST, TEST_CASE, METHOD, TRIAL)
.all_sigs (dict(str, list(list(float)))) – Aggregate of all experiment signatures.

bayesmark.experiment_aggregate.
load_experiments
(uuid_list, db_root, dbid)[source]¶ Generator to load the results of the experiments.
 Parameters
 Yields
meta_data ((str, str, str)) – The meta_data contains a tuple of str with
test_case, optimizer, uuid
.data ((
xarray.Dataset
,xarray.Dataset
,xarray.Dataset
list(float))) – The data contains a tuple of(perf_ds, time_ds, suggest_ds, sig)
. The perf_ds is axarray.Dataset
containing the evaluation results with dimensions(ITER, SUGGEST)
, each variable is an objective. The time_ds is anxarray.Dataset
containing the timing results of the form accepted by summarize_time. The coordinates must be compatible with perf_ds. The suggest_ds is axarray.Dataset
containing the inputs to the function evaluations. Each variable is a function input. Finally, sig contains the test_case signature and must be list(float).

bayesmark.experiment_aggregate.
summarize_time
(all_time)[source]¶ Transform a single timing dataset from an experiment into a form better for aggregation.
 Parameters
all_time (
xarray.Dataset
) – Dataset with variables(SUGGEST_PHASE, EVAL_PHASE, OBS_PHASE)
which have dimensions(ITER,)
,(ITER, SUGGEST)
, and(ITER,)
, respectively. The variable EVAL_PHASE has the function evaluation time for each parallel suggestion. Returns
time_summary – Dataset with variables
(SUGGEST_PHASE, OBS_PHASE, EVAL_PHASE_MAX, EVAL_PHASE_SUM)
which all have dimensions(ITER,)
. The maximum EVAL_PHASE_MAX is relevant for wall clock time, while EVAL_PHASE_SUM is relevant for CPU time. Return type
Experiment Analysis¶
Perform analysis to compare different optimizers across problems.

bayesmark.experiment_analysis.
compute_aggregates
(perf_da, baseline_ds, visible_perf_da=None)[source]¶ Aggregate function evaluations in the experiments to get performance summaries of each method.
 Parameters
perf_da (
xarray.DataArray
) – Aggregate experimental results with each function evaluation in the experiments according to true loss (e.g., generalization). perf_da has dimensions(ITER, SUGGEST, TEST_CASE, METHOD, TRIAL)
as is assumed to have no nan values.baseline_ds (
xarray.Dataset
) – Dataset with baseline performance. It was variables(PERF_MED, PERF_MEAN, PERF_CLIP, PERF_BEST)
with dimensions(ITER, TEST_CASE)
,(ITER, TEST_CASE)
,(TEST_CASE,)
, and(TEST_CASE,)
, respectively. PERF_MED is a baseline of performance based on random search when using medians to summarize performance. Likewise, PERF_MEAN is for means. PERF_CLIP is an upperbound to clip poor performance when using the mean. PERF_BEST is an estimate on the global minimum.visible_perf_da (
xarray.DataArray
) – Aggregate experimental results with each function evaluation in the experiments according to visible loss (e.g., validation). visible_perf_da has dimensions(ITER, SUGGEST, TEST_CASE, METHOD, TRIAL)
as is assumed to have no nan values. If None, we setvisible_perf_da = perf_da
.
 Returns
agg_result (
xarray.Dataset
) – Dataset with summary of performance for each method and test case combination. Contains variables:(PERF_MED, LB_MED, UB_MED, NORMED_MED, PERF_MEAN, LB_MEAN, UB_MEAN, NORMED_MEAN)
each with dimensions(ITER, METHOD, TEST_CASE)
. PERF_MED is a median summary of performance with LB_MED and UB_MED as error bars. NORMED_MED is a rescaled PERF_MED so we expect the optimal performance is 0, and random search gives 1 at all ITER. Likewise, PERF_MEAN, LB_MEAN, UB_MEAN, NORMED_MEAN are for mean performance.summary (
xarray.Dataset
) – Dataset with overall summary of performance of each method. Contains variables(PERF_MED, LB_MED, UB_MED, PERF_MEAN, LB_MEAN, UB_MEAN)
each with dimensions(ITER, METHOD)
.

bayesmark.experiment_analysis.
get_perf_array
(evals, evals_visible)[source]¶ Get the actual (e.g., generalization loss) over iterations.
 Parameters
evals (
numpy.ndarray
of shape (n_iter, n_batch, n_trials)) – The actual loss (e.g., generalization) for a given experiment.evals_visible (
numpy.ndarray
of shape (n_iter, n_batch, n_trials)) – The observable loss (e.g., validation) for a given experiment.
 Returns
perf_array – The best performance so far at iteration i from evals. Where the best has been selected according to evals_visible.
 Return type
numpy.ndarray
of shape (n_iter, n_trials)
Experiment Baseline¶
Build performance baselines from aggregate results to prepare analysis.

bayesmark.experiment_baseline.
compute_baseline
(perf_da)[source]¶ Compute a performance baseline of base and best performance from the aggregate experimental results.
 Parameters
perf_da (
xarray.DataArray
) – Aggregate experimental results with each function evaluation in the experiments. all_perf has dimensions(ITER, SUGGEST, TEST_CASE, METHOD, TRIAL)
as is assumed to have no nan values. Returns
baseline_ds – Dataset with baseline performance. It was variables
(PERF_MED, PERF_MEAN, PERF_CLIP, PERF_BEST)
with dimensions(ITER, TEST_CASE)
,(ITER, TEST_CASE)
,(TEST_CASE,)
, and(TEST_CASE,)
, respectively. PERF_MED is a baseline of performance based on random search when using medians to summarize performance. Likewise, PERF_MEAN is for means. PERF_CLIP is an upperbound to clip poor performance when using the mean. PERF_BEST is an estimate on the global minimum. Return type
Experiment Launcher¶
Launch studies in separate studies or do dry run to build jobs file with lists of commands to run.

bayesmark.experiment_launcher.
arg_safe_str
(val)[source]¶ Cast value as str, raise error if not safe as argument to argparse.

bayesmark.experiment_launcher.
dry_run
(args, opt_file_lookup, run_uuid, fp, random=<mtrand.RandomState object>)[source]¶ Write to buffer description of commands for running all experiments.
This function is almost pure by writing to a buffer, but it could be switched to a generator.
 Parameters
args (dict(CmdArgs, [int, str])) – Arguments of options to pass to the experiments being launched. The keys corresponds to the same arguments passed to this program.
opt_file_lookup (dict(str, str)) – Mapping from method name to filename containing wrapper class for the method.
run_uuid (uuid.UUID) – UUID for this launcher run. Needed to generate different experiments UUIDs on each call. This function is deterministic provided the same run_uuid.
fp (writable buffer) – File handle to write out sequence of commands to execute (broken into jobs on each line) to execute all the experiments (possibly each job in parallel).
random (RandomState) – Random stream to use for reproducibility.

bayesmark.experiment_launcher.
gen_commands
(args, opt_file_lookup, run_uuid)[source]¶ Generator providing commands to launch processes for experiments.
 Parameters
args (dict(CmdArgs, [int, str])) – Arguments of options to pass to the experiments being launched. The keys corresponds to the same arguments passed to this program.
opt_file_lookup (dict(str, str)) – Mapping from method name to filename containing wrapper class for the method.
run_uuid (uuid.UUID) – UUID for this launcher run. Needed to generate different experiments UUIDs on each call. This function is deterministic provided the same run_uuid.
 Yields
iteration_key ((str, str, str, str)) – Tuple containing
(trial, classifier, data, optimizer)
to index the experiment.full_cmd (tuple(str)) – Strings containing command and arguments to run a process with experiment. Join with whitespace or use
util.shell_join()
to get string with executable command. The command omitsoptroot
which means it will default to.
if the command is executed. As such, the command assumes it is executed withoptroot
as the working directory.

bayesmark.experiment_launcher.
real_run
(args, opt_file_lookup, run_uuid, timeout=None)[source]¶ Run sequence of independent experiments to fully run the benchmark.
This uses subprocess to launch a separate process (in serial) for each experiment.
 Parameters
args (dict(CmdArgs, [int, str])) – Arguments of options to pass to the experiments being launched. The keys corresponds to the same arguments passed to this program.
opt_file_lookup (dict(str, str)) – Mapping from method name to filename containing wrapper class for the method.
run_uuid (uuid.UUID) – UUID for this launcher run. Needed to generate different experiments UUIDs on each call. This function is deterministic provided the same run_uuid.
timeout (int) – Max seconds per experiment
Experiment¶
Perform a study.

bayesmark.experiment.
build_eval_ds
(function_evals, objective_names)[source]¶ Convert
numpy.ndarray
with function evaluations toxarray.Dataset
.This function is a data cleanup routine after running an experiment, before serializing the data to end the study.
 Parameters
function_evals (
numpy.ndarray
of shape (n_calls, n_suggestions, n_obj)) – Value of objective for each evaluation.objective_names (list(str) of shape (n_obj,)) – The names of each objective.
 Returns
eval_ds –
xarray.Dataset
containing one variable for each objective with the objective function evaluations. It has dimensions(ITER, SUGGEST)
. Return type

bayesmark.experiment.
build_suggest_ds
(suggest_log)[source]¶ Convert
numpy.ndarray
with function evaluation inputs toxarray.Dataset
.This function is a data cleanup routine after running an experiment, before serializing the data to end the study.

bayesmark.experiment.
build_timing_ds
(suggest_time, eval_time, observe_time)[source]¶ Convert
numpy.ndarray
with timing evaluations toxarray.Dataset
.This function is a data cleanup routine after running an experiment, before serializing the data to end the study.
 Parameters
suggest_time (
numpy.ndarray
of shape (n_calls,)) – The time to make each (batch) suggestion.eval_time (
numpy.ndarray
of shape (n_calls, n_suggestions)) – The time for each evaluation of the objective function.observe_time (
numpy.ndarray
of shape (n_calls,)) – The time for each (batch) evaluation of the objective function, and the time to make an observe call.
 Returns
time_ds – Dataset with variables
(SUGGEST_PHASE, EVAL_PHASE, OBS_PHASE)
which have dimensions(ITER,)
,(ITER, SUGGEST)
, and(ITER,)
, respectively. The variable EVAL_PHASE has the function evaluation time for each parallel suggestion. Return type

bayesmark.experiment.
get_objective_signature
(model_name, dataset, scorer, data_root=None)[source]¶ Get signature of an objective function specified by an sklearn model and dataset.
This routine specializes
signatures.get_func_signature()
for the sklearn study case. Parameters
model_name (str) – Which sklearn model we are attempting to tune, must be an element of constants.MODEL_NAMES.
dataset (str) – Which data set the model is being tuned to, which must be either a) an element of constants.DATA_LOADER_NAMES, or b) the name of a csv file in the data_root folder for a custom data set.
scorer (str) – Which metric to use when evaluating the model. This must be an element of sklearn_funcs.SCORERS_CLF for classification models, or sklearn_funcs.SCORERS_REG for regression models.
data_root (str) – Absolute path to folder containing custom data sets. This may be
None
if no custom data sets are used.``
 Returns
signature – The signature of this test function.
 Return type

bayesmark.experiment.
load_optimizer_kwargs
(optimizer_name, opt_root)[source]¶ Load the kwarg options for this optimizer being tested.
This is part of the general experiment setup before a study.
 Parameters
 Returns
kwargs – The kwargs setting to pass into the optimizer wrapper constructor.
 Return type

bayesmark.experiment.
main
()[source]¶ This is where experiments happen. Usually called by the experiment launcher.

bayesmark.experiment.
run_sklearn_study
(opt_class, opt_kwargs, model_name, dataset, scorer, n_calls, n_suggestions, data_root=None, callback=None)[source]¶ Run a study for a single optimizer on a single sklearn model/data set combination.
This routine is meant for benchmarking when tuning sklearn models, as opposed to the more general
run_study()
. Parameters
opt_class (
abstract_optimizer.AbstractOptimizer
) – Type of wrapper optimizer must be subclass ofabstract_optimizer.AbstractOptimizer
.opt_kwargs (kwargs) – kwargs to use when instantiating the wrapper class.
model_name (str) – Which sklearn model we are attempting to tune, must be an element of constants.MODEL_NAMES.
dataset (str) – Which data set the model is being tuned to, which must be either a) an element of constants.DATA_LOADER_NAMES, or b) the name of a csv file in the data_root folder for a custom data set.
scorer (str) – Which metric to use when evaluating the model. This must be an element of sklearn_funcs.SCORERS_CLF for classification models, or sklearn_funcs.SCORERS_REG for regression models.
n_calls (int) – How many iterations of minimization to run.
n_suggestions (int) – How many parallel evaluation we run each iteration. Must be
>= 1
.data_root (str) – Absolute path to folder containing custom data sets. This may be
None
if no custom data sets are used.``callback (callable) – Optional callback taking the current best function evaluation, and the number of iterations finished. Takes array of shape (n_obj,).
 Returns
function_evals (
numpy.ndarray
of shape (n_calls, n_suggestions, n_obj)) – Value of objective for each evaluation.timing_evals ((
numpy.ndarray
,numpy.ndarray
,numpy.ndarray
)) – Tuple of 3 timing results:(suggest_time, eval_time, observe_time)
with shapes(n_calls,)
,(n_calls, n_suggestions)
, and(n_calls,)
. These are the time to make each suggestion, the time for each evaluation of the objective function, and the time to make an observe call.suggest_log (list(list(dict(str, object)))) – Log of the suggestions corresponding to the function_evals.

bayesmark.experiment.
run_study
(optimizer, test_problem, n_calls, n_suggestions, n_obj=1, callback=None)[source]¶ Run a study for a single optimizer on a single test problem.
This function can be used for benchmarking on general stateless objectives (not just sklearn).
 Parameters
optimizer (
abstract_optimizer.AbstractOptimizer
) – Instance of one of the wrapper optimizers.test_problem (
sklearn_funcs.TestFunction
) – Instance of test function to attempt to minimize.n_calls (int) – How many iterations of minimization to run.
n_suggestions (int) – How many parallel evaluation we run each iteration. Must be
>= 1
.n_obj (int) – Number of different objectives measured, only objective 0 is seen by optimizer. Must be
>= 1
.callback (callable) – Optional callback taking the current best function evaluation, and the number of iterations finished. Takes array of shape (n_obj,).
 Returns
function_evals (
numpy.ndarray
of shape (n_calls, n_suggestions, n_obj)) – Value of objective for each evaluation.timing_evals ((
numpy.ndarray
,numpy.ndarray
,numpy.ndarray
)) – Tuple of 3 timing results:(suggest_time, eval_time, observe_time)
with shapes(n_calls,)
,(n_calls, n_suggestions)
, and(n_calls,)
. These are the time to make each suggestion, the time for each evaluation of the objective function, and the time to make an observe call.suggest_log (list(list(dict(str, object)))) – Log of the suggestions corresponding to the function_evals.
Function Signatures¶
Routines to compute and compare the “signatures” of objective functions. These are useful to make sure two different studies were actually optimizing the same objective function (even if they say the same test case in the metadata).

bayesmark.signatures.
analyze_signature_pair
(signatures, signatures_ref)[source]¶ Analyze a pair of signatures (often from two sets of experiments) and return the error between them.
 Parameters
signatures (dict(str, list(float))) – Signatures from set of experiments. The signatures must all be the same length, so it should be 2D array like.
signatures_ref (dict(str, list(float))) – The signatures from a reference set of experiments. The keys in signatures must be a subset of the signatures in signatures_ref.
 Returns
sig_errs (
pandas.DataFrame
) – rows are test cases, columns are test points.signatures_median (dict(str, list(float))) – Median signature across all repetition per test case.

bayesmark.signatures.
analyze_signatures
(signatures)[source]¶ Analyze function signatures from the experiment.
 Parameters
signatures (dict(str, list(list(float)))) – The signatures should all be the same length, so it should be 2D array like.
 Returns
sig_errs (
pandas.DataFrame
) – rows are test cases, columns are test points.signatures_median (dict(str, list(float))) – Median signature across all repetition per test case.

bayesmark.signatures.
get_func_signature
(f, api_config)[source]¶ Get the function signature for an objective function in an experiment.
 Parameters
f (typing.Callable) – The objective function we want to compute the signature of. This function must take inputs in the form of
dict(str, object)
with one dictionary key per variable, and provide float as the output.api_config (dict(str, dict)) – Configuration of the optimization variables. See API description.
 Returns
signature_x (list(dict(str, object)) of shape (n_suggest,)) – The input locations probed on signature call.
signature_y (list(float) of shape (n_suggest,)) – The objective function values at the inputs points. This is the real signature.
Numpy Util¶
Utilities to that could be included in numpy but aren’t.

bayesmark.np_util.
clip_chk
(x, lb, ub, allow_nan=False)[source]¶ Clip all element of x to be between lb and ub like
numpy.clip()
, but also checknumpy.isclose()
.Shapes of all input variables must be broadcast compatible.
 Parameters
x (
numpy.ndarray
) – Array containing elements to clip.lb (
numpy.ndarray
) – Lower limit in clip.ub (
numpy.ndarray
) – Upper limit in clip.allow_nan (bool) – If true, we allow
nan
to be present in x without out raising an error.
 Returns
x – An array with the elements of x, but where values < lb are replaced with lb, and those > ub with ub.
 Return type

bayesmark.np_util.
cummin
(x_val, x_key)[source]¶ Get the cumulative minimum of x_val when ranked according to x_key.
 Parameters
x_val (
numpy.ndarray
of shape (n, d)) – The array to get the cumulative minimum of along axis 0.x_key (
numpy.ndarray
of shape (n, d)) – The array for ranking elements as to what is the minimum.
 Returns
c_min – The cumulative minimum array.
 Return type
numpy.ndarray
of shape (n, d)

bayesmark.np_util.
isclose_lte
(x, y)[source]¶ Check that less than or equal to (lte,
x <= y
) is approximately true between all elements of x and y.This is similar to
numpy.allclose()
for equality. Shapes of all input variables must be broadcast compatible. Parameters
x (
numpy.ndarray
) – Lower limit in<=
check.y (
numpy.ndarray
) – Upper limit in<=
check.
 Returns
lte – True if
x <= y
is approximately true elementwise. Return type

bayesmark.np_util.
linear_rescale
(X, lb0, ub0, lb1, ub1, enforce_bounds=True)[source]¶ Linearly transform all elements of X, bounded between lb0 and ub0, to be between lb1 and ub1.
Shapes of all input variables must be broadcast compatible.
 Parameters
X (
numpy.ndarray
) – Array containing elements to rescale.lb0 (
numpy.ndarray
) – Current lower bound of X.ub0 (
numpy.ndarray
) – Current upper bound of X.lb1 (
numpy.ndarray
) – Desired lower bound of X.ub1 (
numpy.ndarray
) – Desired upper bound of X.enforce_bounds (bool) – If True, perform input bounds check (and clipping if slight violation) on the input X and again on the output. This argument is not meant to be vectorized like the other input variables.
 Returns
X – Elements of input X after linear rescaling.
 Return type

bayesmark.np_util.
random_seed
(random=<mtrand.RandomState object>)[source]¶ Draw a random seed compatible with
numpy.random.RandomState
. Parameters
random (
numpy.random.RandomState
) – Random stream to use to draw the random seed. Returns
seed – Seed for a new random stream in
[0, 2**321)
. Return type

bayesmark.np_util.
shuffle_2d
(X, random=<mtrand.RandomState object>)[source]¶ Generalization of
numpy.random.shuffle()
of 2D array.Performs inplace shuffling of X. So, it has no return value.
 Parameters
X (
numpy.ndarray
of shape (n, m)) – Arraylike 2D data to shuffle in place. Shuffles order of rows and order of elements within a row.random (
numpy.random.RandomState
) – Random stream to use to draw the random seed.

bayesmark.np_util.
snap_to
(x, fixed_val=None)[source]¶ Snap input x to the fixed_val unless fixed_val is None, where x is returned.
 Parameters
x (
numpy.ndarray
) – Array containing elements to snap.fixed_val (
numpy.ndarray
or None) – Values to be returned if x is close, otherwise an error is raised. If fixed_val is None, x is returned.
 Returns
fixed_val – Snapped to value of x.
 Return type

bayesmark.np_util.
strat_split
(X, n_splits, inplace=False, random=<mtrand.RandomState object>)[source]¶ Make a stratified random split of items.
 Parameters
X (
numpy.ndarray
of shape (n, m)) – Data we would like to split randomly into groups. We should get the same number +/1 of elements from each row in each group.n_splits (int) – How many groups we want to split into.
inplace (bool) – If true, this function will cause in place modifications to X.
random (
numpy.random.RandomState
) – Random stream to use for reproducibility.
 Returns
Y – Stratified split of X where each row of Y contains the same number +/1 of elements from each row of X. Must be a list of arrays since each row may have a different length.
 Return type
list(
numpy.ndarray
)
Path Util¶
Utilities handy for manipulating paths that have extra checks not included in os.path.

bayesmark.path_util.
absopen
(path, mode)[source]¶ Safe version of the built in
open()
that only opens absolute paths.

bayesmark.path_util.
abspath
(path, verify=True)[source]¶ Combo of
os.path.abspath()
andos.path.expanduser()
that will also check existence of directory.

bayesmark.path_util.
join_safe_r
(*args)[source]¶ Safe version of
os.path.join()
that checks resulting path is absolute and the file exists for reading.

bayesmark.path_util.
join_safe_w
(*args)[source]¶ Safe version of
os.path.join()
that checks resulting path is absolute.Because this routine is for writing, if the file already exists, a warning is raised.
Quantile Estimation¶
Compute quantiles and confidence intervals.

bayesmark.quantiles.
max_quantile_CI
(X, q, m, alpha=0.05)[source]¶ Calculate CI on q quantile of distribution on max of m iid samples using a data set X.
This uses nonparametric estimation from order statistics and will have alpha level of at most alpha due to the discrete nature of order statistics.
 Parameters
X (
numpy.ndarray
of shape (n,)) – Data for quantile estimation. Can be vectorized. Must be sortable data type (which is almost everything).q (float) – Quantile to compute, must be in (0, 1). Can be vectorized.
m (int) – Compute statistics for distribution on max over m samples. Must be
>= 1
. Can be vectorized.alpha (float) – False positive rate we allow for CI, must be in (0, 1). Can be vectorized.
 Returns
estimate (dtype of X, scalar) – Best estimate on q quantile on max over m iid samples.
LB (dtype of X, scalar) – Lower end on CI
UB (dtype of X, scalar) – Upper end on CI

bayesmark.quantiles.
min_quantile_CI
(X, q, m, alpha=0.05)[source]¶ Calculate confidence interval on q quantile of distribution on min of m iid samples using a data set X.
This uses nonparametric estimation from order statistics and will have alpha level of at most alpha due to the discrete nature of order statistics.
 Parameters
X (
numpy.ndarray
of shape (n,)) – Data for quantile estimation. Can be vectorized. Must be sortable data type (which is almost everything).q (float) – Quantile to compute, must be in (0, 1). Can be vectorized.
m (int) – Compute statistics for distribution on min over m samples. Must be
>= 1
. Can be vectorized.alpha (float) – False positive rate we allow for CI, must be in (0, 1). Can be vectorized.
 Returns
estimate (dtype of X, scalar) – Best estimate on q quantile on min over m iid samples.
LB (dtype of X, scalar) – Lower end on CI
UB (dtype of X, scalar) – Upper end on CI

bayesmark.quantiles.
order_stats
(X)[source]¶ Compute order statistics on sample X.
Follows convention that order statistic 1 is minimum and statistic n is maximum. Therefore, array elements
0
andn+1
areinf
and+inf
. Parameters
X (
numpy.ndarray
of shape (n,)) – Data for order statistics. Can be vectorized. Must be sortable data type (which is almost everything). Returns
o_stats – Order statistics on X.
 Return type
numpy.ndarray
of shape (n+2,)

bayesmark.quantiles.
quantile
(X, q)[source]¶ Computes q th quantile of X.
Similar to
numpy.percentile()
except that it matches the mathematical definition of a quantile and q is scaled in (0,1) rather than (0,100). Parameters
X (
numpy.ndarray
of shape (n,)) – Data for quantile estimation. Can be vectorized. Must be sortable data type (which is almost everything).q (float) – Quantile to compute, must be in (0, 1). Can be vectorized.
 Returns
estimate – Empirical q quantile from sample X.
 Return type
dtype of X, scalar

bayesmark.quantiles.
quantile_CI
(X, q, alpha=0.05)[source]¶ Calculate CI on q quantile from same X using nonparametric estimation from order statistics.
This will have alpha level of at most alpha due to the discrete nature of order statistics.
 Parameters
X (
numpy.ndarray
of shape (n,)) – Data for quantile estimation. Can be vectorized. Must be sortable data type (which is almost everything).q (float) – Quantile to compute, must be in (0, 1). Can be vectorized.
alpha (float) – False positive rate we allow for CI, must be in (0, 1). Can be vectorized.
 Returns
LB (dtype of X, scalar) – Lower end on CI
UB (dtype of X, scalar) – Upper end on CI

bayesmark.quantiles.
quantile_and_CI
(X, q, alpha=0.05)[source]¶ Calculate CI on q quantile from same X using nonparametric estimation from order statistics.
This will have alpha level of at most alpha due to the discrete nature of order statistics.
 Parameters
X (
numpy.ndarray
of shape (n,)) – Data for quantile estimation. Can be vectorized. Must be sortable data type (which is almost everything).q (float) – Quantile to compute, must be in (0, 1). Can be vectorized.
alpha (float) – False positive rate we allow for CI, must be in (0, 1). Can be vectorized.
 Returns
estimate (dtype of X, scalar) – Empirical q quantile from sample X.
LB (dtype of X, scalar) – Lower end on CI
UB (dtype of X, scalar) – Upper end on CI
Random Search¶
A baseline random search in our standardized optimizer interface. Useful for baselines.

bayesmark.random_search.
suggest_dict
(X, y, meta, n_suggestions=1, random=<mtrand.RandomState object>)[source]¶ Stateless function to create suggestions for next query point in random search optimization.
This implements the API for general structures of different data types.
 Parameters
X (list(dict)) – Places where the objective function has already been evaluated. Not actually used in random search.
y (
numpy.ndarray
, shape (n,)) – Corresponding values where objective has been evaluated. Not actually used in random search.meta (dict(str, dict)) – Configuration of the optimization variables. See API description.
n_suggestions (int) – Desired number of parallel suggestions in the output
random (
numpy.random.RandomState
) – Optionally pass in random stream for reproducibility.
 Returns
next_guess – List of n_suggestions suggestions to evaluate the objective function. Each suggestion is a dictionary where each key corresponds to a parameter being optimized.
 Return type
Serialization¶
A serialization abstraction layer (SAL) to save and load experimental results. All IO of experimental results should go through this module. This makes changing the backend (between different databases) transparent to the benchmark code.

class
bayesmark.serialize.
XRSerializer
[source]¶ Serialization layer when saving and loading xarray datasets (currently) as json.

get_uuids
(db, key)[source]¶ List the UUIDs for the versions of a variable (nonderived key) available in the database.

init_db
(keys, db=None, exist_ok=True)[source]¶ Initialize a “database” for storing data at the specified location.
 Parameters
db_root (str) – Absolute path to the database.
keys (list(str)) – The variable names (or keys) we will store in the database for nonderived data.
db (str) – The name of the database. If
None
, a nonconflicting name will be generated.exist_ok (bool) – If true, do not raise an error if this database already exists.
 Returns
db – The name of the database.
 Return type

init_db_manual
(keys, db)[source]¶ Instruction for how one would manually initialize the “database” on another system.

load
(db, key, uuid_)[source]¶ Load a dataset under a key name in the database. This is the inverse of
save()
. Parameters
 Returns
data (
xarray.Dataset
) – Anxarray.Dataset
variable for the nonderived data from an experiment.meta (jsonserializable) – Associated metadata with the experiment. This can be anything json serializable.

load_derived
(db, key)[source]¶ Load a dataset under a key name in the database as derived data. This is the inverse of
save_derived()
. Parameters
 Returns
data (
xarray.Dataset
) – Anxarray.Dataset
variable for the derived data from experiments.meta (jsonserializable) – Associated metadata with the experiments. This can be anything json serializable.

logging_path
(db, uuid_)[source]¶ Get an absolute path for logging from an experiment given its UUID.

save
(meta, db_root, db, key, uuid_)[source]¶ Save a dataset under a key name in the database.
 Parameters
data (
xarray.Dataset
) – Anxarray.Dataset
variable we would like to store as nonderived data from an experiment.meta (jsonserializable) – Associated metadata with the experiment. This can be anything json serializable.
db_root (str) – Absolute path to the database.
db (str) – The name of the database.
key (str) – The variable name in the database for the data.
uuid_ (uuid.UUID) – The UUID to represent the version of this variable we are storing.

save_derived
(meta, db_root, db, key)[source]¶ Save a dataset under a key name in the database as derived data.
 Parameters
data (
xarray.Dataset
) – Anxarray.Dataset
variable we would like to store as derived data from experiments.meta (jsonserializable) – Associated metadata with the experiments. This can be anything json serializable.
db_root (str) – Absolute path to the database.
db (str) – The name of the database.
key (str) – The variable name in the database for the data.

Sklearn Tuning¶
Routines to build a standardized interface to make sklearn hyperparameter tuning problems look like an objective function.
This file mostly contains a dictionary collection of all sklearn test funcs.
The format of each element in MODELS is:
model_name: (model_class, fixed_param_dict, search_param_api_dict)
model_name is an arbitrary name to refer to a certain strategy.
At usage time, the optimizer instance is created using:
model_class(**kwarg_dict)
The kwarg dict is fixed_param_dict + search_param_dict. The
search_param_dict comes from a optimizer which is configured using the
search_param_api_dict. See the API description for information on setting up
the search_param_api_dict.

class
bayesmark.sklearn_funcs.
SklearnModel
(model, dataset, metric, shuffle_seed=0, data_root=None)[source]¶ Test class for sklearn classifier/regressor CV score objective functions.

class
bayesmark.sklearn_funcs.
SklearnSurrogate
(model, dataset, scorer, path)[source]¶ Test class for sklearn classifier/regressor CV score objective function surrogates.
Space¶
Do the conversion of search spaces into a normalized cartesian space.

class
bayesmark.space.
Boolean
(warp=None, values=None, range_=None)[source]¶ Space for transforming Boolean variables to continuous normalized space.

class
bayesmark.space.
Categorical
(warp=None, values=None, range_=None)[source]¶ Space for transforming categorical variables to continuous normalized space.

unwarp
(X_w)[source]¶ Inverse of warp function.
 Parameters
X_w (
numpy.ndarray
of shape (…, m)) – Warped version of input space. The warped space has a onehot encoding and therefore m is the number of possible values in the space. X_w will have a float type. Nonzero/one values are allowed in X_w. The maximal element in the vector is taken as the encoded value. Returns
X – Unwarped version of X_w. X will have same type code as the
Categorical
class, which is unicode ('U'
). Return type
numpy.ndarray
of shape (…)

warp
(X)[source]¶ Warp inputs to a continuous space.
 Parameters
X (
numpy.ndarray
of shape (…)) – Input variables to warp. This is vectorized to work in any dimension, but it must have the same type code as the class, which is unicode ('U'
) for theCategorical
space. Returns
X_w – Warped version of input space. By convention there is an extra dimension on warped array. The warped space has a onehot encoding and therefore m is the number of possible values in the space. X_w will have a float type.
 Return type
numpy.ndarray
of shape (…, m)


class
bayesmark.space.
Integer
(warp='linear', values=None, range_=None)[source]¶ Space for transforming integer variables to continuous normalized space.

class
bayesmark.space.
JointSpace
(meta)[source]¶ Combination of multiple
Space
objectives to transform multiple variables at the same time (jointly).
get_bounds
()[source]¶ Get bounds of the warped joint space.
 Returns
bounds – Bounds in the warped space. First column is the lower bound and the second column is the upper bound.
bounds.tolist()
gives the bounds in the standard form expected by scipy optimizers:[(lower_1, upper_1), ..., (lower_n, upper_n)]
. Return type
numpy.ndarray
of shape (m, 2)

grid
(max_interp=8)[source]¶ Return grid spanning the original (unwarped) space.
 Parameters
max_interp (int) – The number of points to use in grid space when a range and not values are used to define the space. Must be
>= 0
. Returns
axes – Grids spanning the original spaces of each variable. For each variable, this is simply
self.values
if a grid has already been specified, otherwise it is just grid across the range. Return type

unwarp
(X_w, fixed_vals={})[source]¶ Inverse of
warp()
. Parameters
X_w (
numpy.ndarray
of shape (n, m)) – Warped version of input space. Must be 2D floatnumpy.ndarray
. n is the number of separate points in the warped joint space. m is the size of the joint warped space, which can be inferred in advance by callingget_bounds()
.fixed_vals (dict) – Subset of variables we want to keep fixed in X. Unwarp checks that the unwarped version of X_w matches fixed_vals up to numerical error. Otherwise, an error is raised.
 Returns
X – List of n points in the joint space to warp. Each list element is a dictionary where each key corresponds to a variable in the joint space.
 Return type

warp
(X)[source]¶ Warp inputs to a continuous space.
 Parameters
X (list(dict(str, object)) of shape (n,)) – List of n points in the joint space to warp. Each list element is a dictionary where each key corresponds to a variable in the joint space. Keys can be be missing in the records and the according warped variables will be
nan
. Returns
X_w – Warped version of input space. Result is 2D float np array. n is the number of input points, length of X. m is the size of the joint warped space, which can be inferred by calling
get_bounds()
. Return type
numpy.ndarray
of shape (n, m)


class
bayesmark.space.
Real
(warp='linear', values=None, range_=None)[source]¶ Space for transforming real variables to normalized space (after warping).

class
bayesmark.space.
Space
(dtype, default_round, warp='linear', values=None, range_=None)[source]¶ Base class for all types of variables.

get_bounds
()[source]¶ Get bounds of the warped space.
 Returns
bounds – Bounds in the warped space. First column is the lower bound and the second column is the upper bound. Calling
bounds.tolist()
gives the bounds in the standard form expected by scipy optimizers:[(lower_1, upper_1), ..., (lower_n, upper_n)]
. Return type
numpy.ndarray
of shape (D, 2)

grid
(max_interp=8)[source]¶ Return grid spanning the original (unwarped) space.
 Parameters
max_interp (int) – The number of points to use in grid space when a range and not values are used to define the space. Must be
>= 0
. Returns
values – Grid spanning the original space. This is simply self.values if a grid has already been specified, otherwise it is just grid across the range.
 Return type

unwarp
(X_w)[source]¶ Inverse of warp function.
 Parameters
X_w (
numpy.ndarray
of shape (…, m)) – Warped version of input space. This is vectorized to work in any dimension. But, by convention, there is an extra dimension on the warped array. Currently, the last dimensionm=1
for all warpers. X_w must be of a float type. Returns
X – Unwarped version of X_w. X will have the same type code as the class, which is in self.type_code.
 Return type
numpy.ndarray
of shape (…)

validate
(X, pre=False)[source]¶ Routine to validate inputs to warp.
This routine does not perform any checking on the dimensionality of X and is fully vectorized.

validate_warped
(X, pre=False)[source]¶ Routine to validate inputs to unwarp. This routine is vectorized, but X must have at least 1dimension.

warp
(X)[source]¶ Warp inputs to a continuous space.
 Parameters
X (
numpy.ndarray
of shape (…)) – Input variables to warp. This is vectorized to work in any dimension, but it must have the same type code as the class, which is in self.type_code. Returns
X_w – Warped version of input space. By convention there is an extra dimension on warped array. Currently,
m=1
for all warpers. X_w will have a float type. Return type
numpy.ndarray
of shape (…, m)


bayesmark.space.
biexp
(x)[source]¶ Inverse of
bilog()
function. Parameters
x (scalar) – Input variable in linear space. Can be any numeric type and is vectorizable.
 Returns
y – The biexp of x.
 Return type

bayesmark.space.
bilog
(x)[source]¶ Bilog warping function. Extension of log to work with negative numbers.
Bilog(x) ~= log(x)
for large x orlog(abs(x))
if x is negative. However, the bias term ensures good behavior near 0 andbilog(0) = 0
. Parameters
x (scalar) – Input variable in linear space. Can be any numeric type and is vectorizable.
 Returns
y – The bilog of x.
 Return type

bayesmark.space.
decode
(Y, labels, assume_sorted=False)[source]¶ Perform inverse of onehot encoder encode.
 Parameters
Y (
numpy.ndarray
of shape (…, n)) – Onehot encoding of categorical data X. Extra dimension is appended at end for the onehot vector. Maximum element is taken if there is more than one nonzero entry in onehot vector.labels (
numpy.ndarray
of shape (n,)) – Complete list of all possible labels. List is flattened if it is not already 1dimensional.assume_sorted (bool) – If true, assume labels is already sorted and unique. This saves the computational cost of calling
numpy.unique()
.
 Returns
X – Categorical values corresponding to onehot encoded Y.
 Return type
numpy.ndarray
of shape (…)

bayesmark.space.
encode
(X, labels, assume_sorted=False, dtype=<class 'bool'>, assume_valid=False)[source]¶ Perform one hot encoding of categorical data in
numpy.ndarray
variable X of any dimension. Parameters
X (
numpy.ndarray
of shape (…)) – Categorical values of any standard type. Vectorized to work for any dimensional X.labels (
numpy.ndarray
of shape (n,)) – Complete list of all possible labels. List is flattened if it is not already 1 dimensional.assume_sorted (bool) – If true, assume labels is already sorted and unique. This saves the computational cost of calling
numpy.unique()
.dtype (type) – Desired data of feature array. Onehot is most logically bool, but feature matrices are usually float.
assume_valid (bool) – If true, assume all element of X are in the list labels. This saves the computational cost of verifying X are in labels. If true and a nonlabel X occurs this routine will silently give bogus result.
 Returns
Y – Onehot encoding of X. Extra dimension is appended at end for the onehot vector. It has data type dtype.
 Return type
numpy.ndarray
of shape (…, n)
Stats¶
General statistic tools useful in the benchmark.

bayesmark.stats.
robust_standardize
(X, q_level=0.5)[source]¶ Perform robust standardization of data matrix X over axis 0.
Similar to
sklearn.preprocessing.robust_scale()
except also does a Gaussian adjustment rescaling so that if Gaussian data is passed in the transformed data will, in large n, be distributed as N(0,1). See sklearn feature request #10139 on github. Parameters
X (
numpy.ndarray
of shape (n, …)) – Array containing elements standardize. Requiren >= 2
.q_level (scalar) – Must be in [0, 1]. Interquartile range to use for scale estimation.
 Returns
X – Elements of input X standardization.
 Return type
numpy.ndarray
of shape (n, …)

bayesmark.stats.
t_EB
(x, alpha=0.05, axis=1)[source]¶ Get tstatistic based error bars on mean of x.
 Parameters
x (
numpy.ndarray
of shape (n_samples,)) – Data points to estimate mean. Must not be empty or containNaN
.alpha (float) – The alpha level (
1confidence
) probability (in (0, 1)) to construct confidence interval from tstatistic.axis (int) – The axis on x where we compute the tstatistics. The function is vectorized over all other dimensions.
 Returns
EB – Size of error bar on mean (
>= 0
). The confidence interval is[mean(x)  EB, mean(x) + EB]
. EB isinf
whenlen(x) <= 1
. Will beNaN
if there are any infinite values in x. Return type
Util (General)¶
General utilities that should arguably be included in Python.

bayesmark.util.
preimage_func
(f, x)[source]¶ Preimage a funcation at a set of input points.
 Parameters
f (typing.Callable) – The function we would like to preimage. The output type must be hashable.
x (typing.Iterable) – Input points we would like to evaluate f. x must be of a type acceptable by f.
 Returns
D – This dictionary maps the output of f to the list of x values that produce it.
 Return type

bayesmark.util.
range_str
(stop)[source]¶ Version of
range(stop)
that instead returns strings that are zero padded so the entire iteration is of the same length. Parameters
stop (int) – Stop value equivalent to
range(stop)
. Yields
x (str) – String representation of integer zero padded so all items from this generator have the same
len(x)
.

bayesmark.util.
shell_join
(argv, delim=' ')[source]¶ Join strings together in a way that is an inverse of shlex shell parsing into argv.
Basically, if the resulting string is passed as a command line argument then sys.argv will equal argv.

bayesmark.util.
str_join_safe
(delim, str_vec, append=False)[source]¶ Version of str.join that is guaranteed to be invertible.
 Parameters
 Returns
joined_str – Joined version of str_vec, which is always recoverable with
joined_str.split(delim)
. Return type
Examples
Append is required because,
ss = str_join_safe('_', ('foo', 'bar')) str_join_safe('_', (ss, 'baz', 'qux'))
would fail because we are appending
'baz'
and'qux'
to the already joined stringss = 'foo_bar'
.In this case, we use
ss = str_join_safe('_', ('foo', 'bar')) str_join_safe('_', (ss, 'baz', 'qux'), append=True)
Xarray Util¶
General utilities for xarray that should be included in xarray.

bayesmark.xr_util.
coord_compat
(da_seq, dims)[source]¶ Check if a sequence of
xarray.DataArray
have compatible coordinates. Parameters
da_seq (list(
xarray.DataArray
)) – Sequence ofxarray.DataArray
we would like to check for compatibility.xarray.Dataset
work too.dims (list) – Subset of all dimensions in the
xarray.DataArray
we are concerned with for compatibility.
 Returns
compat – True if all the
xarray.DataArray
have compatible coordinates. Return type

bayesmark.xr_util.
da_concat
(da_dict, dims)[source]¶ Concatenate a dictionary of
xarray.DataArray
similar topandas.concat()
. Parameters
da_dict (dict(tuple(str),
xarray.DataArray
)) – Dictionary ofxarray.DataArray
to combine. The keys are tuples of index values. Thexarray.DataArray
must have compatible coordinates.dims (list(str)) – The names of the new dimensions we create for the dictionary keys. This must be of the same length as the key tuples in da_dict.
 Returns
da – Combined data array. The new dimensions will be
input_da.dims + dims
. Return type

bayesmark.xr_util.
da_to_string
(da)[source]¶ Generate a human readable version of a 1D
xarray.DataArray
. Parameters
da (
xarray.DataArray
) – Thexarray.DataArray
to display. Must only have one dimension. Returns
str_val – String with human readable version of da.
 Return type

bayesmark.xr_util.
ds_concat
(ds_dict, dims)[source]¶ Concatenate a dictionary of
xarray.Dataset
similar topandas.concat()
, and a generalization ofda_concat()
. Parameters
ds_dict (dict(tuple(str),
xarray.DataArray
)) – Dictionary ofxarray.Dataset
to combine. The keys are tuples of index values. Thexarray.Dataset
must have compatible coordinates, and all have the same variables.dims (list(str)) – The names of the new dimensions we create for the dictionary keys. This must be of the same length as the key tuples in ds_dict.
 Returns
ds – Combined dataset. For each variable var, the new dimensions will be
input_ds[var].dims + dims
. Return type

bayesmark.xr_util.
ds_like
(ref, vars_, dims, fill=nan)[source]¶ Produce a blank
xarray.Dataset
copying some coordinates from anotherxarray.Dataset
. Parameters
ref (
xarray.Dataset
) – The reference dataset we want to copy coordinates from.vars_ (typing.Iterable) – List of variable names we want in the new dataset.
dims (list) – List of dimensions we want to copy over from ref. These are the dimensions of the output.
fill (scalar) – Scalar value to fill the blank dataset. The dtype will be determined from the fill value.
 Returns
ds – A new dataset with variables vars_ and dimensions dims where the coordinates have been copied from ref. All values are filled with fill.
 Return type

bayesmark.xr_util.
ds_like_mixed
(ref, vars_, dims, fill=nan)[source]¶ The same as ds_like but allow different dimensions for each variable.
 Parameters
ref (
xarray.Dataset
) – The reference dataset we want to copy coordinates from.vars_ (typing.Iterable) – List of (variable names, dimension) pairs we want in the new dataset. The dimensions for each variable must be a subset of dims.
dims (list) – List of all dimensions we want to copy over from ref.
fill (scalar) – Scalar value to fill the blank dataset. The dtype will be determined from the fill value.
 Returns
ds – A new dataset with variables vars_ and dimensions dims where the coordinates have been copied from ref. All values are filled with fill.
 Return type

bayesmark.xr_util.
is_simple_coords
(coords, min_side=0, dims=None)[source]¶ Check if all xr coordinates are “simple”. That is, equals to
np.arange(n)
. Parameters
coords (dictlike of coordinates) – The coordinates we would like to check, e.g. from
DataArray.coords
.min_side (int) – The minimum side requirement. We can set this
min_side=1
and have empty coordinates result in a return value ofFalse
.dims (None or list of dimension names) – Dimensions we want to check for simplicity. If
None
, check all dimensions.
 Returns
simple – True when all coordinates are simple.
 Return type

bayesmark.xr_util.
only_dataarray
(ds)[source]¶ Convert a
xarray.Dataset
to axarray.DataArray
. If thexarray.Dataset
has more than one variable, an error is raised. Parameters
ds (
xarray.Dataset
) –xarray.Dataset
we would like to convert to axarray.DataArray
. This must contain only one variable. Returns
da – The
xarray.DataArray
extracted from ds. Return type