pycbc.inference package¶
Subpackages¶
- pycbc.inference.io package
- Submodules
- pycbc.inference.io.base_hdf module
- pycbc.inference.io.base_mcmc module
- pycbc.inference.io.base_multitemper module
- pycbc.inference.io.base_nested_sampler module
- pycbc.inference.io.base_sampler module
- pycbc.inference.io.dynesty module
- pycbc.inference.io.emcee module
- pycbc.inference.io.emcee_pt module
- pycbc.inference.io.epsie module
- pycbc.inference.io.multinest module
- pycbc.inference.io.posterior module
- pycbc.inference.io.ptemcee module
- pycbc.inference.io.txt module
- pycbc.inference.io.ultranest module
- Module contents
- pycbc.inference.jump package
- pycbc.inference.models package
- Submodules
- pycbc.inference.models.analytic module
- pycbc.inference.models.base module
- pycbc.inference.models.base_data module
- pycbc.inference.models.brute_marg module
- pycbc.inference.models.data_utils module
- pycbc.inference.models.gated_gaussian_noise module
- pycbc.inference.models.gaussian_noise module
- pycbc.inference.models.hierarchical module
- pycbc.inference.models.marginalized_gaussian_noise module
- pycbc.inference.models.relbin module
- pycbc.inference.models.relbin_cpu module
- pycbc.inference.models.single_template module
- pycbc.inference.models.tools module
- Module contents
- pycbc.inference.sampler package
- Submodules
- pycbc.inference.sampler.base module
- pycbc.inference.sampler.base_cube module
- pycbc.inference.sampler.base_mcmc module
- pycbc.inference.sampler.base_multitemper module
- pycbc.inference.sampler.dynesty module
- pycbc.inference.sampler.emcee module
- pycbc.inference.sampler.emcee_pt module
- pycbc.inference.sampler.epsie module
- pycbc.inference.sampler.multinest module
- pycbc.inference.sampler.ptemcee module
- pycbc.inference.sampler.ultranest module
- Module contents
Submodules¶
pycbc.inference.burn_in module¶
This modules provides classes and functions for determining when Markov Chains have burned in.
- class pycbc.inference.burn_in.BaseBurnInTests(sampler, burn_in_test, **kwargs)[source]¶
Bases:
object
Base class for burn in tests.
- available_tests = ('halfchain', 'min_iterations', 'max_posterior', 'posterior_step', 'nacl')¶
- abstract burn_in_index(filename)[source]¶
The burn in index (retrieved from the iteration).
This is an abstract method because how this is evaluated depends on if this is an ensemble MCMC or not.
- abstract evaluate(filename)[source]¶
Performs all tests and evaluates the results to determine if and when all tests pass.
- abstract max_posterior(filename)[source]¶
Carries out the max posterior test and stores the results.
- min_iterations(filename)[source]¶
Just checks that the sampler has been run for the minimum number of iterations.
- abstract posterior_step(filename)[source]¶
Carries out the posterior step test and stores the results.
- write(fp, path=None)[source]¶
Writes burn-in info to an open HDF file.
- Parameters
fp (pycbc.inference.io.base.BaseInferenceFile) – Open HDF file to write the data to. The HDF file should be an instance of a pycbc BaseInferenceFile.
path (str, optional) – Path in the HDF file to write the data to. Default is (None) is to write to the path given by the file’s
sampler_group
attribute.
- class pycbc.inference.burn_in.EnsembleMCMCBurnInTests(sampler, burn_in_test, **kwargs)[source]¶
Bases:
pycbc.inference.burn_in.BaseBurnInTests
Provides methods for estimating burn-in of an ensemble MCMC.
- available_tests = ('halfchain', 'min_iterations', 'max_posterior', 'posterior_step', 'nacl', 'ks_test')¶
- class pycbc.inference.burn_in.EnsembleMultiTemperedMCMCBurnInTests(sampler, burn_in_test, **kwargs)[source]¶
Bases:
pycbc.inference.burn_in.EnsembleMCMCBurnInTests
Adds support for multiple temperatures to
EnsembleMCMCBurnInTests
.
- class pycbc.inference.burn_in.MCMCBurnInTests(sampler, burn_in_test, **kwargs)[source]¶
Bases:
pycbc.inference.burn_in.BaseBurnInTests
Burn-in tests for collections of independent MCMC chains.
This differs from EnsembleMCMCBurnInTests in that chains are treated as being independent of each other. The
is_burned_in
attribute will be True if any chain passes the burn in tests (whereas in MCMCBurnInTests, all chains must pass the burn in tests). In other words, independent samples can be collected even if all of the chains are not burned in.- write(fp, path=None)[source]¶
Writes burn-in info to an open HDF file.
- Parameters
fp (pycbc.inference.io.base.BaseInferenceFile) – Open HDF file to write the data to. The HDF file should be an instance of a pycbc BaseInferenceFile.
path (str, optional) – Path in the HDF file to write the data to. Default is (None) is to write to the path given by the file’s
sampler_group
attribute.
- class pycbc.inference.burn_in.MultiTemperedMCMCBurnInTests(sampler, burn_in_test, **kwargs)[source]¶
Bases:
pycbc.inference.burn_in.MCMCBurnInTests
Adds support for multiple temperatures to
MCMCBurnInTests
.
- pycbc.inference.burn_in.evaluate_tests(burn_in_test, test_is_burned_in, test_burn_in_iter)[source]¶
Evaluates burn in data from multiple tests.
The iteration to use for burn-in depends on the logic in the burn-in test string. For example, if the test was ‘max_posterior | nacl’ and max_posterior burned-in at iteration 5000 while nacl burned in at iteration 6000, we’d want to use 5000 as the burn-in iteration. However, if the test was ‘max_posterior & nacl’, we’d want to use 6000 as the burn-in iteration. This function handles all cases by doing the following: first, take the collection of burn in iterations from all the burn in tests that were applied. Next, cycle over the iterations in increasing order, checking which tests have burned in by that point. Then evaluate the burn-in string at that point to see if it passes, and if so, what the iteration is. The first point that the test passes is used as the burn-in iteration.
- Parameters
- Returns
is_burned_in (bool) – Whether or not the data passes all burn in tests.
burn_in_iteration – The iteration at which all the tests pass. If the tests did not all pass (
is_burned_in
is false), then returnsNOT_BURNED_IN_ITER
.
- pycbc.inference.burn_in.ks_test(samples1, samples2, threshold=0.9)[source]¶
Applies a KS test to determine if two sets of samples are the same.
The ks test is applied parameter-by-parameter. If the two-tailed p-value returned by the test is greater than
threshold
, the samples are considered to be the same.- Parameters
- Returns
Dictionary mapping parameter names to booleans indicating whether the given parameter passes the KS test.
- Return type
- pycbc.inference.burn_in.max_posterior(lnps_per_walker, dim)[source]¶
Burn in based on samples being within dim/2 of maximum posterior.
- Parameters
lnps_per_walker (2D array) – Array of values that are proportional to the log posterior values. Must have shape
nwalkers x niterations
.dim (int) – The dimension of the parameter space.
- Returns
burn_in_idx (array of int) – The burn in indices of each walker. If a walker is not burned in, its index will be be equal to the length of the chain.
is_burned_in (array of bool) – Whether or not a walker is burned in.
- pycbc.inference.burn_in.nacl(nsamples, acls, nacls=5)[source]¶
Burn in based on ACL.
This applies the following test to determine burn in:
The first half of the chain is ignored.
An ACL is calculated from the second half.
If
nacls
times the ACL is < the length of the chain / 2, the chain is considered to be burned in at the half-way point.
- Parameters
nsamples (int) – The number of samples of in the chain(s).
acls (dict) – Dictionary of parameter -> ACL(s). The ACLs for each parameter may be an integer or an array of integers (for multiple chains).
nacls (int, optional) – The number of ACLs the chain(s) must have gone past the halfway point in order to be considered burned in. Default is 5.
- Returns
Dictionary of parameter -> boolean(s) indicating if the chain(s) pass the test. If an array of values was provided for the acls, the values will be arrays of booleans.
- Return type
pycbc.inference.entropy module¶
The module contains functions for calculating the Kullback-Leibler divergence.
- pycbc.inference.entropy.check_hist_params(samples, hist_min, hist_max, hist_bins)[source]¶
Checks that the bound values given for the histogram are consistent, returning the range if they are or raising an error if they are not. Also checks that if hist_bins is a str, it corresponds to a method available in numpy.histogram
- Parameters
samples (numpy.array) – Set of samples to get the min/max if only one of the bounds is given.
hist_min (numpy.float64) – Minimum value for the histogram.
hist_max (numpy.float64) – Maximum value for the histogram.
hist_bins (int or str) – If int, number of equal-width bins to use in numpy.histogram. If str, it should be one of the methods to calculate the optimal bin width available in numpy.histogram: [‘auto’, ‘fd’, ‘doane’, ‘scott’, ‘stone’, ‘rice’, ‘sturges’, ‘sqrt’]. Default is ‘fd’ (Freedman Diaconis Estimator). This option will be ignored if kde=True.
- Returns
hist_range (tuple or None) – The bounds (hist_min, hist_max) or None.
hist_bins (int or str) – Number of bins or method for optimal width bin calculation.
- pycbc.inference.entropy.compute_pdf(samples, method, bins, hist_min, hist_max)[source]¶
Computes the probability density function for a set of samples.
- Parameters
samples (numpy.array) – Set of samples to calculate the pdf.
method (str) – Method to calculate the pdf. Options are ‘kde’ for the Kernel Density Estimator, and ‘hist’ to use numpy.histogram
bins (str or int, optional) – This option will be ignored if method is kde. If int, number of equal-width bins to use when calculating probability density function from a set of samples of the distribution. If str, it should be one of the methods to calculate the optimal bin width available in numpy.histogram: [‘auto’, ‘fd’, ‘doane’, ‘scott’, ‘stone’, ‘rice’, ‘sturges’, ‘sqrt’]. Default is ‘fd’ (Freedman Diaconis Estimator).
hist_min (numpy.float64, optional) – Minimum of the distributions’ values to use. This will be ignored if kde=True.
hist_max (numpy.float64, optional) – Maximum of the distributions’ values to use. This will be ignored if kde=True.
- Returns
pdf – Discrete probability distribution calculated from samples.
- Return type
numpy.array
- pycbc.inference.entropy.entropy(pdf1, base=2.718281828459045)[source]¶
Computes the information entropy for a single parameter from one probability density function.
- Parameters
pdf1 (numpy.array) – Probability density function.
base ({numpy.e, numpy.float64}, optional) – The logarithmic base to use (choose base 2 for information measured in bits, default is nats).
- Returns
The information entropy value.
- Return type
numpy.float64
- pycbc.inference.entropy.js(samples1, samples2, kde=False, bins=None, hist_min=None, hist_max=None, base=2.718281828459045)[source]¶
Computes the Jensen-Shannon divergence for a single parameter from two distributions.
- Parameters
samples1 (numpy.array) – Samples.
samples2 (numpy.array) – Samples.
kde (bool) – Set to True to estimate the probability density function using kernel density estimation (KDE).
bins (int or str, optional) – If int, number of equal-width bins to use when calculating probability density function from a set of samples of the distribution. If str, it should be one of the methods to calculate the optimal bin width available in numpy.histogram: [‘auto’, ‘fd’, ‘doane’, ‘scott’, ‘stone’, ‘rice’, ‘sturges’, ‘sqrt’]. Default is ‘fd’ (Freedman Diaconis Estimator). This option will be ignored if kde=True.
hist_min (numpy.float64) – Minimum of the distributions’ values to use. This will be ignored if kde=True.
hist_max (numpy.float64) – Maximum of the distributions’ values to use. This will be ignored if kde=True.
base (numpy.float64) – The logarithmic base to use (choose base 2 for information measured in bits, default is nats).
- Returns
The Jensen-Shannon divergence value.
- Return type
numpy.float64
- pycbc.inference.entropy.kl(samples1, samples2, pdf1=False, pdf2=False, kde=False, bins=None, hist_min=None, hist_max=None, base=2.718281828459045)[source]¶
Computes the Kullback-Leibler divergence for a single parameter from two distributions.
- Parameters
samples1 (numpy.array) – Samples or probability density function (for the latter must also set pdf1=True).
samples2 (numpy.array) – Samples or probability density function (for the latter must also set pdf2=True).
pdf1 (bool) – Set to True if samples1 is a probability density funtion already.
pdf2 (bool) – Set to True if samples2 is a probability density funtion already.
kde (bool) – Set to True if at least one of pdf1 or pdf2 is False to estimate the probability density function using kernel density estimation (KDE).
bins (int or str, optional) – If int, number of equal-width bins to use when calculating probability density function from a set of samples of the distribution. If str, it should be one of the methods to calculate the optimal bin width available in numpy.histogram: [‘auto’, ‘fd’, ‘doane’, ‘scott’, ‘stone’, ‘rice’, ‘sturges’, ‘sqrt’]. Default is ‘fd’ (Freedman Diaconis Estimator). This option will be ignored if kde=True.
hist_min (numpy.float64) – Minimum of the distributions’ values to use. This will be ignored if kde=True.
hist_max (numpy.float64) – Maximum of the distributions’ values to use. This will be ignored if kde=True.
base (numpy.float64) – The logarithmic base to use (choose base 2 for information measured in bits, default is nats).
- Returns
The Kullback-Leibler divergence value.
- Return type
numpy.float64
pycbc.inference.evidence module¶
This modules provides functions for estimating the marginal likelihood or evidence of a model.
- pycbc.inference.evidence.arithmetic_mean_estimator(log_likelihood)[source]¶
Returns the log evidence via the prior arithmetic mean estimator (AME).
The logarithm form of AME is used. This is the most basic evidence estimator, and often requires O(billions) of samples from the prior.
- Parameters
log_likelihood (1d array of floats) – The log likelihood of the data sampled from the prior distribution.
- Returns
Estimation of the log of the evidence.
- Return type
- pycbc.inference.evidence.harmonic_mean_estimator(log_likelihood)[source]¶
Returns the log evidence via posterior harmonic mean estimator (HME).
The logarithm form of HME is used. This method is not recommended for general use. It is very slow to converge, formally, has infinite variance, and very error prone.
Not recommended for general use.
- Parameters
log_likelihood (1d array of floats) – The log likelihood of the data sampled from the posterior distribution.
- Returns
Estimation of the log of the evidence.
- Return type
- pycbc.inference.evidence.stepping_stone_algorithm(log_likelihood, betas)[source]¶
Returns the log evidence of the model via stepping stone algorithm. Also returns an estimated standard deviation for the log evidence.
- Parameters
log_likelihood (3d array of shape (betas, walker, iteration)) – The log likelihood for each temperature separated by temperature, walker, and iteration.
betas (1d array) – The inverse temperatures used in the MCMC.
- Returns
log_evidence (float) – Estimation of the log of the evidence.
mcmc_std (float) – The standard deviation of the log evidence estimate from Monte-Carlo spread.
- pycbc.inference.evidence.thermodynamic_integration(log_likelihood, betas, method='simpsons')[source]¶
Returns the log evidence of the model via thermodynamic integration. Also returns an estimated standard deviation for the log evidence.
Current options are integration through the trapezoid rule, a first-order corrected trapezoid rule, and Simpson’s rule.
- Parameters
log_likelihood (3d array of shape (betas, walker, iteration)) – The log likelihood for each temperature separated by temperature, walker, and iteration.
betas (1d array) – The inverse temperatures used in the MCMC.
method ({"trapzoid", "trapezoid_corrected", "simpsons"},) – optional. The numerical integration method to use for the thermodynamic integration. Choices include: “trapezoid”, “trapezoid_corrected”, “simpsons”, for the trapezoid rule, the first-order correction to the trapezoid rule, and Simpson’s rule. [Default = “simpsons”]
- Returns
log_evidence (float) – Estimation of the log of the evidence.
mcmc_std (float) – The standard deviation of the log evidence estimate from Monte-Carlo spread.
pycbc.inference.gelman_rubin module¶
This modules provides functions for evaluating the Gelman-Rubin convergence diagnostic statistic.
- pycbc.inference.gelman_rubin.gelman_rubin(chains, auto_burn_in=True)[source]¶
Calculates the univariate Gelman-Rubin convergence statistic which compares the evolution of multiple chains in a Markov-Chain Monte Carlo process and computes their difference to determine their convergence. The between-chain and within-chain variances are computed for each sampling parameter, and a weighted combination of the two is used to determine the convergence. As the chains converge, the point scale reduction factor should go to 1.
- Parameters
chains (iterable) – An iterable of numpy.array instances that contain the samples for each chain. Each chain has shape (nparameters, niterations).
auto_burn_in (bool) – If True, then only use later half of samples provided.
- Returns
psrf – A numpy.array of shape (nparameters) that has the point estimates of the potential scale reduction factor.
- Return type
numpy.array
- pycbc.inference.gelman_rubin.walk(chains, start, end, step)[source]¶
Calculates Gelman-Rubin conervergence statistic along chains of data. This function will advance along the chains and calculate the statistic for each step.
- Parameters
chains (iterable) – An iterable of numpy.array instances that contain the samples for each chain. Each chain has shape (nparameters, niterations).
start (float) – Start index of blocks to calculate all statistics.
end (float) – Last index of blocks to calculate statistics.
step (float) – Step size to take for next block.
- Returns
starts (numpy.array) – 1-D array of start indexes of calculations.
ends (numpy.array) – 1-D array of end indexes of caluclations.
stats (numpy.array) – Array with convergence statistic. It has shape (nparameters, ncalculations).
pycbc.inference.geweke module¶
Functions for computing the Geweke convergence statistic.
- pycbc.inference.geweke.geweke(x, seg_length, seg_stride, end_idx, ref_start, ref_end=None, seg_start=0)[source]¶
Calculates Geweke conervergence statistic for a chain of data. This function will advance along the chain and calculate the statistic for each step.
- Parameters
x (numpy.array) – A one-dimensional array of data.
seg_length (int) – Number of samples to use for each Geweke calculation.
seg_stride (int) – Number of samples to advance before next Geweke calculation.
end_idx (int) – Index of last start.
ref_start (int) – Index of beginning of end reference segment.
ref_end (int) – Index of end of end reference segment. Default is None which will go to the end of the data array.
seg_start (int) – What index to start computing the statistic. Default is 0 which will go to the beginning of the data array.
- Returns
starts (numpy.array) – The start index of the first segment in the chain.
ends (numpy.array) – The end index of the first segment in the chain.
stats (numpy.array) – The Geweke convergence diagnostic statistic for the segment.
pycbc.inference.option_utils module¶
This module contains standard options used for inference-related programs.
- class pycbc.inference.option_utils.ParseLabelArg(type=<class 'str'>, nargs=None, **kwargs)[source]¶
Bases:
argparse.Action
Argparse action that will parse arguments that can accept labels.
This assumes that the values set on the command line for its assigned argument are strings formatted like
PARAM[:LABEL]
. When the arguments are parsed, theLABEL
bit is stripped off and added to a dictionary mappingPARAM -> LABEL
. This dictionary is stored to the parsed namespace called{dest}_labels
, where{dest}
is the argument’sdest
setting (by default, this is the same as the option string). Likewise, the argument’sdest
in the parsed namespace is updated so that it is justPARAM
.If no
LABEL
is provided, thenPARAM
will be used forLABEL
.This action can work on arguments that have
nargs != 0
andtype
set tostr
.
- class pycbc.inference.option_utils.ParseParametersArg(type=<class 'str'>, nargs=None, **kwargs)[source]¶
Bases:
pycbc.inference.option_utils.ParseLabelArg
Argparse action that will parse parameters and labels from an opton.
Does the same as
ParseLabelArg
, with the additional functionality that ifLABEL
is a known parameter inpycbc.waveform.parameters
, then the label attribute there will be used in the labels dictionary. Otherwise,LABEL
will be used.Examples
Create a parser and add two arguments that use this action (note that the first argument accepts multiple inputs while the second only accepts a single input):
>>> import argparse >>> parser = argparse.ArgumentParser() >>> parser.add_argument('--parameters', type=str, nargs="+", action=ParseParametersArg) >>> parser.add_argument('--z-arg', type=str, action=ParseParametersArg)
Parse a command line that uses these options:
>>> import shlex >>> cli = "--parameters 'mass1+mass2:mtotal' ra ni --z-arg foo:bar" >>> opts = parser.parse_args(shlex.split(cli)) >>> opts.parameters ['mass1+mass2', 'ra', 'ni'] >>> opts.parameters_labels {'mass1+mass2': '$M~(\mathrm{M}_\odot)$', 'ni': 'ni', 'ra': '$\alpha$'} >>> opts.z_arg 'foo' >>> opts.z_arg_labels {'foo': 'bar'}
In the above, the first argument to
--parameters
wasmtotal
. Since this is a recognized parameter inpycbc.waveform.parameters
, the label dictionary contains the latex string associated with themtotal
parameter. A label was not provided for the second argument, and sora
was used. Sincera
is also a recognized parameter, its associated latex string was used in the labels dictionary. Sinceni
andbar
(the label forz-arg
) are not recognized parameters, they were just used as-is in the labels dictionaries.
- pycbc.inference.option_utils.add_density_option_group(parser)[source]¶
Adds the options needed to configure contours and density colour map.
- Parameters
parser (object) – ArgumentParser instance.
- pycbc.inference.option_utils.add_injsamples_map_opt(parser)[source]¶
Adds option to parser to specify a mapping between injection parameters an sample parameters.
- pycbc.inference.option_utils.add_plot_posterior_option_group(parser)[source]¶
Adds the options needed to configure plots of posterior results.
- Parameters
parser (object) – ArgumentParser instance.
- pycbc.inference.option_utils.add_scatter_option_group(parser)[source]¶
Adds the options needed to configure scatter plots.
- Parameters
parser (object) – ArgumentParser instance.
- pycbc.inference.option_utils.expected_parameters_from_cli(opts)[source]¶
Parses the –expected-parameters arguments from the plot_posterior option group.
- Parameters
opts (ArgumentParser) – The parsed arguments from the command line.
- Returns
Dictionary of parameter name -> expected value. Only parameters that were specified in the –expected-parameters option will be included; if no parameters were provided, will return an empty dictionary.
- Return type
- pycbc.inference.option_utils.plot_ranges_from_cli(opts)[source]¶
Parses the mins and maxs arguments from the plot_posterior option group.
- Parameters
opts (ArgumentParser) – The parsed arguments from the command line.
- Returns
mins (dict) – Dictionary of parameter name -> specified mins. Only parameters that were specified in the –mins option will be included; if no parameters were provided, will return an empty dictionary.
maxs (dict) – Dictionary of parameter name -> specified maxs. Only parameters that were specified in the –mins option will be included; if no parameters were provided, will return an empty dictionary.