![]() |
CosmoBolognaLib
Free Software C++/Python libraries for cosmological calculations
|
The class CombinedModelling. More...
#include "Headers/CombinedModelling.h"
Public Member Functions | |
Constructors/destructors | |
CombinedModelling ()=default | |
default constuctor | |
CombinedModelling (std::vector< std::shared_ptr< modelling::Modelling >> modelling, std::vector< std::string > repeated_par={}, const std::vector< std::vector< std::vector< int >>> common_repeated_par={}) | |
Constuctor used to combine statistically independent probes. More... | |
CombinedModelling (std::vector< std::vector< std::shared_ptr< modelling::Modelling >>> modelling, const std::vector< std::shared_ptr< data::CovarianceMatrix >> covariance, const std::vector< cbl::statistics::LikelihoodType > likelihood_types, const std::vector< std::string > repeated_par={}, const std::vector< std::vector< std::vector< int >>> common_repeated_par={}, const std::vector< std::shared_ptr< cosmology::SuperSampleCovariance >> SSC={}) | |
Constructor used to set the modelling of statistically dependent probes. For each vector of cbl::statistics::Posterior objects in the posteriors argument, a cbl::data::CovarianceMatrix object must be defined. The probes in each vector within posteriors are described by the same likelihood function. The final log-likelihood is given by the sum of the logarithms of each likelihood describing a set of dependent probes. More... | |
virtual | ~CombinedModelling ()=default |
default destructor | |
Member functions used to manage likelihood/posterior | |
distributions | |
void | maximize_combined_posterior (const std::vector< double > start, const unsigned int max_iter=10000, const double tol=1.e-6, const double epsilon=1.e-3) |
function that maximizes the combined posterior, finds the best-fit parameters and stores them in the model More... | |
void | sample_combined_posterior (const int chain_size, const int nwalkers, const double aa=2, const bool parallel=true) |
sample the posterior, initializing the chains by drawing from the prior distributions More... | |
void | sample_combined_posterior (const int chain_size, const int nwalkers, const double radius, const std::vector< double > start, const unsigned int max_iter=10000, const double tol=1.e-6, const double epsilon=1.e-3, const double aa=2, const bool parallel=true) |
sample the posterior, initializing the chains in a ball around the posterior best-fit parameters values More... | |
void | sample_combined_posterior (const int chain_size, const int nwalkers, const std::string input_dir, const std::string input_file, const int seed, const double aa=2, const bool parallel=true) |
sample the posterior, initializing the chains reading the input values from an input file More... | |
Member functions used for Input/Output | |
void | write_combined_results (const std::string output_dir, const std::string root_file, const int start=0, const int thin=1, const int nbins=50, const bool fits=false, const bool compute_mode=false, const int ns=-1) |
write the results of the MCMC sampling to file More... | |
void | write_model_from_combined_chain (const std::string output_dir, const std::string output_file, const int start, const int thin, const std::vector< double > xx={}, const std::vector< double > yy={}) |
write the model computing 16th, 50th and 84th percentiles from the MCMC More... | |
![]() | |
void | m_set_posterior (const int seed) |
set the interal variable m_posterior More... | |
Modelling ()=default | |
default constuctor | |
virtual | ~Modelling ()=default |
default destructor | |
std::shared_ptr< data::Data > | data () |
return the dataset More... | |
std::shared_ptr< data::Data > | data_fit () |
return the dataset More... | |
std::shared_ptr< statistics::Likelihood > | likelihood () |
return the likelihood parameters More... | |
std::shared_ptr< statistics::Posterior > | posterior () |
return the posterior parameters More... | |
std::shared_ptr< statistics::ModelParameters > | likelihood_parameters () |
return the likelihood parameters More... | |
std::shared_ptr< statistics::ModelParameters > | posterior_parameters () |
return the posterior parameters More... | |
virtual void | set_parameter_from_string (const std::string parameter, const double value) |
set the value of a parameter providing its name string More... | |
virtual double | get_parameter_from_string (const std::string parameter) const |
get the value of a parameter providing its name string More... | |
std::shared_ptr< statistics::PriorDistribution > | get_prior (const int i) |
get the internal variable m_parameter_priors More... | |
std::shared_ptr< statistics::Model > | get_response_function () |
return the response function used to compute the super-sample covariance More... | |
void | reset_fit_range () |
reset the fit range More... | |
void | set_fit_range (const double xmin, const double xmax) |
set the fit range More... | |
void | set_fit_range (const double xmin, const double xmax, const double ymin, const double ymax) |
set the fit range More... | |
void | set_data (const std::shared_ptr< data::Data > dataset) |
set the dataset More... | |
void | set_likelihood (const statistics::LikelihoodType likelihood_type, const std::vector< size_t > x_index={0, 2}, const int w_index=-1, const double prec=1.e-10, const int Nres=-1) |
set the likelihood function More... | |
void | set_likelihood (const cbl::statistics::Likelihood_function log_likelihood_function) |
set the likelihood function, given a user-defined log-likelihood function More... | |
void | maximize_likelihood (const std::vector< double > start, const std::vector< std::vector< double >> parameter_limits, const unsigned int max_iter=10000, const double tol=1.e-6, const double epsilon=1.e-3) |
function that maximizes the posterior, finds the best-fit parameters and stores them in the model More... | |
void | maximize_posterior (const std::vector< double > start, const unsigned int max_iter=10000, const double tol=1.e-6, const double epsilon=1.e-3, const int seed=666) |
function that maximizes the posterior, finds the best-fit parameters and stores them in the model More... | |
void | sample_posterior (const int chain_size, const int nwalkers, const int seed=666, const double aa=2, const bool parallel=true) |
sample the posterior, initializing the chains by drawing from the prior distributions More... | |
void | sample_posterior (const int chain_size, const int nwalkers, const double radius, const std::vector< double > start, const unsigned int max_iter=10000, const double tol=1.e-6, const double epsilon=1.e-3, const int seed=666, const double aa=2, const bool parallel=true) |
sample the posterior, initializing the chains in a ball around the posterior best-fit parameters values More... | |
void | sample_posterior (const int chain_size, const int nwalkers, std::vector< double > &value, const double radius, const int seed=666, const double aa=2, const bool parallel=true) |
sample the posterior, initializing the chains by drawing from the prior distributions More... | |
void | sample_posterior (const int chain_size, const std::vector< std::vector< double >> chain_value, const int seed=666, const double aa=2, const bool parallel=true) |
sample the posterior, initializing the chains with input values More... | |
void | sample_posterior (const int chain_size, const int nwalkers, const std::string input_dir, const std::string input_file, const int seed=666, const double aa=2, const bool parallel=true) |
sample the posterior, initializing the chains reading the input values from an input file More... | |
void | importance_sampling (const std::string input_dir, const std::string input_file, const int seed=666, const std::vector< size_t > column={}, const int header_lines_to_skip=1, const bool is_FITS_format=false, const bool apply_to_likelihood=false) |
perform importance sampling More... | |
void | write_chain (const std::string output_dir, const std::string output_file, const int start=0, const int thin=1, const bool is_FITS_format=false, const int prec=5, const int ww=14) |
write the chains obtained after the MCMC sampling More... | |
void | read_chain (const std::string input_dir, const std::string input_file, const int nwalkers, const std::vector< size_t > columns={}, const int skip_header=1, const bool fits=false) |
read the chains More... | |
void | show_results (const int start=0, const int thin=1, const int nbins=50, const bool show_mode=false, const int ns=-1) |
show the results of the MCMC sampling on screen More... | |
void | write_results (const std::string output_dir, const std::string root_file, const int start=0, const int thin=1, const int nbins=50, const bool fits=false, const bool compute_mode=false, const int ns=-1) |
write the results of the MCMC sampling to file More... | |
virtual void | write_model (const std::string output_dir, const std::string output_file, const std::vector< double > xx, const std::vector< double > parameters) |
write the model at xx for given parameters More... | |
virtual void | write_model (const std::string output_dir, const std::string output_file, const std::vector< double > xx, const std::vector< double > yy, const std::vector< double > parameters) |
write the model at xx, yy for given parameters More... | |
virtual void | write_model_at_bestfit (const std::string output_dir, const std::string output_file, const std::vector< double > xx) |
write the model at xx with best-fit parameters obtained from posterior maximization More... | |
virtual void | write_model_at_bestfit (const std::string output_dir, const std::string output_file, const std::vector< double > xx, const std::vector< double > yy) |
write the model at xx, yy with best-fit parameters obtained from likelihood maximization More... | |
virtual void | write_model_from_chains (const std::string output_dir, const std::string output_file, const std::vector< double > xx, const int start=0, const int thin=1) |
write the model at xx computing 16th, 50th and 84th percentiles from the chains More... | |
virtual void | write_model_from_chains (const std::string output_dir, const std::string output_file, const std::vector< double > xx, const std::vector< double > yy, const int start=0, const int thin=1) |
write the model at xx, yy computing 16th, 50th and 84th percentiles from the chains More... | |
double | reduced_chi2 (const std::vector< double > parameter={}) |
the reduced \(\chi^2\) More... | |
Protected Attributes | |
std::shared_ptr< statistics::CombinedPosterior > | m_combined_posterior |
combined posterior | |
![]() | |
std::shared_ptr< data::Data > | m_data = NULL |
input data to be modelled | |
bool | m_fit_range = false |
check if fit range has been set | |
std::shared_ptr< data::Data > | m_data_fit |
input data restricted to the range used for the fit | |
std::shared_ptr< statistics::Model > | m_model = NULL |
input model | |
std::shared_ptr< statistics::Model > | m_response_func = NULL |
response function for the computation of the super-sample covariance | |
std::shared_ptr< statistics::Likelihood > | m_likelihood = NULL |
likelihood | |
std::vector< std::shared_ptr< statistics::PriorDistribution > > | m_parameter_priors |
prior | |
std::shared_ptr< statistics::Posterior > | m_posterior = NULL |
posterior | |
Additional Inherited Members | |
![]() | |
void | m_set_prior (std::vector< statistics::PriorDistribution > prior_distribution) |
set the internal variable m_parameter_priors More... | |
void | m_isSet_response () |
check if the response function used to compute the super-sample covariance is set | |
The class CombinedModelling.
This file defines the interface of the base class CombinedModelling, used for combining any kind of modelling
Definition at line 65 of file CombinedModelling.h.
cbl::modelling::CombinedModelling::CombinedModelling | ( | std::vector< std::shared_ptr< modelling::Modelling >> | modelling, |
std::vector< std::string > | repeated_par = {} , |
||
const std::vector< std::vector< std::vector< int >>> | common_repeated_par = {} |
||
) |
Constuctor used to combine statistically independent probes.
modelling | vector of pointers to the single Modelling objects |
repeated_par | parameters shared by different probes, for which the user wants different posteriors for each probe. For example, if the probes A and B depend on the same parameter \(p\), whose identification string is "par", then "par" must be given in input if the user desires to have different posteriors of \(p\) for the two probes A and B. Every probe depending on \(p\) will provide a different posterior on \(p\). This is useful in the case of astrophysical parameters, e.g. the parameters describing galaxy cluster profiles. |
common_repeated_par | for each argument in repeated_par, a vector of vectors of integers can be defined here. Such vectors define the sets of probes for which the same priors and posteriors are provided for the same parameters. For example, let us consider the probes {A1, A2, A3, A4}, provided in the modelling parameter. All the probes (A1, A2, A3, A4) depend on the parameter \(p\), whose identification string is "par", and we set repeated_par = {"par"}. If we want the pair of probes {A1, A2}, {A3, A4}, to provide a different posterior on \(p\), we must set common_repeated_par = { { {0,1}, {2,3} } }. This is useful when different probes in the same bin provide constraints on the same parameters. If common_repeated_par is not provided, every probe depending on \(p\) will provide a different posterior on \(p\). If two parameters are provided in repeated_par, e.g. repeated_par = {"par1", "par2"}, and only "par2" must be shared by more than one probe, then leave blank the vector of vectors corresponding to "par1" in common_repeated_par. |
Definition at line 46 of file CombinedModelling.cpp.
cbl::modelling::CombinedModelling::CombinedModelling | ( | std::vector< std::vector< std::shared_ptr< modelling::Modelling >>> | modelling, |
const std::vector< std::shared_ptr< data::CovarianceMatrix >> | covariance, | ||
const std::vector< cbl::statistics::LikelihoodType > | likelihood_types, | ||
const std::vector< std::string > | repeated_par = {} , |
||
const std::vector< std::vector< std::vector< int >>> | common_repeated_par = {} , |
||
const std::vector< std::shared_ptr< cosmology::SuperSampleCovariance >> | SSC = {} |
||
) |
Constructor used to set the modelling of statistically dependent probes. For each vector of cbl::statistics::Posterior objects in the posteriors argument, a cbl::data::CovarianceMatrix object must be defined. The probes in each vector within posteriors are described by the same likelihood function. The final log-likelihood is given by the sum of the logarithms of each likelihood describing a set of dependent probes.
If sets of probes are described by user-defined likelihoods, then the cbl::data::CovarianceMatrix objects must not be provided for such sets.
modelling | vector of vectors of pointers to Modelling objects. In each vector, a set of probes is contained, with a covariance matrix defined by the corresponding cbl::data::CovarianceMatrix object given as input in the second argument of this constructor |
covariance | objects defining the covariance matrices for the Posterior objects |
likelihood_types | likelihood types for each set of probes |
repeated_par | parameters shared by different probes, for which the user wants different posteriors for each probe. For example, if the probes A and B depend on the same parameter \(p\), whose identification string is "par", then "par" must be given in input if the user desires to have different posteriors of \(p\) for the two probes A and B. This is useful in the case of astrophysical parameters, e.g. the parameters describing galaxy cluster profiles. |
common_repeated_par | for each argument in repeated_par, a vector of vectors of integers can be defined here. Such vectors define the sets of probes for which the same priors and posteriors are provided for the same parameters. For example, let us consider the two sets of probes {A1, A2, A3} and {B1, B2, B3}, provided in the modelling parameter. All the probes (A1, A2, A3, B1, B2, B3) depend on the parameter \(p\), whose identification string is "par", and we set repeated_par = {"par"}. If we want each pair of probes {A1, B1}, {A2, B2}, {A3, B3}, to provide a different posterior on \(p\), we must set common_repeated_par = { { {0,3}, {1,4}, {2,5} } }. This is useful when different probes in the same bin provide constraints on the same parameters. If common_repeated_par is not provided, every probe depending on \(p\) will provide a different posterior on \(p\). If two parameters are provided in repeated_par, e.g. repeated_par = {"par1", "par2"}, and only "par2" must be shared by more than one probe, then leave blank the vector of vectors corresponding to "par1" in common_repeated_par. |
SSC | vector of pointers to cbl::cosmology::SuperSampleCovariance objects, for the computation of the \(S_{ij}\) matrices. If, for example, the sets of probes A, B, C, are considered, and only for A and C the super-sample covariance must be computed, then set the second element of SSC equal to NULL. |
Definition at line 61 of file CombinedModelling.cpp.
void cbl::modelling::CombinedModelling::maximize_combined_posterior | ( | const std::vector< double > | start, |
const unsigned int | max_iter = 10000 , |
||
const double | tol = 1.e-6 , |
||
const double | epsilon = 1.e-3 |
||
) |
function that maximizes the combined posterior, finds the best-fit parameters and stores them in the model
this function exploits the Nelder-Mead method https://en.wikipedia.org/wiki/Nelder%E2%80%93Mead_method
the algorithm defines a simplex (i.e a k-dimensional polytope which is the convex hull of its k+1 vertices) in the parameter space. At each step, it identifies the simplex vertex at which the function to be minimised (i.e. the negative posterior in this case) has the greatest value, and moves it, via reflections and scaling, to a new position in which the function has a lower value. This iteration stops when the simplex area becomes lower than the tolerance. For instance, in 2D, the starting vertices of the simplex (a triangle in 2D) are the following: (start[0], start[1]) ; (start[0]+epsilon, start[1]) ; (start[0], start[1]+epsilon)
start | vector containing initial values for the posterior maximization |
max_iter | the maximum number of iterations |
tol | the tolerance to find convergence |
epsilon | the relative fraction of the initial simplex size |
Definition at line 79 of file CombinedModelling.cpp.
void cbl::modelling::CombinedModelling::sample_combined_posterior | ( | const int | chain_size, |
const int | nwalkers, | ||
const double | aa = 2 , |
||
const bool | parallel = true |
||
) |
sample the posterior, initializing the chains by drawing from the prior distributions
the starting values of the chain are extracted from the (possibly different) distributions of the priors
chain_size | the chain lenght |
nwalkers | the number of parallel chains |
aa | the parameter of the \(g(z)\) distribution |
parallel | false \(\rightarrow\) non-parallel sampler; true \(\rightarrow\) parallel sampler |
Definition at line 88 of file CombinedModelling.cpp.
void cbl::modelling::CombinedModelling::sample_combined_posterior | ( | const int | chain_size, |
const int | nwalkers, | ||
const double | radius, | ||
const std::vector< double > | start, | ||
const unsigned int | max_iter = 10000 , |
||
const double | tol = 1.e-6 , |
||
const double | epsilon = 1.e-3 , |
||
const double | aa = 2 , |
||
const bool | parallel = true |
||
) |
sample the posterior, initializing the chains in a ball around the posterior best-fit parameters values
the starting values of the chain are extracted from uniform distributions in the range [parameter-radius, parameter+radius] (for each likelihood parameter)
this function first maximizes the posterior, starting the computation at the values of the input vector 'start', then it inizializes the chain
chain_size | the chain lenght |
nwalkers | the number of parallel chains |
radius | radius of the ball in parameter space |
start | std::vector containing initial values for the posterior maximization |
max_iter | the maximum number of iterations |
tol | the tolerance in finding convergence |
epsilon | the simplex side |
aa | the parameter of the \(g(z)\) distribution |
parallel | false \(\rightarrow\) non-parallel sampler; true \(\rightarrow\) parallel sampler |
Definition at line 98 of file CombinedModelling.cpp.
void cbl::modelling::CombinedModelling::sample_combined_posterior | ( | const int | chain_size, |
const int | nwalkers, | ||
const std::string | input_dir, | ||
const std::string | input_file, | ||
const int | seed, | ||
const double | aa = 2 , |
||
const bool | parallel = true |
||
) |
sample the posterior, initializing the chains reading the input values from an input file
chain_size | the chain lenght |
nwalkers | the number of parallel chains |
input_dir | input directory |
input_file | input file |
seed | the seed |
aa | the parameter of the \(g(z)\) distribution |
parallel | false \(\rightarrow\) non-parallel sampler; true \(\rightarrow\) parallel sampler |
Definition at line 108 of file CombinedModelling.cpp.
void cbl::modelling::CombinedModelling::write_combined_results | ( | const std::string | output_dir, |
const std::string | root_file, | ||
const int | start = 0 , |
||
const int | thin = 1 , |
||
const int | nbins = 50 , |
||
const bool | fits = false , |
||
const bool | compute_mode = false , |
||
const int | ns = -1 |
||
) |
write the results of the MCMC sampling to file
this function stores to file the posterior mean, the posterior standard deviation, the posterior median, 18th and 82th posterior percentiles, and, optionally, the posterior mode.
If the covariance matrix has been estimated from a set of mock catalogues, and the input parameters ns ( \(n_s\), the number of samples used to estimate the covariance matrix) is provided (>0), then the parameter errors ( \(\sigma_p\)) will be corrected to take into account the uncertainities in the covariance estimate (Percival et al. 2014):
\[ \sigma_p = \sqrt{\frac{1+B(n_b-n_p)}{1+A+B(n_p+1)}} \]
where
\[ A = \frac{2}{(n_s-n_b-1)(n_s-n_b-4)} \,, \]
\[ B = \frac{(n_s-n_b-2)}{(n_s-n_b-1)(n_s-n_b-4)} \,, \]
where \(n_b\) is number of data measurements (e.g. the bins of the dataset).
This correction can be applied only if the likelihood is Gaussian. Morever, the inverce covariance matrix estimator has to be corrected to take into account the inverse Wishart distribution (Hartlap, Simon and Schneider 2006).
output_dir | the output director |
root_file | the root of the output files: - file_root_parameters.dat file containing the output of the MCMC sampling for each parameter - file_root_covariance.dat file containing the covariance of the parameters - file_root_chain file containing the chains: the extention can be .dat or .fits |
start | the minimum chain position to be written |
thin | the step used for dilution on screen |
nbins | the number of bins |
fits | false \(\rightarrow\) ascii file; true \(\rightarrow\) fits file |
compute_mode | true \(\rightarrow\) compute the posterior mode; false \(\rightarrow\) do not compute the posterior mode |
ns | number of samples used to estimate the covariance matrix |
Definition at line 118 of file CombinedModelling.cpp.
void cbl::modelling::CombinedModelling::write_model_from_combined_chain | ( | const std::string | output_dir, |
const std::string | output_file, | ||
const int | start, | ||
const int | thin, | ||
const std::vector< double > | xx = {} , |
||
const std::vector< double > | yy = {} |
||
) |
write the model computing 16th, 50th and 84th percentiles from the MCMC
output_dir | the output directory |
output_file | the output file |
start | the minimum chain position to be written |
thin | the step used for dilution on screen |
xx | x points where the model is computed. If not provided, the x points set for the MCMC are considered. |
yy | y points where the model is computed. If not provided, the y points set for the MCMC are considered. |
Definition at line 127 of file CombinedModelling.cpp.