fit_nd.py

fit_relation(DataDict, SigmaLimit=0.001, save_path=None, select_deg=None, degree_max=None, SymmetricDegreePerDimension=True, NumMonteCarlo=0, NumBootstrap=0, k_fold=None, cores=1, verbose=2)

Fit an n-dimensional relationship using a non parametric model with beta densities.

Parameters:
  • DataDict (dict) – The dictionary containing the data. See the output of mrexo.mle_utils_nd.InputData().

  • SigmaLimit (int, default=1e-3) – The lower limit on the sigma values for all dimensions. Sigma values lower than this limit will be changed to None. This is required because the standard normal distribution blows up if the sigma values are too small (~1e-4). Then the distribution is no longer a convolution of normal and beta distributions, but is just a beta distribution.

  • save_path (str, optional) – The folder name (including path) to save results in. For example, save_path = '~/mrexo_working/trial_result' will create the ‘trial_result’ folder in ‘mrexo_working’ to contain the results.

  • select_deg ({'cv', 'aic', 'bic'} or int, optional) – The number of degrees (or method of determining the number of degrees) for the beta densities. If “cv”, will use cross validation to find the optimal number of degrees. If “aic”, will use AIC minimization. If “bic”, will use BIC minimization. If an integer, will use that number and skip the optimization process for the number of degrees. NOTE: Use AIC or BIC optimization only for large (>200) sample sizes.

  • degree_max (int, optional) – The maximum degree checked during degree selection. By default, uses n/np.log10(n), where n is the number of data points.

  • SymmetricDegreePerDimension (bool, default=True) – If True, while optimizing the number of degrees, it assumes the same number of degrees in each dimension (i.e. symmetric). In the symmetric case, it runs through NumCandidates iterations, typically 20. So the degree candidates are [d1, d1], [d2, d2], etc.. If False, while optimizing the number of degrees it can have NumCandidates ^ NumDimensions iterations. Therefore with 20 degree candidates in 2 dimensions, there will be 400 iterations to go through!

  • NumMonteCarlo (Integer, default=0) – Number of Monte-Carlo simulations to run

  • NumBootstrap (int, default=0) – The number of bootstraps to perform (must be greater than 1).

  • k_fold (int, optional) – The number of folds, if using k-fold validation. Only used if select_deg='cv'. By default, uses 10 folds for n > 60, and 5 folds otherwise.

  • cores (int, default=1) –

    The number of cores to use for parallel processing. This is used in the

    bootstrap and the cross validation. To use all the cores in the CPU, set cores=cpu_count() (requires ‘#from multiprocessing import cpu_count’).

  • verbose ({0,1,2}, default=2) – Integer specifying verbosity for logging: 0 (will not log in the log file or print statements), 1 (will write log file only), or 2 (will write log file and print statements).

Returns:

FullFitResult – Output dictionary from initial fitting without bootstrap using Maximum Likelihood Estimation. See the output of mrexo.mle_utils_nd.MLE_fit().

Return type:

dict