BYOST.build

`BYOST.build`#

Functions for normalizing spectra, running PCA and GPR, and assembling building blocks.

BYOST.build.make_buildingblock(df_spectra, df_conditions, normalize_method='intergrated_flux', normalize_wave_range=None, standardize_std=False, n_components=10, length_scales=[10.0, 0.1], remove_outliars=True, n_restarts_optimizer=20)[source]#

Input:

** Input dataset and conditions wished to be modeld upon ** df_spectra: pandas dataframe of the spectra on the common wavelenght grid, with wave as column names df_conditions: pandas dataframe of the conditions corresponding to df_spectra, e.g., epochs and sBVs

** arguements that could be used to prepare the data ** normalize_method: default = ‘intergrated_flux’; or None, “mean_flux” or “intergrated_flux”

None: - None: take the input data as it is “mean_flux”: normalize by dividing the mean flux in the selected range “intergrated_flux”: normalize by dividing the intergrated flux in the selected range

normalize_wave_range: default = None; or 2-element list ([lambda_left,lambda_right]) standardize_std: default = False, if=True, standardize the input data by standardeviation of each column

** arguement during the PCA step ** n_components: default = 20, the number of the components you would like to keep for furhure analysis

** arguement during the GPR step** length_scales: default [10, 0.1], the length scale of the RBF kernal for condition_1 and condition_2

The GPR depends on these initial scale values, try out the optiminal length scale for your data set!! (this is a little bit similar to the smoothness of the GP preditons, larger scale will return smoother precition, smaller scale will have more details)

remove_outliars: default = True, ignore the local PC ourliars that are beyond 5sigma*global_std n_restarts_optimizer: number of restart of the optimizer

Output:

df_buildingblocks: pandas dataframe contains resulting PCA and GPR

BYOST.build.normalize_flux(df_spectra, normalize_method='mean_flux', normalize_wave_range=None)[source]#

Input:

df_spectra: pandas dataframe of the spectra on the common wavelenght grid, with wave as column names normalize_method: “mean_flux” or “intergrated_flux”

“mean_flux”: normalize by dividing the mean flux in the selected range “intergrated_flux”: normalize by dividing the intergrated flux in the selected range

normalize_wave_range: None or 2-element list ([lambda_left,lambda_right])

Output:

df_spectra: same format as input df_spectra but now each spectrum are normalized

BYOST.build.DO_PCA(PCA_input, n_components=10, standardize_std=False)[source]#

Input:: PCA_input: pandas dataframe/2-D arrays of the normalized flux n_components: default = 20, the number of the components you would like to keep for furhure analysis standardize_std: default = False, if=True, standardize the input data by standardeviation of each column
Output:: pca: fitted pca, see https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html PCA_output: pandas dataframe of the PC projections, which has the same dimension as PCA_input in row,

but dimension in columns is reduced to n_components)

BYOST.build.GPR_2D_input(x1, x2, y, yerr=None, length_scales=[10.0, 0.1], n_restarts_optimizer=20, return_score=True)[source]#

Input:

x1: input variable 1, N-elements 1-D array x2: input variable 2, N-elements 1-D array y: dependent variable, N-elements 1-D array yerr: If not None, the errors of the dependent variable, N-elements 1-D array length_scales: default [10, 0.1], the length scale of the RBF kernal for x1 and x2 n_restarts_optimizer: number of restart of the optimizer return_score: default True, return the GPR R^2 score on the predictions of y given x1 and x2

Output:

gp: fitted gp, see https://scikit-learn.org/stable/: modules/generated/sklearn.gaussian_process.GaussianProcessRegressor.html
gp_score (if return_score=True): scaler, the GPR R^2 score on the predictions of y given x1 and x2: should be between 0 to 1, close to 1 is better generally.

BYOST.build.DO_GPR(PCA_projections, condition_1, condition_2, length_scales=[10.0, 0.1], remove_outliars=True, n_restarts_optimizer=20)[source]#

Input:

PCA_output: pandas dataframe of the PCA projections condition_1: input variable 1, N-elements 1-D array condition_2: input variable 2, N-elements 1-D array length_scales: default [10, 0.1], the length scale of the RBF kernal for condition_1 and condition_2

The GPR depends on these initial scale values, try out the optiminal length scale for your data set!! (this is a little bit similar to the smoothness of the GP preditons, larger scale will return smoother precition, smaller scale will have more details)

remove_outliars: default = True, ignore the local PC ourliars that are beyond 5sigma*global_std n_restarts_optimizer: number of restart of the optimizer

Output:

GPR_output: a pandas dataframe of the fitted gps (and gp scores if True) for each PC column: given the conditions as inut

BYOST.build

Contents

BYOST.build#

`BYOST.build`#