BYOST.build#
Functions for normalizing spectra, running PCA and GPR, and assembling building blocks.
- BYOST.build.make_buildingblock(df_spectra, df_conditions, normalize_method='intergrated_flux', normalize_wave_range=None, standardize_std=False, n_components=10, length_scales=[10.0, 0.1], remove_outliars=True, n_restarts_optimizer=20)[source]#
- Input:
** Input dataset and conditions wished to be modeld upon ** df_spectra: pandas dataframe of the spectra on the common wavelenght grid, with wave as column names df_conditions: pandas dataframe of the conditions corresponding to df_spectra, e.g., epochs and sBVs
** arguements that could be used to prepare the data ** normalize_method: default = ‘intergrated_flux’; or None, “mean_flux” or “intergrated_flux”
None: - None: take the input data as it is “mean_flux”: normalize by dividing the mean flux in the selected range “intergrated_flux”: normalize by dividing the intergrated flux in the selected range
normalize_wave_range: default = None; or 2-element list ([lambda_left,lambda_right]) standardize_std: default = False, if=True, standardize the input data by standardeviation of each column
** arguement during the PCA step ** n_components: default = 20, the number of the components you would like to keep for furhure analysis
** arguement during the GPR step** length_scales: default [10, 0.1], the length scale of the RBF kernal for condition_1 and condition_2
The GPR depends on these initial scale values, try out the optiminal length scale for your data set!! (this is a little bit similar to the smoothness of the GP preditons, larger scale will return smoother precition, smaller scale will have more details)
remove_outliars: default = True, ignore the local PC ourliars that are beyond 5sigma*global_std n_restarts_optimizer: number of restart of the optimizer
- Output:
df_buildingblocks: pandas dataframe contains resulting PCA and GPR
- BYOST.build.normalize_flux(df_spectra, normalize_method='mean_flux', normalize_wave_range=None)[source]#
- Input:
df_spectra: pandas dataframe of the spectra on the common wavelenght grid, with wave as column names normalize_method: “mean_flux” or “intergrated_flux”
“mean_flux”: normalize by dividing the mean flux in the selected range “intergrated_flux”: normalize by dividing the intergrated flux in the selected range
normalize_wave_range: None or 2-element list ([lambda_left,lambda_right])
- Output:
df_spectra: same format as input df_spectra but now each spectrum are normalized
- BYOST.build.DO_PCA(PCA_input, n_components=10, standardize_std=False)[source]#
- Input:
PCA_input: pandas dataframe/2-D arrays of the normalized flux n_components: default = 20, the number of the components you would like to keep for furhure analysis standardize_std: default = False, if=True, standardize the input data by standardeviation of each column
- Output:
pca: fitted pca, see https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html PCA_output: pandas dataframe of the PC projections, which has the same dimension as PCA_input in row,
but dimension in columns is reduced to n_components)
- BYOST.build.GPR_2D_input(x1, x2, y, yerr=None, length_scales=[10.0, 0.1], n_restarts_optimizer=20, return_score=True)[source]#
- Input:
x1: input variable 1, N-elements 1-D array x2: input variable 2, N-elements 1-D array y: dependent variable, N-elements 1-D array yerr: If not None, the errors of the dependent variable, N-elements 1-D array length_scales: default [10, 0.1], the length scale of the RBF kernal for x1 and x2 n_restarts_optimizer: number of restart of the optimizer return_score: default True, return the GPR R^2 score on the predictions of y given x1 and x2
- Output:
- gp: fitted gp, see https://scikit-learn.org/stable/
modules/generated/sklearn.gaussian_process.GaussianProcessRegressor.html
- gp_score (if return_score=True): scaler, the GPR R^2 score on the predictions of y given x1 and x2
should be between 0 to 1, close to 1 is better generally.
- BYOST.build.DO_GPR(PCA_projections, condition_1, condition_2, length_scales=[10.0, 0.1], remove_outliars=True, n_restarts_optimizer=20)[source]#
- Input:
PCA_output: pandas dataframe of the PCA projections condition_1: input variable 1, N-elements 1-D array condition_2: input variable 2, N-elements 1-D array length_scales: default [10, 0.1], the length scale of the RBF kernal for condition_1 and condition_2
The GPR depends on these initial scale values, try out the optiminal length scale for your data set!! (this is a little bit similar to the smoothness of the GP preditons, larger scale will return smoother precition, smaller scale will have more details)
remove_outliars: default = True, ignore the local PC ourliars that are beyond 5sigma*global_std n_restarts_optimizer: number of restart of the optimizer
- Output:
- GPR_output: a pandas dataframe of the fitted gps (and gp scores if True) for each PC column
given the conditions as inut