models.model_spec#
Module: models.model_spec
#
Inheritance diagram for ISLP.models.model_spec
:
Model specification#
This module defines the basic object to represent regression formula: ModelSpec.
Classes#
Contrast
#
- class ISLP.models.model_spec.Contrast(method='drop', drop_level=None)#
Bases:
TransformerMixin
,BaseEstimator
Methods
fit
(X[, y])Construct contrast of categorical variable for use in building a design matrix.
fit_transform
(X[, y])Fit to data, then transform it.
Get metadata routing of this object.
get_params
([deep])Get parameters for this estimator.
set_output
(*[, transform])Set output container.
set_params
(**params)Set the parameters of this estimator.
transform
- __init__(method='drop', drop_level=None)#
Contrast encoding for categorical variables.
- Parameters:
- method[‘drop’, ‘sum’, None, callable]
If ‘drop’, then a column of the one-hot encoding will be dropped. If ‘sum’, then the sum of coefficients is constrained to sum to 1. If None, the full one-hot encoding is returned. Finally, if callable, then it should take the number of levels of the category as a single argument and return an appropriate contrast of the full one-hot encoding.
- drop_levelstr (optional)
If not None, this level of the category will be dropped if method==’drop’.
- fit(X, y=None)#
Construct contrast of categorical variable for use in building a design matrix.
- Parameters:
- Xarray-like
X on which model matrix will be evaluated. If a
pd.DataFrame
orpd.Series
, variables that are of categorical dtype will be treated as categorical.
- Returns:
- Farray-like
Columns of design matrix implied by the categorical variable.
- fit_transform(X, y=None, **fit_params)#
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Input samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None
Target values (None for unsupervised transformations).
- **fit_paramsdict
Additional fit parameters.
- Returns:
- X_newndarray array of shape (n_samples, n_features_new)
Transformed array.
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- set_output(*, transform=None)#
Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
- transform{“default”, “pandas”, “polars”}, default=None
Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
Added in version 1.4: “polars” option was added.
- Returns:
- selfestimator instance
Estimator instance.
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- transform(X)#
Feature
#
- class ISLP.models.model_spec.Feature(variables: tuple, name: str, encoder: Any, use_transform: bool = True, pure_columns: bool = False, override_encoder_colnames: bool = False)#
Bases:
NamedTuple
An element in a model matrix that will build columns from an array-like X.
Methods
count
(value, /)Return number of occurrences of value.
index
(value[, start, stop])Return first index of value.
- __init__(*args, **kwargs)#
- count(value, /)#
Return number of occurrences of value.
- index(value, start=0, stop=sys.maxsize, /)#
Return first index of value.
Raises ValueError if the value is not present.
ModelSpec
#
- class ISLP.models.model_spec.ModelSpec(terms=[], intercept=True, categorical_features=None, categorical_encoders={})#
Bases:
TransformerMixin
,BaseEstimator
- Parameters:
- termssequence (optional)
Sequence of sets whose elements are columns of X when fit. For
pd.DataFrame
these can be column names.- interceptbool (optional)
Include a column for intercept?
- categorical_featuresarray-like of {bool, int} of shape (n_features)
or shape (n_categorical_features,), default=None.
Indicates the categorical features. Will be ignored if X is a
pd.DataFrame
orpd.Series
.None : no feature will be considered categorical for
np.ndarray
.boolean array-like : boolean mask indicating categorical features.
integer array-like : integer indices indicating categorical features.
- categorical_encodersdict
Dictionary whose keys are elements of terms that represent categorical variables. Its values are transforms to be applied to the associate columns in the model matrix by running the fit_transform method when fit is called and overwriting these values in the dictionary.
- Attributes:
- names
Methods
build_sequence
(X[, anova_type])Build implied sequence of submodels based on successively including more terms.
build_submodel
(X, terms)Build design on X after fitting.
fit
(X[, y])Construct parameters for orthogonal polynomials in the feature X.
fit_transform
(X[, y])Fit to data, then transform it.
Get metadata routing of this object.
get_params
([deep])Get parameters for this estimator.
set_output
(*[, transform])Set output container.
set_params
(**params)Set the parameters of this estimator.
transform
(X[, y])Build design on X after fitting.
- __init__(terms=[], intercept=True, categorical_features=None, categorical_encoders={})#
- build_sequence(X, anova_type='sequential')#
Build implied sequence of submodels based on successively including more terms.
- Parameters:
- Xarray-like
X on which columns are evaluated.
- anova_type: str
One of “sequential” or “drop”.
- Returns:
- modelsgenerator
Generator for sequence of models for ANOVA.
- build_submodel(X, terms)#
Build design on X after fitting.
- Parameters:
- Xarray-like
X on which columns are evaluated.
- terms[Feature]
Sequence of features
- Returns:
- Darray-like
Design matrix created with terms
- fit(X, y=None)#
Construct parameters for orthogonal polynomials in the feature X.
- Parameters:
- Xarray-like
X on which model matrix will be evaluated. If a
pd.DataFrame
orpd.Series
, variables that are of categorical dtype will be treated as categorical.
- fit_transform(X, y=None, **fit_params)#
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Input samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None
Target values (None for unsupervised transformations).
- **fit_paramsdict
Additional fit parameters.
- Returns:
- X_newndarray array of shape (n_samples, n_features_new)
Transformed array.
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- property names#
- set_output(*, transform=None)#
Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
- transform{“default”, “pandas”, “polars”}, default=None
Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
Added in version 1.4: “polars” option was added.
- Returns:
- selfestimator instance
Estimator instance.
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- transform(X, y=None)#
Build design on X after fitting.
- Parameters:
- Xarray-like
- yNone
Ignored. This parameter exists only for compatibility with
sklearn.pipeline.Pipeline
.
Functions#
- ISLP.models.model_spec.bs(col, intercept=False, name=None, **spline_args)#
Create a B-spline Feature for a given column.
Additional args and spline_args are passed to
ISLP.transforms.BSpline
along with intercept.- Parameters:
- colcolumn identifier or Column
- interceptbool
Include an intercept column.
- namestr (optional)
Defaults to one derived from col.
- Returns:
- varFeature
- ISLP.models.model_spec.build_columns(column_info, X, var, encoders={}, col_cache={}, fit=False)#
Build columns for a Feature from X.
- Parameters:
- column_info: dict
Dictionary with values specifying sets of columns to be concatenated into a design matrix.
- Xarray-like
X on which columns are evaluated.
- varFeature
Feature whose columns will be built, typically a key in column_info.
- encodersdict
Dict that stores encoder of each Feature.
- col_cache: dict
Dict where columns will be stored – if var.name in col_cache then just returns those columns.
- fitbool (optional)
If True, then try to fit encoder. Will raise an error if encoder has already been fit.
- ISLP.models.model_spec.build_model(column_info, X, terms, intercept=True, encoders={})#
Construct design matrix on a sequence of terms and X after fitting.
- Parameters:
- column_info: dict
Dictionary with values specifying sets of columns to be concatenated into a design matrix.
- Xarray-like
X on which columns are evaluated.
- terms[Feature]
Sequence of features
- encodersdict
Dict that stores encoder of each Feature.
- Returns:
- dfnp.ndarray or pd.DataFrame
Design matrix.
- ISLP.models.model_spec.contrast(col, method='drop', drop_level=None)#
Create encoding of categorical feature.
- Parameters:
- col: column identifier
- method: ‘drop’, ‘sum’, None or callable
- drop_level: level identifier
- Returns:
- varFeature
- ISLP.models.model_spec.derived_feature(variables, encoder=None, name=None, use_transform=True)#
Create a Feature, optionally applying an encoder to the stacked columns.
- Parameters:
- variables[column identifier, Column, Feature]
Variables to apply transform to. Could be column identifiers or variables: all columns will be stacked before encoding.
- namestr (optional)
Defaults to str(encoder).
- encodertransform-like (optional)
Transform obeying sklearn fit/transform convention.
- Returns:
- varFeature
- ISLP.models.model_spec.fit_encoder(encoders, var, X)#
Fit an encoder if not already registered in encoders.
- Parameters:
- encodersdict
Dictionary of encoders for each feature.
- varFeature
Feature whose encoder will be fit.
- Xarray-like
X on which encoder will be fit.
- ISLP.models.model_spec.ns(col, intercept=False, name=None, **spline_args)#
Create a natural spline Feature for a given column.
Additional spline_args are passed to
NaturalSpline
along with intercept.- Parameters:
- colcolumn identifier or Column
- interceptbool
Include an intercept column.
- namestr (optional)
Defaults to one derived from col.
- Returns:
- varFeature
- ISLP.models.model_spec.ordinal(col, name=None, *args, **kwargs)#
Create ordinal encoding of categorical feature.
- Parameters:
- col: column identifier
- Returns:
- varFeature
- ISLP.models.model_spec.pca(variables, name, scale=False, **pca_args)#
Create PCA encoding of features from a sequence of variables.
Additional args and pca_args are passed to
ISLP.transforms.PCA
.- Parameters:
- variables[column identifier, Column or Feature]
Sequence whose columns will be encoded by PCA.
- Returns:
- varFeature
- ISLP.models.model_spec.poly(col, degree=1, intercept=False, raw=False, name=None)#
Create a polynomial Feature for a given column.
Additional args and kwargs are passed to Poly.
- Parameters:
- colcolumn identifier or Column
Column to transform.
- degreeint, default=1
Degree of polynomial.
- interceptbool, default=False
Include a column for intercept?
- rawbool, default=False
If False, perform a QR decomposition on the resulting matrix of powers of centered and / or scaled features.
- namestr (optional)
Defaults to one derived from col.
- Returns:
- varFeature