models.model_spec

models.model_spec#

Module: `models.model_spec`#

Inheritance diagram for ISLP.models.model_spec:

digraph inheritance09a94ee187 { bgcolor=transparent; rankdir=LR; size="8.0, 12.0"; "models.model_spec.Contrast" [URL="#ISLP.models.model_spec.Contrast",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top"]; "sklearn.base.TransformerMixin" -> "models.model_spec.Contrast" [arrowsize=0.5,style="setlinewidth(0.5)"]; "sklearn.base.BaseEstimator" -> "models.model_spec.Contrast" [arrowsize=0.5,style="setlinewidth(0.5)"]; "models.model_spec.Feature" [URL="#ISLP.models.model_spec.Feature",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="An element in a model matrix that will build"]; "models.model_spec.ModelSpec" [URL="#ISLP.models.model_spec.ModelSpec",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="Parameters"]; "sklearn.base.TransformerMixin" -> "models.model_spec.ModelSpec" [arrowsize=0.5,style="setlinewidth(0.5)"]; "sklearn.base.BaseEstimator" -> "models.model_spec.ModelSpec" [arrowsize=0.5,style="setlinewidth(0.5)"]; "sklearn.base.BaseEstimator" [URL="https://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html#sklearn.base.BaseEstimator",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="Base class for all estimators in scikit-learn."]; "sklearn.base.TransformerMixin" [URL="https://scikit-learn.org/stable/modules/generated/sklearn.base.TransformerMixin.html#sklearn.base.TransformerMixin",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="Mixin class for all transformers in scikit-learn."]; }

Model specification#

This module defines the basic object to represent regression formula: ModelSpec.

Classes#

`Contrast`#

class ISLP.models.model_spec.Contrast(method='drop', drop_level=None)#

Bases: TransformerMixin, BaseEstimator

Methods

`fit`(X[, y])	Construct contrast of categorical variable for use in building a design matrix.
`fit_transform`(X[, y])	Fit to data, then transform it.
`get_metadata_routing`()	Get metadata routing of this object.
`get_params`([deep])	Get parameters for this estimator.
`set_output`(*[, transform])	Set output container.
`set_params`(**params)	Set the parameters of this estimator.

transform

__init__(method='drop', drop_level=None)#

Contrast encoding for categorical variables.

Parameters:

method[‘drop’, ‘sum’, None, callable]: If ‘drop’, then a column of the one-hot encoding will be dropped. If ‘sum’, then the sum of coefficients is constrained to sum to 1. If None, the full one-hot encoding is returned. Finally, if callable, then it should take the number of levels of the category as a single argument and return an appropriate contrast of the full one-hot encoding.
drop_levelstr (optional): If not None, this level of the category will be dropped if method==’drop’.

fit(X, y=None)#

Construct contrast of categorical variable for use in building a design matrix.

Parameters:

Xarray-like: X on which model matrix will be evaluated. If a pd.DataFrame or pd.Series, variables that are of categorical dtype will be treated as categorical.

Returns:

Farray-like: Columns of design matrix implied by the categorical variable.

fit_transform(X, y=None, **fit_params)#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:

Xarray-like of shape (n_samples, n_features): Input samples.
yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None: Target values (None for unsupervised transformations).
**fit_paramsdict: Additional fit parameters.

Returns:

X_newndarray array of shape (n_samples, n_features_new): Transformed array.

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)#

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

set_output(*, transform=None)#

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:

transform{“default”, “pandas”, “polars”}, default=None

Configure output of transform and fit_transform.

“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged

Added in version 1.4: “polars” option was added.

Returns:

selfestimator instance: Estimator instance.

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

transform(X)#

`Feature`#

class ISLP.models.model_spec.Feature(variables: tuple, name: str, encoder: Any, use_transform: bool = True, pure_columns: bool = False, override_encoder_colnames: bool = False)#

Bases: NamedTuple

An element in a model matrix that will build columns from an array-like X.

Methods

`count`(value, /)	Return number of occurrences of value.
`index`(value[, start, stop])	Return first index of value.

__init__(*args, **kwargs)#

count(value, /)#: Return number of occurrences of value.

encoder: Any#: Alias for field number 2

index(value, start=0, stop=sys.maxsize, /)#

Return first index of value.

Raises ValueError if the value is not present.

name: str#: Alias for field number 1

override_encoder_colnames: bool#: Alias for field number 5

pure_columns: bool#: Alias for field number 4

use_transform: bool#: Alias for field number 3

variables: tuple#: Alias for field number 0

`ModelSpec`#

class ISLP.models.model_spec.ModelSpec(terms=[], intercept=True, categorical_features=None, categorical_encoders={})#

Bases: TransformerMixin, BaseEstimator

Parameters:

termssequence (optional)

Sequence of sets whose elements are columns of X when fit. For pd.DataFrame these can be column names.

interceptbool (optional)

Include a column for intercept?

categorical_featuresarray-like of {bool, int} of shape (n_features)

or shape (n_categorical_features,), default=None.

Indicates the categorical features. Will be ignored if X is a pd.DataFrame or pd.Series.

None : no feature will be considered categorical for np.ndarray.
boolean array-like : boolean mask indicating categorical features.
integer array-like : integer indices indicating categorical features.

categorical_encodersdict

Dictionary whose keys are elements of terms that represent categorical variables. Its values are transforms to be applied to the associate columns in the model matrix by running the fit_transform method when fit is called and overwriting these values in the dictionary.

Attributes:

names

Methods

`build_sequence`(X[, anova_type])	Build implied sequence of submodels based on successively including more terms.
`build_submodel`(X, terms)	Build design on X after fitting.
`fit`(X[, y])	Construct parameters for orthogonal polynomials in the feature X.
`fit_transform`(X[, y])	Fit to data, then transform it.
`get_metadata_routing`()	Get metadata routing of this object.
`get_params`([deep])	Get parameters for this estimator.
`set_output`(*[, transform])	Set output container.
`set_params`(**params)	Set the parameters of this estimator.
`transform`(X[, y])	Build design on X after fitting.

__init__(terms=[], intercept=True, categorical_features=None, categorical_encoders={})#

build_sequence(X, anova_type='sequential')#

Build implied sequence of submodels based on successively including more terms.

Parameters:

Xarray-like: X on which columns are evaluated.
anova_type: str: One of “sequential” or “drop”.

Returns:

modelsgenerator: Generator for sequence of models for ANOVA.

build_submodel(X, terms)#

Build design on X after fitting.

Parameters:

Xarray-like: X on which columns are evaluated.
terms[Feature]: Sequence of features

Returns:

Darray-like: Design matrix created with terms

fit(X, y=None)#

Construct parameters for orthogonal polynomials in the feature X.

Parameters:

Xarray-like: X on which model matrix will be evaluated. If a pd.DataFrame or pd.Series, variables that are of categorical dtype will be treated as categorical.

fit_transform(X, y=None, **fit_params)#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:

Xarray-like of shape (n_samples, n_features): Input samples.
yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None: Target values (None for unsupervised transformations).
**fit_paramsdict: Additional fit parameters.

Returns:

X_newndarray array of shape (n_samples, n_features_new): Transformed array.

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)#

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

property names#

set_output(*, transform=None)#

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:

transform{“default”, “pandas”, “polars”}, default=None

Configure output of transform and fit_transform.

“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged

Added in version 1.4: “polars” option was added.

Returns:

selfestimator instance: Estimator instance.

set_params(**params)#

Set the parameters of this estimator.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

transform(X, y=None)#

Build design on X after fitting.

Parameters:

Xarray-like
yNone: Ignored. This parameter exists only for compatibility with sklearn.pipeline.Pipeline.

Functions#

ISLP.models.model_spec.bs(col, intercept=False, name=None, **spline_args)#

Create a B-spline Feature for a given column.

Additional args and spline_args are passed to ISLP.transforms.BSpline along with intercept.

Parameters:

colcolumn identifier or Column
interceptbool: Include an intercept column.
namestr (optional): Defaults to one derived from col.

Returns:

varFeature

ISLP.models.model_spec.build_columns(column_info, X, var, encoders={}, col_cache={}, fit=False)#

Build columns for a Feature from X.

Parameters:

column_info: dict: Dictionary with values specifying sets of columns to be concatenated into a design matrix.
Xarray-like: X on which columns are evaluated.
varFeature: Feature whose columns will be built, typically a key in column_info.
encodersdict: Dict that stores encoder of each Feature.
col_cache: dict: Dict where columns will be stored – if var.name in col_cache then just returns those columns.
fitbool (optional): If True, then try to fit encoder. Will raise an error if encoder has already been fit.

ISLP.models.model_spec.build_model(column_info, X, terms, intercept=True, encoders={})#

Construct design matrix on a sequence of terms and X after fitting.

Parameters:

column_info: dict: Dictionary with values specifying sets of columns to be concatenated into a design matrix.
Xarray-like: X on which columns are evaluated.
terms[Feature]: Sequence of features
encodersdict: Dict that stores encoder of each Feature.

Returns:

dfnp.ndarray or pd.DataFrame: Design matrix.

ISLP.models.model_spec.contrast(col, method='drop', drop_level=None)#

Create encoding of categorical feature.

Parameters:

col: column identifier
method: ‘drop’, ‘sum’, None or callable
drop_level: level identifier

Returns:

varFeature

ISLP.models.model_spec.derived_feature(variables, encoder=None, name=None, use_transform=True)#

Create a Feature, optionally applying an encoder to the stacked columns.

Parameters:

variables[column identifier, Column, Feature]: Variables to apply transform to. Could be column identifiers or variables: all columns will be stacked before encoding.
namestr (optional): Defaults to str(encoder).
encodertransform-like (optional): Transform obeying sklearn fit/transform convention.

Returns:

varFeature

ISLP.models.model_spec.fit_encoder(encoders, var, X)#

Fit an encoder if not already registered in encoders.

Parameters:

encodersdict: Dictionary of encoders for each feature.
varFeature: Feature whose encoder will be fit.
Xarray-like: X on which encoder will be fit.

ISLP.models.model_spec.ns(col, intercept=False, name=None, **spline_args)#

Create a natural spline Feature for a given column.

Additional spline_args are passed to NaturalSpline along with intercept.

Parameters:

colcolumn identifier or Column
interceptbool: Include an intercept column.
namestr (optional): Defaults to one derived from col.

Returns:

varFeature

ISLP.models.model_spec.ordinal(col, name=None, *args, **kwargs)#

Create ordinal encoding of categorical feature.

Parameters:

col: column identifier

Returns:

varFeature

ISLP.models.model_spec.pca(variables, name, scale=False, **pca_args)#

Create PCA encoding of features from a sequence of variables.

Additional args and pca_args are passed to ISLP.transforms.PCA.

Parameters:

variables[column identifier, Column or Feature]: Sequence whose columns will be encoded by PCA.

Returns:

varFeature

ISLP.models.model_spec.poly(col, degree=1, intercept=False, raw=False, name=None)#

Create a polynomial Feature for a given column.

Additional args and kwargs are passed to Poly.

Parameters:

colcolumn identifier or Column: Column to transform.
degreeint, default=1: Degree of polynomial.
interceptbool, default=False: Include a column for intercept?
rawbool, default=False: If False, perform a QR decomposition on the resulting matrix of powers of centered and / or scaled features.
namestr (optional): Defaults to one derived from col.

Returns:

varFeature

models.model_spec

Contents

models.model_spec#

Module: models.model_spec#

Model specification#

Classes#

Contrast#

Feature#

ModelSpec#

Functions#

Module: `models.model_spec`#

`Contrast`#

`Feature`#

`ModelSpec`#