models.model_spec#

Module: models.model_spec#

Inheritance diagram for ISLP.models.model_spec:

digraph inheritance09a94ee187 { bgcolor=transparent; rankdir=LR; size="8.0, 12.0"; "models.model_spec.Contrast" [URL="#ISLP.models.model_spec.Contrast",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top"]; "sklearn.base.TransformerMixin" -> "models.model_spec.Contrast" [arrowsize=0.5,style="setlinewidth(0.5)"]; "sklearn.base.BaseEstimator" -> "models.model_spec.Contrast" [arrowsize=0.5,style="setlinewidth(0.5)"]; "models.model_spec.Feature" [URL="#ISLP.models.model_spec.Feature",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="An element in a model matrix that will build"]; "models.model_spec.ModelSpec" [URL="#ISLP.models.model_spec.ModelSpec",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="Parameters"]; "sklearn.base.TransformerMixin" -> "models.model_spec.ModelSpec" [arrowsize=0.5,style="setlinewidth(0.5)"]; "sklearn.base.BaseEstimator" -> "models.model_spec.ModelSpec" [arrowsize=0.5,style="setlinewidth(0.5)"]; "sklearn.base.BaseEstimator" [URL="https://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html#sklearn.base.BaseEstimator",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="Base class for all estimators in scikit-learn."]; "sklearn.base.TransformerMixin" [URL="https://scikit-learn.org/stable/modules/generated/sklearn.base.TransformerMixin.html#sklearn.base.TransformerMixin",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="Mixin class for all transformers in scikit-learn."]; }

Model specification#

This module defines the basic object to represent regression formula: ModelSpec.

Classes#

Contrast#

class ISLP.models.model_spec.Contrast(method='drop', drop_level=None)#

Bases: TransformerMixin, BaseEstimator

Methods

fit(X[, y])

Construct contrast of categorical variable for use in building a design matrix.

fit_transform(X[, y])

Fit to data, then transform it.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

set_output(*[, transform])

Set output container.

set_params(**params)

Set the parameters of this estimator.

transform

__init__(method='drop', drop_level=None)#

Contrast encoding for categorical variables.

Parameters:
method[‘drop’, ‘sum’, None, callable]

If ‘drop’, then a column of the one-hot encoding will be dropped. If ‘sum’, then the sum of coefficients is constrained to sum to 1. If None, the full one-hot encoding is returned. Finally, if callable, then it should take the number of levels of the category as a single argument and return an appropriate contrast of the full one-hot encoding.

drop_levelstr (optional)

If not None, this level of the category will be dropped if method==’drop’.

fit(X, y=None)#

Construct contrast of categorical variable for use in building a design matrix.

Parameters:
Xarray-like

X on which model matrix will be evaluated. If a pd.DataFrame or pd.Series, variables that are of categorical dtype will be treated as categorical.

Returns:
Farray-like

Columns of design matrix implied by the categorical variable.

fit_transform(X, y=None, **fit_params)#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
Xarray-like of shape (n_samples, n_features)

Input samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None

Target values (None for unsupervised transformations).

**fit_paramsdict

Additional fit parameters.

Returns:
X_newndarray array of shape (n_samples, n_features_new)

Transformed array.

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

set_output(*, transform=None)#

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:
transform{“default”, “pandas”, “polars”}, default=None

Configure output of transform and fit_transform.

  • “default”: Default output format of a transformer

  • “pandas”: DataFrame output

  • “polars”: Polars output

  • None: Transform configuration is unchanged

Added in version 1.4: “polars” option was added.

Returns:
selfestimator instance

Estimator instance.

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

transform(X)#

Feature#

class ISLP.models.model_spec.Feature(variables: tuple, name: str, encoder: Any, use_transform: bool = True, pure_columns: bool = False, override_encoder_colnames: bool = False)#

Bases: NamedTuple

An element in a model matrix that will build columns from an array-like X.

Methods

count(value, /)

Return number of occurrences of value.

index(value[, start, stop])

Return first index of value.

__init__(*args, **kwargs)#
count(value, /)#

Return number of occurrences of value.

encoder: Any#

Alias for field number 2

index(value, start=0, stop=sys.maxsize, /)#

Return first index of value.

Raises ValueError if the value is not present.

name: str#

Alias for field number 1

override_encoder_colnames: bool#

Alias for field number 5

pure_columns: bool#

Alias for field number 4

use_transform: bool#

Alias for field number 3

variables: tuple#

Alias for field number 0

ModelSpec#

class ISLP.models.model_spec.ModelSpec(terms=[], intercept=True, categorical_features=None, categorical_encoders={})#

Bases: TransformerMixin, BaseEstimator

Parameters:
termssequence (optional)

Sequence of sets whose elements are columns of X when fit. For pd.DataFrame these can be column names.

interceptbool (optional)

Include a column for intercept?

categorical_featuresarray-like of {bool, int} of shape (n_features)

or shape (n_categorical_features,), default=None.

Indicates the categorical features. Will be ignored if X is a pd.DataFrame or pd.Series.

  • None : no feature will be considered categorical for np.ndarray.

  • boolean array-like : boolean mask indicating categorical features.

  • integer array-like : integer indices indicating categorical features.

categorical_encodersdict

Dictionary whose keys are elements of terms that represent categorical variables. Its values are transforms to be applied to the associate columns in the model matrix by running the fit_transform method when fit is called and overwriting these values in the dictionary.

Attributes:
names

Methods

build_sequence(X[, anova_type])

Build implied sequence of submodels based on successively including more terms.

build_submodel(X, terms)

Build design on X after fitting.

fit(X[, y])

Construct parameters for orthogonal polynomials in the feature X.

fit_transform(X[, y])

Fit to data, then transform it.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

set_output(*[, transform])

Set output container.

set_params(**params)

Set the parameters of this estimator.

transform(X[, y])

Build design on X after fitting.

__init__(terms=[], intercept=True, categorical_features=None, categorical_encoders={})#
build_sequence(X, anova_type='sequential')#

Build implied sequence of submodels based on successively including more terms.

Parameters:
Xarray-like

X on which columns are evaluated.

anova_type: str

One of “sequential” or “drop”.

Returns:
modelsgenerator

Generator for sequence of models for ANOVA.

build_submodel(X, terms)#

Build design on X after fitting.

Parameters:
Xarray-like

X on which columns are evaluated.

terms[Feature]

Sequence of features

Returns:
Darray-like

Design matrix created with terms

fit(X, y=None)#

Construct parameters for orthogonal polynomials in the feature X.

Parameters:
Xarray-like

X on which model matrix will be evaluated. If a pd.DataFrame or pd.Series, variables that are of categorical dtype will be treated as categorical.

fit_transform(X, y=None, **fit_params)#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
Xarray-like of shape (n_samples, n_features)

Input samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None

Target values (None for unsupervised transformations).

**fit_paramsdict

Additional fit parameters.

Returns:
X_newndarray array of shape (n_samples, n_features_new)

Transformed array.

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

property names#
set_output(*, transform=None)#

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:
transform{“default”, “pandas”, “polars”}, default=None

Configure output of transform and fit_transform.

  • “default”: Default output format of a transformer

  • “pandas”: DataFrame output

  • “polars”: Polars output

  • None: Transform configuration is unchanged

Added in version 1.4: “polars” option was added.

Returns:
selfestimator instance

Estimator instance.

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

transform(X, y=None)#

Build design on X after fitting.

Parameters:
Xarray-like
yNone

Ignored. This parameter exists only for compatibility with sklearn.pipeline.Pipeline.

Functions#

ISLP.models.model_spec.bs(col, intercept=False, name=None, **spline_args)#

Create a B-spline Feature for a given column.

Additional args and spline_args are passed to ISLP.transforms.BSpline along with intercept.

Parameters:
colcolumn identifier or Column
interceptbool

Include an intercept column.

namestr (optional)

Defaults to one derived from col.

Returns:
varFeature
ISLP.models.model_spec.build_columns(column_info, X, var, encoders={}, col_cache={}, fit=False)#

Build columns for a Feature from X.

Parameters:
column_info: dict

Dictionary with values specifying sets of columns to be concatenated into a design matrix.

Xarray-like

X on which columns are evaluated.

varFeature

Feature whose columns will be built, typically a key in column_info.

encodersdict

Dict that stores encoder of each Feature.

col_cache: dict

Dict where columns will be stored – if var.name in col_cache then just returns those columns.

fitbool (optional)

If True, then try to fit encoder. Will raise an error if encoder has already been fit.

ISLP.models.model_spec.build_model(column_info, X, terms, intercept=True, encoders={})#

Construct design matrix on a sequence of terms and X after fitting.

Parameters:
column_info: dict

Dictionary with values specifying sets of columns to be concatenated into a design matrix.

Xarray-like

X on which columns are evaluated.

terms[Feature]

Sequence of features

encodersdict

Dict that stores encoder of each Feature.

Returns:
dfnp.ndarray or pd.DataFrame

Design matrix.

ISLP.models.model_spec.contrast(col, method='drop', drop_level=None)#

Create encoding of categorical feature.

Parameters:
col: column identifier
method: ‘drop’, ‘sum’, None or callable
drop_level: level identifier
Returns:
varFeature
ISLP.models.model_spec.derived_feature(variables, encoder=None, name=None, use_transform=True)#

Create a Feature, optionally applying an encoder to the stacked columns.

Parameters:
variables[column identifier, Column, Feature]

Variables to apply transform to. Could be column identifiers or variables: all columns will be stacked before encoding.

namestr (optional)

Defaults to str(encoder).

encodertransform-like (optional)

Transform obeying sklearn fit/transform convention.

Returns:
varFeature
ISLP.models.model_spec.fit_encoder(encoders, var, X)#

Fit an encoder if not already registered in encoders.

Parameters:
encodersdict

Dictionary of encoders for each feature.

varFeature

Feature whose encoder will be fit.

Xarray-like

X on which encoder will be fit.

ISLP.models.model_spec.ns(col, intercept=False, name=None, **spline_args)#

Create a natural spline Feature for a given column.

Additional spline_args are passed to NaturalSpline along with intercept.

Parameters:
colcolumn identifier or Column
interceptbool

Include an intercept column.

namestr (optional)

Defaults to one derived from col.

Returns:
varFeature
ISLP.models.model_spec.ordinal(col, name=None, *args, **kwargs)#

Create ordinal encoding of categorical feature.

Parameters:
col: column identifier
Returns:
varFeature
ISLP.models.model_spec.pca(variables, name, scale=False, **pca_args)#

Create PCA encoding of features from a sequence of variables.

Additional args and pca_args are passed to ISLP.transforms.PCA.

Parameters:
variables[column identifier, Column or Feature]

Sequence whose columns will be encoded by PCA.

Returns:
varFeature
ISLP.models.model_spec.poly(col, degree=1, intercept=False, raw=False, name=None)#

Create a polynomial Feature for a given column.

Additional args and kwargs are passed to Poly.

Parameters:
colcolumn identifier or Column

Column to transform.

degreeint, default=1

Degree of polynomial.

interceptbool, default=False

Include a column for intercept?

rawbool, default=False

If False, perform a QR decomposition on the resulting matrix of powers of centered and / or scaled features.

namestr (optional)

Defaults to one derived from col.

Returns:
varFeature