Model selection strategies#

This module defines search strategies to be used in generic stepwise model selection.



class ISLP.models.strategy.MinMaxCandidates(model_spec, min_terms=0, max_terms=0, lower_terms=None, upper_terms=None, validator=None)#

Bases: object



Produce candidates for fitting.

check_finished(results, path, best, ...)

Check if we should continue or not.

__init__(model_spec, min_terms=0, max_terms=0, lower_terms=None, upper_terms=None, validator=None)#
model_spec: ModelSpec

ModelSpec describing the terms in the model.

min_terms: int (default: 0)

Minumum number of terms to select

max_terms: int (default: 0)

Maximum number of terms to select

lower_terms: [Feature]

Subset of terms to keep: smallest model.

upper_terms: [Feature]

Largest possible model.

validator: callable

Callable taking a single argument: state, returning whether this is a valid state.


Produce candidates for fitting.

state: ignored
candidates: iterator

A generator of (indices, label) where indices are columns of X and label is a name for the given model. The iterator cycles through all combinations of columns of nfeature total of size ranging between min_terms and max_terms. If appropriate, restricts combinations to include a set of fixed terms. Models are labeled with a tuple of the feature names. The names of the columns default to strings of integers from range(nterms).

check_finished(results, path, best, batch_results)#

Check if we should continue or not. For exhaustive search we stop because all models are fit in a single batch.


class ISLP.models.strategy.Stepwise(model_spec, direction='forward', min_terms=1, max_terms=1, lower_terms=None, upper_terms=None, validator=None)#

Bases: MinMaxCandidates

model_spec: ModelSpec

ModelSpec describing the terms in the model.

direction: str

One of [‘forward’, ‘backward’, ‘both’]

min_terms: int (default: 1)

Minumum number of terms to select

max_terms: int (default: 1)

Maximum number of terms to select

lower_terms: [Feature]

Subset of terms to keep: smallest model.

upper_terms: [Feature]

Largest possible model.

constraints: {array-like} (optional), shape [n_terms, n_terms]

Boolean matrix decribing a dag with [i,j] nonzero implying that j is a child of i (i.e. there is an edge i->j). All search candidates are checked for validity: i.e. the parent of each term in a candidate must be included in the set of terms.



Produce candidates for fitting.

check_finished(results, path, best, ...)

Check if we should continue or not.

first_peak(model_spec[, direction, ...])

fixed_steps(model_spec, n_steps[, ...])

Strategy that stops first time a given model size is reached.

__init__(model_spec, direction='forward', min_terms=1, max_terms=1, lower_terms=None, upper_terms=None, validator=None)#
model_spec: ModelSpec

ModelSpec describing the terms in the model.

min_terms: int (default: 0)

Minumum number of terms to select

max_terms: int (default: 0)

Maximum number of terms to select

lower_terms: [Feature]

Subset of terms to keep: smallest model.

upper_terms: [Feature]

Largest possible model.

validator: callable

Callable taking a single argument: state, returning whether this is a valid state.


Produce candidates for fitting. For stepwise search this depends on the direction.

If ‘forward’, all columns not in the current state are added (maintaining an upper limit on the number of columns at self.max_terms).

If ‘backward’, all columns not in the current state are dropped (maintaining a lower limit on the number of columns at self.min_terms).

All candidates include self.lower_terms if any.

state: ignored
candidates: iterator

static first_peak(model_spec, direction='forward', min_terms=1, max_terms=1, random_state=0, lower_terms=[], upper_terms=[], initial_terms=[], validator=None, parsimonious=False)#
X: {array-like, sparse matrix}, shape = [n_samples, n_features]

Training vectors, where n_samples is the number of samples and n_features is the number of features. New in v 0.13.0: pandas DataFrames are now also accepted as argument for X.

direction: str

One of [‘forward’, ‘backward’, ‘both’]

min_terms: int (default: 1)

Minumum number of terms to select

max_terms: int (default: 1)

Maximum number of terms to select

lower_terms: [Feature]

Subset of terms to keep: smallest model.

upper_terms: [Feature]

Largest possible model.

initial_terms: column identifiers, default=[]

Subset of terms to be used to initialize when direction is both. If None defaults to behavior of forward. where self.columns will correspond to columns if X is a pd.DataFrame or an array of integers if X is an np.ndarray

validator: callable

Callable taking a single argument: state, returning whether this is a valid state.

parsimonious: bool

If True, use the 1sd rule: among the shortest models within one standard deviation of the best score pick the one with the best average score.

initial_state: tuple

(column_names, feature_idx)

state_generator: callable

Object that proposes candidates based on current state. Takes a single argument state

build_submodel: callable

Candidate generator that enumerate all valid subsets of columns.

check_finished: callable

Check whether to stop. Takes two arguments: best_result a dict with keys of scores and state.

static fixed_steps(model_spec, n_steps, direction='forward', lower_terms=[], upper_terms=[], initial_terms=[], validator=None)#

Strategy that stops first time a given model size is reached.

model_spec: ModelSpec

ModelSpec describing the terms in the model.

n_steps: int

How many steps to take in the search?

direction: str

One of [‘forward’, ‘backward’, ‘both’]

min_terms: int (default: 0)

Minumum number of terms to select

max_terms: int (default: None)

Maximum number of terms to select. If None defaults to number of terms in model_spec.

lower_terms: [Feature]

Subset of terms to keep: smallest model.

upper_terms: [Feature]

Largest possible model.

initial_terms: column identifiers, default=[]

Subset of terms to be used to initialize.



class ISLP.models.strategy.Strategy(initial_state: Any, candidate_states: Callable, build_submodel: Callable, check_finished: Callable, postprocess: Callable)#

Bases: NamedTuple

initial_state: object

Initial state of feature selector.

candidate_states: callable

Callable taking single argument state and returning candidates for next batch of scores to be calculated.

build_submodel: callable

Callable taking two arguments (X, state) that returns model matrix represented by state.

check_finished: callable

Callable taking three arguments (results, best_state, batch_results) which determines if the state generator should step. Often will just check if there is a better score than that at current best state but can use entire set of results if desired.

postprocess: callable

Callable to postprocess the results after selection procedure terminates.


count(value, /)

Return number of occurrences of value.

index(value[, start, stop])

Return first index of value.

__init__(*args, **kwargs)#
build_submodel: Callable#

Alias for field number 2

candidate_states: Callable#

Alias for field number 1

check_finished: Callable#

Alias for field number 3

count(value, /)#

Return number of occurrences of value.

index(value, start=0, stop=sys.maxsize, /)#

Return first index of value.

Raises ValueError if the value is not present.

initial_state: Any#

Alias for field number 0

postprocess: Callable#

Alias for field number 4


ISLP.models.strategy.first_peak(results, path, best, batch_results)#

Check if we should continue or not.

For first_peak search we stop if we cannot improve over our current best score.

ISLP.models.strategy.fixed_steps(n_steps, results, path, best, batch_results)#

Check if we should continue or not.

For first_peak search we stop if we cannot improve over our current best score.

ISLP.models.strategy.min_max(model_spec, min_terms=1, max_terms=1, lower_terms=None, upper_terms=None, validator=None, parsimonious=False)#
model_spec: ModelSpec

ModelSpec describing the terms in the model.

min_terms: int (default: 1)

Minumum number of terms to select

max_terms: int (default: 1)

Maximum number of terms to select

lower_terms: [Feature]

Subset of terms to keep: smallest model.

upper_terms: [Feature]

Largest possible model.

validator: callable

Callable taking a single argument: state, returning whether this is a valid state.

parsimonious: bool

If True, use the 1sd rule: among the shortest models within one standard deviation of the best score pick the one with the best average score.

initial_state: tuple

(column_names, feature_idx)

state_generator: callable

Object that proposes candidates based on current state. Takes a single argument state

build_submodel: callable

Candidate generator that enumerate all valid subsets of columns.

check_finished: callable

Check whether to stop. Takes two arguments: best_result a dict with keys of scores. and state.

ISLP.models.strategy.validator_from_constraints(model_spec, constraints)#