models.strategy#
Module: models.strategy
#
Inheritance diagram for ISLP.models.strategy
:
Model selection strategies#
This module defines search strategies to be used in generic stepwise model selection.
Classes#
MinMaxCandidates
#
- class ISLP.models.strategy.MinMaxCandidates(model_spec, min_terms=0, max_terms=0, lower_terms=None, upper_terms=None, validator=None)#
Bases:
object
Methods
candidate_states
(state)Produce candidates for fitting.
check_finished
(results, path, best, ...)Check if we should continue or not.
- __init__(model_spec, min_terms=0, max_terms=0, lower_terms=None, upper_terms=None, validator=None)#
- Parameters:
- model_spec: ModelSpec
ModelSpec describing the terms in the model.
- min_terms: int (default: 0)
Minumum number of terms to select
- max_terms: int (default: 0)
Maximum number of terms to select
- lower_terms: [Feature]
Subset of terms to keep: smallest model.
- upper_terms: [Feature]
Largest possible model.
- validator: callable
Callable taking a single argument: state, returning whether this is a valid state.
- candidate_states(state)#
Produce candidates for fitting.
- Parameters:
- state: ignored
- Returns:
- candidates: iterator
A generator of (indices, label) where indices are columns of X and label is a name for the given model. The iterator cycles through all combinations of columns of nfeature total of size ranging between min_terms and max_terms. If appropriate, restricts combinations to include a set of fixed terms. Models are labeled with a tuple of the feature names. The names of the columns default to strings of integers from range(nterms).
- check_finished(results, path, best, batch_results)#
Check if we should continue or not. For exhaustive search we stop because all models are fit in a single batch.
Stepwise
#
- class ISLP.models.strategy.Stepwise(model_spec, direction='forward', min_terms=1, max_terms=1, lower_terms=None, upper_terms=None, validator=None)#
Bases:
MinMaxCandidates
- Parameters:
- model_spec: ModelSpec
ModelSpec describing the terms in the model.
- direction: str
One of [‘forward’, ‘backward’, ‘both’]
- min_terms: int (default: 1)
Minumum number of terms to select
- max_terms: int (default: 1)
Maximum number of terms to select
- lower_terms: [Feature]
Subset of terms to keep: smallest model.
- upper_terms: [Feature]
Largest possible model.
- constraints: {array-like} (optional), shape [n_terms, n_terms]
Boolean matrix decribing a dag with [i,j] nonzero implying that j is a child of i (i.e. there is an edge i->j). All search candidates are checked for validity: i.e. the parent of each term in a candidate must be included in the set of terms.
Methods
candidate_states
(state)Produce candidates for fitting.
check_finished
(results, path, best, ...)Check if we should continue or not.
first_peak
(model_spec[, direction, ...])fixed_steps
(model_spec, n_steps[, ...])Strategy that stops first time a given model size is reached.
- __init__(model_spec, direction='forward', min_terms=1, max_terms=1, lower_terms=None, upper_terms=None, validator=None)#
- Parameters:
- model_spec: ModelSpec
ModelSpec describing the terms in the model.
- min_terms: int (default: 0)
Minumum number of terms to select
- max_terms: int (default: 0)
Maximum number of terms to select
- lower_terms: [Feature]
Subset of terms to keep: smallest model.
- upper_terms: [Feature]
Largest possible model.
- validator: callable
Callable taking a single argument: state, returning whether this is a valid state.
- candidate_states(state)#
Produce candidates for fitting. For stepwise search this depends on the direction.
If ‘forward’, all columns not in the current state are added (maintaining an upper limit on the number of columns at self.max_terms).
If ‘backward’, all columns not in the current state are dropped (maintaining a lower limit on the number of columns at self.min_terms).
All candidates include self.lower_terms if any.
- Parameters:
- state: ignored
- Returns:
- candidates: iterator
A generator of (indices, label) where indices are columns of X and label is a name for the given model. The iterator cycles through all combinations of columns of nfeature total of size ranging between min_terms and max_terms. If appropriate, restricts combinations to include a set of fixed terms. Models are labeled with a tuple of the feature names. The names of the columns default to strings of integers from range(nterms).
- check_finished(results, path, best, batch_results)#
Check if we should continue or not. For exhaustive search we stop because all models are fit in a single batch.
- static first_peak(model_spec, direction='forward', min_terms=1, max_terms=1, random_state=0, lower_terms=[], upper_terms=[], initial_terms=[], validator=None, parsimonious=False)#
- Parameters:
- X: {array-like, sparse matrix}, shape = [n_samples, n_features]
Training vectors, where n_samples is the number of samples and n_features is the number of features. New in v 0.13.0: pandas DataFrames are now also accepted as argument for X.
- direction: str
One of [‘forward’, ‘backward’, ‘both’]
- min_terms: int (default: 1)
Minumum number of terms to select
- max_terms: int (default: 1)
Maximum number of terms to select
- lower_terms: [Feature]
Subset of terms to keep: smallest model.
- upper_terms: [Feature]
Largest possible model.
- initial_terms: column identifiers, default=[]
Subset of terms to be used to initialize when direction is both. If None defaults to behavior of forward. where self.columns will correspond to columns if X is a pd.DataFrame or an array of integers if X is an np.ndarray
- validator: callable
Callable taking a single argument: state, returning whether this is a valid state.
- parsimonious: bool
If True, use the 1sd rule: among the shortest models within one standard deviation of the best score pick the one with the best average score.
- Returns:
- initial_state: tuple
(column_names, feature_idx)
- state_generator: callable
Object that proposes candidates based on current state. Takes a single argument state
- build_submodel: callable
Candidate generator that enumerate all valid subsets of columns.
- check_finished: callable
Check whether to stop. Takes two arguments: best_result a dict with keys of scores and state.
- static fixed_steps(model_spec, n_steps, direction='forward', lower_terms=[], upper_terms=[], initial_terms=[], validator=None)#
Strategy that stops first time a given model size is reached.
- Parameters:
- model_spec: ModelSpec
ModelSpec describing the terms in the model.
- n_steps: int
How many steps to take in the search?
- direction: str
One of [‘forward’, ‘backward’, ‘both’]
- min_terms: int (default: 0)
Minumum number of terms to select
- max_terms: int (default: None)
Maximum number of terms to select. If None defaults to number of terms in model_spec.
- lower_terms: [Feature]
Subset of terms to keep: smallest model.
- upper_terms: [Feature]
Largest possible model.
- initial_terms: column identifiers, default=[]
Subset of terms to be used to initialize.
- Returns:
- strategyNamedTuple
Strategy
#
- class ISLP.models.strategy.Strategy(initial_state: Any, candidate_states: Callable, build_submodel: Callable, check_finished: Callable, postprocess: Callable)#
Bases:
NamedTuple
- initial_state: object
Initial state of feature selector.
- candidate_states: callable
Callable taking single argument state and returning candidates for next batch of scores to be calculated.
- build_submodel: callable
Callable taking two arguments (X, state) that returns model matrix represented by state.
- check_finished: callable
Callable taking three arguments (results, best_state, batch_results) which determines if the state generator should step. Often will just check if there is a better score than that at current best state but can use entire set of results if desired.
- postprocess: callable
Callable to postprocess the results after selection procedure terminates.
Methods
count
(value, /)Return number of occurrences of value.
index
(value[, start, stop])Return first index of value.
- __init__(*args, **kwargs)#
- count(value, /)#
Return number of occurrences of value.
- index(value, start=0, stop=sys.maxsize, /)#
Return first index of value.
Raises ValueError if the value is not present.
Functions#
- ISLP.models.strategy.first_peak(results, path, best, batch_results)#
Check if we should continue or not.
For first_peak search we stop if we cannot improve over our current best score.
- ISLP.models.strategy.fixed_steps(n_steps, results, path, best, batch_results)#
Check if we should continue or not.
For first_peak search we stop if we cannot improve over our current best score.
- ISLP.models.strategy.min_max(model_spec, min_terms=1, max_terms=1, lower_terms=None, upper_terms=None, validator=None, parsimonious=False)#
- Parameters:
- model_spec: ModelSpec
ModelSpec describing the terms in the model.
- min_terms: int (default: 1)
Minumum number of terms to select
- max_terms: int (default: 1)
Maximum number of terms to select
- lower_terms: [Feature]
Subset of terms to keep: smallest model.
- upper_terms: [Feature]
Largest possible model.
- validator: callable
Callable taking a single argument: state, returning whether this is a valid state.
- parsimonious: bool
If True, use the 1sd rule: among the shortest models within one standard deviation of the best score pick the one with the best average score.
- Returns:
- initial_state: tuple
(column_names, feature_idx)
- state_generator: callable
Object that proposes candidates based on current state. Takes a single argument state
- build_submodel: callable
Candidate generator that enumerate all valid subsets of columns.
- check_finished: callable
Check whether to stop. Takes two arguments: best_result a dict with keys of scores. and state.
- ISLP.models.strategy.validator_from_constraints(model_spec, constraints)#