Spline features#
The modelling tools included in ISLP
allow for
construction of spline functions of features.
Force rebuild
import numpy as np
from ISLP import load_data
from ISLP.models import ModelSpec, ns, bs
Carseats = load_data('Carseats')
Carseats.columns
Index(['Sales', 'CompPrice', 'Income', 'Advertising', 'Population', 'Price',
'ShelveLoc', 'Age', 'Education', 'Urban', 'US'],
dtype='object')
Let’s make a term representing a cubic spline for Population
. We’ll use knots based on the
deciles.
knots = np.percentile(Carseats['Population'], np.linspace(10, 90, 9))
knots
array([ 58.9, 110.4, 160. , 218.6, 272. , 317.8, 366. , 412.2, 467. ])
bs_pop = bs('Population', internal_knots=knots, degree=3)
The object bs_pop
does not refer to any data yet, it must be included in a ModelSpec
object
and fit using the fit
method.
design = ModelSpec([bs_pop], intercept=False)
py_features = np.asarray(design.fit_transform(Carseats))
Compare to R
#
We can compare our polynomials to a similar function in R
%load_ext rpy2.ipython
We’ll recompute these features using bs
in R
. The default knot selection of the
ISLP
and R
version are slightly different so we just fix the set of internal knots.
%%R -i Carseats,knots -o R_features
library(splines)
R_features = bs(Carseats$Population, knots=knots, degree=3)
In addition: Warning message:
In (function (package, help, pos = 2, lib.loc = NULL, character.only = FALSE, :
libraries ‘/usr/local/lib/R/site-library’, ‘/usr/lib/R/site-library’ contain no packages
np.linalg.norm(py_features - R_features)
0.0
Underlying model#
As for poly
, the computation of the B-splines is done by a special sklearn
transformer.
bs_pop
Feature(variables=('Population',), name='bs(Population, internal_knots=[ 58.9 110.4 160. 218.6 272. 317.8 366. 412.2 467. ], degree=3)', encoder=BSpline(internal_knots=array([ 58.9, 110.4, 160. , 218.6, 272. , 317.8, 366. , 412.2, 467. ]),
lower_bound=10.0, upper_bound=509.0), use_transform=True, pure_columns=False, override_encoder_colnames=True)
Natural splines#
Natural cubic splines are also implemented.
ns_pop = ns('Population', internal_knots=knots)
design = ModelSpec([ns_pop], intercept=False)
py_features = np.asarray(design.fit_transform(Carseats))
%%R -o R_features
library(splines)
R_features = ns(Carseats$Population, knots=knots)
np.linalg.norm(py_features - R_features)
5.839849019410244e-16
Intercept#
Looking at py_features
we see it contains columns: [Population**i for i in range(1, 4)]
. That is,
it doesn’t contain an intercept, the order 0 term. This can be include with intercept=True
. This means that the
column space includes an intercept, though there is no specific column labeled as intercept.
bs_int = ns('Population', internal_knots=knots, intercept=True)
design = ModelSpec([bs_int], intercept=False)
py_int_features = np.asarray(design.fit_transform(Carseats))
py_int_features.shape, py_features.shape
((400, 11), (400, 10))