NCI 60 Data#

NCI microarray data. The data contains expression levels on 6830 genes from 64 cancer cell lines. Cancer type is also recorded.

The format is a list containing two elements: ‘data’ and ‘labs’.

  • data: is a 64 by 6830 matrix of the expression values while

  • labs: is a vector listing the cancer types for the 64 cell lines.

Source#

The data come from Ross et al. (Nat Genet., 2000). More information can be obtained at http://genome-www.stanford.edu/nci60.

from ISLP import load_data
NCI60 = load_data('NCI60')
NCI60.keys()
dict_keys(['data', 'labels'])
NCI60['labels'].value_counts()
label      
NSCLC          9
RENAL          9
MELANOMA       8
BREAST         7
COLON          7
LEUKEMIA       6
OVARIAN        6
CNS            5
PROSTATE       2
K562A-repro    1
K562B-repro    1
MCF7A-repro    1
MCF7D-repro    1
UNKNOWN        1
dtype: int64
NCI60['data'].shape
(64, 6830)