tutorial¶
ugtm provides an implementation of GTM (Generative Topographic Mapping), kGTM (kernel Generative Topographic Mapping), GTM classification models (kNN, Bayes) and GTM regression models. ugtm also implements cross-validation options which can be used to compare GTM classification models to SVM classification models, and GTM regression models to SVM regression models. Typical usage:
#!/usr/bin/env python
import ugtm
import numpy as np
#generate sample data and labels: replace this with your own data
data=np.random.randn(100,50)
labels=np.random.choice([1,2],size=100)
#build GTM map
gtm=ugtm.runGTM(data=data,verbose=True)
#plot GTM map (html)
gtm.plot_html(output="out")
For installation instructions, cf. https://github.com/hagax8/ugtm
1. Import package¶
Import ugtm package, allowing to construct GTM and kernel GTM (kGTM) maps, GTM classification models, GTM regression models:
import ugtm
2. Construct and plot GTM maps (or kGTM maps)¶
A gtm object can be created by running the function runGTM on a dataset. Parameters for runGTM are: k = sqrt(number of nodes), m = sqrt(number of rbf centres), s = RBF width factor, regul = regularization coefficient. The number of iteration for the expectation-maximization algorithm is set to 200 by default. This is an example with random data:
import ugtm
#import numpy to generate random data
import numpy as np
#generate random data (independent variables x),
#discrete labels (dependent variable y),
#and continuous labels (dependent variable y),
#to experiment with categorical or continuous outcomes
train = np.random.randn(20,10)
test = np.random.randn(20,10)
labels=np.random.choice(["class1","class2"],size=20)
activity=np.random.randn(20,1)
#create a gtm object and write model
gtm = ugtm.runGTM(train)
gtm.write("testout1")
#run verbose
gtm = ugtm.runGTM(train, verbose=True)
#to run a kernel GTM model instead, run following:
gtm = ugtm.runkGTM(train, doKernel=True, kernel="linear")
#access coordinates (means or modes), and responsibilities of gtm object
gtm_coordinates = gtm.matMeans
gtm_modes = gtm.matModes
gtm_responsibilities = gtm.matR
3. Plot html maps¶
Call the plot_html() function on the gtm object:
#run model on train
gtm = ugtm.runGTM(train)
# ex. plot gtm object with landscape, html: labels are continuous
gtm.plot_html(output="testout10",labels=activity,discrete=False,pointsize=20)
# ex. plot gtm object with landscape, html: labels are discrete
gtm.plot_html(output="testout11",labels=labels,discrete=True,pointsize=20)
# ex. plot gtm object with landscape, html: labels are continuous
# no interpolation between nodes
gtm.plot_html(output="testout12",labels=activity,discrete=False,pointsize=20, \
do_interpolate=False,ids=labels)
# ex. plot gtm object with landscape, html: labels are discrete,
# no interpolation between nodes
gtm.plot_html(output="testout13",labels=labels,discrete=True,pointsize=20, \
do_interpolate=False)
4. Plot pdf maps¶
Call the plot() function on the gtm object:
#run model on train
gtm = ugtm.runGTM(train)
# ex. plot gtm object, pdf: no labels
gtm.plot(output="testout6",pointsize=20)
# ex. plot gtm object with landscape, pdf: labels are discrete
gtm.plot(output="testout7",labels=labels,discrete=True,pointsize=20)
# ex. plot gtm object with landscape, pdf: labels are continuous
gtm.plot(output="testout8",labels=activity,discrete=False,pointsize=20)
5. Plot multipanel views (only if labels or activities are provided)¶
Call the plot_multipanel() function on the gtm object. This plots a general model view, showing means, modes, landscape with or without points. The plot_multipanel function only works if you have defined labels:
#run model on train
gtm = ugtm.runGTM(train)
# ex. with discrete labels and inter-node interpolation
gtm.plot_multipanel(output="testout2",labels=labels,discrete=True,pointsize=20)
# ex. with continuous labels and inter-node interpolation
gtm.plot_multipanel(output="testout3",labels=activity,discrete=False,pointsize=20)
# ex. with discrete labels and no inter-node interpolation
gtm.plot_multipanel(output="testout4",labels=labels,discrete=True,pointsize=20, \
do_interpolate=False)
# ex. with continuous labels and no inter-node interpolation
gtm.plot_multipanel(output="testout5",labels=activity,discrete=False,pointsize=20, \
do_interpolate=False)
6. Project new data onto existing GTM map¶
New data can be projected on the GTM map by using the transform() function, which takes as input the gtm model, a training and test set. The train set is then only used to perform data preprocessing on the test set based on the train (for example: apply the same PCA transformation to the train and test sets before running the algorithm):
#run model on train
gtm = ugtm.runGTM(train,doPCA=True)
#test new data (test)
transformed=ugtm.transform(optimizedModel=gtm,train=train,test=test,doPCA=True)
#plot transformed test (html)
transformed.plot_html(output="testout14",pointsize=20)
#plot transformed test (pdf)
transformed.plot(output="testout15",pointsize=20)
#plot transformed data on existing classification model,
#using training set labels
gtm.plot_html_projection(output="testout16",projections=transformed,\
labels=labels, \
discrete=True,pointsize=20)
7. Output predictions for a test set: GTM regression (GTR) and classification (GTC)¶
The GTR() function implements the GTM regression model (cf. references) and GTC() function a GTM classification model (cf. references):
#continuous labels (prediction by GTM regression model)
predicted=ugtm.GTR(train=train,test=test,labels=activity)
#discrete labels (prediction by GTM classification model)
predicted=ugtm.GTC(train=train,test=test,labels=labels)
8. Advanced GTM predictions with per-class probabilities¶
Per-class probabilities for a test set can be given by the advancedGTC() function (you can set the m, k, regul, s parameters just as with runGTM):
#get whole output model and label predictions for test set
predicted_model=ugtm.advancedGTC(train=train,test=test,labels=labels)
#write whole predicted model with per-class probabilities
ugtm.printClassPredictions(predicted_model,"testout17")
9. Crossvalidation experiments¶
Different crossvalidation experiments were implemented to compare GTC and GTR models to classical machine learning methods:
#crossvalidation experiment: GTM classification model implemented in ugtm,
#here: set hyperparameters s=1 and regul=1 (set to -1 to optimize)
ugtm.crossvalidateGTC(data=train,labels=labels,s=1,regul=1,n_repetitions=10,n_folds=5)
#crossvalidation experiment: GTM regression model
ugtm.crossvalidateGTR(data=train,labels=activity,s=1,regul=1)
#you can also run the following functions to compare
#with other classification/regression algorithms:
#crossvalidation experiment, k-nearest neighbours classification
#on 2D PCA map with 7 neighbors (set to -1 to optimize number of neighbours)
ugtm.crossvalidatePCAC(data=train,labels=labels,n_neighbors=7)
#crossvalidation experiment, SVC rbf classification model (sklearn implementation):
ugtm.crossvalidateSVCrbf(data=train,labels=labels,C=1,gamma=1)
#crossvalidation experiment, linear SVC classification model (sklearn implementation):
ugtm.crossvalidateSVC(data=train,labels=labels,C=1)
#crossvalidation experiment, linear SVC regression model (sklearn implementation):
ugtm.crossvalidateSVR(data=train,labels=activity,C=1,epsilon=1)
#crossvalidation experiment, k-nearest neighbours regression on 2D PCA map with 7 neighbors:
ugtm.crossvalidatePCAR(data=train,labels=activity,n_neighbors=7)