tutorial
ugtm provides an implementation of GTM (Generative Topographic Mapping), kGTM (kernel Generative Topographic Mapping), GTM classification models (kNN, Bayes) and GTM regression models. ugtm also implements cross-validation options which can be used to compare GTM classification models to SVM classification models, and GTM regression models to SVM regression models. Typical usage:
#!/usr/bin/env python
import ugtm
import numpy as np
#generate sample data and labels: replace this with your own data
data = np.random.randn(100, 50)
labels = np.random.choice([1, 2], size=100)
#build GTM map
gtm = ugtm.runGTM(data=data, verbose=True)
#access coordinates (means or modes)
coordinates = gtm.matMeans
modes = gtm.matModes
1. Import package
Import the ugtm package, which provides GTM and kernel GTM (kGTM) maps, GTM classification models, and GTM regression models:
import ugtm
2. Construct GTM maps (or kGTM maps)
A gtm object can be created by running the function runGTM() on a dataset. Parameters for runGTM are: k = sqrt(number of nodes), m = sqrt(number of rbf centres), s = RBF width factor, regul = regularization coefficient. The number of iterations for the expectation-maximization algorithm is set to 200 by default:
import ugtm
import numpy as np
train = np.random.randn(20, 10)
test = np.random.randn(20, 10)
labels = np.random.choice(["class1", "class2"], size=20)
activity = np.random.randn(20, 1)
#create a gtm object and write model
gtm = ugtm.runGTM(train)
gtm.write("testout1")
#run verbose
gtm = ugtm.runGTM(train, verbose=True)
#to run a kernel GTM model instead:
gtm = ugtm.runkGTM(train, doKernel=True, kernel="linear")
#access coordinates (means or modes) and responsibilities
gtm_coordinates = gtm.matMeans
gtm_modes = gtm.matModes
gtm_responsibilities = gtm.matR
3. Visualization
ugtm outputs are plain NumPy arrays (matMeans, matModes, matR), so any plotting library works. Quick example with matplotlib:
import matplotlib.pyplot as plt
gtm = ugtm.runGTM(data=train)
coords = gtm.matMeans
plt.scatter(coords[:, 0], coords[:, 1], c=labels, cmap="Spectral_r")
plt.colorbar()
plt.show()
For richer interactive examples using Altair, see Visualization examples.
4. Incremental GTM for large datasets
For datasets too large to hold the full N×K responsibility matrix in RAM,
use runIGTM() or eIGTM.
The n_blocks parameter controls how many chunks the data is split into
(0 = auto, set to ceil(N / 5000)):
#run incremental GTM (same interface as runGTM)
igtm = ugtm.runIGTM(data=train, n_blocks=5, verbose=True)
#access coordinates and responsibilities
igtm_coordinates = igtm.matMeans
igtm_modes = igtm.matModes
#sklearn-compatible transformer
from ugtm import eIGTM
transformed = eIGTM(n_blocks=5).fit_transform(train)
#block-wise projection for large test sets (generator)
model = eIGTM().fit(train)
for block in model.transform_blocks(test, block_size=1000):
pass # process block here
See eIGTM: incremental GTM for full details.
5. Project new data onto an existing GTM map
New data can be projected onto an existing GTM map using the transform() function. The train set is used to apply consistent preprocessing (e.g. PCA) to the test set:
#run model on train
gtm = ugtm.runGTM(train, doPCA=True)
#project test data
transformed = ugtm.transform(optimizedModel=gtm, train=train, test=test, doPCA=True)
#access projected coordinates
test_coordinates = transformed.matMeans
test_modes = transformed.matModes
6. Output predictions for a test set: GTM regression (GTR) and classification (GTC)
The GTR() function implements the GTM regression model and GTC() function a GTM classification model:
#continuous labels (prediction by GTM regression model)
predicted = ugtm.GTR(train=train, test=test, labels=activity)
#discrete labels (prediction by GTM classification model)
predicted = ugtm.GTC(train=train, test=test, labels=labels)
7. Advanced GTM predictions with per-class probabilities
Per-class probabilities for a test set can be given by the advancedGTC() function:
#get whole output model and label predictions for test set
predicted_model = ugtm.advancedGTC(train=train, test=test, labels=labels)
#write whole predicted model with per-class probabilities
ugtm.printClassPredictions(predicted_model, "testout17")
8. Crossvalidation experiments
Different crossvalidation experiments were implemented to compare GTC and GTR models to classical machine learning methods:
#crossvalidation experiment: GTM classification model
#here: set hyperparameters s=1 and regul=1 (set to -1 to optimize)
ugtm.crossvalidateGTC(data=train, labels=labels, s=1, regul=1, n_repetitions=10, n_folds=5)
#crossvalidation experiment: GTM regression model
ugtm.crossvalidateGTR(data=train, labels=activity, s=1, regul=1)
#crossvalidation experiment, k-nearest neighbours classification
#on 2D PCA map with 7 neighbors (set to -1 to optimize)
ugtm.crossvalidatePCAC(data=train, labels=labels, n_neighbors=7)
#crossvalidation experiment, SVC rbf classification model (sklearn):
ugtm.crossvalidateSVCrbf(data=train, labels=labels, C=1, gamma=1)
#crossvalidation experiment, linear SVC classification model (sklearn):
ugtm.crossvalidateSVC(data=train, labels=labels, C=1)
#crossvalidation experiment, linear SVC regression model (sklearn):
ugtm.crossvalidateSVR(data=train, labels=activity, C=1, epsilon=1)
#crossvalidation experiment, k-nearest neighbours regression on 2D PCA map:
ugtm.crossvalidatePCAR(data=train, labels=activity, n_neighbors=7)