tutorial
========

ugtm provides an implementation of GTM (Generative Topographic Mapping), kGTM (kernel Generative Topographic Mapping), GTM classification models (kNN, Bayes) and GTM regression models. ugtm also implements cross-validation options which can be used to compare GTM classification models to SVM classification models, and GTM regression models to SVM regression models. Typical usage::

    #!/usr/bin/env python

    import ugtm
    import numpy as np

    #generate sample data and labels: replace this with your own data
    data = np.random.randn(100, 50)
    labels = np.random.choice([1, 2], size=100)

    #build GTM map
    gtm = ugtm.runGTM(data=data, verbose=True)

    #access coordinates (means or modes)
    coordinates = gtm.matMeans
    modes = gtm.matModes


1. Import package
-----------------

Import the ugtm package, which provides GTM and kernel GTM (kGTM) maps, GTM classification models, and GTM regression models::

    import ugtm


2. Construct GTM maps (or kGTM maps)
-------------------------------------

A gtm object can be created by running the function :func:`~ugtm.ugtm_gtm.runGTM` on a dataset. Parameters for runGTM are: k = sqrt(number of nodes), m = sqrt(number of rbf centres), s = RBF width factor, regul = regularization coefficient. The number of iterations for the expectation-maximization algorithm is set to 200 by default::

    import ugtm
    import numpy as np

    train = np.random.randn(20, 10)
    test = np.random.randn(20, 10)
    labels = np.random.choice(["class1", "class2"], size=20)
    activity = np.random.randn(20, 1)

    #create a gtm object and write model
    gtm = ugtm.runGTM(train)
    gtm.write("testout1")

    #run verbose
    gtm = ugtm.runGTM(train, verbose=True)

    #to run a kernel GTM model instead:
    gtm = ugtm.runkGTM(train, doKernel=True, kernel="linear")

    #access coordinates (means or modes) and responsibilities
    gtm_coordinates = gtm.matMeans
    gtm_modes = gtm.matModes
    gtm_responsibilities = gtm.matR


3. Visualization
----------------

ugtm outputs are plain NumPy arrays (``matMeans``, ``matModes``, ``matR``), so any plotting library works. Quick example with matplotlib::

    import matplotlib.pyplot as plt

    gtm = ugtm.runGTM(data=train)
    coords = gtm.matMeans

    plt.scatter(coords[:, 0], coords[:, 1], c=labels, cmap="Spectral_r")
    plt.colorbar()
    plt.show()

For richer interactive examples using Altair, see :doc:`visualization_examples`.


4. Incremental GTM for large datasets
--------------------------------------

For datasets too large to hold the full N×K responsibility matrix in RAM,
use :func:`~ugtm.ugtm_igtm.runIGTM` or :class:`~ugtm.ugtm_sklearn.eIGTM`.
The ``n_blocks`` parameter controls how many chunks the data is split into
(0 = auto, set to ``ceil(N / 5000)``)::

    #run incremental GTM (same interface as runGTM)
    igtm = ugtm.runIGTM(data=train, n_blocks=5, verbose=True)

    #access coordinates and responsibilities
    igtm_coordinates = igtm.matMeans
    igtm_modes       = igtm.matModes

    #sklearn-compatible transformer
    from ugtm import eIGTM
    transformed = eIGTM(n_blocks=5).fit_transform(train)

    #block-wise projection for large test sets (generator)
    model = eIGTM().fit(train)
    for block in model.transform_blocks(test, block_size=1000):
        pass  # process block here

See :doc:`eIGTM_transformer` for full details.


5. Project new data onto an existing GTM map
--------------------------------------------

New data can be projected onto an existing GTM map using the :func:`~ugtm.ugtm_gtm.transform` function. The train set is used to apply consistent preprocessing (e.g. PCA) to the test set::

    #run model on train
    gtm = ugtm.runGTM(train, doPCA=True)

    #project test data
    transformed = ugtm.transform(optimizedModel=gtm, train=train, test=test, doPCA=True)

    #access projected coordinates
    test_coordinates = transformed.matMeans
    test_modes = transformed.matModes


6. Output predictions for a test set: GTM regression (GTR) and classification (GTC)
-------------------------------------------------------------------------------------

The :func:`~ugtm.ugtm_predictions.GTR` function implements the GTM regression model and :func:`~ugtm.ugtm_predictions.GTC` function a GTM classification model::

    #continuous labels (prediction by GTM regression model)
    predicted = ugtm.GTR(train=train, test=test, labels=activity)

    #discrete labels (prediction by GTM classification model)
    predicted = ugtm.GTC(train=train, test=test, labels=labels)


7. Advanced GTM predictions with per-class probabilities
---------------------------------------------------------

Per-class probabilities for a test set can be given by the :func:`~ugtm.ugtm_predictions.advancedGTC` function::

    #get whole output model and label predictions for test set
    predicted_model = ugtm.advancedGTC(train=train, test=test, labels=labels)

    #write whole predicted model with per-class probabilities
    ugtm.printClassPredictions(predicted_model, "testout17")


8. Crossvalidation experiments
-------------------------------

Different crossvalidation experiments were implemented to compare GTC and GTR models to classical machine learning methods::

    #crossvalidation experiment: GTM classification model
    #here: set hyperparameters s=1 and regul=1 (set to -1 to optimize)
    ugtm.crossvalidateGTC(data=train, labels=labels, s=1, regul=1, n_repetitions=10, n_folds=5)

    #crossvalidation experiment: GTM regression model
    ugtm.crossvalidateGTR(data=train, labels=activity, s=1, regul=1)

    #crossvalidation experiment, k-nearest neighbours classification
    #on 2D PCA map with 7 neighbors (set to -1 to optimize)
    ugtm.crossvalidatePCAC(data=train, labels=labels, n_neighbors=7)

    #crossvalidation experiment, SVC rbf classification model (sklearn):
    ugtm.crossvalidateSVCrbf(data=train, labels=labels, C=1, gamma=1)

    #crossvalidation experiment, linear SVC classification model (sklearn):
    ugtm.crossvalidateSVC(data=train, labels=labels, C=1)

    #crossvalidation experiment, linear SVC regression model (sklearn):
    ugtm.crossvalidateSVR(data=train, labels=activity, C=1, epsilon=1)

    #crossvalidation experiment, k-nearest neighbours regression on 2D PCA map:
    ugtm.crossvalidatePCAR(data=train, labels=activity, n_neighbors=7)