==================== eGTC: GTM classifier ==================== Run eGTC --------- :class:`~ugtm.ugtm_sklearn.eGTC` is a sklearn-compatible GTM classifier. Similarly to PCA or t-SNE, GTM reduces the dimensionality from n_dimensions to 2 dimensions. GTC uses a GTM class map to predict labels for new data (cf. :func:`~gtm.ugtm_landscape.classMap`). Two algorithms are available: the bayesian classifier GTC (:class:`~gtm.ugtm_sklearn.uGTC`) or the nearest node classifier (:class:`~gtm.ugtm_sklearn.uGTCnn`). The following example uses the iris dataset:: from ugtm import eGTC from sklearn import datasets from sklearn import preprocessing from sklearn import decomposition from sklearn import metrics from sklearn import model_selection iris = datasets.load_iris() X = iris.data y = iris.target X_train, X_test, y_train, y_test = model_selection.train_test_split( X, y, test_size=0.33, random_state=42) # optional preprocessing scaler = preprocessing.StandardScaler().fit(X_train) X_train = scaler.transform(X_train) X_test = scaler.transform(X_test) # Predict labels for X_test gtc = eGTC() gtc = gtc.fit(X_train,y_train) y_pred = gtc.predict(X_test) # Print score print(metrics.matthews_corrcoef(y_test,y_pred)) Visualize class map ------------------- The GTC algorithm is based on a classification map, discretized into a grid of nodes, which are colored by predicted label. To each node is associated class probabilities: .. altair-plot:: from ugtm import eGTM, eGTC import numpy as np import altair as alt import pandas as pd from sklearn import datasets from sklearn import preprocessing from sklearn import decomposition from sklearn import metrics from sklearn import model_selection iris = datasets.load_iris() X = iris.data y = iris.target X_train, X_test, y_train, y_test = model_selection.train_test_split( X, y, test_size=0.33, random_state=42) # optional preprocessing std = preprocessing.StandardScaler() X_train = std.fit(X_train).transform(X_train) # Construct class map gtc = eGTC() gtc = gtc.fit(X_train,y_train) dfclassmap = pd.DataFrame(gtc.optimizedModel.matX, columns=["x1", "x2"]) dfclassmap["predicted_node_label"] = iris.target_names[gtc.node_label] dfclassmap["probability_of_predominant_class"] = np.max(gtc.node_probabilities,axis=1) # Classification map alt.Chart(dfclassmap).mark_square().encode( x='x1', y='x2', color='predicted_node_label:N', size=alt.value(50), opacity='probability_of_predominant_class:Q', tooltip=['x1','x2', 'predicted_node_label:N', 'probability_of_predominant_class:Q'] ).properties(title = "Class map", width = 200, height = 200) Visualize predicted vs real labels ---------------------------------- Visualize predicted vs real labels using the iris dataset and `altair `_: .. altair-plot:: from ugtm import eGTM, eGTC import numpy as np import altair as alt import pandas as pd from sklearn import datasets from sklearn import preprocessing from sklearn import decomposition from sklearn import model_selection from sklearn.metrics import confusion_matrix iris = datasets.load_iris() X = iris.data y = iris.target X_train, X_test, y_train, y_test = model_selection.train_test_split( X, y, test_size=0.33, random_state=42) # optional preprocessing scaler = preprocessing.StandardScaler().fit(X_train) X_train = scaler.transform(X_train) X_test = scaler.transform(X_test) # Predict labels for X_test gtc = eGTC() gtc = gtc.fit(X_train,y_train) y_pred = gtc.predict(X_test) # Get GTM transform for X_test transformed = eGTM().fit(X_train).transform(X_test) df = pd.DataFrame(transformed, columns=["x1", "x2"]) df["predicted_label"] = iris.target_names[y_pred] df["true_label"] = iris.target_names[y_test] df["probability_of_predominant_class"] = np.max(gtc.posteriors,axis=1) # Projection of X_test colored by predicted label chart1 = alt.Chart().mark_circle().encode( x='x1',y='x2', size=alt.value(100), color=alt.Color("predicted_label:N", legend=alt.Legend(title="label")), opacity="probability_of_predominant_class:Q", tooltip=["x1", "x2", "predicted_label:N", "true_label:N", "probability_of_predominant_class:Q"] ).properties(title="Pedicted labels", width=200, height=200).interactive() # Projection of X_test colored by true_label chart2 = alt.Chart().mark_circle().encode( x='x1', y='x2', color=alt.Color("true_label:N", legend=alt.Legend(title="label")), size=alt.value(100), tooltip=["x1", "x2", "predicted_label:N", "true_label:N", "probability_of_predominant_class:Q"] ).properties(title="True_labels", width=200, height=200).interactive() alt.hconcat(chart1, chart2, data=df) Parameter optimization ---------------------- GridSearchCV can be used with eGTC for parameter optimization:: from ugtm import eGTC import numpy as np from sklearn.model_selection import GridSearchCV # Dummy train and test X_train = np.random.randn(100, 50) X_test = np.random.randn(50, 50) y_train = np.random.choice([1, 2, 3], size=100) # Parameters to tune tuned_params = {'regul': [0.0001, 0.001, 0.01], 's': [0.1, 0.2, 0.3], 'k': [16], 'm': [4]} # GTM classifier (GTC), bayesian gs = GridSearchCV(eGTC(), tuned_params, cv=3, iid=False, scoring='accuracy') gs.fit(X_train, y_train) print(gs.best_params_)