=================== eGTR: GTM regressor =================== Run eGTR -------- :class:`~ugtm.ugtm_sklearn.eGTR` is a sklearn-compatible GTM regressor. Similarly to PCA or t-SNE, GTM reduces the dimensionality from n_dimensions to 2 dimensions. GTR uses a GTM class map to predict labels for new data (cf. :func:`~gtm.ugtm_landscape.landscape`). The following example uses the California housing dataset:: from ugtm import eGTR from sklearn import datasets from sklearn import preprocessing from sklearn import decomposition from sklearn import model_selection housing = datasets.fetch_california_housing() X = housing.data y = housing.target X_train, X_test, y_train, y_test = model_selection.train_test_split( X, y, test_size=0.33, random_state=42) # optional preprocessing scaler = preprocessing.StandardScaler().fit(X_train) X_train = scaler.transform(X_train) X_test = scaler.transform(X_test) # Predict labels for X_test gtr = eGTR() gtr = gtr.fit(X_train,y_train) y_pred = gtr.predict(X_test) Visualize activity landscape ---------------------------- The GTR algorithm is based on an activity landscape. This landscape is discretized into a grid of nodes, which can be colored by predicted label. This visualization uses the python package `altair `_: .. altair-plot:: from ugtm import eGTR, eGTM import numpy as np import altair as alt import pandas as pd from sklearn import datasets from sklearn import preprocessing from sklearn import decomposition from sklearn import metrics from sklearn import model_selection housing = datasets.fetch_california_housing() X = housing.data[:100] y = housing.target[:100] X_train, X_test, y_train, y_test = model_selection.train_test_split( X, y, test_size=0.33, random_state=42) # optional preprocessing std = preprocessing.StandardScaler() X_train = std.fit(X_train).transform(X_train) # Construct activity landscape gtr = eGTR() gtr = gtr.fit(X_train,y_train) dfclassmap = pd.DataFrame(gtr.optimizedModel.matX, columns=["x1", "x2"]) dfclassmap["predicted_node_label"] = gtr.node_label # Classification map alt.Chart(dfclassmap).mark_square().encode( x='x1', y='x2', color=alt.Color('predicted_node_label:Q', scale=alt.Scale(scheme='greenblue'), legend=alt.Legend(title="California house prices")), size=alt.value(50), tooltip=['x1','x2', 'predicted_node_label:Q'] ).properties(title = "Activity landscape", width = 200, height = 200) Visualize predicted vs real labels ---------------------------------- This visualization uses the python package `altair `_: .. altair-plot:: from ugtm import eGTM, eGTR import numpy as np import altair as alt import pandas as pd from sklearn import datasets from sklearn import preprocessing from sklearn import decomposition from sklearn import metrics from sklearn import model_selection housing = datasets.fetch_california_housing() X = housing.data[:100] y = housing.target[:100] X_train, X_test, y_train, y_test = model_selection.train_test_split( X, y, test_size=0.33, random_state=42) # optional preprocessing scaler = preprocessing.StandardScaler().fit(X_train) X_train = scaler.transform(X_train) X_test = scaler.transform(X_test) # Predict labels for X_test gtr = eGTR() gtr = gtr.fit(X_train,y_train) y_pred = gtr.predict(X_test) # Get GTM transform for X_test transformed = eGTM().fit(X_train).transform(X_test) df = pd.DataFrame(transformed, columns=["x1", "x2"]) df["predicted_label"] = y_pred df["true_label"] = y_test chart1 = alt.Chart(df).mark_point().encode( x='x1',y='x2', color=alt.Color("predicted_label:Q",scale=alt.Scale(scheme='greenblue'), legend=alt.Legend(title="California house prices")), tooltip=["x1", "x2", "predicted_label:Q", "true_label:Q"] ).properties(title="Predicted labels", width=200, height=200).interactive() chart2 = alt.Chart(df).mark_point().encode( x='x1',y='x2', color=alt.Color("true_label:Q",scale=alt.Scale(scheme='greenblue'), legend=alt.Legend(title="California house prices")), tooltip=["x1", "x2", "predicted_label:Q", "true_label:Q"] ).properties(title="True labels", width=200, height=200).interactive() alt.hconcat(chart1, chart2) Parameter optimization ---------------------- GridSearchCV from sklearn can be used with eGTC for parameter optimization:: from ugtm import eGTR import numpy as np from sklearn.model_selection import GridSearchCV # Dummy train and test X_train = np.random.randn(100, 50) X_test = np.random.randn(50, 50) y_train = np.random.choice([1, 2, 3], size=100) # Parameters to tune tuned_params = {'regul': [0.0001, 0.001, 0.01], 's': [0.1, 0.2, 0.3], 'k': [16], 'm': [4]} # GTM classifier (GTR) gs = GridSearchCV(eGTR(), tuned_params, cv=3, iid=False, scoring='r2') gs.fit(X_train, y_train) print(gs.best_params_)