We will use PCA on the handwritten digits test set from the UCI optical handwritten digits dataset. There are 64 features with labels. The features are the number of pixels ranging from 0-16 from an 8x8 grid preprocessed image.
import numpy as np import matplotlib.pyplot as plt test = np.loadtxt("data/optdigits_test.txt", delimiter = ",") X = test[:, 0:64] y = test[:, 64] # Plot digits fig = plt.figure(figsize=(8, 6)) fig.tight_layout() for i in range(0, 20): ax = fig.add_subplot(5, 5, i + 1) ax.imshow(X[i].reshape((8,8)), cmap = "Greys", vmin = 0, vmax = 16) plt.show()
mu = np.mean(X, axis = 0) sigma = np.std(X, axis = 0) sX = 1/(sigma + 10**(-10)) * (X - mu)In order to get the eigenvalues, we need to use numerical linear algebra techniques. From my understanding this is an optimized variant of the QR algorithm. In general, this is $O(n^3)$ complexity for dense matrices. For large matrices, gradient descent will have to be used I believe.
# The covariance matrix of the features! not the observations E = 1/(sX.shape[0] - 1) * sX.T @ sX spectrum, U = np.linalg.eig(E) print("Eigenvectors:", spectrum) print("Eigenvalues:", U)
Eigenvectors: [7.34477606 5.83549054 5.15396118 3.96623597 2.96634519 2.57204442 2.40600941 2.06867355 1.82993314 1.78951739 1.69784615 1.57287888 1.38870781 1.35933609 1.32152536 1.16829176 1.08368677 0.99977861 0.97438293 0.90891242 0.82271926 0.77631014 0.71155675 0.64552365 0.59527399 0.5765018 0.52673155 0.5106363 0.48686381 0.45560107 0.44285155 0.42230086 0.3991063 0.39110111 0.36094517 0.34860306 0.3195963 0.29406627 0.27692285 0.05037444 0.06328961 0.258273 0.24783029 0.2423566 0.07635394 0.08246812 0.09018543 0.09840876 0.10250434 0.11188655 0.11932898 0.12426371 0.13321081 0.14311427 0.217582 0.15818474 0.16875236 0.20799593 0.17612894 0.2000909 0.18983516 0. 0. 0. ] Eigenvalues: [[ 0. 0. 0. ... 1. 0. 0. ] [ 0.18223392 -0.04702701 0.02358821 ... 0. 0. 0. ] [ 0.285868 -0.0595648 -0.05679875 ... 0. 0. 0. ] ... [ 0.103198 0.24261778 -0.02227952 ... 0. 0. 0. ] [ 0.1198106 0.16508926 0.10036559 ... 0. 0. 0. ] [ 0.07149362 0.07132924 0.09244589 ... 0. 0. 0. ]]We can compute the explained variability of each eigenvalue
spectrum_corr = spectrum / np.sum(spectrum) print("Explained variability:", spectrum_corr)
Explained variability: [0.12033916 0.09561054 0.08444415 0.06498408 0.04860155 0.0421412 0.03942083 0.03389381 0.02998221 0.02932003 0.02781805 0.02577055 0.02275303 0.0222718 0.02165229 0.01914167 0.01775547 0.01638069 0.0159646 0.01489191 0.0134797 0.01271931 0.01165837 0.01057647 0.00975316 0.00944559 0.00863014 0.00836643 0.00797693 0.00746471 0.00725582 0.00691911 0.00653909 0.00640793 0.00591384 0.00571162 0.00523637 0.00481808 0.00453719 0.00082535 0.00103696 0.00423163 0.00406053 0.00397085 0.00125101 0.00135118 0.00147763 0.00161236 0.00167946 0.00183318 0.00195512 0.00203598 0.00218257 0.00234483 0.00356493 0.00259175 0.00276489 0.00340787 0.00288575 0.00327835 0.00311032 0. 0. 0. ]We get the largest 2 eigenvalues and then map the standardized matrix of $X$ onto the corresponding eigenvectors
K = 2 I = np.argsort(-spectrum)[0:K] Z = sX @ U[:, I]We can look at the corresponding eigenvalues
spectrum[I]
array([7.34477606, 5.83549054])PCA can be used for visualization. Since we chose only 2 eigenvectors, we can plot the features.
plt.style.use("default") fig = plt.figure(figsize=(8, 6)) plt.tight_layout() plt.scatter(Z[:, 0], Z[:, 1], s = 5, c = y) plt.xlabel('Z1') plt.ylabel('Z2') plt.colorbar() plt.show()
# Train test split n_samples = X.shape[0] n_TRAIN = int(.75 * n_samples) I = np.arange(0, n_samples) TRAIN = np.random.choice(I, n_TRAIN, replace = False) TEST = np.setdiff1d(I, TRAIN) X_train = X[TRAIN, :] y_train = y[TRAIN] X_test = X[TEST, :] y_test = y[TEST]We will use Scikit-learn's implementation of logistic regression. We fit using all of the features
# Using all of the features from sklearn.linear_model import LogisticRegression lr = LogisticRegression(solver = "lbfgs", max_iter = 1000, multi_class = "multinomial") lr.fit(X_train, y_train) print("Train accuracy:", 1/X_train.shape[0] * np.sum(lr.predict(X_train) == y_train)) print("Test accuracy", 1/X_test.shape[0] * np.sum(lr.predict(X_test) == y_test))
Train accuracy: 1.0 Test accuracy 0.9622222222222222We use PCA with $K = 10$.
# Using PCA K = 10 sX_train = 1/(np.std(X_train, axis = 0) + 10**(-10)) * (X_train - np.mean(X_train, axis = 0)) E_train = 1/(sX_train.shape[0] - 1) * sX_train.T @ sX_train spectrum_train, U_train = np.linalg.eig(E_train) I = np.argsort(-spectrum_train)[0:K] Z_train = sX_train @ U[:, I] sX_test = 1/(np.std(X_test, axis = 0) + 10**(-10)) * (X_test - np.mean(X_test, axis = 0)) Z_test = sX_test @ U[:, I] lr = LogisticRegression(solver = "lbfgs", max_iter = 1000, multi_class = "multinomial") lr.fit(Z_train, y_train) print("Train accuracy for K =", K, ":", 1/Z_train.shape[0] * np.sum(lr.predict(Z_train) == y_train)) print("Test accuracy for K =", K, ":", 1/Z_test.shape[0] * np.sum(lr.predict(Z_test) == y_test))
Train accuracy for K = 10 : 0.9057164068299925 Test accuracy for K = 10 : 0.8777777777777778We reduced the dimension significantly but at the cost of prediction accuracy.
References.
https://github.com/ethen8181/machine-learninghttp://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. The MIT Press.
Richard O. Duda, Peter E. Hart, and David G. Stork. 2000. Pattern Classification (2nd Edition). Wiley-Interscience, New York, NY, USA.
Kevin P. Murphy. 2012. Machine Learning: A Probabilistic Perspective. The MIT Press.
Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag, Berlin, Heidelberg.
Simon Haykin. 2007. Neural Networks (3rd Edition). Prentice-Hall, Inc., Upper Saddle River, NJ, USA.
Tweets by austindavbrown