This notebook shows a possible implementation of CCA on the
mfeat-fac
and mfeat-fou
set. The files are
available here.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.rcParams["figure.dpi"] = 150
plt.style.use("ggplot")
with open("mfeat-fac", "r") as file:
view1 = file.read().split("\n")
view1 = [el.strip().split() for el in view1[:-1]]
with open("mfeat-fou", "r") as file:
view2 = file.read().split("\n")
view2 = [el.strip().split() for el in view2[:-1]]
# cast the data to numpy arrays
view1 = np.array(view1, dtype=float)
view2 = np.array(view2, dtype=float)
# standardize function
scale = lambda M : (M - M.mean(axis=0).reshape(1,-1))/M.std(ddof=1,axis=0).reshape(1,-1)
# apply standardization to the two data sets
view1_scaled = scale(view1)
view2_scaled = scale(view2)
# number of examples for each matrix
n = view1.shape[0]
# calculation of the covariance matrices
sigma_xx = 1/n*(view1_scaled.T @ view1_scaled)
sigma_zz = 1/n*(view2_scaled.T @ view2_scaled)
sigma_xz = 1/n*(view1_scaled.T @ view2_scaled)
# Now we show the covariance matrices Sigma_XX and Sigma_ZZ
fig, ax = plt.subplots(nrows=1, ncols=2)
ax[0].imshow(sigma_xx, cmap='magma', interpolation=None)
ax[0].set_title("$\\Sigma_{XX}$")
ax[0].axis("off");
ax[1].imshow(sigma_zz, cmap='magma', interpolation=None)
ax[1].set_title("$\\Sigma_{ZZ}$")
ax[1].axis("off");
# Now we plot the inter-set covariance matrix
plt.imshow(sigma_xz.T, cmap="magma", interpolation=None)
plt.axis("off");
plt.title("$\\Sigma_{XZ}$")
plt.colorbar(fraction=0.02, pad=0.02)
Recall that, if \(\Sigma_{XX}= P D
P^{-1}\) then \(\Sigma_{XX}^{-1/2} = P
D^{-1/2} P^{-1}\) (diagonalization).
Also, the
exponentiation of the diagonal matrix of the eigenvalues \(D\) is done elementwise, and only
applied to the diagonal. All the other elements are 0.
Note that the eigenvectors in the eigenvector matrix returned by
np.linalg.eig
are already normalized. Therefore the
eigenvector matrix is orthogonal, therefore the transpose of the matrix
is equal to its inverse, and we can write: \(\Sigma_{XX}^{-1/2} = P D^{-1/2} P^{T}\)
# np.linalg.eig returns, as first element, the vector of eigenvalues. As second element, the matrix of eigenvectors
sigma_xx_eigval , sigma_xx_eigvect = np.linalg.eig(sigma_xx)
# clip negative eigenvalues arising due to numerical imprecisions
sigma_xx_eigval[sigma_xx_eigval < 0] = 1e-18
# reorder eigenvalues. May not be sorted, in principle
order = np.argsort(sigma_xx_eigval)
sigma_xx_eigval , sigma_xx_eigvect = sigma_xx_eigval[order], sigma_xx_eigvect[:,order]
sigma_xx_12 = sigma_xx_eigvect @ np.diag(sigma_xx_eigval**(-0.5)) @ sigma_xx_eigvect.T
# np.linalg.eig returns, as first element, the vector of eigenvalues. As second element, the matrix of eigenvectors
sigma_zz_eigval, sigma_zz_eigvect = np.linalg.eig(sigma_zz)
# clip negative eigenvalues arising due to numerical imprecisions
sigma_zz_eigval[sigma_zz_eigval < 0] = 1e-18
# reorder eigenvalues. May not be sorted, in principle
order = np.argsort(sigma_zz_eigval)
sigma_zz_eigval , sigma_zz_eigvect = sigma_zz_eigval[order], sigma_zz_eigvect[:,order]
sigma_zz_12 = sigma_zz_eigvect @ np.diag(sigma_zz_eigval**(-0.5)) @ sigma_zz_eigvect.T
W = sigma_xx_12 @ sigma_xz @ sigma_zz_12
# mind that the right singular vector matrix V is returned as transpose by the SVD
U, S, Vh = np.linalg.svd(W)
V = Vh.T
Now the columns of \(U\) and \(V\) are \(u^*_m\) and \(v^*_m\) we have in the notes.
The matrices of the canonical loadings \(u_m\) and \(v_m\) can be obtained as follows
u = sigma_xx_12 @ U
v = sigma_zz_12 @ V
m = np.min(list(u.shape)+list(v.shape))
# remove trailing columns for u or v
u = u[:,:m]
v = v[:,:m]
# calculate the correlation coefficient. np.corrcoeff works but I like this way better
# the original matrices are normalized, so we don't need to divide
corrGraph = np.mean((view1_scaled @ u) * (view2_scaled @ v), axis=0)
# plot the correlation line
plt.plot(np.arange(len(corrGraph)), corrGraph, c="k", marker="o", markersize=3, lw=0.5)
plt.xlabel("index")
plt.ylabel("correlation")
plt.title("Correlation graph between $Xu_m$ and $Zv_m$")
corrGraph[0]
## 0.9713478237777331
Note how this value matches R
’s correlation coefficient
value
plt.scatter(-(view1_scaled @ u)[:,1],
-(view2_scaled @ v) [:,0],
c = "k",
alpha = 0.2)
plt.xlabel("First View, second component")
plt.ylabel("Second View, first component")
plt.title("Canonical Variates")
np.linalg.svd
instead of np.linalg.eig
. These
two functions give the same results when the underlying matrix is
symmetric positive semi-definite (as all covariance matrices are by
definition)