--- title: 'R Code to Accompany "Principal Component Analysis and Optimization: A Tutorial"' author: "Robert Reris and J. Paul Brooks" date: "November 25, 2014" output: html_document --- Load required libraries. knitr is necessary for producing this html document. rgl is necessary for 3-dimensional plots. Set the random number generator seed, so we get the same results from clustering each time. ```{r setup, results='asis',warning=FALSE} library(knitr) library(rgl) set.seed(123456) knit_hooks$set(webgl = hook_webgl) cat('', sep = '\n') ``` Load the Motor Trend data and take a look at the first few rows. ```{r} data(mtcars) head(mtcars) ``` Center and scale the data by subtracting out column means and dividing by the standard deviations of the columns. ```{r} mydata <- scale(mtcars, center=T, scale=T) ``` Apply principal component analysis (PCA). The data are already centered and scaled, so there is no need for prcomp() to do it again. The option retx=T indicates that we will get the scores. ```{r} mypca <- prcomp(mydata, center=F, scale=F,retx=T) ``` View the principal component standard deviations, variance explained, and cumulative variance explained for the principal components. The information for Table 1 comes from this view. ```{r} summary(mypca) ``` View the principal component loadings. The information for Table 2 comes from this view. ```{r} mypca$rotation ``` View the principal component scores on the first principal component. The information for Table 3 comes from this view. ```{r} mypca$x[,1] ``` Cluster the points projected on the first three principal components. ```{r} myclust <- kmeans(mypca$x[,1:3], centers=4,nstart=100) ``` Plot the points projected into the best-fitting three-dimensional subspace, and color the points according to cluster membership. This (interactive!) plot is the basis for Figure 1. (Use the arrow keys to scroll up and down.) ```{r,webgl=TRUE} plot3d(mypca$x[,1:3], xlab="PC 1", ylab="PC 2", zlab="PC 3", cex=1.5, size=1, type="s", col=myclust$cluster) text3d(myclust$centers, texts=c("1","2","3","4")) ``` Apply PCA to a subset of the data with only three variables. ```{r} my3Ddata <- mydata[,c("cyl", "qsec", "carb")] my3Dpca <- prcomp(my3Ddata,center=F,scale=F,retx=T) ``` Find the projected points in terms of the original coordinates by multiplying the first two columns of the scores matrix by the transpose of the first two columns of the rotation matrix. This plot is the basis for Figure 4. ```{r} my3Dreconstructions <- my3Dpca$x[,1:2] %*% t(my3Dpca$rotation[,1:2]) ``` Plot the original points, their (reconstructed) projections in the best-fitting two-dimensional subspace, and line segments between them. ```{r, webgl=TRUE} plot3d(my3Ddata, xlab="No. Cylinders", ylab="Quart. Mile", zlab="No. Carburetors", xlim=c(-2,2), ylim = c(-3,3), zlim = c(-2,4), cex=1.5, size=1, type="s") planes3d(my3Dpca$rotation[,3],alpha=.5) plot3d(my3Dreconstructions, col="red", cex=1.5, size=1, add=TRUE,type="s") mylist <- list(my3Ddata,my3Dreconstructions) segments3d(do.call(rbind,mylist)[order(sequence(sapply(mylist,nrow))),]) ```