FAiR is a package to enhance the ability to conduct Factor Analysis in R and provides some functionality that is not found in any other R package or other statistical program. FAiR implements a new way to estimate the factor analysis model called semi-exploratory factor analysis (SEFA) in addition to exploratory factor analysis (EFA) and confirmatory factor analysis (CFA). The essence of SEFA is that the user specifies how many coefficients in the “loading” matrix are exactly zero, and the locations of these exact zeros are estimated along with the values of the nonzero parameters. FAiR uses a genetic algorithm called RGENOUD for restricted optimization, which makes it possible to extract or transform factors subject to inequality constraints on functions of multiple parameters.
Most open source software builds on other open source software and FAiR is no exception. The following table briefly explains the relationship between FAiR and other R packages
| Package | Relationship | Usefulness |
|---|---|---|
| rgenoud | Dependency | Restricted optimization |
| gWidgetsRGtk2 | Dependency | Graphical User Interface |
| rrcov | Dependency | Minimum Covariance Determinant estimator |
| Matrix | Dependency | Storing big symmetric matrices with packed storage |
| methods and stats4 | Dependency | S4 classes |
| corpcor | Suggests | Shrinkage covariance estimator |
| mvnmle | Suggests | Estimate covariance among data with missingness |
| polycor | Suggests | Estimate covariance among continuous and ordinal variables |
| GPArotation | Suggests | Copied a lot of code and used to make starting values |
| nFactors | Suggests | Enhanced scree plots |
| Rgraphviz | Suggests | Make directed acyclic graphs |
| mvnormtest | Suggests | Test for multivariate normality |
| energy | Suggests | Test for multivariate normality |
| jit | Suggests | More speed |
| stats | Modified code | Reused a lot of code from factanal() |
| sem | Modified code | Reused a lot of code from several functions |
| psych | Modified code | Reused code from fa.graph() |
A big thanks to everyone involved in the aforementioned R packages. My hope is that the links between FAiR and other packages will become more numerous and stronger in future versions.
There are already three main tools for factor analysis in R, namely factanal() in the stats package to extract EFA factors assuming the data are multivariate normal, the GPArotation package to transform factors in EFA, and the sem package for CFA assuming the data are multivariate normal. The primary ways in which FAiR adds to this collection are that it has discrepancy functions that do not assume the data are multivariate normal, it permits inequality restrictions on functions of multiple parameters, and it implements SEFA.
In comparison to software outside of R, FAiR differs in three primary ways. First, FAiR is primarily licensed under the Affero General Public License. Outside of R and TETRAD, there are no open-source tools for factor analysis and many of the widely used packages for factor analysis are commercial.
Second, FAiR tries to (re)implement the philosophy of factor analysis outlined in a book by Allen Yates, which has the goal of restoring (exploratory) factor analysis’ original scientific purpose of making inferences about how outcomes relate to latent factors in the population, rather than being merely a descriptive tool for describing how outcomes relate to each other within a sample. A few software packages implement Yates’ geomin criterion for factor transformation, but the geomin criterion was only one component of Yates’ perspective on factor analysis. FAiR tries to incorporate all of Yates’ insights, but the implementations in FAiR differ from Yates’ algorithms.
Third, FAiR includes functionality that is not available anywhere else and would not be feasible without a genetic optimization algorithm. In particular, the ability to estimate SEFA models is unique to FAiR, and FAiR is also the only factor analysis software that can impose inequality restrictions on functions of multiple parameters, which is a very useful and promising feature that can be used to implement many of Yates’ ideas.
On the other hand, FAiR has only been in development for a few months whereas other programs for factor analysis often have been in development for many years. Thus, FAiR currently lacks some features that are available in other software and probably has bugs of varying magnitudes. Hopefully, both of these shortcomings will be remedied in time.
Not only can FAiR solve Thurstone’s box problem, it is the first ever to solve Thurstone’s box problem via the analytic criterion Thurstone originally proposed to numerically characterize simple structure. Although the box problem can be solved by several criteria, their success usually depends either on weighting schemes or good starting values. RGENOUD makes it possible to find the global optimum of Thurstone’s criterion (Φ) subject to the constraint that factor collapse is to be avoided without weighting schemes and for any reasonable starting values. There is a bit of hackery below because we do not have Thurstone’s original covariance matrix and cannot reextract his three factors with Factanal, but here is the demonstration using Rotate:
library(FAiR) ## Get initial solution to Thurstone's box problem # Taken from data(Thurstone, package = "GPArotation") box26 <- rbind( c(0.629, -0.494, 0.579), c(0.751, 0.602, 0.125), c(0.765, -0.230, -0.572), c(0.866, 0.131, 0.459), c(0.873, -0.473, -0.042), c(0.906, 0.250, -0.323), c(0.824, -0.149, 0.528), c(0.859, 0.358, 0.306), c(0.812, -0.518, 0.203), c(0.951, -0.441, -0.254), c(0.876, 0.406, -0.185), c(0.885, 0.095, -0.431), c(-0.102, -0.936, 0.322), c(0.102, 0.936, -0.322), c(-0.081, -0.163, 0.969), c(0.081, 0.163, -0.969), c(0.006, 0.810, 0.582), c(-0.006, -0.810, -0.582), c(0.852, 0.223, 0.420), c(0.861, -0.483, -0.094), c(0.912, 0.248, -0.304), c(0.847, 0.218, 0.405), c(0.845, -0.456, -0.106), c(0.902, 0.246, -0.272), c(0.987, -0.026, 0.043), c(0.965, 0.057, -0.028)) noise <- diag(runif(nrow(box26), max = .02)) Sigma <- tcrossprod(box26) + noise man <- make_manifest(covmat = Sigma) res <- make_restrictions(man, factors = 3, model = "EFA") efa <- Factanal(man, res, impatient = TRUE) ## Hack efa@loadings[,,1:4] <- box26 efa@loadings[,,5] <- box26^2 ## Now Rotate() using Thurstone's criterion with a restriction to prevent factor collapse efa_rotated <- Rotate(efa, criteria = list("phi"), methodArgs = list(nfc_threshold = 0.3, c = 1.0)) coef(efa_rotated) # close to true loadings efa_rotated@correlations[,,"PF"] # close to true primary factors ## Raise toast to Thurstone
Not quite impossible
. See the example in the Factanal help file, which produces
> show(sefa) Call: Factanal(manifest = man, restrictions = res) Number of observations: 112 Discrepancy: 7.055898 Semi-exploratory factor analysis with 2 factors All free factor intercorrelations are on the [-1,1] interval All coefficients on the [ -1.5 , 1.5 ] interval Zeros per factor A B zeros 2 2 Mapping rule: default Discrepancy function: MLE 6 degrees of freedom > summary(sefa) Call: Factanal(manifest = man, restrictions = res) Point estimates (blanks, if any, are exact zeros): F1 F2 Uniqueness general 0.405 0.467 0.449 picture 0.642 0.588 blocks -0.009 0.889 0.218 maze 0.478 0.772 reading 0.938 0.120 vocab 0.844 0.288 F1 1.000 0.446 F2 0.446 1.000
Note that nothing dictated that the zeros would fall at [2,1], [4,1], [5,2], and [6,2]. The only requirement was that there would be two zeros in each column
There are more examples in the paper linked below.
SEFAiR So Far, a paper explaining SEFA (needs an update soon)
Restrictions, Factor Analysis, and Genetic Algorithms, a short paper explaining how FAiR works.
FAiR Vignette, has screenshots of the GUI menus
Manual WARNING: Not very clear yet, particularly in the absence of having read the previous three PDFs.
If you use Bazaar, you can branch FAiR by executing bzr branch lp:fair .
To ask questions about the code, statistics, or ideas in FAiR, click here.
To file a bug against the code, click here. In particular, shortcomings in the documentation should be registered here.
To request a feature (called a “blueprint” by Launchpad), click here.
Interesting discussion between Ben Goodrich and Cosma Shalizi
Installing FAiR is slightly more complicated than a typical R package due to FAiR’s rudimentary point-and-click menu system.
First, one needs to install R version 2.6.0 or later (go here for binaries or here for detailed instructions). R can then be started by clicking the R icon
in the program menu (Windows) or Finder (Mac) or (non-Windows) by opening a shell (e.g. bash) and executing the command R .
Second, one needs to install the GTK libraries. This step is derived from John Verzani’s website.
On Windows, the easiest way to do so is to start R and execute
source("http://www.math.csi.cuny.edu/pmg/installpmg.R")
and follow the prompts and default options. Be sure to restart R when the installation is finished.
On Mac, the easiest way to do so is to install this universal binary.
On Linux, the GTK libraries are probably already installed, even if you use a KDE-centric distro. If not, install them using your favorite package manager. On Debian-based distros, this can be accomplished with apt-get install r-cran-rgtk2
Finally, FAiR itself needs to be installed, along with its dependencies. The easiest way to do so on all platforms is to start R and execute
install.packages(c("gWidgetsRGtk2", "rgenoud", "rrcov", "Matrix"), dependencies = TRUE) install.packages(c("corpcor", "mvnmle", "polycor", "nFactors", "mvnormtest", "energy", "GPArotation"), dependencies = TRUE) # optional packages install.packages("FAiR", dependencies = TRUE)
After following these steps, FAiR can be used, examples run, help file accessed from within R by executing the following
library(FAiR) example(Factanal) help("FAiR-package")
Good covariance matrices for playing around with are ability.cov, Harman23.cor, and Harman74.cor (all of which are in library(datasets) and typically loaded at run time)
On Mac, FAiR will not work properly (i.e. crash) unless R is run via the X server. There are two ways to be safe. One is to start the R GUI by clicking the R icon in /Applications, then click the X icon in the top center of the R GUI, then execute library(FAiR). If you do not have the R GUI, then execute open -a X11.app in a terminal, then execute R, then library(FAiR). If you do not have X11 installed, it is quite difficult to use FAiR.
On Windows, I suggest that you disable buffering of output by pressing Ctrl-W or by unchecking Misc -> Buffered output. Doing so will allow you to watch the progress of the genetic algorithm without having to adjust the mouse to flush the buffer.
On Linux, if you use FAiR via ssh be sure to use the -X option, e.g. ssh -X myname@myserver . Otherwise, the GUI menus will cause a crash