
What is Factor Analysis in Research? A Complete Guide
Meet the Expert
Shruti Sharma
Academic Writing Coach & Research Communication Specialist
- Guided 300+ PhD scholars through EFA, CFA, SEM, and scale development
- Expertise in SPSS, AMOS, and SmartPLS for factor analysis and structural modelling
- Specialises in writing measurement and methodology chapters for management and social science theses
Factor analysis is a multivariate statistical method that identifies a smaller set of underlying latent factors from a larger set of observed variables. It is one of the most powerful tools for scale development, construct validation, and data reduction in quantitative research — and is used extensively in management, psychology, education, and marketing studies.
If you are developing or validating a questionnaire for your thesis, or building a structural equation model, factor analysis is almost certainly part of your analytical toolkit. Understanding the difference between exploratory and confirmatory factor analysis — and when to use each — is crucial for a credible quantitative study.
What Factor Analysis Does
Imagine you have 20 questionnaire items measuring employee well-being. Factor analysis might reveal that these 20 items actually cluster into 4 underlying factors: physical health, psychological safety, work-life balance, and social support. These 4 factors (also called latent variables or constructs) explain the patterns of correlation among the 20 observed items.
EFA vs CFA — Key Differences
Used when factor structure is unknown; lets data determine groupings
Tests a pre-specified model against the data; theory-driven
No prior model; suitable for early-stage scale development
Requires a theoretical model specifying which items load on which factor
Run via Analyze → Dimension Reduction → Factor in SPSS
Requires SEM software; output includes model fit indices (CFI, RMSEA)
Prerequisites: KMO and Bartlett's Test
Before running EFA, check whether your data is suitable for factor analysis:
| Test | What It Checks | Acceptable Result |
|---|---|---|
| KMO Measure of Sampling Adequacy | Whether inter-item correlations are suitable for factoring | KMO ≥ 0.70 (0.60 minimum) |
| Bartlett's Test of Sphericity | Whether correlation matrix is non-identity (i.e., items are correlated) | p < 0.05 (significant) |
| Communalities | Proportion of each item's variance explained by extracted factors | ≥ 0.40 for each item |
| Sample size | Adequate N for stable factor solution | Minimum 5–10 participants per item; N ≥ 200 preferred |
How to Run EFA in SPSS (Step-by-Step)
- Go to Analyze → Dimension Reduction → Factor
- Move all items into the Variables box
- Click Descriptives → tick KMO and Bartlett's test of sphericity and Coefficients
- Click Extraction → Method: Principal Axis Factoring (or Principal Components); Extract: Eigenvalues over 1; tick Scree plot
- Click Rotation → Method: Oblimin (if factors may correlate) or Varimax (if factors assumed orthogonal)
- Click Options → tick Suppress small coefficients (absolute value below 0.30 or 0.40)
- Click OK
Interpreting EFA Output
| SPSS Output Table | What to Look For |
|---|---|
| KMO and Bartlett's Test | KMO ≥ 0.70; Bartlett's p < 0.05 |
| Total Variance Explained | Cumulative % variance explained by retained factors (aim for ≥ 50–60%) |
| Scree Plot | Elbow point indicates number of factors to retain |
| Pattern Matrix | Factor loadings for each item on each factor (≥ 0.40 or 0.50 preferred) |
| Communalities | Extraction values ≥ 0.40 for each item |
Factor Loading Interpretation
| Factor Loading | Interpretation |
|---|---|
| ≥ 0.70 | Strong loading — item is an excellent indicator of the factor |
| 0.50–0.69 | Moderate loading — item is a good indicator |
| 0.40–0.49 | Acceptable loading — borderline; review conceptually |
| 0.32–0.39 | Weak loading — consider dropping the item |
| < 0.32 | Drop the item from the scale |
Confirmatory Factor Analysis (CFA) and Model Fit
CFA is run in AMOS, R (lavaan), or SmartPLS. It tests whether your pre-specified factor structure fits the observed data. Key model fit indices:
| Fit Index | Acceptable Value |
|---|---|
| CFI (Comparative Fit Index) | ≥ 0.90 (≥ 0.95 preferred) |
| TLI (Tucker-Lewis Index) | ≥ 0.90 |
| RMSEA (Root Mean Square Error of Approximation) | ≤ 0.08 (≤ 0.06 preferred) |
| SRMR (Standardised Root Mean Square Residual) | ≤ 0.08 |
| Chi-square / df ratio (CMIN/DF) | ≤ 3.0 (≤ 5.0 acceptable) |
EFA Before CFA: The Recommended Sequence
In doctoral research, the recommended sequence for scale development is: (1) Conduct EFA on a portion of your sample to explore the factor structure; (2) Conduct CFA on the remaining portion (or a new sample) to validate the structure. If your scale is well-established in the literature, you may proceed directly to CFA without EFA. Always justify your choice in your methodology chapter.
Need help with EFA, CFA, or reporting factor analysis results in your thesis? Our statistical experts at Thesis Ace Writers can assist you from analysis to write-up.
Rotation Methods in EFA: Oblimin vs Varimax
- Varimax (orthogonal rotation): Assumes factors are uncorrelated. Produces simpler, cleaner factor structure. Use when factors are theoretically independent.
- Oblimin / Promax (oblique rotation): Allows factors to correlate. More realistic for social science constructs (e.g., motivation and satisfaction are likely correlated). Generally recommended when factors are expected to be related.
Common Mistakes in Factor Analysis
- Skipping KMO and Bartlett's test: Always confirm data suitability before running factor analysis
- Using principal components analysis (PCA) when you should use EFA: PCA is a data reduction technique, not a latent variable model — use principal axis factoring for construct validation
- Deleting items without theoretical justification: Statistical output alone is insufficient — always consider item content and theory
- Running EFA and CFA on the same sample: This inflates fit; use different subsamples or separate studies
- Ignoring cross-loadings: Items that load strongly on two or more factors are ambiguous and should be revised or removed
Related Reading from Thesis Ace Writers
Running factor analysis for your thesis and need expert guidance? Book a consultation with Thesis Ace Writers today.
Frequently Asked Questions
Click a question to expand the answer.
Factor analysis is a multivariate statistical technique used to identify a smaller number of underlying latent factors (unobserved variables) that explain the patterns of correlation among a larger set of observed variables (items). In research, it is primarily used for: (1) data reduction — reducing many variables to a few meaningful factors; and (2) scale development and validation — grouping questionnaire items that measure the same underlying construct. Factor analysis is widely used in psychology, management, education, marketing, and health research.
Exploratory Factor Analysis (EFA) is used when you do not have a prior theory about which items belong to which factors — it lets the data reveal the factor structure. Confirmatory Factor Analysis (CFA) is used when you have a pre-specified model (based on theory or a prior EFA) and want to test whether the observed data fits that model. EFA is typically used in scale development; CFA is used in scale validation and in the measurement model step of SEM (Structural Equation Modelling).
Factor loadings represent the correlation between an observed variable and an underlying factor. General guidelines: ≥ 0.70 = strong loading; ≥ 0.50 = acceptable loading (recommended minimum for inclusion); ≥ 0.32 = weak but may be reported; < 0.32 = item should generally be dropped. In practice, most researchers require factor loadings ≥ 0.40 or ≥ 0.50 and cross-loadings (item loading on more than one factor) below 0.30.
The KMO (Kaiser-Meyer-Olkin) measure of sampling adequacy indicates whether the correlation matrix is suitable for factor analysis. KMO values: ≥ 0.90 = Marvelous; ≥ 0.80 = Meritorious; ≥ 0.70 = Middling (acceptable); ≥ 0.60 = Mediocre; < 0.50 = Unacceptable. Bartlett's Test of Sphericity tests whether the correlation matrix is an identity matrix (all variables uncorrelated). It should be statistically significant (p < 0.05) to proceed with factor analysis.
Several criteria guide the number of factors to extract: (1) Kaiser's criterion (eigenvalue > 1) — retain factors with eigenvalues greater than 1; (2) Scree plot — look for the 'elbow' where the slope flattens; (3) Parallel analysis — compare eigenvalues to those from random data (most rigorous method); (4) Theoretical interpretability — factors must make conceptual sense. Kaiser's criterion is most commonly used but tends to over-extract; parallel analysis is considered more accurate.