The Square-Root Scree Plot: A Simple Improvement to a Classic Display

The Square-Root Scree Plot: A Simple Improvement to a Classic Display

Marco Del Giudice

Previous Versions

Version #6 - Published - Nov 19th Version #5 - Nov 19th Version #4 Changed Submission editor. - Nov 17th Version #3 - Accepted - Nov 17th Version #2 - Nov 17th Version #1 - Oct 22nd

Submission status
Published

Submission Editor
Emil O. W. Kirkegaard

Author
Marco Del Giudice

Title
The Square-Root Scree Plot: A Simple Improvement to a Classic Display

Abstract

Scree plots are ubiquitous in applications of exploratory factor analysis (EFA) and principal component analysis (PCA); they are used to visualize the relative importance of different factors/components and display the results of selection procedures (e.g., parallel analysis). Because the eigenvalues shown in the scree plot indicate the amounts of variance accounted for by the corresponding factors/components, they tend to give a distorted picture of the relative importance of the factors/components with respect to the original units of the variables. Specifically, variances inflate the apparent importance of large effects and deflate that of small effects; as a result, traditional scree plots exaggerate the differences between larger and smaller factors/components, and flatten the visual representation of the smaller ones. In this brief note, I propose a simple solution in the form of square-root scree plots, i.e., scree plots based on the square root of the eigenvalues. Square-root scree plots provide a balanced display of the relative importance of the factors/components, and a more legible representation of the smaller ones. They are a useful addition to the toolkit of EFA and PCA, and may be preferable as a default option in most common applications.

Keywords
principal component analysis, scree plot, factor analysis, Eigenvalues

Preprint link
https://psyarxiv.com/axubd

Pdf

Paper

Typeset Pdf

Typeset Paper

Reviewers ( 0 / 0 / 2 )
Emil O. W. Kirkegaard: Accept
John Herrnstein: Accept

Sat 22 Oct 2022 23:26

Emil O. W. Kirkegaard

Wed 02 Nov 2022 18:28

Reviewer | Admin | Editor

On the scree plot, usually the eigenvalues are shown as integers. However, considering that they represent variance% metrics, wouldn't it make more sense to persent them as fractions, and thus also the sqrt version as fractions? This would put the values on the usual correlation/standardized beta-metric. E.g. in your plot, factor/component 1 is about 45% variance, or .45, so sqrt(.45)=.67 instead of 6.7 on your plot. I think this would improve intuitive understanding. Regarding your footnote, this would mean that the traditional Kaiser-Guttman would have to be converted to 0.1 on the sqrt/correlation scale.

How does scree plots work for correlated factors/components? Their variances do not add up to 1 due to the covariance, and their correlation-metric values will also not be directly comparable. Never saw this considered, just question.

Otherwise, I don't see anything to correct in this brief article.

Marco Del Giudice

Sat 05 Nov 2022 05:15

Author

Replying to Reviewer 1

Thanks a lot for the suggestions! See below:

On the scree plot, usually the eigenvalues are shown as integers. However, considering that they represent variance% metrics, wouldn't it make more sense to persent them as fractions, and thus also the sqrt version as fractions? This would put the values on the usual correlation/standardized beta-metric. E.g. in your plot, factor/component 1 is about 45% variance, or .45, so sqrt(.45)=.67 instead of 6.7 on your plot.

I've been thinking about this but I worry that the resulting coefficients would become misleading. Eigenvalues divided by their sum do represent proportions of the total variance, but as a rule, that variance is not distributed homogeneously across the original variables. So turning them into "correlations" would raise the question, correlations with what? Not with any particular variable or set of variables. The square-root eigenvalues are more abstract, do not invite over-interpretations, and are meaningful in relation to one another. Please let me know if this makes sense!

I think this would improve intuitive understanding. Regarding your footnote, this would mean that the traditional Kaiser-Guttman would have to be converted to 0.1 on the sqrt/correlation scale.

That depends on the number of variables and their variances. E.g., with 20 standardized variables, an eigenvalue of 1 accounts for 5% of the variance, and the square root would be .22. With 50 variables, the same eigenvalue accounts for 2% of the variance (square root 0.14). Possibly another reason to prefer the square-root eigenvalues (1 stays 1 no matter how many variables are in the set).

How does scree plots work for correlated factors/components? Their variances do not add up to 1 due to the covariance, and their correlation-metric values will also not be directly comparable. Never saw this considered, just question.

The scree plot corresponds to unrotated factors/components that are (still) orthogonal; not sure I understand this question.

Otherwise, I don't see anything to correct in this brief article.

Emil O. W. Kirkegaard

Sat 05 Nov 2022 06:02

Reviewer | Admin | Editor

Replying to Marco Del Giudice

I've been thinking about this but I worry that the resulting coefficients would become misleading. Eigenvalues divided by their sum do represent proportions of the total variance, but as a rule, that variance is not distributed homogeneously across the original variables. So turning them into "correlations" would raise the question, correlations with what? Not with any particular variable or set of variables. The square-root eigenvalues are more abstract, do not invite over-interpretations, and are meaningful in relation to one another. Please let me know if this makes sense!

I suggest you add some note about this idea of thinking about the values as being on the correlation/standardized beta scale when divided by 100. Otherwise, I'm happy with the article as is.

Re. oblique. I meant, what if one wants to use the scree plot to determine the number of factors to retain in an oblique solution. Seems to me like this would require using the size of the factors (eigenvalue or R2 scale), and these would not be entirely comparable due to their intercorrelations.

John Herrnstein

Thu 17 Nov 2022 18:21

Reviewer

This paper offers a simple improvement on the Scree plot and I could see myself using it. There is not much to talk about with respect to the paper, so I move to accept.

Marco Del Giudice

Thu 17 Nov 2022 21:31

Author

Replying to Reviewer 1

Replying to Marco Del Giudice

I've been thinking about this but I worry that the resulting coefficients would become misleading. Eigenvalues divided by their sum do represent proportions of the total variance, but as a rule, that variance is not distributed homogeneously across the original variables. So turning them into "correlations" would raise the question, correlations with what? Not with any particular variable or set of variables. The square-root eigenvalues are more abstract, do not invite over-interpretations, and are meaningful in relation to one another. Please let me know if this makes sense!

I suggest you add some note about this idea of thinking about the values as being on the correlation/standardized beta scale when divided by 100. Otherwise, I'm happy with the article as is.

Re. oblique. I meant, what if one wants to use the scree plot to determine the number of factors to retain in an oblique solution. Seems to me like this would require using the size of the factors (eigenvalue or R2 scale), and these would not be entirely comparable due to their intercorrelations.

In the revision, I expanded footnote 1 to discuss the possibility of usign normalized eigenvalues, as suggested. I noted both the pros and cons of this option.

Forum Bot

Thu 17 Nov 2022 21:32

Bot

Author has updated the submission to version #2

Forum Bot

Thu 17 Nov 2022 21:56

Bot

The submission was accepted for publication.

Forum Bot

Sat 19 Nov 2022 20:59

Bot

Author has updated the submission to version #5