Evidence for a Paternal Age Effect on Leftism

Evidence for a Paternal Age Effect on Leftism

Joseph Bronski

Previous Versions

Version #10 - Published - Nov 25th Version #9 - Nov 25th Version #8 - Nov 25th Version #7 - Nov 25th Version #6 - Accepted - Nov 11th Version #5 - Aug 20th Version #4 - Aug 20th Version #3 - Aug 18th Version #2 - Aug 11th Version #1 - Aug 1st

Submission status
Published

Submission Editor
Emil O. W. Kirkegaard

Author
Joseph Bronski

Title
Evidence for a Paternal Age Effect on Leftism

Abstract

The US has seen a linear decrease in the proportion of conservatives in each generation for at least 90 years. Sarraf et al. [5] have suggested that this is related to increases in mutational load due to relaxed selection pressures on humans in industrialized environments. We provide additional evidence for this hypothesis: leftists have older fathers than non-leftists, and those with older fathers are more likely to be leftist. Since male gametes acquire about 2 mutations per year, while female gametes mutate much more slowly, traits that are changing due to mutational pressure are expected to be more common in offspring from older fathers. Additionally, we show that older fathers themselves are not more leftist than younger fathers, suggesting that the paternal age effect is not due to differences in breeding patterns between leftists and non-leftists.

Keywords
paternal age, leftism, woke, mutational load, sociobiology

Supplemental materials link
https://github.com/josephbronski/PaternalAgeStudy

Pdf

Paper

Typeset Pdf

Typeset Paper

Reviewers ( 0 / 0 / 2 )
Sebastian Jensen: Accept
Meng Hu: Accept

Tue 01 Aug 2023 02:43

Sebastian Jensen

Mon 07 Aug 2023 21:29

Reviewer

I am in approval of most of the article, however, I have a few comments to make:

1. Forcing leftism to a binary distribution is unnecessary, and removes some dimensionality from the data. This is particularly relevant to the reg table in Figure 6.

2. Cross-sectional differences in personality do not imply cohort differences in personality. I currently have several studies on hand which assess cohort differences in personality independent of age:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9096450/ -> increased openness, extraversion, and emotional stability between birth years of 1880 and 1945-1976

https://sci-hub.ru/10.1016/S0191-8869(00)00066-0 -> increased extraversion between 60s and 90s in American college students

https://www.researchgate.net/publication/315854266_Cohort_differences_in_personality -> systemic literature review, most replicated finding is increased extraversion

https://psycnet.apa.org/record/2011-08639-001 -> small increases in extraversion, conscientiousness, emotional stability, and agreeableness within genders in Dutch psych students between 1982 and 2007

https://pubmed.ncbi.nlm.nih.gov/22588670/ -> increase in extraversion, decreased social-desirability responding and lying

I doubt the increase in extraversion is caused by dysgenics (or eugenics), rather, I think it is more probably that environmental changes (increased moving, more unstable social circles, public schooling, service economy) have forced people to become more extraverted than they would be in a "natural" setting.

3. Could you report a p-value for the differences tested in Figure 5?

4. Could you report effect sizes, p-values, sample sizes, and sample sources in the abstract?

Meng Hu

Tue 08 Aug 2023 09:34

Reviewer | Admin

The hypothesis test here is interesting, but there are questions left unanswered.

There are 2380 participants recruited from the Prolific platforms. The sample size, along with Prolific fees, makes that study very costly. Yet according to the paper, it seems there are only a few variables collected? This is surprising considering the cost. Are there really no more variables than the ones mentioned in the paper?

Because there are obviously confounding factors not accounted for. Woodley of Menie et al. (2020) showed that paternal age could be negatively associated with church attendance. Perhaps more importantly is the fact that religiosity and political attitudes are correlated. If you do have religiosity, use it as a control, otherwise mention this issue in the discussion. Also, birth order controls for within family variance. Adding this variable would ensure that the father's age truly reflects between family variance. Another variable likely correlated with father's age is father education. Again if you do not have these variables, make sure you mention them in the discussion.

Woodley of Menie, M. A., Kanazawa, S., Pallesen, J., & Sarraf, M. A. (2020). Paternal age is negatively associated with religious behavior in a post-60s but not a pre-60s US birth cohort: testing a prediction from the social epistasis amplification model. Journal of religion and health, 59, 2733-2752.

The method section must be rewritten well. We need to know how each variable are measured (number and label of variables' categories), how the questions are asked (e.g., LGBT, BLM).

Another problem: you said there are two rounds of data collection. So it looks like it was intended to be 2 separate analyses but your result section doesn't make it clear which data you are using. You should separate the Results section into subcategories: Sample 1 and Sample 2.

Yet another issue. What do you mean by "attempting to balance the number of liberals and conservatives"? Did you employ a particular procedure for your data collection or participant screening? If so, make it clear in the text.

Furthermore, I have doubts about the methods. What is Leftist? Typically the variable for political views is a 7 Likert scale. If you dichotomized this variable into 0-1 to get the Leftist variable, as it seems reading your text, this practice is known to cause biased estimates (MacCallum et al., 2002). But you also seem to have dichotomized the father's age in Figures 2-5. I suggest you work with the original variables, unless you have a good theoretical reason for splitting the father's age at exactly 35. This is also why we need to know exactly how the variables are measured. In your method section, you wrote: "We asked the same political questions, as well as a) how many children their wife gave birth to before they were 35 years old and b) how many children their wife gave birth to after they were 35 years old." I am wondering if this was exactly how the question was phrased. If this is the case, this is a potential issue.

MacCallum, R. C., Zhang, S., Preacher, K. J., & Rucker, D. D. (2002). On the practice of dichotomization of quantitative variables. Psychological methods, 7(1), 19.

Overall I suggest you remove Figures 2-5 and stick with a regression analysis (one for each sample) and adding the squared term of age to check for nonlinearity. Because a regression analysis would make Figures 2-5 redundant. Before this step though, make sure you report the kurtosis/skewness of your variables. The right tail of paternal age doesn't look right. If these look like outliers, try to run the regression with and without these outliers (typically outliers are z-scores values of 3 or sometimes 2.5) and see if these affect the regression results. Do not forget to report both the standardized and unstandardized regression coefficients. Especially if some variables are dichotomous or categorical you would rather discuss the unstandardized coefficient.

When applying multiple regression though, you wouldn't need to report Figure 6 anymore (probably a typo, it should have been Table 1).

For figure 1, we obtained a general factor from the questions about feminism, Black Live Matter, and LGBT. The loadings were .74, .78, and .79 respectively. Roughly the top half most leftist scorers were categorized as “left.”

This is very confusing, could you explain in more details the procedure and the goal of such analysis? Furthermore, I am concerned about how these variables are measured. If the response is dichotomous (yes/no), a factor analysis is inappropriate because it assumes continuous variables. A factor is by definition continuous.

Typo: "and 1 in 300 are born retarded due de novo mutation" -> should be "due to de novo"

Final remark, I would appreciate if the references in the text actually are not numbered. It forces the reader to go back and forth to check the authors' papers. For anyone willing to check and read the references, this is extremely tedious. Furthermore, it's easier to get the wrong reference by using numbers rather than using the authors' names.

Joseph Bronski

Fri 11 Aug 2023 03:13

Author | Admin

Replying to Reviewer 1

I am in approval of most of the article, however, I have a few comments to make:

1. Forcing leftism to a binary distribution is unnecessary, and removes some dimensionality from the data. This is particularly relevant to the reg table in Figure 6.

2. Cross-sectional differences in personality do not imply cohort differences in personality. I currently have several studies on hand which assess cohort differences in personality independent of age:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9096450/ -> increased openness, extraversion, and emotional stability between birth years of 1880 and 1945-1976

https://sci-hub.ru/10.1016/S0191-8869(00)00066-0 -> increased extraversion between 60s and 90s in American college students

https://www.researchgate.net/publication/315854266_Cohort_differences_in_personality -> systemic literature review, most replicated finding is increased extraversion

https://psycnet.apa.org/record/2011-08639-001 -> small increases in extraversion, conscientiousness, emotional stability, and agreeableness within genders in Dutch psych students between 1982 and 2007

https://pubmed.ncbi.nlm.nih.gov/22588670/ -> increase in extraversion, decreased social-desirability responding and lying

I doubt the increase in extraversion is caused by dysgenics (or eugenics), rather, I think it is more probably that environmental changes (increased moving, more unstable social circles, public schooling, service economy) have forced people to become more extraverted than they would be in a "natural" setting.

3. Could you report a p-value for the differences tested in Figure 5?

4. Could you report effect sizes, p-values, sample sizes, and sample sources in the abstract?

1. Sadly by design leftism is binary in this study. I would like to replicate it in the future with a continuous metric.

2. Thank you, I changed the reference to the openness study. On extraversion, while it's not directly relevant, I actually recall from memory seeing some study claiming that extraverts are reproducing more than non-extraverts, i.e. it was under selection. So I would disagree that the change is environmental. But I haven't been able to find this study. Maybe someone knows what I'm talking about?

3. The 95% CIs show p >> 0.05 so I didn't actually compute a specific p value. However p for the larger gap should be .94 and for the smaller should be about .98 considering Zs of .066 and .022 respectively.

4. Done

Joseph Bronski

Fri 11 Aug 2023 19:53

Author | Admin

Replying to Tue 08 Aug 2023 09:34

The hypothesis test here is interesting, but there are questions left unanswered.

1. There are 2380 participants recruited from the Prolific platforms. The sample size, along with Prolific fees, makes that study very costly. Yet according to the paper, it seems there are only a few variables collected? This is surprising considering the cost. Are there really no more variables than the ones mentioned in the paper?

2. Because there are obviously confounding factors not accounted for. Woodley of Menie et al. (2020) showed that paternal age could be negatively associated with church attendance. Perhaps more importantly is the fact that religiosity and political attitudes are correlated. If you do have religiosity, use it as a control, otherwise mention this issue in the discussion. Also, birth order controls for within family variance. Adding this variable would ensure that the father's age truly reflects between family variance. Another variable likely correlated with father's age is father education. Again if you do not have these variables, make sure you mention them in the discussion.

Woodley of Menie, M. A., Kanazawa, S., Pallesen, J., & Sarraf, M. A. (2020). Paternal age is negatively associated with religious behavior in a post-60s but not a pre-60s US birth cohort: testing a prediction from the social epistasis amplification model. Journal of religion and health, 59, 2733-2752.

3. The method section must be rewritten well. We need to know how each variable are measured (number and label of variables' categories), how the questions are asked (e.g., LGBT, BLM).

4. Another problem: you said there are two rounds of data collection. So it looks like it was intended to be 2 separate analyses but your result section doesn't make it clear which data you are using. You should separate the Results section into subcategories: Sample 1 and Sample 2.

5. Yet another issue. What do you mean by "attempting to balance the number of liberals and conservatives"? Did you employ a particular procedure for your data collection or participant screening? If so, make it clear in the text.

6. Furthermore, I have doubts about the methods. What is Leftist? Typically the variable for political views is a 7 Likert scale. If you dichotomized this variable into 0-1 to get the Leftist variable, as it seems reading your text, this practice is known to cause biased estimates (MacCallum et al., 2002). But you also seem to have dichotomized the father's age in Figures 2-5. I suggest you work with the original variables, unless you have a good theoretical reason for splitting the father's age at exactly 35. This is also why we need to know exactly how the variables are measured. In your method section, you wrote: "We asked the same political questions, as well as a) how many children their wife gave birth to before they were 35 years old and b) how many children their wife gave birth to after they were 35 years old." I am wondering if this was exactly how the question was phrased. If this is the case, this is a potential issue.

MacCallum, R. C., Zhang, S., Preacher, K. J., & Rucker, D. D. (2002). On the practice of dichotomization of quantitative variables. Psychological methods, 7(1), 19.

7. Overall I suggest you remove Figures 2-5 and stick with a regression analysis (one for each sample) and adding the squared term of age to check for nonlinearity. Because a regression analysis would make Figures 2-5 redundant. Before this step though, make sure you report the kurtosis/skewness of your variables. The right tail of paternal age doesn't look right. If these look like outliers, try to run the regression with and without these outliers (typically outliers are z-scores values of 3 or sometimes 2.5) and see if these affect the regression results. Do not forget to report both the standardized and unstandardized regression coefficients. Especially if some variables are dichotomous or categorical you would rather discuss the unstandardized coefficient.

When applying multiple regression though, you wouldn't need to report Figure 6 anymore (probably a typo, it should have been Table 1).

For figure 1, we obtained a general factor from the questions about feminism, Black Live Matter, and LGBT. The loadings were .74, .78, and .79 respectively. Roughly the top half most leftist scorers were categorized as “left.”

8. This is very confusing, could you explain in more details the procedure and the goal of such analysis? Furthermore, I am concerned about how these variables are measured. If the response is dichotomous (yes/no), a factor analysis is inappropriate because it assumes continuous variables. A factor is by definition continuous.

9. Typo: "and 1 in 300 are born retarded due de novo mutation" -> should be "due to de novo"

Final remark, I would appreciate if the references in the text actually are not numbered. It forces the reader to go back and forth to check the authors' papers. For anyone willing to check and read the references, this is extremely tedious. Furthermore, it's easier to get the wrong reference by using numbers rather than using the authors' names.

1. On Prolific, studies cost more the longer they take. So in this study, I made sure the survey was as short as possible. This is why the leftism variable was binary, and why I didn't collect more data than needed.

2. I did not collect data on possible confounders other than the 2nd round on older vs. younger father leftism. As explained in 1., this is because of budget constraints. It would have multiplied the cost to collect data like Big 5, IQ, religiosity, and more, probably by a factor of 10 or more for the ones I just mentioned, because the survey could take 10 minutes or more. This was way outside of my budget. I have added a limitations section noting these issues.

3. I added the questions

4. I added labeling to the figure charts, but I think cutting the results section in two would be a more confusing presentation.

5. I clarified this in the text. Basically I just collected some data from people who said they were liberals on their Prolific screening items, and some data who said they were conservatives, to make sure I had about 50/50

6. If you see my other comments you will see that I am not a huge fan of how I measured leftist, but it seems to be better or as good as party ID / "left wing" identification, so I think it's far from worthless. However, I want to replicate this using a better, continuous metric. As for paternal age, I show the total distribution in figure 1 to establish that leftists have older fathers. I switch to under/over 35 after that because the theory predicts a small effect and this maximizes my analytical power. As for the phrasing under/over 35 questions in the second round, I made it as clear as possible and I found the mean number of children per person in the sample was in line with national estimates. Basically, if the question is phrased badly people will say they have too many kids because they will include their pre-35 kids with their post-35 kids in the "how many kids did you have after 35." In this case I asked "How many children did your wife give birth to after you were OVER the age of 35? (Please input a whole number using digits, e.g. 2)".

7. Paternal age should look like that but I agree there could be power issues at the right tails. Generating the histogram with fathers over 45 cut off (sd = 7) , d = 0.129, p = 0.002, leftists have older fathers still. So I do not believe that the tails are biasing the analysis. If anything they biased it in the direction contra to my hypothesis and findings since my result gets better when I cut them off. As follows from this, the other analyses don't noticeably change when I cut off the tales. I regenerated the charts with the tail cut to make sure of this and it looks basically the same.

I can't do regression analysis on the second round of data and for the first round I think d is a better measure than r. r can be derived from d using an equation and it's about 0.07. This is hard to interpret without mapping it back onto d because leftism was binary. However I do think that a logistic regression with paternal age (continous) and participant age predicting leftism status could be informative, although I find it hard to interpret compared to figures 1 through 5. I added it as figure 7.

8. I did factor analysis just to see the loadings. Essentially, if someone answered yes on all 3 questions they were leftist, and otherwise they were considered non-leftist.

9. fixed

On the citation format, I prefer numbered but I can change it over to names if that's what the journal requires before publication.

Forum Bot

Fri 11 Aug 2023 19:55

Bot

Author has updated the submission to version #2

Meng Hu

Sat 12 Aug 2023 12:01

Reviewer | Admin

I appreciate that you have now described the variables. Otherwise, it would have been very difficult if not impossible to really detect where are the strengths and weaknesses.

After reading this new version carefully, here are what i think need to be explained.

1. If possible, for replication purposes, you should probably display your code in a supplementary file, eventually (if that is possible) the data you have used (without ID and personal information of the participants of course). In this case, say in the main text the materials are provided in the supplementals. Finally, tell us which software you are using (it's apparently not R).

2. In the text below figure 1 you didn't explain the purpose of the factor analysis and you still did not explain whether it's a factor analysis or principal component analysis. And since these three variables in first round are dichotomous you should have used a tetrachoric approach for factor analysis, e.g., fa(r, cor="tet") if using R. Also, if, as your reply suggest, you wanted to measure leftism factor, the loadings do not tell you the whole story. You need to report the scale reliability, which basically tells you how well you have measured leftism factor. In this case, use Omega (not the Hierarchical, and not Alpha, just Omega). Flora (2020) explains the strength of Omega compared to Alpha, and also provides easy guidance as to how to apply it in R (see section categorical Omega since your variables are binary).

Flora, D. B. (2020). Your coefficient alpha is probably wrong, but which coefficient omega is right? A tutorial on using R to obtain better reliability estimates. Advances in Methods and Practices in Psychological Science, 3(4), 484-501.

However, what concerns me most is the fact that Figure 1 seems to imply your leftism factor is a binary variable (0-1). A classical factor analysis should produce continuous factor score distribution with mean zero, so I would like to know how you got a purely binary distribution of such factor.

3. About Figures 2-5 I highly recommend you display also the plots using the original metric of father age. As explained earlier, unless you have a strong theoretical reason to dichotomize the variable at exactly 35, don't do it. Dichotomizing a variable is known to cause bias, as explained earlier. It's not a recommended approach. I highly recommend you again to read MacCallum's article.

MacCallum, R. C., Zhang, S., Preacher, K. J., & Rucker, D. D. (2002). On the practice of dichotomization of quantitative variables. Psychological methods, 7(1), 19.

4. "Probability" in Figures 2-5 has been computed by taking the mean, so make sure you explain it in the beginning of Result section. Because otherwise it's not clear. Often, when I see researchers displaying their probability plot, it's actually the predicted probabilities from, say, a logistic regression.

5. Since you're keeping your graphs as well now, make sure you write in the text "Figure 2 shows..." under the respective graph so it's easier to follow. And make sure you correct this mistake: "Next, we show the leftism probabilities from Figure 3 alongside the leftism probabilities of the fathers. " because it's actually Figure 4.

6. Figure 4 apparently was produced by using both the first "baseline" data and its follow up. But since the follow up sample was much, much smaller, there is a strong attrition, which is another limitation. Make sure you discuss that in the appropriate section.

7. You need to present each of your regression tables appropriately by "calling" them, such as "in Table 1 we present results from...".

8. You did not seem to have used the variable party ID in your analyses. Or perhaps it is not made clear again. Sure, you have described each variable, but you did not explain how you computed "leftism" for all of your analyses, which is crucial for understanding what is happening. One problem is that party ID and politics are redundant and it's unclear how you obtained the 0-1 dichotomized response for the result reported in your figures and tables. A second issue is that party ID has 6 categories and only the first two ones relate two left/right, and politics variable would not include "centrist" responses which means here that you must have less than N=2,380 and N=264 in your final sample. In this case, you would need to display the sample sizes for each analyses (e.g., factor analysis, each figures and each tables). Knowing how leftism is computed would also tell me which variables were used. Because while you collected in both rounds party ID, politics, LGBT, LBM, feminism, I feel like you didn't use all variables because, to my surprise, your leftism is a binary variable.

9. In your limitation section, you indeed recognized that religiosity, birth order effects are potential confounders (see e.g., Woodley et al. I mentioned before), but you didn't explain why, so this is confusing for the reader. I suggest you explain and add eventually some references to back up. You should also explain a major limitation of leftism variable is that typically the political views, measured as a 7-Likert scale is a widely used metric and its predictive power and consistency is well replicated in the psychological field. But we don't know what are the properties of this variable. Make sure the readers understand this point well because it's actually explained loosely.

You also did not explain why it is wise to dichotomize paternal age, because as I mentioned earlier, it is known to cause serious bias in estimation, but since you also reported the paternal age effect in its original metric in a second table, I'm fine. Just don't forget to report the plots from Figures 2-5 using the original metric of paternal age as well.

10. Figures 6-7 are actually tables, not figures. Also in "Figure" 7 you did not capitalize the words but you did so for Figure 6 and previous Figures as well. Also, about those tables, I realize now that I have missed an important element in my earlier response. Your table actually shows the Dep. Variable, and it's called "factor_1bin". You never explained what this is about. Factor binary? How was it created? If it's a factor score, how can it be binary? Yet your reply indicate your leftism variable is binary. It's far from clear. But my guess is that you did not use the observed variable of either party ID or politics, because your sample is exactly totalling 2,380, i.e., you did not lose a single case. Did you really not use either of these variables?

The odds ratio in your Table 1 for parent age and respondent age should have been 1.27 and 0.98 respectively. With respect to Table 2, you should also report the odds ratio for both variables. The effect of paternal age (continuous) is not very strong, especially compared to age effect. The lower CI of paternal age is also very close to zero. Because of this, you want to mention in the discussion that the result is not robust because when using the original metric of the parent variable, the effect is not strong anymore, as opposed to its binary "transformation".

11. In your reply you said the extreme values in the right tail did not produce results contradicting your hypothesis, this is fine but you should report this robustness analysis in your main text.

12. My recommendation that you change the citation format was based on practicality or what I believe to be easier to follow for any reader, as well as easiness for editing a chunk of the discussion if references must be added in the middle without completely messing with the order. However, the journal does not seem to oblige at following a particular format. Furthermore, if you like this format better, you obviously don't have to follow my suggestion.

Meng Hu

Sat 12 Aug 2023 14:09

Reviewer | Admin

I was thinking about telling you about how I would handle the analysis only after knowing how you computed the leftism variable, because it's hard to recommend the best approach when some important aspects are unknown. But now I realize it's probably better that I let you know earlier what I believe to be the best approach. I would simply use the politics variable as a dependent variable in a multinomial regression. I assume you would have a decent number of cases who selected "centrist". In this case left-wing is your reference category, and you will get your point estimates of father's age and respondent's age for the category centrist and right-wing. It tells you if right wing or centrist people have older fathers with respect to left-wing people. A robust finding would show that both categories must have nontrivial effects when compared to the reference category (left-wing). I honestly don't see how LGBT or feminism or BLM have any added value here, unless I'm provided with an explanation. But if I have multiple, yet redundant, variables, I would rather use them as robustness analyses and see if the analysis replicates using different variables. In this case I would also employ multinomial but this time by using party ID with democrats, independent and republicans categories while discarding the other categories or collapsing them together with independent (if you do this though, make sure you don't collapse libertarians with right-wing; libertarians are not right-wing, despite the widespread myth). This is far from ideal because typically political views is a 7-likert scale, but it's probably better than using a factor score on (apparently?) dichotomous variables. Because with centrist added as a possible response category, you get a chunk of information that's left behind when using a binary variable (left-right).

Joseph Bronski

Fri 18 Aug 2023 19:26

Author | Admin

Replying to Sat 12 Aug 2023 12:01

I appreciate that you have now described the variables. Otherwise, it would have been very difficult if not impossible to really detect where are the strengths and weaknesses.

After reading this new version carefully, here are what i think need to be explained.

1. If possible, for replication purposes, you should probably display your code in a supplementary file, eventually (if that is possible) the data you have used (without ID and personal information of the participants of course). In this case, say in the main text the materials are provided in the supplementals. Finally, tell us which software you are using (it's apparently not R).

2. In the text below figure 1 you didn't explain the purpose of the factor analysis and you still did not explain whether it's a factor analysis or principal component analysis. And since these three variables in first round are dichotomous you should have used a tetrachoric approach for factor analysis, e.g., fa(r, cor="tet") if using R. Also, if, as your reply suggest, you wanted to measure leftism factor, the loadings do not tell you the whole story. You need to report the scale reliability, which basically tells you how well you have measured leftism factor. In this case, use Omega (not the Hierarchical, and not Alpha, just Omega). Flora (2020) explains the strength of Omega compared to Alpha, and also provides easy guidance as to how to apply it in R (see section categorical Omega since your variables are binary).

Flora, D. B. (2020). Your coefficient alpha is probably wrong, but which coefficient omega is right? A tutorial on using R to obtain better reliability estimates. Advances in Methods and Practices in Psychological Science, 3(4), 484-501.

However, what concerns me most is the fact that Figure 1 seems to imply your leftism factor is a binary variable (0-1). A classical factor analysis should produce continuous factor score distribution with mean zero, so I would like to know how you got a purely binary distribution of such factor.

3. About Figures 2-5 I highly recommend you display also the plots using the original metric of father age. As explained earlier, unless you have a strong theoretical reason to dichotomize the variable at exactly 35, don't do it. Dichotomizing a variable is known to cause bias, as explained earlier. It's not a recommended approach. I highly recommend you again to read MacCallum's article.

MacCallum, R. C., Zhang, S., Preacher, K. J., & Rucker, D. D. (2002). On the practice of dichotomization of quantitative variables. Psychological methods, 7(1), 19.

4. "Probability" in Figures 2-5 has been computed by taking the mean, so make sure you explain it in the beginning of Result section. Because otherwise it's not clear. Often, when I see researchers displaying their probability plot, it's actually the predicted probabilities from, say, a logistic regression.

5. Since you're keeping your graphs as well now, make sure you write in the text "Figure 2 shows..." under the respective graph so it's easier to follow. And make sure you correct this mistake: "Next, we show the leftism probabilities from Figure 3 alongside the leftism probabilities of the fathers. " because it's actually Figure 4.

6. Figure 4 apparently was produced by using both the first "baseline" data and its follow up. But since the follow up sample was much, much smaller, there is a strong attrition, which is another limitation. Make sure you discuss that in the appropriate section.

7. You need to present each of your regression tables appropriately by "calling" them, such as "in Table 1 we present results from...".

8. You did not seem to have used the variable party ID in your analyses. Or perhaps it is not made clear again. Sure, you have described each variable, but you did not explain how you computed "leftism" for all of your analyses, which is crucial for understanding what is happening. One problem is that party ID and politics are redundant and it's unclear how you obtained the 0-1 dichotomized response for the result reported in your figures and tables. A second issue is that party ID has 6 categories and only the first two ones relate two left/right, and politics variable would not include "centrist" responses which means here that you must have less than N=2,380 and N=264 in your final sample. In this case, you would need to display the sample sizes for each analyses (e.g., factor analysis, each figures and each tables). Knowing how leftism is computed would also tell me which variables were used. Because while you collected in both rounds party ID, politics, LGBT, LBM, feminism, I feel like you didn't use all variables because, to my surprise, your leftism is a binary variable.

9. In your limitation section, you indeed recognized that religiosity, birth order effects are potential confounders (see e.g., Woodley et al. I mentioned before), but you didn't explain why, so this is confusing for the reader. I suggest you explain and add eventually some references to back up. You should also explain a major limitation of leftism variable is that typically the political views, measured as a 7-Likert scale is a widely used metric and its predictive power and consistency is well replicated in the psychological field. But we don't know what are the properties of this variable. Make sure the readers understand this point well because it's actually explained loosely.

You also did not explain why it is wise to dichotomize paternal age, because as I mentioned earlier, it is known to cause serious bias in estimation, but since you also reported the paternal age effect in its original metric in a second table, I'm fine. Just don't forget to report the plots from Figures 2-5 using the original metric of paternal age as well.

10. Figures 6-7 are actually tables, not figures. Also in "Figure" 7 you did not capitalize the words but you did so for Figure 6 and previous Figures as well. Also, about those tables, I realize now that I have missed an important element in my earlier response. Your table actually shows the Dep. Variable, and it's called "factor_1bin". You never explained what this is about. Factor binary? How was it created? If it's a factor score, how can it be binary? Yet your reply indicate your leftism variable is binary. It's far from clear. But my guess is that you did not use the observed variable of either party ID or politics, because your sample is exactly totalling 2,380, i.e., you did not lose a single case. Did you really not use either of these variables?

The odds ratio in your Table 1 for parent age and respondent age should have been 1.27 and 0.98 respectively. With respect to Table 2, you should also report the odds ratio for both variables. The effect of paternal age (continuous) is not very strong, especially compared to age effect. The lower CI of paternal age is also very close to zero. Because of this, you want to mention in the discussion that the result is not robust because when using the original metric of the parent variable, the effect is not strong anymore, as opposed to its binary "transformation".

11. In your reply you said the extreme values in the right tail did not produce results contradicting your hypothesis, this is fine but you should report this robustness analysis in your main text.

12. My recommendation that you change the citation format was based on practicality or what I believe to be easier to follow for any reader, as well as easiness for editing a chunk of the discussion if references must be added in the middle without completely messing with the order. However, the journal does not seem to oblige at following a particular format. Furthermore, if you like this format better, you obviously don't have to follow my suggestion.

1. okay I will do this I need to clean the code up though and remove the prolific IDs from the data.

2. I should not have used factor analysis, basically what I did with the 3 items is I coded someone as "left" if they answered all 3 yes and non-left otherwise. I used factor loadings to show that these items were highly intercorrelated and would in theory form a strong continuous factor. I have made this more clear in the methods section. I also computed omega and reported it in methods. It was 0.54.

3. I don't like it either but I did it for monetary purposes. I still think the results are legitimate, though, but ideally this would be redone with nothing binary, including paternal age and leftism, however that will cost a lot more money, potentially 10x as much. But I have shown what I can without dichotomizing the variable. The second round of data collection was dichotomized by desgin, I cannot undichotomize it at this stage.

4. Okay I added something explaining this.

5. Done

6. Figure 4 was not produced with a followup. Since I do not have access to the specific fathers of the first sample, I just asked fathers in general. I added something explaining this in the methods section in more detail.

7. Done

8. I didn't use party or left-wingness at all in the analysis. I only collected that data to serve as a sanity check for the feminsm/lgbt/BLM questions. When I saw that those correlated heavily with being Democrat or left-wing I stopped caring about those measures because they are worse than my 3 items and are vague. I have added in the methods the correlations between my coding, party, and wingness. They were near 0.70 indicating that my leftist coding is mostly valid.

9. I added this stuff to the limitations sections.

10. I changed the name in the table to be Leftism, it's the same thing I explained elsewhere.

11. Done

Forum Bot

Fri 18 Aug 2023 19:29

Bot

Author has updated the submission to version #3

Meng Hu

Sat 19 Aug 2023 07:51

Reviewer | Admin

Thank you for updating the article. In this comment, I will explain why this paper is acceptable for publication despite its problems.

I understand that it is unpractical to redo a survey, and I again did not ask you to do this. What I have said earlier is that dichotomization employed in the first round data should have been avoided and that Figures 2-5 should have employed the original variable (not transformed) of the father age for the first round data. Or at the very least, note in the method section that you are dichotomizing the continuous variable of father age by splitting the age at 35 yrs old.

I would also add an information on your factor analysis. Whether or not you have used tetrachoric/polychoric correlation matrix method (which is the only appropriate approach here), since your variables BLM/LGBT/feminism are dichotomous.

A few corrections may be needed here:

a) You should remove "write-in" from the list of answer in "what is your party identification?".

b) I would have written "and the political variable derived from".

Furthermore, there was a 0.69 correlation between this classification scheme and the variable derived from coding “Left-wing” as 1, “Centrist” as 0, and “Right-wing” as -1

c) Who is "this author"? Perhaps a more appropriate writing would be "the present author does not find...". And when writing "some claim" I find it necessary to cite some authors or references.

Regarding religiosity, some claim more or less complicated environmental effects of ideas on behavior. This author does not find this framework generally supported or valid

In any case, I believe this description "and therefore is not generally concerned with measuring religious participation as an important variable" is very surprising because as I cited earlier, Woodley et al. showed that reliogisity might be correlated with paternal age, even though their finding is not robust (null correlation in one sample, but strong correlation in another). Thus I would not discard religiosity too quickly as a potential confounder. It may or may not be a problem.

Another nitpick, I find "environmental effects of ideas on behavior" a bit difficult to understand. I had to read this sentence perhaps 4-5 times because it's described a bit loosely.

The following remarks can be taken at your own discretion, there are not mandatory, although I highly suggest these points below to be addressed:

1. With regard to your result section, I once again highly recommend you display the odds ratio for Table 2. It doesn't make sense to me why you reported these values for Table 1 but not 2. Also, your report of Table 2's result is a bit odd because the sentence sounds like father age is an important predictor while in reality the effect size is negligible and lower CI close to zero despite the very large sample size. A more accurate description would have been "father age has a positive (but very small) effect".

2. To better grasp the limitation of the analysis, a less informed reader would need to delve deeper in basic statistics (e.g., knowing why dichotomization is wrong, and what is the minimum value of omega that is deemed acceptable). Indeed I would have appreciated a reference from MacCallum (2002) or at least a warning that dichotomization is not valid, because the practice of transforming/dichotomizing a continuous variable is usually not justifiable, among other things. Of course I'm fine with "In the future, this study should be replicated ... with a continuous metric for leftism and paternal age for all results." but I feel like it doesn't display or describe the full extent of the issue.

3. I would also, as I suggested earlier, used the original variables of politics and party ID for robustness analysis. Having at least 3 categories they provide more information when used in a multinomial regression. Showing a correlation of 0.7 (a bit difficult to interpret since one var is dichotomous and the other is categorical) is simply not enough to ensure that these two observed variables, if used as dependent variable, would yield the same parameter estimates of the father's age as with your dichotomized variable obtained from LGBT/LBM/feminism. Regardless, I do not believe that your dichotomized, constructed, leftism is a more valid indicator than party ID, simply because I have never seen any other study measuring leftism using LGBT/LBM/feminism.

4. The abstract should better reflect the above issue I mentioned. You can, e.g., say that the problems with the variables in use require replication. This would encourage readers to read your discussion and limitation sections and not merely stop at the abstract (which most people do, I suspect...).

Conclusion: I believe there are enough information now for the informed reader to know the limitation of the statistical procedure (unfortunately, a less informed reader needs to delve deeper). Thus, despite its problems, the article in its current form might be acceptable for publication.

Despite my positive conclusion, I still believe the present analysis will likely not replicate, because of its statistical procedure (unusual variables, dichotomization) and the result presented in Table 2.

Final note: if you find the time to clean the data, make sure it's available at OP as a supplementary file.

Note 2: I just realized now that perhaps you might want to explain why you didn't use either party ID or political variable in your analysis. Any reader would be surprised that these variables weren't used at all. Because they seem to be the logical choice.

Note 3: "The distributions both have long right tails as expected. When these are removed, the results don’t significantly change. In fact, they slightly improved." This is described way too loosely. Personally I would be more explicit and describe which values are removed., e.g., Is it above 50, 55, 60?

Joseph Bronski

Sun 20 Aug 2023 18:38

Author | Admin

Replying to Sat 19 Aug 2023 07:51

Thank you for updating the article. In this comment, I will explain why this paper is acceptable for publication despite its problems.

I understand that it is unpractical to redo a survey, and I again did not ask you to do this. What I have said earlier is that dichotomization employed in the first round data should have been avoided and that Figures 2-5 should have employed the original variable (not transformed) of the father age for the first round data. Or at the very least, note in the method section that you are dichotomizing the continuous variable of father age by splitting the age at 35 yrs old.

I would also add an information on your factor analysis. Whether or not you have used tetrachoric/polychoric correlation matrix method (which is the only appropriate approach here), since your variables BLM/LGBT/feminism are dichotomous.

A few corrections may be needed here:

a) You should remove "write-in" from the list of answer in "what is your party identification?".

b) I would have written "and the political variable derived from".

Furthermore, there was a 0.69 correlation between this classification scheme and the variable derived from coding “Left-wing” as 1, “Centrist” as 0, and “Right-wing” as -1

c) Who is "this author"? Perhaps a more appropriate writing would be "the present author does not find...". And when writing "some claim" I find it necessary to cite some authors or references.

Regarding religiosity, some claim more or less complicated environmental effects of ideas on behavior. This author does not find this framework generally supported or valid

In any case, I believe this description "and therefore is not generally concerned with measuring religious participation as an important variable" is very surprising because as I cited earlier, Woodley et al. showed that reliogisity might be correlated with paternal age, even though their finding is not robust (null correlation in one sample, but strong correlation in another). Thus I would not discard religiosity too quickly as a potential confounder. It may or may not be a problem.

Another nitpick, I find "environmental effects of ideas on behavior" a bit difficult to understand. I had to read this sentence perhaps 4-5 times because it's described a bit loosely.

The following remarks can be taken at your own discretion, there are not mandatory, although I highly suggest these points below to be addressed:

1. With regard to your result section, I once again highly recommend you display the odds ratio for Table 2. It doesn't make sense to me why you reported these values for Table 1 but not 2. Also, your report of Table 2's result is a bit odd because the sentence sounds like father age is an important predictor while in reality the effect size is negligible and lower CI close to zero despite the very large sample size. A more accurate description would have been "father age has a positive (but very small) effect".

2. To better grasp the limitation of the analysis, a less informed reader would need to delve deeper in basic statistics (e.g., knowing why dichotomization is wrong, and what is the minimum value of omega that is deemed acceptable). Indeed I would have appreciated a reference from MacCallum (2002) or at least a warning that dichotomization is not valid, because the practice of transforming/dichotomizing a continuous variable is usually not justifiable, among other things. Of course I'm fine with "In the future, this study should be replicated ... with a continuous metric for leftism and paternal age for all results." but I feel like it doesn't display or describe the full extent of the issue.

3. I would also, as I suggested earlier, used the original variables of politics and party ID for robustness analysis. Having at least 3 categories they provide more information when used in a multinomial regression. Showing a correlation of 0.7 (a bit difficult to interpret since one var is dichotomous and the other is categorical) is simply not enough to ensure that these two observed variables, if used as dependent variable, would yield the same parameter estimates of the father's age as with your dichotomized variable obtained from LGBT/LBM/feminism. Regardless, I do not believe that your dichotomized, constructed, leftism is a more valid indicator than party ID, simply because I have never seen any other study measuring leftism using LGBT/LBM/feminism.

4. The abstract should better reflect the above issue I mentioned. You can, e.g., say that the problems with the variables in use require replication. This would encourage readers to read your discussion and limitation sections and not merely stop at the abstract (which most people do, I suspect...).

Conclusion: I believe there are enough information now for the informed reader to know the limitation of the statistical procedure (unfortunately, a less informed reader needs to delve deeper). Thus, despite its problems, the article in its current form might be acceptable for publication.

Despite my positive conclusion, I still believe the present analysis will likely not replicate, because of its statistical procedure (unusual variables, dichotomization) and the result presented in Table 2.

Final note: if you find the time to clean the data, make sure it's available at OP as a supplementary file.

Note 2: I just realized now that perhaps you might want to explain why you didn't use either party ID or political variable in your analysis. Any reader would be surprised that these variables weren't used at all. Because they seem to be the logical choice.

Note 3: "The distributions both have long right tails as expected. When these are removed, the results don’t significantly change. In fact, they slightly improved." This is described way too loosely. Personally I would be more explicit and describe which values are removed., e.g., Is it above 50, 55, 60?