Back to Post-publication discussions

No Fair Sex in Academia: Is Hiring to Editorial Boards Gender Biased?

Submission status
Reviewing

Submission Editor
Submission editor not assigned yet.

Authors
George Francis
Emil O. W. Kirkegaard

Title
No Fair Sex in Academia: Is Hiring to Editorial Boards Gender Biased?

Abstract

The editorial boards of academic journals overrepresent men, even above their proportion in university faculties. In this paper we test whether this gender disparity is caused by anti-female bias, supposing that anti-female discrimination means women must have a higher research output than men in order to overcome bias against them. We collect a dataset of the research output and gender of 4384 academics on the editorials boards of 120 journals within four social science subjects: Anthropology, Psychology, Political Science and Economics. Our findings are precisely the opposite of what would be expected from anti-female bias. Using a transformation of the H index as our indicator of research output, we find male research output to be 0.35 standard deviations (p<0.001) above female research output. However, the gap falls to 0.13 standard deviations (p<0.001) when years publishing is controlled for. Our results are replicated with alternative dependent variables and using robust regression.

 

We followed up our research with a survey of 231 academics, asking them questions on their attitudes towards discrimination in hiring to editorial boards. Although two-thirds of academics supported no bias, the remainder were far more likely to be biased against men than against women. For every 1 academic who supported discrimination in favour of men, 11 supported discrimination in favour of women. The survey results were consistent with the hypothesis that academics and journal editors are biased in favour of women.

 

Keywords
gender, sex, discrimination, academia

Pdf

Paper

Reviewers ( 0 / 0 / 2 )
Reviewer 1: Accept
Reviewer 2: Accept
Public Note
Supplementary files and code will be added at a later date.

Mon 12 Jul 2021 19:41

Bot

Authors have updated the submission to version #2

Bot

Authors have updated the submission to version #3

Reviewer

Hello. What follows is my review of this paper.

Brief Summary

This study is chiefly about sex differences in research output among academics who sit on elite editorial boards in four of the core fields of social science – in particular, anthropology, political science, psychology, and economics.

The authors collect a novel dataset of individual academics positioned on editorial boards from 30 top-ranked journals in each of these fields during the period between March and June of 2020. The dataset contains some basic demographic information (gender and total years spent publishing) as well as several standard measures of research performance (citations, H-Index, etc.).

The authors use this dataset to investigate whether, among academics sitting on editorial boards, there remain sex differences in research performance that favor men (as such differences typically do among academics overall). They argue that the existence (and/or magnitude) of sex differences in performance among academics on editorial boards can be taken as a crude test of the existence of anti-meritocratic bias against one sex or another, although they occasionally note that this test requires several critical assumptions ("other things being equal").

The authors find that, among the population they study, men achieve much higher research output than women. There is some speculative discussion about which controls are appropriate (e.g., years spent publishing) and whether age confounds may affect the interpretation of their results, but broadly the authors conclude that bias against women in editorial board selection likely does not exist, or even that existing bias acts to disfavor men.

Lastly, the authors conduct an online survey using Prolific of 425 individuals with PhDs that is finally restricted to 231 individuals who say they work in academia or are currently publishing scientific papers. The survey asks respondents several normative (and also a few empirical) questions about whether diversity by age or gender is important to have on academic editorial boards, whether there are age or sex differences in academic ability, and about their own evaluation of the current preferences of those who hire editorial board members. They most importantly find that survey respondents are far more willing to reveal a preference for discriminating against men than vice versa. They conclude that this is consistent with their empirical conclusions in the previous part of their paper.

Specific Points of Concern

Organization.

The paper is not very well organized. As just one example among many, why is Nielsen (2015) first discussed only at the end of the paper, and not in the introduction during the discussion of existing literature on bias in academic selection? There also seems to be more emphasis on the literature on sex differences in cognitive ability and academic/vocational interests than on the literature concerning bias in selection and hiring of academics. As this paper is primarily about the latter, the literature review should mostly concern previous research in this domain (i.e., the various methods that other researchers have used to test for such biases, what they have found, how the authors' "simple test" compares to these other methods and its advantages and disadvantages in comparison, etc.). It doesn't seem to me that the authors provide a full overview of this literature. It also seems that the authors' discussion of the literature on sex differences in intelligence is somewhat slanted. My understanding is that some studies do find that women have an advantage on verbal skills such as reading comprehension and verbal fluency, not withstanding Lynn's (2021) study of the Wechsler Intelligence Scale, but this is never mentioned. See, e.g., Lynn and Mikk (2009).

The discussion in the introduction of the political leanings of different fields of social science does not seem highly relevant to the main results of the paper. It is odd that Table 1 the first table presented in the paper. The authors admit they are basically unable to conduct formal tests concerning the issue at hand: "To properly test for any gender bias arising from political opinion between subjects we would need to include more subjects" (p. 12).

The discussion of how the authors' selection of fields might "bias" their results seems confused, and this point is related to the necessity of a clearer discussion of the caveats regarding what conclusions the authors' test allows them to draw. The questions of what scientific domains the authors have chosen to study relates more to what specific question they are trying to answer ("Does academia in general have an anti-female bias?' or "Do certain particular academic fields have an anti-female bias?") than to any potential bias in their findings or methodology.

In general, the introduction should focus more explicitly on what exactly the authors are contributing to the existing literature with their empirical investigation, and why the authors believe the empirical tests of their paper will allow them to draw the conclusions they do. The problems and caveats with their central test (which are vaguely gestured at) should be more directly discussed – more on this below.

Concerns about the dataset.

The summary statistics in Table 2 indicate that the author's dataset contains entries that are obviously erroneous. Who exactly has an H-Index of 356 in any of anthropology, political science, psychology, or economics? How is it possible that anyone has an H-Index Since 2016 of 2455? For each individual researcher, mustn't his H-Index Since 2016 always be strictly less than his H-Index? These data seem impossible. The authors may argue that they use robust regression to alleviate concerns about human error in data collection, and that the errors are not their fault, but Google Scholar's, but the presence of these figures makes the reader skeptical that the dataset provides an accurate representation of academics' research output.

I also wonder how it can be that the H-Index and the H-Index Since 2016 have a correlation of 0.37, but the H-Index and the Transformed H-Index Since 2016 have a correlation of 0.85? In general, I am surprised by the uniquely low correlations between all of the other variables and the H-Index Since 2016. Perhaps this is an artifact of the log transform and the nature of the H-Index measure, but my intuition goes the other way. What do the authors think?

The authors should provide a list of the 30 academic journals from each field used in the paper. It could go in the appendix. The reader cannot tell, for example, whether the Journal of Political Economy was in fact excluded. It should not have been, as it is an elite "Top 5" journal in economics.

Methodology.

A more natural way to check the results for robustness to errors and outliers (and to some degree to genuine sex differences in variance of ability that especially affect the right tail of output) would be to simply winsorize the dependent variables (at, say, the 98th percentile for all measures) and re-run the analysis on the winsorized dataset.

The use of robust regression is interesting, but, especially as it does not appear to affect the main conclusions, it could be relegated to the appendix. A more important issue is whether the authors are using heteroskedastic-robust standard errors (White 1980) in their main tables (Tables 6 and 7). This is important for accurate inference and to ensure that the authors' statistical tests are correctly-sized. Furthermore, the authors evidently are clustering their standard errors at the level of the individual researcher (which will tend – although not always – to produce the smaller standard errors), but this is not the most conservative approach to inference and ought to be explicitly justified if the authors intend it, as unobserved factors affecting research performance may well be correlated within disciplines or within editorial boards.

The discussion section about whether years spent publishing is an appropriate control (p. 17) is not very clear. It is hard to follow and largely speculative. The authors speculate about confoundedness with age, but as they have no variable for age in their dataset, they have no way to test any relevant model.

I may have misunderstood how the authors normalize their dependent variables, but if they normalize at the journal-level (and journals are nested within fields), why are there main effects for different fields in columns (9–12) in Tables 6 and 7? (This might be a misunderstanding on my part of what exactly the authors are doing.)
 
Caveats regarding the authors' "simple test."

The test that the authors propose for whether there is discrimination against women in editorial board selection is not as simple as they at times suggest. For one, it requires that editorial board selection is determined solely on the basis of whether a single latent "merit" variable surpasses a fixed threshold. For another, it requires that there are no mean or variance differences in the sex-specific distributions of this latent "merit" variable. If this second requirement fails, then even under the single threshold model, ultimate means of output of men and women who surpass the threshold will not necessarily be equal. One gets the sense that the authors are aware of all of this, but it is not discussed with as much emphasis or detail as it should be. For example, the authors admit there are "two competing explanations" for their findings, i.e., "that men are in general higher performing academics than women and that journals are biased in favor of women" (p. 20). This needs more detailed and explicit discussion, and should be discussed earlier in the paper at the point when the basic test is introduced.

Of course, the size of the effect of mean or variance differences in the latent trait(s) on the ultimate differences in mean performance above a single fixed threshold is an empirical question, and the effect may be small in magnitude. But the fact that the authors' test relies on a single threshold model of selection combined with equal means and variances of latent ability should be explicitly stated. (As such, it is evidently not a highly precise test of the existence of bias.) This issue seems especially relevant also given that the authors appear sympathetic to the "higher male variability" hypothesis in general. Given the authors' beliefs about this hypothesis, what effect sizes do they expect for differences above a fixed threshold if selection is meritocratic? How do their empirical findings line up with these expectations? None of these questions is answered explicitly. A second, broader problem – related to reliance on a threshold model of selection – is that optimal editorial board selection may be multi-dimensional, with research performance being just one dimension of qualification. Indeed, it is not entirely clear that filling editorial boards with the "best" researchers "from the top down" would be ideal, as these "service jobs" in academia typically take time away from one's own research.

Survey results.
 
I don't have any substantial criticisms to make of the authors' online survey. Perhaps the most relevant results would be better suited to a visual representation rather than the representation in Table 9, so that readers could see the entire distribution of responses. 

Conclusion

I think the issues discussed above are weighty enough that the paper needs substantial revision and should not be published in its current state. However, the novel collection of data and the main results presented are of great scientific interest, so with more work I think it could ultimately make for a nice paper.

Reviewer

Review of No Fair Sex in Academia: Is Hiring to Editorial Boards Gender Biased?

General

This is an important study, using a large and suitable sample. Since the findings are clear, the title should not be expressed as a question but as a statement.

Because it is an important study about a controversial issue, it would have much greater impact if it were published in a mainstream journal. Given the large sample that should also be possible, if tedious.

It would simplify review if lines were numbered.

The abstract is, in my opinion, too long and detailed. Inference stats are inappropriate (if you want, one could simply say that this or that “was statistically significant”). The part about the survey comes as a separate thing, but should be integrated with the rest in one single para.

There is a lot of talk about errors in the data, but if that is really a concern you can find out, and if it is, then there are more straightforward ways to fix that than to perform series of different regressions: Clean the data (quality checking) by sorting, plotting, and what you can to reveal outliers, and correct them or winsorize.

I perceive the text as lacking a definite particle in many places. I don’t know how correct this is, for example I would like “suggesting editorial board’s apparent gender disparities could be meritocratic” to be “suggesting that editorial board’s apparent gender disparities could be meritocratic”.

In general, the author of this text is not a friend of commas. Much more commas are needed, the alternative being to chop many sentences up into shorter ones. Please have other people evaluate these and other idiosyncrasies with the language, regardless of whether the authors are native English speakers.

The writing is a bit clunky and wordy, and also not as clear as possible. I think the text can be shortened by some 10-15% and become much clearer.

Skew and kurtosis should also be reported amongst the descriptive stats, particularly important to assess the transformations.

The phrasing and language is in many places crude and can be perceived as a bit blunt and harsh to the sensitivities of contemporary academics and women in particular. While one may decry this overly sensitivity, the purpose is, after all, communication. So if it leads to substantially more effective and wider communication and higher impact, I think it’s a good tactic to revise the language, which can be done without sacrificing anything very important, like clarity or space.

Your use of categories (male vs. female) indicates it is biological sex you are considering, so you should use the terms “sex” rather than ’gender’. The point of the matter is that there are two different concepts for being male and female, sex for biological (gonadal) sex and gender for “social sex”. Gender is supposed to reflect “social sex”, that is, roughly “behaviours, preferences, and experiences that are typical of one sex, but which may vary along a continuum”. If we are to follow the APA on this matter, they state that (https://apastyle.apa.org/style-grammar-guidelines/bias-free-language/gender): Gender refers to the attitudes, feelings, and behaviors that a given culture associates with a person's biological sex (APA, 2012). Gender is a social construct and a social identity. Use the term “gender” when referring to people as social groups. For example, when reporting the genders of participants in the Method section, write something like this: “Approximately 60% of participants identified as cisgender women, 35% as cisgender men, 3% as transgender women, 1% as transgender men, and 1% as nonbinary.” Sex refers to biological sex assignment; use the term “sex” when the biological distinction of sex assignment (e.g., sex assigned at birth) is predominant.”

There are however some established terms that contain ”gender”, such as ”gender quota” when one refers to the legislated form of formal discrimination against men. Be careful to select the most appropriate.

Use ”disciplines” instead of ”subjects”, which is confusing.

All stat terms (p, F, N, R, r, etc.) should be in italics.

Detailed comments are found in a commented PDF /files/attachments/Emund_the_Old/2021/08/17/2_no_fair_sex_in_academia_is_hiring_to_editorial_boards_comm2.pdf

I note that there was a new version uploaded today, which I have naturally not looked at. Please consider all comments I have provided for possible revisions of the latest version..

 

Bot

Authors have updated the submission to version #4

Author

Thank you very much for the helpful comments on the first submission. Just a heads up to say the paper has been rewritten after consideration of your comments and is awating review. 

 

Kind regards

George Francis

Reviewer

Review of No Fair Sex in Academia: Is Hiring to Editorial Boards Gender Biased?

General

As stated in the previous review, I think that this is an important study. It is very ambitious in gathering a lot of data on editorial board members specifically, and also conducting a survey. It’s good that the ms. has been shortened.

I note that most of my suggestions and comments from my previous review have not been implemented. I must conclude that the authors have either not read them (some were in the PDF file) or have ignored them. I have now looked more closely to understand why, but cannot see any reason. It would be useful if the authors provided a review response, where reviewers can see clearly which suggestions have been implemented or not, and the reasons why.

I begin with the comments that still apply from the previous review:

Since the findings are clear, the title should not be expressed as a question but as a statement. Not revised.

It would simplify review if lines were numbered. Not revised.

The abstract is, in my opinion, too long and detailed. Inference stats are inappropriate (if you want, one could simply say that this or that “was statistically significant”). Not revised.

I am totally confused about the regression reported in Table 6, see detailed comments in PDF. Not revised.

In my previous review, I wrote that the authors are not friends of commas. The comma situation is much improved, but now there’s a lack of definite particles instead. I have inserted a number of “that” in the PDF. Like an example from the previous review:suggesting editorial board’s apparent gender disparities could be meritocratic” should be “suggesting that editorial board’s apparent gender disparities could be meritocratic”. Not revised.

I still think it needs proofreading, preferably by a native English speaker.

Here are some new comments:

Under-representation and over-representation are frequently used words. This is a tricky concept, because it is entirely defined in relation to a reference - representation – which is itself vague and unclear. Related to what? To the readership of a certain journal? To the general population? To academics in general (note differences between caring science and computer science)? Better to say "man are in vast majority" or something like that. If a term like this is needed, I think you have to define it clearly the first time it’s used, e.g. “Representation is here defined as the proportion of women in a group of people being the same as in the population to which that group belongs. In the present study, under-representation therefore means that the proportion of women in the group being investigated – typically an editorial board – is smaller than the proportion of women in the discipline that the editorial board serves”.

The central research question comes in the 2nd para: “These disparities pose a key question: to what extent do sex biases or sex differences explain different outcomes?” One would expect the following to focus on the question to be investigated, i.e. bias, but the whole next para is about cognitive differences. I think that is inconsistent with the focus of the paper, in particular as there are other possible group differences, such as personality, interests, and preferences, which are not mentioned at all. I recommend a short para that briefly reviews these different alternative causes, referring only to review and overview articles, and then focus on discrimination and possible ways that it might influence.

Use 1000 dividers consistently, e.g. 5,625

I have some concerns with the survey items, which are not ideally phrased in order to get at what you want. This type of ambiguity is unfortunately common. How do you think respondents interpret "important"? Is "important" equal to avoid/decrease or prefer/increase? So, what would a person who thinks that the sex composition should be all female or male rate? That it's important or not important? Or those who think that meritocracy should be the only factor? They may still think that "age diversity" is "important" only that it should fall out according to merits, or that there are highly merited people of all ages, and it's only because of age discrimination that this does not manifest. Such a person may answer "not important" in order not to suggest that "under-represented" age ranges be favoured - but still think such diversity is desriable! When we come to Q3 and higher I become even more confused. What does “Should journal editors have an age preference in hiring to editorial boards? (Pick 5 for no age preference)” mean? Apparently, 5 is no preference and 10 is “a preference”? But which preference? For higher, lower, equal, or perhaps varied ages? So what would then ratings from 0 to 4.99 stand for? It’s simply not defined, which might be why the mean is just above 5. But this construction of items causes confusion and ambiguity for both respondents and for interpreters of the results. This should be dealt with in the result section and discussed as a weakness in the discussion. So, I am really unsure about how to interpret the mean of 3.9 for “Q8. Do you think journal editors have a sex preference in hiring to editorial boards? (Pick 5 for no sex preference)”. Journal editors have a *negative* preference?! What’s that even supposed to mean? It would have been clear if the item was “editors have a preference for hiring females to editorial boards? (Pick 5 for no sex preference, and smaller values for a male preference)”. These issues seem to be reflected in the vastly different distributions across items in Figure 2.

Figure 2 has “gender” in its panel captions.

All stat terms (p, F, N, R, r, etc.) should be in italics.

Detailed comments are found in a commented PDF.

 

 

Reviewer
Author

Thank you to the reviewers for their helpful comments regarding our manuscript. We have tried to incorporate all suggestions where appropriate. We use this response to briefly outline our key changes in response to reviewer concerns.

 

Response to Reviewer 2:

 

Organisation

With regards to organisation we have edited the piece to avoid repetition and have our ideas flow more coherently throughout. We have clarified that our first method only tests gender bias within the social sciences.

We’ve improved the dataset by removing observations with inconsistent measurements (ie. Metrics that are higher post 2016 than their all time value) and removed outliers. Being wary of any potential bias arising from removing observations we rerun our analysis without these cleaning methods in the appendix.

A full list of academic journals used in the study have been added to the appendix.

 

Methodology

Again with regards to outliers we try rerun our results with and without outlier values and our findings remain consistent.

An issue that reviewer 1 mentioned in the first review was regarding use of robust standard errors and clustered standard errors. For our main results in table 6, we have recalculated p values robust and clustered standard errors. The p values of all regression coefficients were in the same thresholds for statistical significance as before. As such we just report the normal standard errors, but our use of these other standard errors is mentioned in the results and left in the supplementary files for interested readers.

 

Caveats regarding the authors' "simple test."

 

Moreover we have tried to clarify that higher male publication metrics can indicate a range of things including, but not limited to, gender bias. In the introduction we clarify the inferences that can be drawn from our test with the following paragraph

It must be noted that a sex difference in the academic output of editorial board members can only be an indicator, not proof of sex bias. As mentioned, men seem to have a higher variance and average intelligence. This would cause men, on editorial boards, to have a higher academic output even if there was no bias. Thus if women have a higher academic output, despite their lower variance in IQ, we can be confident that there is anti-female bias. We can also say that the larger the sex difference in favour of men, the lower the likelihood of anti-female bias and increases the likelihood of anti-male bias. So if men have a higher academic output than women we can be confident that there is no extreme anti-female bias.

We reiterate the importance of careful interpretation of higher male output in the discussion:

Overall we can be confident that male research output is higher than women’s on editorial boards. This is unlikely under the hypothesis of anti-female bias which predicts that women have a higher research output. The regression results update our prior beliefs away from anti-female discrimination and towards the possibilities of anti-male discrimination and men being better at academic research. To further explore the hypothesis of anti-male bias, we surveyed academics on their attitudes to gender bias.

 

Response to Reviewer 1:

Since the findings are clear, the title should not be expressed as a question but as a statement. Not revised.

Whilst I’m personally quite confident that hiring to editorial boards is gender biased, I do not think the paper proves this but rather is strongly suggestive. This is because of two key limitations to the two methods employed.

The male advantage in publication metrics is a function of both the greater male variability and possible anti-male bias. Thus the male advantage in publication metrics is suggestive, not conclusive of anti-male bias (Lines 129-136). In the discussion section (lines 703-726), we note this evidence is unlikely to be consistent with the hypothesis of discrimination against women, not that is proof of bias against men.

Our surveyed academics on balance express a desire to discriminate against male academics. However, these opinions may not be truly representative of the sorts of academics who have the power to hire others to editorial boards (Lines 737-752).

This also explains why we are more cautious than the reviewer has asked for eg. First paragraph after table 5, we state that our results are strongly suggestive rather than definitive which the reviewer seems to be asking for.

 

It would simplify review if lines were numbered. Not revised.

Now rectified.

 

The abstract is, in my opinion, too long and detailed. Inference stats are inappropriate (if you want, one could simply say that this or that “was statistically significant”). Not revised.

 

Abstract is shortened but inference stats are kept.

 

Unfortunately (or perhaps fortunately?), I suspect most readers of academic papers only look at the abstract and then a few may glimpse across the rest of the paper. We prefer to keep the details of the methodology and inference statistics in the abstract so our findings and their power are immediately clear, even for busy readers. Many papers in OpenPsych and other journals use inference stats and some are of a similar length.

 

I am totally confused about the regression reported in Table 6, see detailed comments in PDF. Not revised.

A row name, ‘model number’, is now employed to make clear that the numbers above the columns in the regressions represents the models’ numbers.

The dummy variable ‘Sex (female = 1)’ is now ‘Sex (female = 1 male = 0)’, since the reviewer was concerned that the dummy variable coding was ambiguous.

Tables have been changed to follow the APA style more closely, with captions in hopefully appropriate style.

Columns and rows swapped in table 5. Stated in the graph that positive values indicate male advantages. Error in degrees of freedom for Economics spotted and corrected.

To remove ambiguity we have edited the paragraph before Table 6 to make clear we are running regression models within each discipline and also models for a pooled sample. We also make clearer that in some of the pooled models we control for the discipline effects with dummy variables and interaction terms.

 

In my previous review, I wrote that the authors are not friends of commas. The comma situation is much improved, but now there’s a lack of definite particles instead. I have inserted a number of “that” in the PDF. Like an example from the previous review: “suggesting editorial board’s apparent gender disparities could be meritocratic” should be “suggesting that editorial board’s apparent gender disparities could be meritocratic”. Not revised.

Thank you for pointing out areas which needed commas. In the updated article we have tried to use the definite particle whenever that is appropriate.

 

Under-representation and over-representation are frequently used words. This is a tricky concept, because it is entirely defined in relation to a reference - representation – which is itself vague and unclear. Related to what? To the readership of a certain journal? To the general population? To academics in general (note differences between caring science and computer science)? Better to say "man are in vast majority" or something like that. If a term like this is needed, I think you have to define it clearly the first time it’s used, e.g. “Representation is here defined as the proportion of women in a group of people being the same as in the population to which that group belongs. In the present study, under-representation therefore means that the proportion of women in the group being investigated – typically an editorial board – is smaller than the proportion of women in the discipline that the editorial board serves”.

 

In the introduction previous papers we discuss have used different reference groups and these are explicitly stated. To clarify what we mean by overrepresent and underrepresent in our results we now define the terms usage in the first paragraph of the results section

We use the terms overrepresent and underrepresent to denote whether the fraction of women on editorial boards in a discipline, is greater or less than female representation in the relevant population of academics who could be placed on editorial boards (ie. active authors and university faculty members).”

 

 

 

 

 

The central research question comes in the 2nd para: “These disparities pose a key question: to what extent do sex biases or sex differences explain different outcomes?” One would expect the following to focus on the question to be investigated, i.e. bias, but the whole next para is about cognitive differences. I think that is inconsistent with the focus of the paper, in particular as there are other possible group differences, such as personality, interests, and preferences, which are not mentioned at all. I recommend a short para that briefly reviews these different alternative causes, referring only to review and overview articles, and then focus on discrimination and possible ways that it might influence.

In the second paragraph we mention that there are differences in careerism between men and women. This comes before any discussion of cognitive ability. Now more discussion is devoted to the topic of careerism.

 

Whilst we mention other personality and interest differences later in the paper regarding tangential topics, it is unclear why differences other than careerism would cause female academics to be more likely to join editorial boards. Perhaps I could imagine that women as more empathising might be more likely to get involved in the communal work of an editorial board, but I think that’s slightly too speculative and a distraction from our strong points. As such, I consider the key alternative hypothesis to anti-female discrimination to be male superiority in academic ability. The focus on the male advantage in intelligence and greater male variance reflects the authors’ priors on female success in academia.

 

I have some concerns with the survey items, which are not ideally phrased in order to get at what you want. This type of ambiguity is unfortunately common. How do you think respondents interpret "important"? Is "important" equal to avoid/decrease or prefer/increase? So, what would a person who thinks that the sex composition should be all female or male rate? That it's important or not important? Or those who think that meritocracy should be the only factor? They may still think that "age diversity" is "important" only that it should fall out according to merits, or that there are highly merited people of all ages, and it's only because of age discrimination that this does not manifest. Such a person may answer "not important" in order not to suggest that "under-represented" age ranges be favoured - but still think such diversity is desriable! When we come to Q3 and higher I become even more confused. What does “Should journal editors have an age preference in hiring to editorial boards? (Pick 5 for no age preference)” mean? Apparently, 5 is no preference and 10 is “a preference”? But which preference? For higher, lower, equal, or perhaps varied ages? So what would then ratings from 0 to 4.99 stand for? It’s simply not defined, which might be why the mean is just above 5. But this construction of items causes confusion and ambiguity for both respondents and for interpreters of the results. This should be dealt with in the result section and discussed as a weakness in the discussion. So, I am really unsure about how to interpret the mean of 3.9 for “Q8. Do you think journal editors have a sex preference in hiring to editorial boards? (Pick 5 for no sex preference)”. Journal editors have a *negative* preference?! What’s that even supposed to mean? It would have been clear if the item was “editors have a preference for hiring females to editorial boards? (Pick 5 for no sex preference, and smaller values for a male preference)”. These issues seem to be reflected in the vastly different distributions across items in Figure 2.

 

I agree we did not fully consider the possibility that concern for diversity may be compatible with a dislike of affirmative action. As such, the paper now gives that as a possible explanation for the lack of correlation between concern for age diversity and support for affirmative action. The questions had a prompt saying 5 for no preference and each end of the scale was labelled. This is now clarified in the text and images of the survey are now in the supplementary files.

Use 1000 dividers consistently, e.g. 5,625

Noted. I’ve tried to add commas in to all 4+ digit numbers

 

 

Responses to notes in pdf:

 

 

  • For Google Scholar pages that seemed erroneous the review suggested using a different source. This could introduce new biases coming from the different source and the different time period. By recording their publication statistics now, they would have had more time to increase them relative to other editorial board members studied. We found using the potentially erroneous data did not change our results so we do not think it likely that using another source for these entries would change our results or conclusions.

 

  • The reviewer suggests certain points in our discussion may be tangential or a more appropriate for an opinion piece. These are namely our discussion of 1. how our findings relate to the pervasive misandry hypothesis of Clark and Winegard, which causes academics to believe in widespread misogyny because they are inculcated with misandrist ideas. 2. We discuss the policy implications for academic journals – they might improve meritocracy and their own journal’s success by hiring men other boards have ignored for political reasons.

 

We think using the discussion to briefly mention policy implications and implications to other scientific hypotheses are appropriate and within the scope of an academic paper. Thus, we would like to keep this section. Nonetheless, the language now has been made less casual and more formal so as to stay appropriate.

 

  • The reviewer suggests any human error in data gathering would not bias our results and is self-evident and therefore not worth mentioning in the discussion. We agree it is a small issue, but feel it is better to be on the front foot with regards to any potential criticism

 

Kind regards

George Francis

Bot

Authors have updated the submission to version #5

Bot

Authors have updated the submission to version #6

Bot

Authors have updated the submission to version #7

Reviewer

Review of No Fair Sex in Academia: Is Hiring to Editorial Boards Gender Biased? v. 7

General

The authors have now responded to my comments, which is good.

I note that most of my suggestions and comments from my previous review have not been implemented. I must conclude that the authors have either not read them (some were in the PDF file) or have ignored them. I have now looked more closely to understand why, but cannot see any reason. It would be useful if the authors provided a review response, where reviewers can see clearly which suggestions have been implemented or not, and the reasons why.

I begin with the comments that still apply from the previous review:

Since the findings are clear, the title should not be expressed as a question but as a statement. Not revised.

The tables are ugly because they have both horiz and vertical lines, and do not follow APA format. I think the APA format is much better than the present style.

I still think it needs proofreading, preferably by a native English speaker. The language is unidiomatic, and to me, at least, quite annoying to read. This is gleaned from the many suggested changes below.

Here are some new comments:

The central research question comes in the 2nd para: “These disparities pose a key question: to what extent do sex biases or sex differences explain different outcomes?” One would expect the following to focus on the question to be investigated, i.e. bias, but the whole next para is about cognitive differences. I think that is inconsistent with the focus of the paper, in particular as there are other possible group differences, such as personality, interests, and preferences, which are not mentioned at all. I recommend a short para that briefly reviews these different alternative causes, referring only to review and overview articles, and then focus on discrimination and possible ways that it might influence.

46 maybe-> may be

 

46 women have a greater interested in family-> women have a greater interest in family

108 “have been urging their editors to improve the sex ratio in their boards” Improve is value-laden, which should be avoided in scientific writings unless specifically motivated. It is also unclear what it means, because depending on the ideals of the person and the current ratio in the particular context that that person is referring to, she feel that “improving the sex ratio” is to increase males, to increase females, or to make the proportion equal.

125-127 This sentence is a bit unclear

130 I suggest “As mentioned, the variance in intelligence is higher amongst males, and their average also seems to be somewhat higher.”

You should use the term ‘sex’ rather than ’gender’, because your use of categories (male vs. female) indicates it is biological sex you are considering. In fact, you inconsistently use sex 20 times and gender 20 times in the ms. The point of the matter is that there are two different concepts for being male and female, sex for biological (gonadal) sex and gender for “social sex”. Gender is supposed to reflect “social sex”, that is, roughly “behaviours, preferences, and experiences that are typical of one sex, but which may vary along a continuum”. If we are to follow the APA on this matter, they state that (https://apastyle.apa.org/style-grammar-guidelines/bias-free-language/gender): Gender refers to the attitudes, feelings, and behaviors that a given culture associates with a person's biological sex (APA, 2012). Gender is a social construct and a social identity. Use the term “gender” when referring to people as social groups. For example, when reporting the genders of participants in the Method section, write something like this: “Approximately 60% of participants identified as cisgender women, 35% as cisgender men, 3% as transgender women, 1% as transgender men, and 1% as nonbinary.” Sex refers to biological sex assignment; use the term “sex” when the biological distinction of sex assignment (e.g., sex assigned at birth) is predominant.”

However, the connection between this and the composition of editorial boards is complex, and would require more elaboration. More than is worth to spend space on, given that is rather peripheral, as the question you examine is the relation between merits and editorship. If we ALSO weigh in that the average difference is heavily contested (although probably correct) it seems counterproductive to get another 50% of academics presumably reading this to just ignore the whole study. In my estimation, the IQ thing doesn’t add anything anyway, because (1) again, this study is not about that, it doesn’t measure IQ, it’s all merits and editorship, and (2) you cannot (and you don’t) estimate any effects of IQ, so no comparisons of different possible effects can be made anyway.

138 Which test?

 146 Doesn’t it also indicate that women are being favoured when Psychiatry journal boards hire?

157 The authors found THAT women had fewer publications

158 No, not assistant professors, but hired or promoted to a position as professor, the equivalent status to tenured (or full) professor in the United States.

159 Strumia found THAT

162 there might be A sex bias

165 You should not mix “gender bias” and “sex bias” unless (1) you intend a different meaning with these two terms AND (2) you clearly define that distinction (before used the first time).

Again, the only thing you or any other previous studies KNOW about the study individuals is their SEX, they have typically NOT measured or asked them about their “gender identity” or “social sex” and it is hence quite incorrect to use “gender”. Please follow the APA guidelines:

Gender refers to the attitudes, feelings, and behaviors that a given culture associates with a person's biological sex (APA, 2012). Gender is a social construct and a social identity. Use the term “gender” when referring to people as social groups. For example, when reporting the genders of participants in the Method section, write something like this: “Approximately 60% of participants identified as cisgender women, 35% as cisgender men, 3% as transgender women, 1% as transgender men, and 1% as nonbinary.” Sex refers to biological sex assignment; use the term “sex” when the biological distinction of sex assignment (e.g., sex assigned at birth) is predominant.” (https://apastyle.apa.org/style-grammar-guidelines/bias-free-language/gender).”

178 “unobserved ability, making a preference for one sex over another possibly meritocratic.“I

Please develop this argument (if there is one)

181 “Firstly, women make up a higher proportion of these scholars”

Compared to what?

192 ” We thought it was also important to choose disciplines within a large range of political” -> within a wide range of political

195 “left-wingers” Perhaps an unhappy term to use in a scholarly paper?

196 Change to “be prone to bias towards groups with low status, including women”

198 Change to “Whilst a wide range of disciplines”

226 Change to “suggesting that editorial boards’ apparent sex disparities could be close to the meritocratic ideal.”

What is “sex disparities”? What is “the meritocratic ideal”? Define! Define!

236 “From this ranking, we then took the top 30 journals from each discipline”

Makes sense!

“our results reflect whether there is bias in the elite of each discipline studied.”

Does not make sense! What do you mean?

240 “We disagreed with the discipline label of some of the journals on Scimagojr....”

What’s the point of this? Probably better to just stick with the labeling that Scimagojr uses: Then it’s all on them! It probably makes no essential difference anyway!

245 Change to “Table 9”

281 Change to “were women, meaning that women were slightly less”

284 Change to “Table 9”

“Sometimes Google Scholar pages for individual academics contained errors. Some papers had incorrect dates and some were attributed to the wrong author. When a Google Scholar Page included five or more articles that the author had not written, we excluded this author.

299 “To deal with these extreme values we applied Tukey’s Fences...”

What is that?

327 “This suggests that any of the dependent variables...”

354 “underrepresent to denote whether the fraction of women on editorial boards in a discipline is greater or less than female representation in the relevant population of academics...”

358 Change to “For comparison, we found a range of datasets representing the sex proportion amongst academics in the disciplines studied here.”

364 Change to “have sex proportions specifically for Anthropology or Political Science, so we use...”

388 Change to “within the Social Sciences”

416 Change to “in Figure 1”

418 Change to “Table 5”

460 “Our results are the opposite of what would be expected if women were being discriminated against, strongly suggesting that women are not discriminated against in hiring to editorial boards.”

Strange sentence?

466 Change to “Women are under-represented in psychology editorial boards, and yet the women who do manage to get on the editorial boards dramatically underperform relative to the men that are on the board by 0.44 standard deviations.” And “In other words, women are underrepresented on psychology editorial boards relative to their proportion presence on faculty, but are are still overrepresented relative to their merit”

Table 6. Vertical indices along the left side should be numbered 1-12, otherwise it’s difficult to identify the horizontal indicies. The F values seem much too high.

527 I would suggest not using terms like “academia is a ‘leaky pipeline’”, because that is a charged and political concept, created to summon a particular impression about the causes of fewer women in certain domains.

623 Change to “This means that for every academic preferring men there were 11 who preferred women.

626 Change to “This suggests that there is a large minority of academics that would act to discriminate against men to serve on editorial boards.” (They usually are not hired or payed, but serve free of charge)

629 Change to “Only 3% of respondents believed that journal editors should be biased in favour of men.”

650 Numbers < 12 should be written with letters, unless they constitute a precise value with a specified unit, such as “2.1 centimetres”, “4 Ångström”. Change to “Nearly three academics preferred young academics for every one that supported older academics”

659 Change to “These differed significantly from 5 (p < 0.001), suggesting that academics thought that journals were biased in favour of men and older scholars.”

660 Change to “So whilst academics are biased in favour of women and young people, they simultaneously believe other academics have the opposite bias.”

662 Change to “We speculate in the discussion that academics have such a strong anti-male bias that it deludes them into thinking that academia at large has the opposite bias.” OR “We speculate in the discussion that academics have a strong anti-male bias, and that this contrasts with reality in such a way that it appears to them that academia at large has the opposite bias.”

667 “diversity in questions 2 and 1 respectively.” What does this mean? Why in reverse order?

670 Change to ”With responses overwhelmingly closer to 10 than 0, it seems that academics place much value on diversity.” Was it specified which kind of diversity you referred to? E.g., in political preferences, in theoretical/scientific perspectives, in family values, in wealth, etc?

676 “had a greater aptitude for research, despite the fact men tend to receive more citations”. Is the point of mentioning this here that this fact was part of the question, e.g. “Do you believe that, considering the fact that male academics tend to receive more citations, men have a greater aptitude for research?”

680 Change to “suggests academics believe that young scholars are just as good as older scholars.”

682 Change to “In Table 8 we present”

688 “This could be because some scholars that believe in age diversity think this requires more older scholars to be represented on journal boards”.

This is a very important point, and it needs further attention in klight of my previous comments regarding the formulation of the items, i.e.:

  I have some concerns with the survey items, which are not ideally phrased in order to get at what you want. This type of ambiguity is unfortunately common. How do you think respondents interpret "important"? Is "important" equal to avoid/decrease or prefer/increase? So, what would a person who thinks that the sex composition should be all female or male rate? That it's important or not important? Or those who think that meritocracy should be the only factor? They may still think that "age diversity" is "important" only that it should fall out according to merits, or that there are highly merited people of all ages, and it's only because of age discrimination that this does not manifest. Such a person may answer "not important" in order not to suggest that "under-represented" age ranges be favoured - but still think such diversity is desriable! When we come to Q3 and higher I become even more confused. What does “Should journal editors have an age preference in hiring to editorial boards? (Pick 5 for no age preference)” mean? Apparently, 5 is no preference and 10 is “a preference”? But which preference? For higher, lower, equal, or perhaps varied ages? So what would then ratings from 0 to 4.99 stand for? It’s simply not defined, which might be why the mean is just above 5. But this construction of items causes confusion and ambiguity for both respondents and for interpreters of the results. This should be dealt with in the result section and discussed as a weakness in the discussion. So, I am really unsure about how to interpret the mean of 3.9 for “Q8. Do you think journal editors have a sex preference in hiring to editorial boards? (Pick 5 for no sex preference)”. Journal editors have a *negative* preference?! What’s that even supposed to mean? It would have been clear if the item was “editors have a preference for hiring females to editorial boards? (Pick 5 for no sex preference, and smaller values for a male preference)”. These issues seem to be reflected in the vastly different distributions across items in Figure 2.

693 “This could indicate that bias against men is so strong amongst academics that they refuse to believe in greater male academic ability.” This sounds like a discussion/interpretation point. Could you structure the arguments more clearly, in particular in results/discussion?

741 Change to “In the regression results, we found that controlling for years publishing reduces the male advantage in research output.”

742 Change to “We are uncertain about the reasons for this, but suggest that (1) older scholars have had more time to publish papers, (2) younger cohorts of scholars are less productive than older ones, and (3) journals have a pro-old age bias.”

750 “men being better at academic research”. This is an unfortunate formulation, because it implies essentialism and inherent causes. Even if that may be the case, you have not presented any evidence for it, and it is essentially irrelevant in relation to the empirical questions treated in this ms. I suggest “The regression results are inconsistent with anti-female discrimination but support the presence of anti-male discrimination and higher academic performance amongst men”.

753 Change to “they were eleven times more likely to support discrimination”

776 Change to “It is possible that whilst our”

782 Change to “Moreover, academics at elite institutions are overwhelmingly left-wing, which is associated with having pro-female preferences (Winegard et al., 2020), suggesting that editors...”

 801 Change to “Academics who do not support affirmative action for women or diversity might be shunned or even ‘cancelled’ by their overwhelmingly left-wing colleagues”

809 Change to “Furthermore, we found that those who were more strongly biased against men also more strongly believed academia to be biased against women.“

814 Change to “Given that anti-male bias is so common and accepted, this could explain why our results are consistent with anti-male bias despite anti-female bias being a more popular theory with academics.”

819 Change to “We cannot determine whether editorial boards have previously exhibited a bias against women, because our data are not longitudinal, but we can be reasonably confident that they do not now. “

822 Change to “In Gary Becker’s taste discrimination model of the labour market 822 (1971), profit seeking firms should employ discriminated groups because they accept lower wages. Likewise, journals looking for top talent could do well in recruiting men that other editorial boards have ignored”

Author

Dear Reviewer 1,

Thank you very much for your comments on our manuscript. I’m sorry this process has taken so much of your time. I’d like to assure you that I’m taking all your comments seriously and have made changes in line with your suggestions. In your most recent response, I count over 60  suggestions, of which I think we have now fulfilled all but three of them. I will briefly explain the 3 suggestions I’m not sure we could fulfil and provide an explanation for them. After that, I will go over all the other suggestions you have made. I should note that 2 of the 3 suggestions we have not completely fulfilled appear to be new, which I think also holds for most of the other suggestions. There is a character limit on replies, so our response is split over two messages.

 

Comments on our Regression Table:

Table 6. Vertical indices along the left side should be numbered 1-12, otherwise it’s difficult to identify the horizontal indicies. The F values seem much too high.

With regards to the numbering, I do not quite understand what is being asked. The models are numbered 1-12 already.

I’ve looked at my code and re-run the regressions and I produce the same F values. They range from 16 to 692. I expect some of the F values are so big because our sample sizes are very large and ‘years since first publication’ explains a large amount of the variation in academic output. The fact that F values only appear especially large when the years variable is employed would seem to suggest there are not data problems, rather one variable just explains a substantial proportion of the variation. That years publishing explains so much variation in output is quite intuitive.

 

Sex or Gender?

You should use the term ‘sex’ rather than ’gender’, because your use of categories (male vs. female) indicates it is biological sex you are considering. In fact, you inconsistently use sex 20 times and gender 20 times in the ms. 

The problem is tricky because our survey used the term gender rather than sex. We have only used the term gender when referring to the survey questions and have explained why in a footnote at the start of the survey section. We forgot to mention that we had made this change in our last communication with the reviewer, but we hope you find this approach reasonable. Admittedly, a few other uses of the term ‘gender’ were added by accident in past edits and have now been removed.

We explain our position in the footnote of page 15:

“In our survey of academics we used the term ‘gender’ rather than ‘sex’, although the rest of the paper is focused on sex. These two concepts may have different interpretations and connotations, with sex implying biology and gender implying a ‘social construct’. Transgender people constitute 0.6% of all US adults (Jones, 2021), so we suppose that in practice the individuals have the same sex as their gender. As such, we do not think changing terminology should change the interpretation of our results.”

 

IQ:

If we ALSO weigh in that the average difference is heavily contested (although probably correct) it seems counterproductive to get another 50% of academics presumably reading this to just ignore the whole study.In my estimation, the IQ thing doesn’t add anything anyway, because (1) again, this study is not about that, it doesn’t measure IQ, it’s all merits and editorship, and (2) you cannot (and you don’t) estimate any effects of IQ, so no comparisons of different possible effects can be made anyway.

We include discussion of IQ because it is an essential explanation for why women have poorer publication records and appear less likely to be hired to editorial boards. Without mentioning IQ, readers may fairly infer that sex bias in hiring, causes men to have higher publication metrics, which we have not proved. Moreover, without an appreciation of IQ differences readers may believe that women perform worse than men due to sexism (and so deserve to get onto editorial boards despite a poorer publication record). Thus, although we cannot measure IQ or its effects, it is an omitted variable with likely important relationships with all our main variables – sex, h index and even years publishing.

Reviewer 2 critiqued our paper, noting that the existence of mean and variance differences in intelligence meant multiple interpretations of our results were possible, which we had not previously made sufficiently clear. To ensure these interpretations are fairly explained in the paper, we incorporate a discussion of intelligence differences.

I understand this will put off readers, but I would like the paper to be honest with readers and not to mislead them. This is why I’m attempting to publish with OpenPsych.

If removing discussion of intelligence is necessary to get this paper published, I would do so. However, given it is only a new critique of the paper, I hope it is not a necessary condition for us.

 

 

 

 

All accepted suggested changes:

Since the findings are clear, the title should not be expressed as a question but as a statement. Not revised.

A new title is given that is not question.

The tables are ugly because they have both horiz and vertical lines, and do not follow APA format. I think the APA format is much better than the present style.

This has been fixed to match the APA format

I still think it needs proofreading, preferably by a native English speaker. The language is unidiomatic, and to me, at least, quite annoying to read. This is gleaned from the many suggested changes below.

A third party has helped to proof read the essay and all suggested changes with regards to language from the reviewer have been followed

The central research question comes in the 2nd para: “These disparities pose a key question: to what extent do sex biases or sex differences explain different outcomes?” One would expect the following to focus on the question to be investigated, i.e. bias, but the whole next para is about cognitive differences. I think that is inconsistent with the focus of the paper, in particular as there are other possible group differences, such as personality, interests, and preferences, which are not mentioned at all. I recommend a short para that briefly reviews these different alternative causes, referring only to review and overview articles, and then focus on discrimination and possible ways that it might influence.

Before discussing cognitive differences we discuss the potential consequences of differences in careerism. We have added another paragraph overviewing the literature on whether personality differences can explain sex differences in achievement in academia.

46 maybe-> may be

Fixed

46 women have a greater interested in family-> women have a greater interest in family

Fixed

108 “have been urging their editors to improve the sex ratio in their boards” Improve is value-laden, which should be avoided in scientific writings unless specifically motivated. It is also unclear what it means, because depending on the ideals of the person and the current ratio in the particular context that that person is referring to, she feel that “improving the sex ratio” is to increase males, to increase females, or to make the proportion equal.

Changed to “urging their editors to increase the representation of women on their boards”, avoiding any normative claims and improving clarity

125-127 This sentence is a bit unclear

Changed from “Attempts to employ affirmative for women on journal boards may be helpful to create meritocratic representations if they are being discriminated against. However, if women are not discriminated against, affirmative action policies may reduce meritocracy in academia, creating the very problems of discrimination, affirmative action was meant to counteract. As such, stronger evidence on whether sex bias is at play is essential for judging whether affirmative action policies can be justified or are counterproductive.”

To

Attempts to employ affirmative for women on journal boards may be meritocratic, if there is sex discrimination. However, if there is no discrimination, affirmative action policies may counterproductive. Moreover, if affirmative action and sex bias support the same sex, then affirmative action may aggravate inequities. As such, stronger evidence on whether sex bias exists is essential for judging whether affirmative action will improve meritocracy.”

 

130 I suggest “As mentioned, the variance in intelligence is higher amongst males, and their average also seems to be somewhat higher.”

Suggestion followed.

 

You should use the term ‘sex’ rather than ’gender’, because your use of categories (male vs. female) indicates it is biological sex you are considering. In fact, you inconsistently use sex 20 times and gender 20 times in the ms. 

The problem is tricky because our survey used the term gender rather than sex. We have only used the term gender when referring to the survey questions and have explained why in a footnote at the start of the survey section. We forgot to explain that we had made this change in our last communication with the reviewer, but we hope you find this approach reasonable. Admittedly, a few other ‘genders’ were added by accident in past reviews.

 

138 Which test?

Fixed changed from “The reasoning for our test comes from Gary Becker’s taste discrimination model of the labour market (Becker, 1971).” To Our reasoning comes from Gary Becker’s taste discrimination model of the labour market (Becker, 1971)”

 

 146 Doesn’t it also indicate that women are being favoured when Psychiatry journal boards hire?

Changed from “This result suggests women are not being discriminated against when Psychiatry journal boards hire.” To “This result suggests women are not being discriminated against, when Psychiatry journal boards hire, and may even imply that women are being favoured.”

 

157 The authors found THAT women had fewer publications

Fixed – “that” has been added

 

158 No, not assistant professors, but hired or promoted to a position as professor, the equivalent status to tenured (or full) professor in the United States.

Fixed – I misunderstood the reviewer’s original remarks about the sentence, so it was not fixed properly last time.

 

159 Strumia found THAT

Fixed

162 there might be A sex bias

Changed from “These results suggest that women are unlikely to be discriminated against in hiring by universities, despite there being more male than female academics.” To “These results suggest that women are unlikely to be discriminated against in hiring by universities or even a bias against women, despite there being more male than female academics.

165 You should not mix “gender bias” and “sex bias” unless (1) you intend a different meaning with these two terms AND (2) you clearly define that distinction (before used the first time).

Fixed

178 “unobserved ability, making a preference for one sex over another possibly meritocratic.“I

Please develop this argument (if there is one)

Changed from “A caveat to these resume studies is that sex may be confounded with unobserved ability, making a preference for one sex over another possibly meritocratic.” To

A caveat to these resume studies is that sex differences in hiring may not be caused by prejudice, but by statistical discrimination.”

 

181 “Firstly, women make up a higher proportion of these scholars”

Compared to what?

 Changed to “Firstly, women likely make up a higher proportion of academics in humanities than in STEM disciplines…”

192 ” We thought it was also important to choose disciplines within a large range of political” -> within a wide range of political

Fixed – changed ‘large’ to ‘wide’

195 “left-wingers” Perhaps an unhappy term to use in a scholarly paper?

“Right wingers” and “left-wingers” changed to “-wing individuals”

196 Change to “be prone to bias towards groups with low status, including women”

Fixed

198 Change to “Whilst a wide range of disciplines”

Large changed to wide.

226 Change to “suggesting that editorial boards’ apparent sex disparities could be close to the meritocratic ideal.”

What is “sex disparities”? What is “the meritocratic ideal”? Define! Define!

The sub-clause has been removed.

236 “From this ranking, we then took the top 30 journals from each discipline”

Makes sense!

“our results reflect whether there is bias in the elite of each discipline studied.”

Does not make sense! What do you mean?

The sub-clause is removed to avoid ambiguity.

Fixed – the word ‘that’ has been added

240 “We disagreed with the discipline label of some of the journals on Scimagojr....”

What’s the point of this? Probably better to just stick with the labeling that Scimagojr uses: Then it’s all on them! It probably makes no essential difference anyway!

The point of this was so we could be more confident in the our analyses comparing disciplines and using disciplines as dummy variables. As mentioned in the text, the ‘Journal of Political Economy’ is an economics journal, not one used by political scientists. I hope you will consider this as reasonable.

Author

245 Change to “Table 9”

284 Change to “Table 9”

Table is now capitalised, but no reference to a table is made on line 284.

281 Change to “were women, meaning that women were slightly less”

Fixed

 

299 “To deal with these extreme values we applied Tukey’s Fences...”

What is that?

Explanation of the Tukey’s Fences is now given along with the formula to identify positive outliers.

 

327 “This suggests that any of the dependent variables...”

Fixed – ‘that’ is added.

354 “underrepresent to denote whether the fraction of women on editorial boards in a discipline is greater or less than female representation in the relevant population of academics...”

Changed to suggestion

358 Change to “For comparison, we found a range of datasets representing the sex proportion amongst academics in the disciplines studied here.”

Change adopted

 

364 Change to “have sex proportions specifically for Anthropology or Political Science, so we use...”

Change adopted

 

388 Change to “within the Social Sciences”

Change adopted

 

416 Change to “in Figure 1”

Change adopted

 

418 Change to “Table 5

Change adopted

 

460 “Our results are the opposite of what would be expected if women were being discriminated against, strongly suggesting that women are not discriminated against in hiring to editorial boards.”

Sentence removed

 

466 Change to “Women are under-represented in psychology editorial boards, and yet the women who do manage to get on the editorial boards dramatically underperform relative to the men that are on the board by 0.44 standard deviations.” And “In other words, women are underrepresented on psychology editorial boards relative to their proportion presence on faculty, but are are still overrepresented relative to their merit”

Changes made.

 

 

527 I would suggest not using terms like “academia is a ‘leaky pipeline’”, because that is a charged and political concept, created to summon a particular impression about the causes of fewer women in certain domains.

Phrase removed.

 

623 Change to “This means that for every academic preferring men there were 11 who preferred women.

Change adopted

 

626 Change to “This suggests that there is a large minority of academics that would act to discriminate against men to serve on editorial boards.” (They usually are not hired or payed, but serve free of charge)

Change adopted

 

629 Change to “Only 3% of respondents believed that journal editors should be biased in favour of men.”

Change adopted

650 Numbers < 12 should be written with letters, unless they constitute a precise value with a specified unit, such as “2.1 centimetres”, “4 Ångström”. Change to “Nearly three academics preferred young academics for every one that supported older academics”

Change adopted. But numbers are not given in letters when denoting responses, because it would be confusing to switch to numbers to letters when there is a decimal point. This is consistent with your other requests which include numbers not written with letters.

 

659 Change to “These differed significantly from 5 (p < 0.001), suggesting that academics thought that journals were biased in favour of men and older scholars.”

“That” is now included.

 

660 Change to “So whilst academics are biased in favour of women and young people, they simultaneously believe other academics have the opposite bias.”

 

Change adopted

 

662 Change to “We speculate in the discussion that academics have such a strong anti-male bias that it deludes them into thinking that academia at large has the opposite bias.” OR “We speculate in the discussion that academics have a strong anti-male bias, and that this contrasts with reality in such a way that it appears to them that academia at large has the opposite bias.”

Change adopted – first option is used.

 

667 “diversity in questions 2 and 1 respectively.” What does this mean? Why in reverse order?

Section rephrased to improve clarity

 

670 Change to ”With responses overwhelmingly closer to 10 than 0, it seems that academics place much value on diversity.” Was it specified which kind of diversity you referred to? E.g., in political preferences, in theoretical/scientific perspectives, in family values, in wealth, etc?

Paragraph changed to make clear I’m referring to age and sex diversity.

 

676 “had a greater aptitude for research, despite the fact men tend to receive more citations”. Is the point of mentioning this here that this fact was part of the question, e.g. “Do you believe that, considering the fact that male academics tend to receive more citations, men have a greater aptitude for research?”

No mention of differences in citations was mentioned in the question item. The subtext was that the surveyed academics were wrong because their sex bias prevents them from admitting the sex differences in academic aptitude. This was tested later in the paper “We find belief in higher female aptitude (Question 6) correlates at 0.22 (p < 0.001) with a preference for hiring women (Question 4). This would support the idea that bias in favour of women is motivating bias regarding their ability and also discrimination in favour of women.”

Just a note, we did mention that “To ensure all respondents correctly interpreted the question as implying that a sex preference would be discriminatory and anti-meritocratic, we labelled the right end of responses “They should favor females above their academic accomplishments” and the left the same but for males.

 

680 Change to “suggests academics believe that young scholars are just as good as older scholars.”

Changed adopted – “that” is added

 

682 Change to “In Table 8 we present”

Change adopted

 

  I have some concerns with the survey items, which are not ideally phrased in order to get at what you want. This type of ambiguity is unfortunately common. How do you think respondents interpret "important"? Is "important" equal to avoid/decrease or prefer/increase? So, what would a person who thinks that the sex composition should be all female or male rate? That it's important or not important? Or those who think that meritocracy should be the only factor? They may still think that "age diversity" is "important" only that it should fall out according to merits, or that there are highly merited people of all ages, and it's only because of age discrimination that this does not manifest. Such a person may answer "not important" in order not to suggest that "under-represented" age ranges be favoured - but still think such diversity is desriable! When we come to Q3 and higher I become even more confused. What does “Should journal editors have an age preference in hiring to editorial boards? (Pick 5 for no age preference)” mean? Apparently, 5 is no preference and 10 is “a preference”? But which preference? For higher, lower, equal, or perhaps varied ages? So what would then ratings from 0 to 4.99 stand for? It’s simply not defined, which might be why the mean is just above 5. But this construction of items causes confusion and ambiguity for both respondents and for interpreters of the results. This should be dealt with in the result section and discussed as a weakness in the discussion. So, I am really unsure about how to interpret the mean of 3.9 for “Q8. Do you think journal editors have a sex preference in hiring to editorial boards? (Pick 5 for no sex preference)”. Journal editors have a *negative* preference?! What’s that even supposed to mean? It would have been clear if the item was “editors have a preference for hiring females to editorial boards? (Pick 5 for no sex preference, and smaller values for a male preference)”. These issues seem to be reflected in the vastly different distributions across items in Figure 2.

 

I had tried to deal with these problems by better explaining the questions and putting pictures of the questions in the supplementary materials. I have now added a paragraph to the discussion, to point out that ambiguities in the questions could cause problems, arising from the ambiguity of use of our ten point scale, where 0, 5 and the 10 is defined but the meaning of choices in between are not. With regards to the ambiguity of the terms ‘diversity’ and ‘importance’, the ambiguity of underlying intentions was part of the study design. This was because we could correlate responses to the questions on diversity’s importance, with other questions like sex preference, to see whether a concern for diversity was related to male vs female preference.

The discussion paragraph on this issue is as follows:

“Another limitation, pointed out by a reviewer, of our survey is the possible ambiguities of our questions. In our questions we gave a 0-10 scale, with 0 and 10 labelled as extreme responses and 5 as intermediate. For example, in question 4 on whether editors should have a preference for women, 10 was labelled, “They should favor females above their academic accomplishments”, 0 was given the same label but for men and 5 was labelled as no preference. As such, the difference between 1-4 and 6-9 was not defined although we meant higher numbers to represent more pro-female preferences. Some respondents may not have realised that these intermediate values represented different points on the dimension of pro-male to pro-female preferences. Nonetheless, we do not think any ambiguity in our questions have distorted our results. Respondents were given the opportunity to gave feedback, but did not make comments about the scale of our questions being confusing. Furthermore, a visual inspection of the results in Figure 2 show smooth distributions, with modal answers not always being 0, 5 or 10, suggesting respondents correctly interpreted the other values on our 0-10 scale.”

 

 

693 “This could indicate that bias against men is so strong amongst academics that they refuse to believe in greater male academic ability.” This sounds like a discussion/interpretation point. Could you structure the arguments more clearly, in particular in results/discussion?

Sentence removed and interpretation is moved to the discussion.

 

741 Change to “In the regression results, we found that controlling for years publishing reduces the male advantage in research output.”

Changed – “that” is added

 

742 Change to “We are uncertain about the reasons for this, but suggest that (1) older scholars have had more time to publish papers, (2) younger cohorts of scholars are less productive than older ones, and (3) journals have a pro-old age bias.”

Changed, but re-written “(3) journals may have a pro-old age bias” to make clearer that I’m very uncertain about this hypothesis.

 

750 “men being better at academic research”. This is an unfortunate formulation, because it implies essentialism and inherent causes. Even if that may be the case, you have not presented any evidence for it, and it is essentially irrelevant in relation to the empirical questions treated in this ms. I suggest “The regression results are inconsistent with anti-female discrimination but support the presence of anti-male discrimination and higher academic performance amongst men”.

The phrase better at academic research is changed to “higher performance amongst male academics”

 

753 Change to “they were eleven times more likely to support discrimination”

Change accepted

 

776 Change to “It is possible that whilst our”

Change made

 

782 Change to “Moreover, academics at elite institutions are overwhelmingly left-wing, which is associated with having pro-female preferences (Winegard et al., 2020), suggesting that editors...”

Change made

 

801 Change to “Academics who do not support affirmative action for women or diversity might be shunned or even ‘cancelled’ by their overwhelmingly left-wing colleagues”

Change made

 

809 Change to “Furthermore, we found that those who were more strongly biased against men also more strongly believed academia to be biased against women.“

 

814 Change to “Given that anti-male bias is so common and accepted, this could explain why our results are consistent with anti-male bias despite anti-female bias being a more popular theory with academics.”

 

In light of another request to keep this issue in the discussion, this area has changed more generally.

 

819 Change to “We cannot determine whether editorial boards have previously exhibited a bias against women, because our data are not longitudinal, but we can be reasonably confident that they do not now.

Change made

 

Bot

Authors have updated the submission to version #8

Reviewer

Review of No Fair Sex in Academia: Evidence of Discrimination in Hiring to Editorial Boards v. 8

The authors have now responded comprehensively to my comments. I found a few minor things that could improve the ms.

34 -> “Only 2% of the individuals considered to be ‘eminent’ in science, before 1950”

137 These sentences are confusing if the reader is not fully aware that you’re referring to the sex of the actual board members specifically – if it refers to academics in general, the conclusions would be the opposite. So to avoid confusion you might want to be overexplicit, like “Thus if women on boards have a higher academic output, despite their lower variance in IQ, we can be confident that there is anti-female bias for admission to the board. We can also say that the larger the sex difference in favour of men on boards, the lower the likelihood of anti-female bias and the higher the likelihood of anti-male bias for admission to the board. So if men on editorial boards have a higher academic output than women we can be confident that there is no anti-female bias for admitting board members.

272 -> “scaled into standard deviation units as Z-scores, according to” or even better  

271 -> “were first log10 transformed and then Z-transformed into standard deviation units within each academic discipline”

423 Very confusing to look at a graph with a distribution mean of 0 and the caption reads “Distributions of Log10 Transformed h-Index”. Please add that the data are z-transformed.

Table 6. Still not clear to me what the function of the numbers 1-12 is, before I thought it was to identify the same the horizontal indicies. The F values still seem much too high. I ran several MRA with similar data, and got for example R2 = 0.048, F = 4.66, p = .0013 and R2 = 0.086, F = 5.016, p = .0017. So regardless of higher or lower R2, F is always much lower than your values. I’m not saying you’re wrong, but please make sure you’re not.

600 One cannot see the x-axis in the PDF – obscured by the “Note”

600 I assume you mean “For questions regarding age and sex preference, lower scores indicate”

But that is the opposite of “we labelled the right end of responses “They should favor females above their academic accomplishments” and the left the same but for males”, no?

735 “uncertain about the reasons for this, but suggest that (1) older scholars have had more time..” Mustn’t this explanation also include sex somehow?

 

Reviewer

Review of No Fair Sex in Academia: Evidence of Discrimination in Hiring to Editorial Boards v. 8

The authors have now responded comprehensively to my comments. I found a few minor things that could improve the ms.

34 -> “Only 2% of the individuals considered to be ‘eminent’ in science, before 1950”

137 These sentences are confusing if the reader is not fully aware that you’re referring to the sex of the actual board members specifically – if it refers to academics in general, the conclusions would be the opposite. So to avoid confusion you might want to be overexplicit, like “Thus if women on boards have a higher academic output, despite their lower variance in IQ, we can be confident that there is anti-female bias for admission to the board. We can also say that the larger the sex difference in favour of men on boards, the lower the likelihood of anti-female bias and the higher the likelihood of anti-male bias for admission to the board. So if men on editorial boards have a higher academic output than women we can be confident that there is no anti-female bias for admitting board members.

272 -> “scaled into standard deviation units as Z-scores, according to” or even better  

271 -> “were first log10 transformed and then Z-transformed into standard deviation units within each academic discipline”

423 Very confusing to look at a graph with a distribution mean of 0 and the caption reads “Distributions of Log10 Transformed h-Index”. Please add that the data are z-transformed.

Table 6. Still not clear to me what the function of the numbers 1-12 is, before I thought it was to identify the same the horizontal indicies. The F values still seem much too high. I ran several MRA with similar data, and got for example R2 = 0.048, F = 4.66, p = .0013 and R2 = 0.086, F = 5.016, p = .0017. So regardless of higher or lower R2, F is always much lower than your values. I’m not saying you’re wrong, but please make sure you’re not.

600 One cannot see the x-axis in the PDF – obscured by the “Note”

600 I assume you mean “For questions regarding age and sex preference, lower scores indicate”

But that is the opposite of “we labelled the right end of responses “They should favor females above their academic accomplishments” and the left the same but for males”, no?

735 “uncertain about the reasons for this, but suggest that (1) older scholars have had more time..” Mustn’t this explanation also include sex somehow?

 

Reviewer

Review of No Fair Sex in Academia: Evidence of Discrimination in Hiring to Editorial Boards v. 8

The authors have now responded comprehensively to my comments. I found a few minor things that could improve the ms.

34 -> “Only 2% of the individuals considered to be ‘eminent’ in science, before 1950”

137 These sentences are confusing if the reader is not fully aware that you’re referring to the sex of the actual board members specifically – if it refers to academics in general, the conclusions would be the opposite. So to avoid confusion you might want to be overexplicit, like “Thus if women on boards have a higher academic output, despite their lower variance in IQ, we can be confident that there is anti-female bias for admission to the board. We can also say that the larger the sex difference in favour of men on boards, the lower the likelihood of anti-female bias and the higher the likelihood of anti-male bias for admission to the board. So if men on editorial boards have a higher academic output than women we can be confident that there is no anti-female bias for admitting board members.

272 -> “scaled into standard deviation units as Z-scores, according to” or even better  

271 -> “were first log10 transformed and then Z-transformed into standard deviation units within each academic discipline”

423 Very confusing to look at a graph with a distribution mean of 0 and the caption reads “Distributions of Log10 Transformed h-Index”. Please add that the data are z-transformed.

Table 6. Still not clear to me what the function of the numbers 1-12 is, before I thought it was to identify the same the horizontal indicies. The F values still seem much too high. I ran several MRA with similar data, and got for example R2 = 0.048, F = 4.66, p = .0013 and R2 = 0.086, F = 5.016, p = .0017. So regardless of higher or lower R2, F is always much lower than your values. I’m not saying you’re wrong, but please make sure you’re not.

600 One cannot see the x-axis in the PDF – obscured by the “Note”

600 I assume you mean “For questions regarding age and sex preference, lower scores indicate”

But that is the opposite of “we labelled the right end of responses “They should favor females above their academic accomplishments” and the left the same but for males”, no?

735 “uncertain about the reasons for this, but suggest that (1) older scholars have had more time..” Mustn’t this explanation also include sex somehow?

 

Bot

Authors have updated the submission to version #9