OpenPsych forums
[ODP] An update on the narrowing of the black-white gap in the Wordsum - Printable Version

+- OpenPsych forums (https://www.openpsych.net/forum)
+-- Forum: Forums (https://www.openpsych.net/forum/forumdisplay.php?fid=1)
+--- Forum: Submissions (https://www.openpsych.net/forum/forumdisplay.php?fid=2)
+---- Forum: Withdrawn submissions (https://www.openpsych.net/forum/forumdisplay.php?fid=10)
+---- Thread: [ODP] An update on the narrowing of the black-white gap in the Wordsum (/showthread.php?tid=168)

Pages: 1 2 3 4 5 6 7


MH #50 - Emil - 2014-Dec-16

Quote: Your main claim was that anything below .99 is not normal. You did not specify any condition. Now you specify a condition : that N should not be small. Either way, this means your first statement was plain wrong.

Not plain wrong, but restricted to large datasets.

Quote:The sample sizes are large. Histogram shows normal distribution, but W values are all below 0.99; age has 0.970 and cohort has 0.984.

I could have also said there is another thing which is not right with your statement of W>0.99. The reason is that in my multilevel analysis, the W values for level-2 residuals were around 0.94-0.95 but histogram shows nearly normal distribution. Same thing with Q-Q plots. There was just a small-to-modest deviation from normality.

There is nothing wrong with the .99 guide. Your judgments based on Q-Q plots do not show much. Subjective judgments are bad in science (and everywhere else). If there is an objective way, use that instead.

Yes, there is deviation from normal. That is what W shows. Why are we disagreeing about? First you say .99 doesn't work even for large datasets. Then you agree that the two variables with .94-.95 show small to modest deviation from normality. Which way is it?

Quote: But first... Why are you so nitpicking... ? I said already my own method actually works. If you don't like it, then it's just like saying "Meng, you use Times New Roman for your style but I preferred Arial, so unless you switch back to Arial style, I must disapprove your paper". If that was merely an advice, I will thank you for this, because you are a fast learner, unlike me, and I may need help with R (sometimes). However, I hope it's not a requirement for publication. Because all of your objections (instead the "swilk" thing) is about "change the placement of the graphs" "make the syntax looking more stylish" "don't type the numbers of your parameters" etc. All of them are unreasonable requirements, in my opinion. It's, after all, just a question of taste. Nothing to do with the content of the article (method, theory, etc.).

Writing bad code makes it hard for others to follow your method if they wish to examine the code at a later point. And when you hard-code stuff, you risk introducing errors at every point. (Case in point below.)

I have not said that it was a requirement for publication.

As with John, I don't understand multi-level regression, so I'm not competent to judge that part of the method.

But you should fix the figures before publication. It makes no sense for them to be in the back. It's like reading working papers in economics. It is debatable whether the tables should be in the text as well. I am ok with them being in the back.

Quote: I know that. But i didn't want to use it first. I wanted to "keep" only cases with age<70. But not to remove cases with missing values on wordsum. It may contain some useful information (e.g., people having wordsum may have different values on a certain variables than people who don't have wordsum). I prefer to use a function like "do if". In SAS, Stata and SPSS you can easily do it.

Well, you can of course have both datasets. I don't see the problem here.

Quote: It works for you because you remove all the missing data before. I prefer to run the regression using a statement like "do if not missing x1 x2 x3 etc".

I'm not aware of any R command like that. Don't try to force STATA/whatever commands/syntax on R. Use the R commands and syntax to solve the problems.

Quote: However, you use your own data. Not mine.

For convenience. I already have my own data loaded and my code is otherwise good.

However, for completeness.

Code:
source("merger.R") #my functions
entiredata = read.csv("GSSsubset.csv", stringsAsFactors=FALSE) #load file, not with factors

d = subset(entiredata, age<70)

d$weight<- d$wtssall*d$oversamp # interaction variable

d$bw<- rep(NA) # generate variable with only missing values
d$bw[d$race==1] = 1 # assign a value of "1" under the condition specified in bracket
d$bw[d$race==2] = 0 # assign a value of "0" under the condition specified in bracket
d$bw0 = rep(NA)
d$bw0[d$year>=2000 & d$race==1 & d$hispanic==1] = 1
d$bw0[d$year>=2000 & d$race==2 & d$hispanic==1] = 0
d$bw00 = rep(NA)
d$bw00[d$year<2000 & d$race==1] = 1
d$bw00[d$year<2000 & d$race==2] = 0
d$bw1 = d$bw0 # combine the two vars, by incorporating the first var
d$bw1[is.na(d$bw0)] = d$bw00[is.na(d$bw0)] # then the second, without Nas

d$agec = d$age-mean(d$age) #40.62 # mean-centering of age, mean was wrong
d$agec2 = d$agec^2 # squared term of age
d$agec3 = d$agec^3 # cubic term of age
d$bw1agec = d$bw1*d$agec
d$bw1agec2 = d$bw1*d$agec2
d$bw1agec3 = d$bw1*d$agec3

d$wordsum[d$wordsum==-1] = NA
d$wordsum[d$wordsum==99] = NA

require(VGAM) #multilevel stuff
library(lattice) #xy plot

#setsub it
d1= subset(d, select=c(wordsum, bw1, age, agec, agec2, agec3, bw1agec, bw1agec2, bw1agec3, weight))
d1 = d1[complete.cases(d1),] #keep only complete cases
#22156 rows

R_rocks<- vglm(wordsum ~ bw1 + agec + agec2 + agec3 + bw1agec + bw1agec2 + bw1agec3, tobit(Upper=10), data=d1, weights=d1$weight)
summary(R_rocks)

d1$wordsumpredictedage = 5.210807+1.355356*d1$bw1+-.0051079*d1$agec+-.0016632*d1$agec2+.000017*d1$agec3+.0169752*d1$bw1agec+.0002868*d1$bw1agec2+.0000104*d1$bw1agec3
coefs.mh.manually.typed = c(1.355356,.0051079,.0016632,.000017,.0169752,.0002868,.0000104)
coefs.extracted = coefficients(R_rocks)[-c(1,2)] #dont get intercepts
hist(coefs.extracted-coefs.mh.manually.typed) #what are the differences?
#pretty small

d1$fitted = predict(R_rocks)[,1] #fitted values, you only first the first column
plot(d1$fitted,d1$wordsumpredictedage) #plot and compare
cor(d1$fitted,d1$wordsumpredictedage) #correlation
#.9986497 the rest perhaps being due to imprecision in typing the coefficients off above

#fancy plot
xyplot(d1$fitted ~ d1$age, data=d1, groups=bw1, pch=19, type=c("p"), col = c('red', 'blue'), grid=TRUE, ylab="Wordsum predicted by tobit", xlab="age", key=list(text=list(c("Black", "White")), points=list(pch=c(19,19), col=c("red", "blue")), columns=2))


I found some errors in the code. E.g. you had calculated the wrong mean when re-centering. Again, you had inputted values manually instead of using a function. Bad.

The coefficients you typed in are not identical to those in the summary file, perhaps because I fixed the re-centering issue. They are however, very similar.

Plot attached. It seems normal.

The largest problem was loading the datafile correctly. In your code, you load a CSV file. However, you linked to a SAV file (SPSS). I tried loading the SPSS file, which works, but it treats some columns as factors (bad). I then had to open SPSS and export as CSV, and then load it as CSV without stringsAsFactors=FALSE to get rid of the factors.


RE: [ODP] An update on the narrowing of the black-white gap in the Wordsum - Meng Hu - 2014-Dec-17

(2014-Dec-16, 05:40:59)Emil Wrote: There is nothing wrong with the .99 guide.


Everything is wrong, if I can prove you I can get normal distribution with <0.99 W value. And I did.

Quote:Your judgments based on Q-Q plots do not show much. Subjective judgments are bad in science (and everywhere else). If there is an objective way, use that instead.

Your comment assume that SW test is objective. But this can't be and you know that. There is no universal value which tells you what is normal and is not.

Your comment is weird. I don't need GDP to tell that africans are poorer than europeans for the exact same reason I don't need the W value of SW test to tell me if my distribution is normal or not. I can separate the rich from the poor by looking at them exactly the same way I can tell the kind of distribution I have from a quick look at the histogram. And anyone who tells me he needs GDP or any other values for knowing whether africans are poor or not is beyond help. There is nothing universal in science. You ask the impossible or you mischaracterize science. If there is no certainty in subjective judgment ("I recognize a poor when I see it") there is no more certainty among the numbers and their arbitrary cut offs and differing suggestions made by scientists, based on different models and simulations. It's just a matter of precision. If I have problem to sort the poor and the rich, that means there is high probability their wealth are not very different. The same is true if I hesitate between normal and non-normal distributions. Because this implies there is certainly some non-normality, maybe slight or moderate, but there is definitely a little something that is non-normal.

You even knew that the "universal" p-value cut off of 0.05 is arbitrary, but you are ready to believe that your value of W=0.99 is universal ? Probably not. Then, your guide is equally subjective. But if you don't believe W=0.99 is universal, why are you telling me that subjective judgments on QQ/histogram are bad ? It cannot be worse than using your guide, which would certainly prove most dangerous for me. Most researchers, if not all, would kill me if they learn that I use a rule applied to a statistic that has been recommended by someone who did not publish his paper on a peer-reviewed article. For a cut-off value of 0.99 so important in stats, it's clear that nobody will accept your judgment if it can't pass the peer-review. So, I have every reason to think that your suggestion to cite your blog post and follow your cut-off is extremely dangerous. And not a reasonable request.

Quote:Writing bad code makes it hard for others to follow your method if they wish to examine the code at a later point. And when you hard-code stuff, you risk introducing errors at every point. (Case in point below.)

I don't know how many times I should repeat it. Your comment assume you don't trust me, that is, you think I must have necessarily made a mistake somewhere. Why not rerun the syntax, if you want to see ? Like I've said, I examined the numbers many times before. Yes. Too many times.

And it's not a bad code. Perhaps in a subjective way, but not in an objective way. A code is objectively "bad" or "wrong" only if it produces erroneous results. All of my results are correct.

Quote:As with John, I don't understand multi-level regression, so I'm not competent to judge that part of the method.

It is still possible to comment on this stuff. If for example, there are affirmations that were not backed by references, you can mention them, or if you don't understand something, ask to clarify. I have cited everything I think I could. The only change I will operate (unless someone points out something else) is the parapraph concerning intercept-slope correlation. Considering the mails I have received from Snijders and Hox, that correlation can be interpreted as a standard correlation, bounded between -1 and +1, but only when the random effects variables are not constrained (e.g., to be zero). This is the case in my study. So, I have to rectify this later.

Quote:But you should fix the figures before publication. It makes no sense for them to be in the back. It's like reading working papers in economics. It is debatable whether the tables should be in the text as well. I am ok with them being in the back.

In what way it makes no sense ? According to you and Dalliard, the reasons you gave me is clearly a matter of taste, and thus are entirely subjective. Not objective. You two said, basically, that it is a bother and a waste of time to scroll down to the end. Which will only takes 1-2 seconds. Like you've said before, if someone wants to read and check the syntax, he will look at the supplementary files, so there was no problem for having a pdf article and a pdf for supplementary files. But I can tell the same thing for the graphs. Where I have put the graph is not making it more difficult to read. What can make them difficult to read is if there is no title, misleading title, no legend, or insufficient information in the legend in the graph. But no one here has complained about that.

Quote:I found some errors in the code. E.g. you had calculated the wrong mean when re-centering. Again, you had inputted values manually instead of using a function. Bad.

What do you mean by re-centering ? I centered, but I did not re-centered. I also don't understand "inputted values". If you mean imputed values, I didn't do anything like this. It seems you said it was the value 40.62 which was the problem. Once again, I said I'm not wrong. In my article, I said I have taken the mean age for people having wordsum score and I have also applied sampling weight for all of my analyses. See below :

Code:
. summarize wordsum age logincome educ degree year cohort [aweight = weight] if !missing(wordsum)

    Variable |     Obs      Weight        Mean   Std. Dev.       Min        Max
-------------+-----------------------------------------------------------------
     wordsum |   23817  24541.0623    6.013715   2.102101          0         10
         age |   23817  24541.0623    40.62188   13.87005         18         69
   logincome |   21797   22231.922    10.12833   .9611794   5.501258   11.95208
        educ |   23775  24496.7364    13.09723   2.897454          0         20
      degree |   23779  24501.2668    1.396848   1.142989          0          4
-------------+-----------------------------------------------------------------
        year |   23817  24541.0623    1992.703   11.16641       1974       2012
      cohort |   23817  24541.0623    1952.081    17.3142       1905       1994



Quote:d1$fitted = predict(R_rocks)[,1] #fitted values, you only first the first column

This is not what I did. What is the purpose of [,1] here ?

Anyway, you're just showing me that I was right about R. No, R doesn't rock, it sucks. When you request fitted() without [,1] you will actually get the weird plot I had before. Like I've said, R is annoying. It sucks when you have missing data. Even when you correct for missing data, you still have problem, because you need to know first that fitted() must be used in conjunction with [,1]. You knew that, but I didn't. You'll never, never, never have this sort of problems in other softwares, where the procedure is straightforward. But R is silly.

Quote:The largest problem was loading the datafile correctly

You didn't pay any attention to my syntax. Remember :

entiredata<-read.csv("GSSsubset.csv")

It's csv, not sav file. This means I have already converted my sav into csv file. It has no problem of factor variables because everything in csv files is numerical.


RE: [ODP] An update on the narrowing of the black-white gap in the Wordsum - Meng Hu - 2014-Dec-19

I have attempted some simulations. See below. Something is definitely odd with the Shapiro-Wilk test. I do not recommend this test.

Code:
a<-rnorm(2000,0,2)
b<-rnorm(500,-2,4)
c<-rnorm(1500,4,4)
x<-c(a,b,c)
hist(x, freq=FALSE)
shapiro.test(x)


On the above, almost everyone can easily agree from the histogram that x is normal or roughly normal. But not according to the SW "W" value, if I use your cut off. I recommend you to repeat the copy-paste on R. You see always the same thing. W is always below 0.99 and the histogram always look the same; slight skew and slight kurtosis.

Code:
x<-rnbinom(5000,7,.3)
hist(x)
shapiro.test(x)


Repeat the copy-pasting again. You see the distribution is skewed. It's modestly non-normal, and most people will agree with my look on the histogram. But W value is always around 0.96.

Code:
x <- rlnorm(5000,3,.22)
hist(x)
shapiro.test(x)


I left the best in the end. Try it several times. And if possible, try it many times. Now, I attach two graphs. sw1 and sw2. (EDIT : there is no title in the graphs. Anyway, sw1 is the graph on the left, attachment 577, and sw2 is on the right, attachment 578.) Most of the time, the above syntax generates something very to close sw1. But sometimes, you get something like sw2. You can tell by "eyeballing" that the two distributions are not alike. You will surely agree that most people can easily accept the sw1 as fairly normal, but you will surely also agree that most people won't accept the sw2 as normal, because of its high kurtosis.

But wait... look at the W value in the two pictures. They are always the same. Always around 0.97. This tells us that SW cannot detect an important source of non-normality, e.g., kurtosis. So, histogram has now three superior features compared to SW test.

1. Most (if not all) people use SW test with p value, not W value.
2. Histogram can tell which distribution you have, and can guide you toward the most appropriate method.
3. SW test cannot distinguish biases in distribution that histogram can definitely do.

Considering the cut-off value of 0.99, because it's a "guide" and "recommendation" I think it definitely needs to pass the peer review. Otherwise, it will be difficult for researchers to agree on that "rule" if the eminent experts can show it wrong.

But I haven't found anything yet about cut off for W value. If it comes to this, I will perhaps think of removing it. In any case, SW test adds nothing more to histogram and P-P/Q-Q plots.

Anyway,

I said before :

Quote:There is nothing universal in science

I should have been explicit : "in social science".

I have also said I will publish a blog article concerning the use of Yhat. But I changed my mind. I've decided to publish it right now.
http://humanvarieties.org/2014/12/18/how-to-calculate-and-use-predicted-y-values-in-multiple-regression/

EDIT2 : I got a response from Kenneth C Land (a nice guy, I can tell). He agrees with my conclusion that the age effect with respect to black-white difference was wrongly taken as cohort effect. So, the tobit/OLS regressions were clearly wrong. And multilevel regression the correct procedure.


RE: [ODP] An update on the narrowing of the black-white gap in the Wordsum - Chuck - 2014-Dec-22

Quote:EDIT2 : I got a response from Kenneth C Land (a nice guy, I can tell). He agrees with my conclusion that the age effect with respect to black-white difference was wrongly taken as cohort effect. So, the tobit/OLS regressions were clearly wrong. And multilevel regression the correct procedure.

I suspected as much. Conceptually, I now have no problem with the analysis.

Regarding the paper -- you're not going to like this -- I think that you should rewrite some of it so to emphasize the novelty of the analysis. Basically, what you did was tack a multilevel regression analysis onto your original analysis. But you don't make the point of the paper clear.

What you should do is clearly state:

H&H's method was incorrect. Multilevel regression should be used. I will demonstrate the methodological effect. First, I will replicate H&H's findings using OLS, etc.. Second I will analyze the data using the more proper multilevel regression. By this, H&H's findings don't replicate. There is no wordsum gap narrowing. This illustrates the importance of using the most appropriate method.

Edit: If you need help rewriting the paper, let me know.

(Also, since you have all the code, maybe you could write a quick follow up paper looking at the White-Hispanic gap using: generation, cohort, age, and period. I don't think that hitherto we have been properly disentangling the effects.)


RE: [ODP] An update on the narrowing of the black-white gap in the Wordsum - Meng Hu - 2014-Dec-22

I would like you to tell me what you want me to add, modify, change, or remove (if there is anything). If I read it correctly, you only asked that I modify the abstract. Is there anything else ?


RE: [ODP] An update on the narrowing of the black-white gap in the Wordsum - Chuck - 2014-Dec-22

(2014-Dec-22, 23:39:48)Meng Hu Wrote: I would like you to tell me what you want me to add, modify, change, or remove (if there is anything). If I read it correctly, you only asked that I modify the abstract. Is there anything else ?


I would like you to modify the discussion, conclusion, and abstract. I would like you to be clearer about what you are doing and about the novelty and importance of your research. I would like you to make clear that the other methods were sub-optimal. This point does not come across in the HV post or the paper. When I read both, the impression I get is: "Well, different methods, different results and maybe the B/W cohort narrowing wasn't as large as thought." And I think: "This Meng Hu guy is really conscientious (if not excessively so), checking so many different models. Kind of a waste of time, though." What I want to get is: "Hey, the proper way of conducting the analysis is Multi-level regression. I will demonstrate. When we do it, we see no narrowing. Readers, make sure to check your method. " And I think: "This Meng Hu guy really knows his stats."


RE: [ODP] An update on the narrowing of the black-white gap in the Wordsum - Chuck - 2014-Dec-23

(2014-Dec-22, 23:53:51)Chuck Wrote: I would like you to modify the discussion, conclusion, and abstract.


start here:

A. "The aim of this article is to provide an update to Huang & Hauser's (2001) study. I use the General Social Survey (GSS) to analyze the trend in the Black-White difference in Wordsum scores by survey years and by cohorts. Tobit regression models show that the Black-White difference diminishes over time by cohorts but just slightly by survey years. That is, the gap closing is mainly a cohort effect. The Black-White narrowing may have ceased in the last cohorts and periods. A multilevel regression is performed to examine if the result still holds, by modeling age as fixed effects and cohort as random effects. There was no racial gap narrowing. Explanations for these conflicting results are provided."

Try ~

B. "The aim of this article is to demonstrate the importance of using multilevel regression when analyzing cohort data. To show this, I analyse the Black-White difference in Wordsum scores by survey years and by cohorts, using the General Social Survey (GSS). Replicating Huang & Hauser's (2001) findings, I find a substantial narrowing of the difference when using Tobit regression models. However, when using the more appropriate multilevel regression models which disentangles cohort from age effects, I find no such narrowing. An explanation for the difference in results is provided."

(The discussion and conclusion should clearly convey the same points as in B.)


RE: [ODP] An update on the narrowing of the black-white gap in the Wordsum - Emil - 2014-Dec-23

The reason I have not followed up is that MH was getting overly emotional and hostile. I thus decided to let him cool off for a bit before replying.


RE: [ODP] An update on the narrowing of the black-white gap in the Wordsum - Chuck - 2014-Dec-23

(2014-Dec-23, 05:31:46)Emil Wrote: The reason I have not followed up...


This isn't how his replies read to me; but my theory of mind interpretations are subject to change with my mood.

Let me clarify a point. You said:

"As with John, I don't understand multi-level regression, so I'm not competent to judge that part of the method."

I understand the method conceptually; and I can generally recognize when it's called for. (In contrast, H&H's reviewers didn't.) I don't, though, grasp the fine nuances in the technique, so I won't be able to catch technical errors. This would be like not understanding certain technical aspects of OLS, say regarding normality assumptions, despite being able to make sense of OLS and grasping the basics and being able to identifying clear nonsense. I do trust MH's judgement regarding these technical points; we usually disagree when it comes to more theoretical issue like the meaning of a Jensen Effect.


RE: [ODP] An update on the narrowing of the black-white gap in the Wordsum - Meng Hu - 2014-Dec-24

Chuck, I will make the modifications. Note that there is a discussion section, but no conclusion section. I don't think I need to add a conclusion section, on top of what will be written in the discussion section.

Also, concerning H&H, I think most people didn't understand it either at the time. The use of multilevel regression has been recommended, but only recently, by Yang & Land in 2004 and 2006. However, you also have an article by Miyazaki & Raudenbush (2000) "Tests for Linkage of Multiple Cohorts in an Accelerated Longitudinal Design" where they attempt to separate age and cohort effects, although this study uses a longitudinal survey (NLSY79, if I read it correctly). It seems to me that today, Hauser is fully aware of this problem, because I saw his name in a paper on Hierarchical APC model written by Yang and/or Land.

EDIT : Later, I will attach the mails (in a doc file) I received from Land, West and Snijders concerning my questions on multilevel regression. Concerning Land, he only had two comments. The first is that I should use the specific calendar years that define the cohorts. What he has in mind is probably what he did in all his papers on the GSS data, i.e., 1905-1910, 1911-1915, 1916-1920, 1921-1925, etc. I didn't do something like this because I was afraid about the sample sizes (extremely small at the first and last cohorts, if you do it like he asked). He also said he would prefer to use age, age^2, age^3, race, race*age, race*age^2, race*age^3, rather than a series of dummy variables, because it's simpler to interpret. I responded that I did it, but the results were the same. He responded that he expected this, and that there was nothing surprising in these patterns.

(2014-Dec-23, 05:31:46)Emil Wrote: The reason I have not followed up is that MH was getting overly emotional and hostile. I thus decided to let him cool off for a bit before replying.


Maybe a little bit, but at the same time, you requested a lot of very unreasonable things.

I said : "my R syntax maybe is not "elegant" to you, but produce correct results".

You said : "no it's bad I don't like it, and it's error prone, so you should modify it".

I respond : "your answer sound like you suspect me having made errors in typing the numbers; if so, check the numbers by yourself rather than continuing saying that the syntax is wrongdoing and bad".

It's not an exaggeration to say this whole discussion is pointless, and time-wasting. And I don't really want to waste time on something like this. Normally, these advices should be given elsewhere, e.g. "Other Discussions".

Another weird request is the figures. When I make a request about the shape and presentation of the article, it's not based on whether I like it or not. The way you usually cite references (i.e., [1], [2], [3] etc. instead of Kirkegaard et al. (2014) etc.) in all of your papers is very annoying to me, but I never said anything about it, because as I said, it's subjective, so it should not be a requirement. On the other hand, if I think the tables and figures would make problems for the credibility of the journal (e.g., using screenshot from blogpost) I will probably say it and it is a reasonable request. But concerning where I should put my figures, I decided this based on a certain logic : I never saw someone else doing as you requested. Either figures+tables are included within the text, or at the very end of the paper. The only reason you have given is because the figures are more important. I wonder in what way. Figure 8 and table 9 are equally important. Figure 8 only plots the parameter estimates given in table 9, where the depiction of the BW gap changes is already, and clearly visible (see column "Coeff. β1j").

You also said things that are very contradictional. Such as that we cannot tell from histogram if the distribution is normal or not. If it weren't you, I would say you're joking and try to play with my mind. In your blog post here, for example, you always accompany each Shapiro-Wilk test with histogram. Why, if you don't need histograms ? The reason is because you can't understand the numbers without graphical representations. I said it many times but you keep saying this. I have also the feeling you know you're wrong, but won't admit it. Otherwise, you wouldn't use graphs with SW test. The only reason why we can guess the extent of the non-normality with SW test is only because we had graphical representations about what a W value of <0.90, 0.95 or >0.99 might be. In fact, SW cannot be understood without histogram, but you don't need SW to understand a histogram. A single number like this is too abstract and it's not possible to understand the shape of the distribution of your data.

Another contradiction is that if eyeballing is silly, I'm wondering why you asked me to put the figures in a "more visible way" to the readers, on the basis that the figures were more important than the tables. That doesn't make sense to me, if I think about what you said about eyeballing.

As I said before, if someone insist on these requests, I will probably have to make it, but that doesn't mean I find them reasonable.