Back to [Archive] Meta

1
Data sharing policy
Admin
Updating this thread to a more general discussion of the data sharing policy

---

Data sharing before or after publication?

Before:
- Reviewers and anyone else can check the dataset and check analyses before publication.
- There is no delay in data sharing. The data are immediately available for anyone to use upon submission.

After:
- There can be no conflicts of priority.

Conflict of priority
The longer the delay from submission to publication (around 10 days right now), the easier it is for a malicious person to copy the dataset, write a paper, send it to another journal and try to claim priority.

Priority conflicts can easily be settled by the author since he has the original files which have an earlier creation date. He can also point to the submission thread which necessarily predates the copyist's submission.

I don't think conflicts of priority will be a problem and hence favor the maximally open and quickest model.
I favour the "let the author decide" (whether to attach datasets before or after publication) option. The delay from submission to publication is based on low N and could increase. Besides, due to probability theory, some papers will inevitably sit in the forum for a much longer time than others, increasing risk of priority conflicts.
An alternative is that the reviewers can request that the datafile be sent via email, thus not forcing the author to publish the dataset before publication.
Admin
I favour the "let the author decide" (whether to attach datasets before or after publication) option. The delay from submission to publication is based on low N and could increase. Besides, due to probability theory, some papers will inevitably sit in the forum for a much longer time than others, increasing risk of priority conflicts.


Any priority conflicts can be easily decided based on the date of the submission on the forum. Or alternatively by the creation date of the data file on the author's computer.

An alternative is that the reviewers can request that the datafile be sent via email, thus not forcing the author to publish the dataset before publication.


Yes, I considered that as well. This only enables reviewers to see the file, not anyone else who might want to use (legitimately) or inspect for themselves.
The author emails the dataset to reviewers before publication and posts it in the forum after publication. Readers do not need to see the file before the paper is published, only reviewers do.
Admin
The author emails the dataset to reviewers before publication and posts it in the forum after publication. Readers do not need to see the file before the paper is published, only reviewers do.


This is one possible method, yes.

I disagree that only appointed reviewers need to see the file before. As I said, non-appointed reviewers may have valuable input too (e.g. error correction).

Currently, the policy is that datasets must be in the submission thread opening post. This is my preferred policy. Reviewers have however generally ignored this requirement and only demanded datasets just before publication, i.e. in line with your preferred policy.
Then even non appointed reviewers can request the file to be sent via email. At least the author wary of conflict of interests will have proof (email).
Admin
Recently we discussed some aspects of the data sharing policy in a submission. http://openpsych.net/forum/showthread.php?tid=46

This journal has a mandatory data sharing policy. Papers that do not share data files cannot be published. Very briefly, the reasons for this are 1) Protects against data loss (e.g. computer breakdown), 2) sharing data is necessary for proper peer-review, 3) sharing data enables many forms of secondary science, meta-analysis, pooling of data, etc.. See Publish (your data) or (let the data) perish! Why not publish your data too?

However, in some cases there is data which is free to obtain, but which cannot be shared due to copyrights. What should the policy be here? In the case of Curtis' paper, I have shared the necessary data so that he does not have to risk it. This publisher is not based in the US. We will move the server to a safe location if necessary (i.e. anonymous hosting paid with cryptocurrencies).

Thoughts?
In the last few months I have requested data from psychology researchers and I have been ignored, denied, or asked to pay a large sum of money. While annoying, my thought is generally if the data is theirs they can do with it as they choose. However, in each of these cases the individuals teach at public institutions and the data was collected via a federal grant. It isn't just that the researchers are not abiding by the scientific ideal of openness; they are skirting the directives of the grant to share data.

The stand for openness is the primary reason I am excited and optimistic about this journal. However, if the stand is too stringent, it can stifle the success of the journal. Many important data files are open to the public, but can't be redistributed. If the use of these data files is not allowed by the journal because of policies of data sharing, the number and quality of articles that can be published here will be negatively affected.

It seems to me if the data is public, but non-distributable supplying a link to the data should meet the data sharing criteria of the journal. Note this doesn’t include data that is “public” in the sense that you need to be from a member institution or fill out an application to access the data.
Admin
The primary thing, as I see it, is that other researchers get the specific datafiles used in the study, so that they can replicate the analyses. Compare it with mathematics. Imagine mathematicians publishing papers without the equations or steps, but merely talking about them, perhaps reporting that they contain an average of 5.5 terms per equation. It's preposterous.

One might also criticize that we have been relying on SPSS files, a proprietary program. However, just pirating the software solves this problem. I also think many free programs can only SPSS files (not sure, does anyone know?).

I'm curious to hear what the reviewers think. They are, after all, those putting in the work for the journal.
Admin
There is apparently a FOSS alternative to SPSS. I tried using it but it was very complicated to install, so I gave up. http://www.gnu.org/software/pspp/
Then even non appointed reviewers can request the file to be sent via email. At least the author wary of conflict of interests will have proof (email).


That's a good option. Also emil, I don't think you need to be too stringent with the data sharing. The author can also publish the covariance matrix (+ variance, Mean, SD). With that, the reviewers can use them and replicate the results. Even this is really an improvement. If you look at most studies in other journals, you don't see the covariance matrix reported. Apparently it's not obligatory. But it's a defect. Several times, in meta-analyses, the author(s) mention in their inclusion criteria that they select data for which they have the variance/covariance matrix.

By the way, I remember 2 times when I asked data (by mail) to some authors. I knew they wouldn't give me the raw data, so I asked only for the correlation (or covariance, or both) matrix. I have been told by them that the data do not belong to them and that even this is not possible. Once, I have been told that they don't have the data.
Admin
As mentioned in his paper, Wicherts did a study where they found that there was a correlation between authors being reluctant to share datasets and statistics errors in papers they had written. Extreme bad for science.
Stats error is unlikely to go undetected by reviewers if the "input data" (e.g., covariance matrix) is reported. The only little disadvantage in it is if the author itself misreported the numbers that appear on the input data.

If you do PCA, path analysis, or multivariate genetic analyses by way of DF regressions, for example, just having the covariance (or the cross-twin covariance for genetic analyses) is sufficient to ensure that you can replicate the same numbers that appear in the submitted paper.
Admin
I talked with Meng Hu about what to do when the raw data isn't available, but some summary data is which is what the researcher used. This is the case for an upcoming submission s/he is working on.

My answer is that the data sharing policy requires that the data the researcher used be made available. If the researcher only used summarized data, then there is no requirement of making the inaccessible raw data available.

Does that seem reasonable?

For instance, if a researcher found a study with a correlation matrix and used that for factor analysis. The data sharing policy would require the correlation matrix be published/attached, but would not require the raw data to be published, which might of course be unavailable to the researcher.
I don't think John is planning to do a factor analysis, and I don't see the option for doing it, in the NCES database we use. We only make tabulations. Thus, the only thing we can do is to add a supplementary paper in which we illustrate the procedure, with screenshots, like the ones I sent you before. We can also put some screenshots of the numbers generated by the tabulations, as an example, but I don't think we will insert all of them. There are too many tables.
Admin
This is fine with me. I think the other reviewers will agree.
1