Thursday, April 28, 2016

CERN Gives a Quantum Push to Open Science


In the biggest push towards Open Science so far, CERN launched its Open Data Portal with 300 terabyte dump of experimental data from real collision events of the Large Hadron Collider (LHC) openly available to all.
Through its Open Data Portal CERN has made available the experimental data in analysable form from the collision data of the first LHC run in 2010. Not just the data, CERN has made available the analytical tools too. The open source software to read and analyse the data together with the corresponding documentation in open access accompanies the data. CERN says that this data is meant for students and citizen scientists and expects that ‘these data will be of high value for the research community, and also be used for education purposes’.
Remember, we are dealing with data at the frontiers of complex scientific research. In an article CERN cautions that the complexity of this data from real collision events should not be underplayed. Those who access the data will have to put in time and effort to understand the data, the learning tools and the techniques to interpret the data.

Sharing of Real Experimental Data
Sharing of real experimental data has always been a highly debated issue. When should the data be shared? How will the credit be attributed to the data? How will access be regulated or should it be regulated at all? These questions had dampened the enthusiasm of well-meaning scientists and institutions that wanted to share data. To all these questions that troubled researchers for long, CERN has given an illustration of how experimental data can be shared, archived and preserved for long.
The Open Data Portal assigns digital object identifiers (DOIs) to the data sets and code, making them citable objects in the normal scientific communications, and offers the data openly for anyone to download since they are published under a Creative Commons license.
Another area of concern among researchers in sharing raw data was how to cite the data. LHC has addressed this concern with a clear data access policy for those who access the data. This contains its publication policy for those who analyse and publish papers by accessing the data. It requires a suitable acknowledgement and disclaimer to be included (in this case LHCb experimental data): 
“acknowledgement that the data was collected by LHCb, and disclaimer that no responsibility for the results is taken by the collaboration.
A suitable disclaimer is: This paper is based on data obtained by the LHCb experiment, but is analyzed independently, and has not been reviewed by the LHCb collaboration.”
This is second of such initiatives this year. Earlier this year, scientists from the Laser Interferometer Gravitational-Wave Observatory (LIGO) released data from the first confirmed measurement of gravitational waves and even included source code documenting the analysis step-by-step.

Pioneering Open Science
The LHC experiments at CERN are what one considers to be at the frontiers of research. It is heartening to see CERN adopting Open Science as a matter of conviction. Here is an excerpt from an article On the Road to Open Science written by Tim Smith who leads the collaboration and information services at CERN.
Science is predicated on the concept that the hypotheses that we propose to explain the phenomena that we observe can be tested through repeatable experiment. We should share sufficient details of our observations and conclusions for independent scrutiny, reproduction and verification. In this data-intensive age we have somewhat fallen short of this ideal since we have continued to “share” through publication processes which had no place for data, certainly not large volumes of it, nor the code that was needed to interpret it. Hence Open Science is striving to rebalance the processes and reintroduce data and code as first-class research objects to be shared, scrutinized and reused.

Even for Schools
The CERN Open Data Portal also provides real experiment data (event datasets from the ALICE,ATLASCMS and LHCb collaborations) specifically prepared for educational purposes for even high school students. These resources are accompanied by visualisation tools. Apparently over ten thousand high-school students are accessing, using and learning from this data every year. This data is meant for the international masterclasses in particle physics. For educational purposes the complex primary data has to be processed in a manner understandable as simple applications. CERN has developed many such applications and made them available online. In addition, it has invited, those who would like to build similar applications.

A Great Opportunity for India
This is indeed a golden opportunity for researchers and students in India. Indian researchers have been major collaborators in the CERN experiment. But these have been from specialised institutes. India has, at the same time, a large number of institutions teaching and doing research in physics and mathematics, with teachers and students of excellent calibre, but bogged down by outdated syllabus. They have now the ability to access and work on the live data from one of the most complex of the experiments ever carried out in physics. The teachers in these institutions who want to break away from the traditional mould has now tools that can excite brilliant minds to understand how complex problems in physics are addressed. School teachers have an excellent opportunity to ignite the interest of students in Physics.
Physics is seeing exciting times in leading Open Science movement. Indian researchers in physics and in other science disciplines should take an active part in this Open Science movement.
 



Sunday, April 3, 2016

Outcome Switching

Make my Trials!

A hot topic that is now being discussed in the scientific research is 'outcome switching'.  In layman’s language outcome switching means the authors of a study did not report something they set about to find they set out initially but included additional outcome without disclosing the result of the original findings, with no explanation for the change. The field is so new that there is no Wikipedia entry on this, yet!

The most discussed case in outcome switching is that of a clinical trial named ‘Study 329’. That study was sponsored by GlaxoSmithKline (GSK) on antidepressant paroxetine with tradename Paxil. The result that was published in 2001, claimed the demonstrated that the drug was well tolerated and effective as an antidepressant for kids. This way in which the result of this study has been published demonstrates what outcome switching is about. 
Study 329 of GSK set out to monitor the efficacy of Paxil as an antidepressant on eight specific parameters. On all these parameters the research showed that it had no significant impact; the drug was no better than the placebo sugar pill. The researchers then came up with additional 19 new measures. Just 4 of the 19 new parameters showed result. In the paper that was published the researchers presented the results on these four only without discussing the other components as if the study was set out to measure the impact of these four parameters only. So even though the pre-decided parameters showed negative outcomes these were not discussed in the publication and few additional parameters that suited the study got reported.

The above is not a rare incident. The Economist reported about a study published in BMC Medicine in 2015 which found that 31% of the clinical trials did not stick to their original parameters. The problem is beginning to receive academic attention. University of Oxford, has launched the COMPareProject. The  project aims to systematically check every trial published in the top five medical journals - the New England Journal of Medicine, the Journal of the American Medical Association, The Lancet, the Annals of Internal Medicine and British Medical Journal (BMJ). The finding so far are revealing. COMPare team has so far studied 67 trials (information in their site as on 3rd April). They found 9 trials conducted as per original protocol questions. In other cases, they found that 300 outcomes were not reported and 357 new outcomes were added.
GSK’s study 329 was initiated in 1992 and got completed in 1998. 

Fortunately, there has been changes in the regulatory processes after this. All trials have to be registered before they begin and the specified outcomes have to be published on website clinicaltrial.gov or similar national sites.

The website, retractionwatch carries an interview with Ben Goldcare the project leader of COMPare. He explains that not all changes in the outcomes of clinical trials are for nefarious reasons. What is of concern according to him is that when every time outcomes are switched, that creates a culture of permissiveness that lets other people do the same to tweak the trial’s conclusions.

The motivation for outcome switching can be many. Could it be mere survivorship bias? Survivorship bias is the logical error of concentrating on the people or things that "survived" some process and inadvertently overlooking those that did not because of their lack of visibility. This can lead to false conclusions in several different ways. But in clinical trials this can be very costly. But the evidence above weighs otherwise.

But the price may have been paid by patients. GSK's Paxil has been prescribed to millions of children and young adults. By early 2000s its sale was nearly US$ 2 billion a year!

You may read more about it in the Vox article or in the aptly titled article in The Economist - For My Next Trick