In the biggest push towards Open Science so far, CERN launched its Open Data Portal with 300 terabyte dump of experimental data from real collision events of the Large Hadron Collider (LHC) openly available to all.
Through its Open Data Portal CERN has made available the experimental data in analysable form from the collision data of the first LHC run in 2010. Not just the data, CERN has made available the analytical tools too. The open source software to read and analyse the data together with the corresponding documentation in open access accompanies the data. CERN says that this data is meant for students and citizen scientists and expects that ‘these data will be of high value for the research community, and also be used for education purposes’.
Remember, we are dealing with data at the frontiers of complex scientific research. In an article CERN cautions that the complexity of this data from real collision events should not be underplayed. Those who access the data will have to put in time and effort to understand the data, the learning tools and the techniques to interpret the data.
Sharing of Real Experimental Data
Sharing of real experimental data has always been a highly debated issue. When should the data be shared? How will the credit be attributed to the data? How will access be regulated or should it be regulated at all? These questions had dampened the enthusiasm of well-meaning scientists and institutions that wanted to share data. To all these questions that troubled researchers for long, CERN has given an illustration of how experimental data can be shared, archived and preserved for long.
The Open Data Portal assigns digital object identifiers (DOIs) to the data sets and code, making them citable objects in the normal scientific communications, and offers the data openly for anyone to download since they are published under a Creative Commons license.
Another area of concern among researchers in sharing raw data was how to cite the data. LHC has addressed this concern with a clear data access policy for those who access the data. This contains its publication policy for those who analyse and publish papers by accessing the data. It requires a suitable acknowledgement and disclaimer to be included (in this case LHCb experimental data):
“acknowledgement that the data was collected by LHCb, and disclaimer that no responsibility for the results is taken by the collaboration.
A suitable disclaimer is: This paper is based on data obtained by the LHCb experiment, but is analyzed independently, and has not been reviewed by the LHCb collaboration.”
This is second of such initiatives this year. Earlier this year, scientists from the Laser Interferometer Gravitational-Wave Observatory (LIGO) released data from the first confirmed measurement of gravitational waves and even included source code documenting the analysis step-by-step.
Pioneering Open Science
The LHC experiments at CERN are what one considers to be at the frontiers of research. It is heartening to see CERN adopting Open Science as a matter of conviction. Here is an excerpt from an article On the Road to Open Science written by Tim Smith who leads the collaboration and information services at CERN.
Science is predicated on the concept that the hypotheses that we propose to explain the phenomena that we observe can be tested through repeatable experiment. We should share sufficient details of our observations and conclusions for independent scrutiny, reproduction and verification. In this data-intensive age we have somewhat fallen short of this ideal since we have continued to “share” through publication processes which had no place for data, certainly not large volumes of it, nor the code that was needed to interpret it. Hence Open Science is striving to rebalance the processes and reintroduce data and code as first-class research objects to be shared, scrutinized and reused.
Even for Schools
The CERN Open Data Portal also provides real experiment data (event datasets from the ALICE,ATLAS, CMS and LHCb collaborations) specifically prepared for educational purposes for even high school students. These resources are accompanied by visualisation tools. Apparently over ten thousand high-school students are accessing, using and learning from this data every year. This data is meant for the international masterclasses in particle physics. For educational purposes the complex primary data has to be processed in a manner understandable as simple applications. CERN has developed many such applications and made them available online. In addition, it has invited, those who would like to build similar applications.
A Great Opportunity for India
This is indeed a golden opportunity for researchers and students in India. Indian researchers have been major collaborators in the CERN experiment. But these have been from specialised institutes. India has, at the same time, a large number of institutions teaching and doing research in physics and mathematics, with teachers and students of excellent calibre, but bogged down by outdated syllabus. They have now the ability to access and work on the live data from one of the most complex of the experiments ever carried out in physics. The teachers in these institutions who want to break away from the traditional mould has now tools that can excite brilliant minds to understand how complex problems in physics are addressed. School teachers have an excellent opportunity to ignite the interest of students in Physics.
Physics is seeing exciting times in leading Open Science movement. Indian researchers in physics and in other science disciplines should take an active part in this Open Science movement.