In the
biggest push towards Open Science
so far, CERN launched its Open
Data Portal with 300 terabyte dump of experimental data from real
collision events of the Large Hadron Collider (LHC) openly available to all.
Through its Open Data Portal CERN has made available the experimental
data in analysable form from the collision data of the first LHC run in 2010.
Not just the data, CERN has made available the analytical tools too. The open
source software to read and analyse the data together with the corresponding
documentation in open access accompanies the data. CERN says
that this data is meant for students and citizen scientists and expects that ‘these
data will be of high value for the research community, and also be used for
education purposes’.
Remember, we
are dealing with data at the frontiers of complex scientific research. In an
article CERN cautions that the complexity of this data from real collision
events should not be underplayed. Those who access the data will have to put in
time and effort to understand the data, the learning tools and the techniques
to interpret the data.
Sharing of Real Experimental Data
Sharing of
real experimental data has always been a highly debated
issue. When should the data be shared? How will the credit be attributed to the data? How
will access be regulated or should it be regulated at all? These questions had dampened
the enthusiasm of well-meaning scientists and institutions that wanted to share
data. To all these questions that troubled researchers for long, CERN has given
an illustration of how experimental data can be shared, archived and preserved
for long.
The Open Data
Portal assigns digital object identifiers (DOIs) to the data sets and code,
making them citable objects in the normal scientific communications, and offers
the data openly for anyone to download since they are published under a
Creative Commons license.
Another area of concern among researchers in
sharing raw data was how to cite the data. LHC has addressed this concern with
a clear data
access policy for those who access the data. This contains its publication
policy for those who analyse and publish papers by accessing the data. It requires a suitable
acknowledgement and disclaimer to be included (in this case LHCb experimental data):
“acknowledgement
that the data was collected by LHCb, and disclaimer that no responsibility for
the results is taken by the collaboration.
A
suitable disclaimer is: This paper is based on data obtained by the LHCb
experiment, but is analyzed independently, and has not been reviewed by the
LHCb collaboration.”
This is
second of such initiatives this year. Earlier this year, scientists from the
Laser Interferometer Gravitational-Wave Observatory (LIGO) released data from the
first confirmed measurement of gravitational waves and even included source
code documenting the analysis step-by-step.
Pioneering Open Science
The LHC
experiments at CERN are what one considers to be at the frontiers of research.
It is heartening to see CERN adopting Open Science as a matter of conviction. Here
is an excerpt from an article On the
Road to Open Science written by Tim Smith who leads the collaboration and
information services at CERN.
Science
is predicated on the concept that the hypotheses that we propose to explain the
phenomena that we observe can be tested through repeatable experiment. We
should share sufficient details of our observations and conclusions for
independent scrutiny, reproduction and verification. In this data-intensive age
we have somewhat fallen short of this ideal since we have continued to “share”
through publication processes which had no place for data, certainly not large
volumes of it, nor the code that was needed to interpret it. Hence Open Science
is striving to rebalance the processes and reintroduce data and code as
first-class research objects to be shared, scrutinized and reused.
Even for Schools
The CERN Open
Data Portal also provides real experiment data (event datasets from the ALICE,ATLAS, CMS and LHCb collaborations)
specifically prepared for educational purposes for even high school students. These
resources are accompanied by visualisation tools. Apparently over ten thousand
high-school students are accessing, using and learning from this data every
year. This data is meant for the international masterclasses in particle
physics. For educational purposes the complex primary data has to be processed
in a manner understandable as simple applications. CERN has developed many such
applications and made them available online. In addition, it has invited, those
who would like to build similar applications.
A Great Opportunity for India
This is
indeed a golden opportunity for researchers and students in India. Indian
researchers have been major collaborators in the CERN experiment. But these
have been from specialised institutes. India has, at the same time, a large
number of institutions teaching and doing research in physics and mathematics,
with teachers and students of excellent calibre, but bogged down by outdated syllabus.
They have now the ability to access and work on the live data from one of the
most complex of the experiments ever carried out in physics. The teachers in
these institutions who want to break away from the traditional mould has now
tools that can excite brilliant minds to understand how complex problems in
physics are addressed. School teachers have an excellent opportunity to ignite
the interest of students in Physics.
Physics is
seeing exciting times in leading Open Science movement. Indian researchers in
physics and in other science disciplines should take an active part in this Open
Science movement.