Our lega-seas of data

Fridtjof Nansen made his scientific observations and memoirs publicly available in the printed form.

Fridtjof Nansen made his scientific observations and memoirs publicly available in the printed form.

Fridtjof Nansen sailed the Arctic and Nordic seas and measured the oceans for many years. He did not hide his observational data away in a filing cabinet only for him to use. Rather he published and made it available to everyone. For example, in 1901 he published six volumes containing all his data and memoirs from his famous North Pole Fram Expedition. Nansen knew that his data determined his science. He knew that his data made his analyses reproducible, and therefore transparent. He knew that his data was his legacy.

A lot has changed since the time of Fridtjof Nansen. Today, research cruises span the oceans collecting observations of a myriad parameters at higher resolution and in more detail than ever before. The result is vast amounts of digital data from different locations, collected by different projects. So, how can we manage all this data and still maintain the ideas of openness of Nansen?

Data management has really developed over the past decade or so. It’s true that we can store huge amounts of data, but we cannot just dump all our observations on a server somewhere, and hope that others can use them. The challenge is to plan exactly how to store the data. How do we format it? How do we make it easily accessible? Good data management must answer these questions and develop solutions. Data management is a big job!

Data available in the SOCAT data base and when it was first recorded (source: Are Olsen)

This is where people like Benjamin Pfeil come in.  He is a Data Manager at the University of Bergen and works on several large-scale projects such as CarboChange. He also manages the huge oceanographic data bank of SOCAT and contributes to CARINA. These data banks are collaborations between several projects and institutes in a successful attempt to centralize and standardize observations of surface CO2, salinity, oxygen, and more. There’s a lot of data in these data banks and it keeps on increasing. Benjamin shows me an animation of the observations that are freely available in SOCAT and when they were taken. You can see for yourself that he amount of data increased dramatically over the past couple of decades.

There’s so much data, one might be forgiven for asking whether it is all necessary. Will we ever get around to using and analyzing it all? Benjamin uses family photography as an analogy. Today, we take hundreds, maybe thousands of photos each year of family and friends. We store them on our computers and many of them are never seen again. However these photos are still important to us. If we lose these them, the memories may be lost. It’s the same with oceanographic observations. If they are not documented, then they can be lost. We cannot travel back 50 years and make the same observations again. Maybe not all the data will be used again. Maybe it will gather digital dust like an unused book in a library. Just like a library, what’s important is not that the book is read, but that it is there. Benjamin tells me, ‘we cannot dictate what people will read, use or need. We can just give them opportunities.’

Old written and printed observations lost in the research twilight zone of a dusty institute loft (photo: Mathew Reeve)

Old written and printed observations lost in the research twilight zone of a dusty institute loft (photo: Mathew Reeve)

Will these data management practices continue? Will developments be able to keep up with the ever-increasing amounts of data that we produce? Benjamin is positive and says that the trend will continue. Funding agencies now stipulate that data management is a major part of research projects. Benjamin also says that good data management has become ‘trendy’, because researchers have an inherent desire to secure their legacy. New developments at some data centers mean that other researchers can cite your data just like a normal peer-reviewed article. With these new citation practices comes improved co-authorship and networking opportunities, which can be particularly rewarding for young and early career scientists. Benjamin recommends that young scientists take the leap and make their data available. Young scientists don’t need to rely on their mentor or supervisor; they should just make it happen.

Fridtjof Nansen made his data open and available with the facilities at his disposal. These facilities were limited to a printing press. Many oceanographic observations were therefore printed, filed and sadly forgotten about. Despite being openly accessible, many of these past observations are now lost in the midst of dusty lofts at research institutes around the world. Benjamin and his colleagues now have the technology at their fingertips to make oceanographic data truly open and accessible anywhere in the world and anytime into the future. Our scientific legacy is becoming more secure.


Share on Facebook0Tweet about this on TwitterShare on LinkedIn0Share on Tumblr0Share on Google+0Pin on Pinterest0Share on Reddit0

Mathew Stiller-Reeve

I am a postdoc researcher at NORCE Climate and the Bjerknes Centre for Climate Research. I research the monsoon in Bangladesh and I am the founder of ClimateSnack; a community that hopes to give all young and early career climate scientists an opportunity to practice and improve their scientific communication skills.

Latest posts by Mathew Stiller-Reeve (see all)

SciSnack Disclaimer: We write in SciSnack to improve our skills in the art of scientific communication. We therefore welcome comments concerning the clarity, focus, language, structure and flow of our articles. We only accept constructive feedback. All comments are manually approved and anything slightly nasty will not be accepted.