Software in science: a plea to free your code

All scientific statements must be testable, and any such test should be reproducible. However, more and more published science is not reproducible. Nowadays we almost invariably use computers and software to perform calculations that include numerical simulations that bring to life the concepts and models we have of the world. Besides such computational experiments, the whole process from an initial hypothesis or idea to a published paper is filled with numerical calculations, ranging from drawing diagrams for a conceptual model to the analysis of observational data or the numerical output of a model simulation.

In highly computationalised fields like the science, or art, of climate modelling, the complexity of numerical models is often characterised as the evildoer of irreproducibility, though this issue can be found in most scientific disciplines, including those that do not use complex numerical models. The models’ complexity suggests to some of us that publishing model code is not useful, but we should not feel demotivated to do it anyway. It is rather the case that no matter how difficult it is to reproduce a scientific result, we must make sure we have done everything that could help our fellow scientists to give as much information as possible that could help approach reproducibility and/or give the insight that may help set up a similar experiment (Ince et al., 2012).

Therefore we should publish the methods of our research in full, including our model code. We should make sure that the model can be run with software that is available to anyone. The solution is to publish your code as free software, and to make sure any required software is also freely available. Free refers here to the freedom to run and share the code, and make modifications and share those as well. To some this may seem like a lot of hassle. It is not, or at least it should not be. If you are using a model of which you are not the main author and you made significant changes, ask the maintainer of the model’s repository to get write access to the repository. For the final publication (of your respective paper) you can upload the corresponding version of your model code to the data repository Zenodo. Of course, you can use Zenodo also to share your model output. I often include analysis scripts in an electronic supplement to the paper. Others like to put everything on GitLab or a similar website. Probably your institute has a code sharing service as well. The most important thing is that your code is easily accessible.

Sharing your code with the general public (and thus the scientific community) seems like an obvious thing to do, but the fact that codes are often not published suggests there are obstacles. I believe most of them only seem like obstacles, and others are easy to overcome. One apparent issue is that usually the company you work for owns the copyright. At almost any scientific institute, when you ask whether it it is alright to license your software under a free software licence you will get a positive answer, though sometimes this takes some time. Three out of three scientific institutes I worked for have responded that you may license your code under a free software licence. Who the actual copyright has? Not that important when you can set the software free.

Another potential issue is that it takes work to clean up your code, and upload it in the right format to a repository that is well suited for this purpose. The last part may take a bit of time the first time you do this, but hereafter sharing your code with the general public will be quick and feels only natural to do. Concerning the former, making your code look good, is something that takes time, but in my experience it is a good investment anyway. You may find bugs and you may still be able to easily understand your own code at a later time. The code does not need to look perfect; generally, fellow scientists will already be happy with somewhat tidy code. You (or someone else) may find time after a publication to improve the code further; science and code development is characterised by a continuous process of improvement.

But just publishing the code is not enough: the scientific community must not be restricted in testing the code to reproduce the results, and in building further on the science already reproduced with it. Therefore the software should be free, meaning that rights are given to the user to run, study and share the code, make improvements and publish those. By default any work, including software, falls under copyright law, meaning nobody downloading or getting the code has the above rights. Except when you give those rights explicitly.

How do you make your software free? You need to license the software. Any free software licence would be acceptable as it sets code free, so the community can use, test and improve it. The GNU General Public License (GPL) is especially useful in that it protects the code from being incorporated in proprietary software. Attach a COPYING or LICENSE.txt file that includes the legal text of the free software licence, for instance the GPL. Additionally, put a copyright and licence notice in the header of each non-trivial source file. Then put it online.

There is potentially an additional advantage when using the GNU GPL: any closely associated code will be freed as well. This happens because of the share-alike or copyleft nature of the GPL. However, large models usually have many contributers and include code from different sources which may contain GPL-incompatible code. Choosing a good license for your code may help straighten out part of the legal and ethical issues, but often it is difficult to sort this out completely. This is okay; there is no need for any party to force the hand of another’s. The most important thing is that scientists can reproduce your results and have the freedom to do derivative work, because this is an essential part of doing science. The only way to achieve this is to make your code accessible under a free licence.

In other words, free your code!

Further reading

© 2018, 2020 Marco van Hulten, licensed under CC-BY-ND-4.0

Share on Facebook0Tweet about this on TwitterShare on LinkedIn0Share on Tumblr0Share on Google+0Pin on Pinterest0Share on Reddit0'
Marco has a doctoral degree in ocean biogeochemistry and modelling, and currently holds a research appointment in the Geophysical Institute at the University of Bergen. His focus is the ocean, especially the trace element cycles and sediment–seawater interface. Marco has an strong interest in the foundations and philosophy of science and the computational techniques that are used to bring to life the concepts and models we have of the world.'

Latest posts by Marco van Hulten (see all)

SciSnack Disclaimer: We write in SciSnack to improve our skills in the art of scientific communication. We therefore welcome comments concerning the clarity, focus, language, structure and flow of our articles. We only accept constructive feedback. All comments are manually approved and anything slightly nasty will not be accepted.