Licensing
Open licensing of scientific material
This memo gives recommendations for selecting open source license for scientific material, including software, data and documents. Let us continue to make the newest computational techniques easily available to spark new applications and further development!
Recommended open licenses for academic publishing
Software and source code
The modified BSD licenses (in particular FreeBSD) and MIT license set minimal restrictions on the end user, and have therefore been recommended for academic purposes, for instance at ICML/MLOSS workshop 2010 (V. Stodden and G. Bradski), and in the Quick Guide to Software Licensing for the Scientist-Programmer in PLoS CB. Also note that none of the standard licenses below excludes commercial use. Based on my experiences in developing scientific open source tools, I suggest the following preference order for selecting an open license:
- FreeBSD (or other modified BSD licenses). Allows essentially free reuse of the code, and relicensing by others, assuming that the original open licensing statement is distributed with the code. Preferred over MIT since makes an explicit statement concerning binary versions of the code and contains a notice prohibiting the use of the name of the copyright holder in promotion. See also other BSD licenses that are slightly more restrictive than FreeBSD.
- MIT license Essentially similar to FreeBSD but slightly less explicit regarding binary versions of the code.
- GPL (>=2) is more restrictive than FreeBSD or MIT licenses. One reason to license software under GPL occurs when your code contains parts of GPL-licensed code: the complete source code utilizing portions of GPL-licensed code needs to be released under GPL since this is a viral license. For compatibility, GPL(>=2) is often preferred over GPLv2 to allow end user select between GPLv2 and any later version. GPL should not the default choice for academic publishing since its requirements of viral distribution are somewhat incompatible with the general scientific standard of unrestricted reuse.
- LGPL requires that modified versions of your code are also published under LGPL, but not the whole software that utilizes the code. It is therefore less restrictive than GPL, but more restrictive than BSD or MIT that only require preservation of the license note in the code. See also reasons not to use LGPL.
There are many other open licenses but they may set restrictions that are not appropriate for academic purposes, or may have compatibility problems with other licenses. For instance, Apache2.0 is nearly identical to FreeBSD and MIT, but not always GPL2-compatible which may prevent reuse (see also this link). Many people tend to think that GPLv3 sets too extensive restrictions, including requirements on hardware (see also this and this for further discussion). ‘'’Public domain’’’ or ‘'’missing license’’’ do not imply open source and can prevent the reuse of your work since the concepts are legally ill-defined. ‘'’Non-commercial clause’’’ is sometimes assigned to license (‘‘not for commercial purposes’’); note however that none of the standard open licenses sets such restrictions- commercial use of academic research results is not restricted in general, so why should source code be different? For further info, check comparison of free software licenses.
Data
- CC-BY 4.0 (Recommended for data)
- CC0 (Recommended for metadata; gives away all rights)
- ODC-BY Open Data Commons Attribution License (another open data license)
- Public Domain Dedication and License (PDDL) (another open data license)
See also the Open Data Manual
Documents
Publishing your work
The following repositories can be used to distribute all research documents (data, code, publications, videos etc.). Github is mostly for code; Figshare and Dryad are suited for data and miscellaneous material:
- Github (use primarily for code)
- Figshare
- Data Dryad (includes a fee)
You can also list your work in an appropriate repository for open source content:
- Software: Machine Learning Open Source Software (MLOSS)
- Data: Public repositories for open data
- Documents: arXiv; bioArXiv; etc.
Open licensing step-by-step
-
Pick a standard license (see suggestions below)
-
Check license compatibility if your project includes external source code.
-
Ensure that you own sufficient copyrights for the licensing
-
Mention the license name in your documentation and source files, and include the full license as a text file or provide a link.
-
Add your personal information (i.e. name, email, affiliation) in the license.
Why license my work?
Minimally restrictive licenses can help to promote the core scientific standards of publicity, transparency, reproducibility, and unrestricted use of research results. Motivations for open licensing include:
- Guarantee your own rights to your work
- Encourage the reuse of your work in a legally sustainable manner with minimal effort; missing licensing statements can prevent reuse
- Enforce core scientific standards of transparency and reproducibility (see papers by V. Stodden)
- Valued by funding organizations, other scientists, fellow geeks, and laymen.
- It is simple
- Publish your computer code: it is good enough (Nature News)
Links and References
- A Quick Guide to Software Licensing for the Scientist-Programmer; Morin/Urban/Sliz, PLoS CB 2012
- Pohdintoja avoimen datan lisensoinnista Suomessa
- Hietanen, Herkko: The Pursuit of Efficient Copyright Licensing — How Some Rights Reserved Attempts to Solve the Problems of All Rights Reserved. Doctoral dissertation, 2008.
- Oksanen, Ville: Five Essays on Copyright in the Digital Era. Doctoral dissertation, 2008.
- Stodden, Victoria: Research on open standards for computational science
- opensource.com