Being a data scientist in a startup I can program with several languages, but often R is a natural choice.
Recently I wanted my company to build a product based on R. It simply seemed like a perfect fit. But this turned out to be a slippery slope into the open-source code licensing field, which I wasn’t really aware of before.
Bottom line: legal advice was not to use R!
Was it a single lawyer? No. The company was willing to “play along” with me, and we had a consultation with 4 different software lawyers, one after the other.
What is the issue? R is licensed as GPL 2, and most R packages are also GPL (whether 2 or 3).
GPL is not a permissive license. It is categorized as “strongly protective”.
In layman terms, if you build your work on a GPL program it may force you to license your product with a GPL license, too. In other words – it restrains you from keeping your code proprietary.
Now you say – “This must be wrong”, and “You just don’t understand the license and its meaning”, right? You may also mention that Microsoft and other big companies are using R, and provide R services.
Well, maybe. I do believe there are ways to make your code proprietary, legally. But, when your software lawyers advise to “make an effort to avoid using this program” you do not brush them off 🙁
Now, for some details.
As a private company, our code needs to be proprietary. Our core is not services, but the software itself. We need to avoid handing our source code to a customer. The program itself will be installed on a customer’s server. Most of our customers have sensitive data and a SAAS model (or a connection to the internet) is out of the question. Can we use R?The R Core Team addressed the question “Can I use R for commercial purposes?”. But, as lawyers told us, the way it is addressed does not solve much. Any GPL program can be used for commercial purposes. You can offer your services installing the software, or sell a visualization you’ve prepared with ggplot2. But, it does not answer the question – can I write a program in R, and have it licensed with a non-GPL license (or simply – a commercial license)?
The key question we were asked was is our work a “derivative work” of R. Now, R is an interpreted programming language. You can write your code in notepad and it will run perfectly. Logic says that if you do not modify the original software (R) and you do not copy any of its source code, you did not make a derivative work.
As a matter of fact, when you read the FAQ of the GPL license it almost seems that indeed there is no problem. Here is a paragraph from the Free Software Foundation https://www.gnu.org/licenses/gpl-faq.html#IfInterpreterIsGPL: If a programming language interpreter is released under the GPL, does that mean programs written to be interpreted by it must be under GPL-compatible licenses?(#IfInterpreterIsGPL) When the interpreter just interprets a language, the answer is no. The interpreted program, to the interpreter, is just data; a free software license like the GPL, based on copyright law, cannot limit what data you use the interpreter on. You can run it on any data (interpreted program), any way you like, and there are no requirements about licensing that data to anyone.
Problem solved? Not quite. The next paragraph shuffles the cards:
However, when the interpreter is extended to provide “bindings” to other facilities (often, but not necessarily, libraries), the interpreted program is effectively linked to the facilities it uses through these bindings. So if these facilities are released under the GPL, the interpreted program that uses them must be released in a GPL-compatible way. The JNI or Java Native Interface is an example of such a binding mechanism; libraries that are accessed in this way are linked dynamically with the Java programs that call them. These libraries are also linked with the interpreter. If the interpreter is linked statically with these libraries, or if it is designed to link dynamically with these specific libraries, then it too needs to be released in a GPL-compatible way.
Another similar and very common case is to provide libraries with the interpreter which are themselves interpreted. For instance, Perl comes with many Perl modules, and a Java implementation comes with many Java classes. These libraries and the programs that call them are always dynamically linked together.
A consequence is that if you choose to use GPLed Perl modules or Java classes in your program, you must release the program in a GPL-compatible way, regardless of the license used in the Perl or Java interpreter that the combined Perl or Java program will run on
This is commonly interpreted as “You can use R, as long as you don’t call any library”.
Now, can you think of using R without, say, the Tidyverse package? Tidyverse is a GPL library. And if you want to create a shiny web app – you still use the Shiny library (also GPL). Assume you will purchase a shiny server pro commercial license, this still does not resolve the shiny library itself being licensed as GPL.
Furthermore, we often use quite a lot of R libraries – and almost all are GPL. Same goes for a shiny app, in which you are likely to use many GPL packages to make your product look and behave as you want it to.
Is it legal to use R after all?
I think it is. The term “library” may be the cause of the confusion. As Perl is mentioned specifically in the GPL FAQ quoted above, Perl addressed the issue of GPL licensed interpreter on proprietary scripts head on (https://dev.perl.org/licenses/ ): “my interpretation of the GNU General Public License is that no Perl script falls under the terms of the GPL unless you explicitly put said script under the terms of the GPL yourself.Furthermore, any object code linked with perl does not automatically fall under the terms of the GPL, provided such object code only adds definitions of subroutines and variables, and does not otherwise impair the resulting interpreter from executing any standard Perl script”
There may also be a hidden explanation by which most libraries are fine to use. As said above, it is possible the confusion is caused by the use of the term “library” in different ways.
Linking/binding is a technical term for what occurs when compiling software together. This is not what happens with most R packages, as may be understood when reading the following question and answer: Does an Rcpp-dependent package require a GPL license?
The question explains why (due to GPL) one should NOT use the Rcpp R library. Can we infer from it that it IS ok to use most other libraries?
“This is not a legal advice”
As we’ve seen, what is and is not legal to do with R, being GPL, is far from being clear.Everything that is written on the topic is also marked as “not a legal advice”. While this may not be surprising, one has a hard time convincing a lawyer to be permissive, when the software owners are not clear about it. For example, the FAQ “Can I use R for commercial purposes?” mentioned above begins with “R is released under the GNU General Public License (GPL), version 2. If you have any questions regarding the legality of using R in any particular situation you should bring it up with your legal counsel”. And ends with “None of the discussion in this section constitutes legal advice. The R Core Team does not provide legal advice under any circumstances.”
In between the information is not very decisive, either. So at the end of the day, it is unclear what is the actual legal situation.
Another thing one of the software lawyers told us is that Investors do not like GPL. In other words, even if it turns out that it is legal to use R with its libraries – a venture capital investor may be reluctant. If true, this may cause delays and may also require additional work convincing the potential investor that what you are doing is indeed flawless. Hence, lawyers told us, it is best if you can find an alternative that is not GPL at all.
What makes Python better?
Most of the “R vs. Python” articles are pure junk, IMHO. They express nonsense commonly written in the spirit of “Python is a general-purpose language with a readable syntax. R, however, is built by statisticians and encompasses their specific language.” Far away from the reality as I see it.But Python has a permissive license. You can distribute it, you can modify it, and you do not have to worry your code will become open-source, too. This truly is a great advantage.
Is there anything in between a permissive license and a GPL?
Yes there is.For example, there is the Lesser GPL (LGPL). As described in Wikipedia: “The license allows developers and companies to use and integrate a software component released under the LGPL into their own (even proprietary) software without being required by the terms of a strong copyleft license to release the source code of their own components. However, any developer who modifies an LGPL-covered component is required to make their modified version available under the same LGPL license.” Isn’t this exactly what the R choice of a license was aiming at?
Others use an exception. Javascript, for example, is also GPL. But they added the following exception: “As a special exception to GPL, any HTML file which merely makes function calls to this code, and for that purpose includes it by reference shall be deemed a separate work for copyright law purposes. In addition, the copyright holders of this code give you permission to combine this code with free software libraries that are released under the GNU LGPL. You may copy and distribute such a system following the terms of the GNU GPL for this code and the LGPL for the libraries. If you modify this code, you may extend this exception to your version of the code, but you are not obligated to do so. If you do not wish to do so, delete this exception statement from your version.”
R is not LGPL. R has no written exceptions.
The fact that R and most of its libraries use a GPL license is a problem. At the very least it is not clear if it is really legal to use R to write proprietary code.
Even if it is legal, Python still has an advantage being a permissive license, which means “no questions asked” by potential customers and investors.
It would be good if the R core team, as well as people releasing packages, were clearer about the proper use of the software, as they see it. They could take Perl as an example.
It would be even better if the license would change. At least by adding an exception, reducing it to an LGPL or (best) permissive license.
Click HERE to leave a comment.
But then… Linux is GPL, your code -R, Python or otherwise- eventually calls functionality within the kernel which is GPL, therefore no software developed by anybody can be ever made proprietary as long as it runs on Linux… Let alone Windows or MacOS. Should be all use BSD now to develop proprietary software? I don’t think so.
This seems to me a classical FUD.
Totally agree, Francisco!
FUD – fear, uncertainty and doubt
The only way to solve it is with information!
What GPL license states is that once a software is licensed as GPL it can not be changed into a proprietary software anymore.
In the case of R it is saying that you can not redistribute R or any other R library licensed as GPL in a proprietary way, for example. You will not “sell R” as a software.
But if you use R to build a software called “SuperSoftware” you can sell SuperSoftware in the way you want. Think a bit: even Rstudio has a “proprietary” version (https://www.rstudio.com/products/rstudio-server-pro/)
Looks like the author of the post have found 4 lawyers that don’t understand how the open source industry works and he believes these 4 lawyers more than anything else.
RStudio was written in Java, C++, and JavaScript, not R.
I thought the core question here was: can a closed source application call a GPL library and legally remain closed source. I thought it was common knowledge that the answer was no. I thought that one of the key attributes of the LGPL was that it did not have this restriction. I believe LGPL originally stood for library GPL, but was derogatorily renamed to lesser GPL because the GNU folks thoughtt it was too permissive and didn’t want to promote it anymore. See this page from their own website: https://www.gnu.org/licenses/why-not-lgpl.html.
IIUC, this is not an issue at our firm. We use R on the backend for R&D to develop statistical models that then get coded for use in another language. We don’t deliver R libraries to our clients’ installations.
Yes, Linux is GPL but GPL also includes a System Library Exception, see:
https://www.gnu.org/licenses/gpl-faq.html#SystemLibraryException
Interesting discussion here:
https://softwareengineering.stackexchange.com/questions/158789/can-i-link-to-a-gpl-library-from-a-closed-source-application
Interesting article by the way.
If you read carefully, the System Library Exception exception is a completely different business. It allows you to use GPL-incompatible libraries in your GPL-ed application without having to distribute their source code along with your code. Basically, it allows you to develop GPL code using non-free libs.
My understanding is that as per GPL yes you will need to provide the code to the customer.
But on the other hand If I’m a customer who ordered some custom application for my business I will have zero incentive to share that code with any competitors, and if it is really specially customized to my needs I might ask the vendor not to sell it to anyone else that could use it to beat me in the market.
Then, it is also probable that since I’m hiring you, I actually don’t know to read or write R (that is not my business) so having the code on my server won’t mean much: I will not want to resell it, I will not be able to modify it without your help, and of course you can charge a fee for any other (future) modifications.
Finally, if the customer has a its own legal team, they will probably advice to get the rights (or some rights on the code) they are ordering for their custom app. After all if the app is critical for my business I need to be sure that my business will continue even if your doesn’t i.e. being able to hire someone else to maintain the app once your company closes.
I work for a government agency and I have had the agency lawyers review the licencing of R and their conclusion was the same as your lawyers. Thankfully, I am happy with this because I publish my code on GitHub and satisfy the terms of the licence.
As a package author, if I licence my package under the GPL then that is because I *don’t want* you to profit from it in your closed-source product. The licence is working as intended.
All the “internet lawyers” responding that the author is wrong should probably just keep quiet.
When you need to say where you work at in order to convince someone about your arguments… and when you say some should keep quiet… oh, boy!
I struggle a lot to understand those licences myself and would not be any wiser than the blog author about the situation but I have to say I agree with Michael.
a) If a software author chooses GPL, then they intent to restrict the usage of code for closed-source commercial projects.
That is a valid and reasonable intent. There is nothing wrong with it.
b) If four lawyers independently from each other tell you to do one thing, you probably should do it. Saying “this is just another case of fear, uncertainty and doubt, don’t be so scared” (paraphrasing Fransisco U. G. and you, Charles) is not helpful at all.
If four lawyers come to the same/similiar conclusion, what makes you think the JUDGE will come to a different one, if a case actually goes to court?
The differences between lawyers and judges are – beyond doubt – generally smaller than between some guys who post on a forum and judges. Meaning: Its more likely that a judge would come to the same conclusion as a lawyer and not yours.
When you have people on the internet telling you to ignore your (four!) lawyers, I think saying “keep quiet” is not terribly out of line.
I don’t mean this to disrespect you.
Best,
Thomas
I argue that the whole foundation of this article is flawed. This is a great example of what will eventually break open source movement. The author (and his company) are trying to leverage a community of contributors for personal/company gain, without contributing back.
R, and other open source computing solutions, are community developed — the individual who works “off the clock” from 1 to 3 in the morning, the academic who receives grant funding to develop the technology, or the employee whose company supports/sponsors open-source. There is limited ROI in these cases other than the benefit of the community development. In comes a startup who wants to use all of this infrastructure without contributing back. They essentially want to take advantage of the community. What do they get? A platform to run on, a collection of advanced algorithms which would otherwise take years of man power to create, a mechanism to distribute, etc.
At a minimum, it is counter to the ideals of the open source movement … at a maximum, it is unethical.
I understand that it people/companies have to make money … but I also understand that nothing is free. You have to pay to play. If you don’t want to pay for it by contributing back, then go and find a closed solution and pay for that:
– Don’t want to play by R’s rules? Go buy a license for SAS and build out your third party app — at the ridiculous cost that SAS will charge you.
– Don’t want to publish your code for dense computation? Get our your wallet and buy matlab and build a third-party app.
– Don’t want to be dependent on a closed solution at all? Hire 100 programmers just to build your platform from the ground up, so you can get your one little library working.
Or look at other financial models. Large companies are understanding the value of open source and the benefits of using it. It is more cost-effective. How do they make money? Through consultation and added features. This works for Kitware; this works for oracle, mysql, and now it works for microsoft. Yes, I understand that some of these are non-GPL licenses, but that is not the point. The point is that they have learned that money making comes from the services not the software when you are working in an open source world.
Can you work the system to your advantage and play the game and limit yourself to BSD and other very open licenses? Absolutely … at the cost of hurting a community and acting in a way that makes it difficult to sleep at night.
Thanks for this. There is no free lunch, if you want free inputs and sell the outputs while not sharing back your improvements to the community, you are just taking advantage of others’ work.
Ignore the pseudo moralic/ethic arguments.
GPL is about distribution. If you distribute your commericial code alone, i.e. not together with R, the recipient may do what s/he wants. There are proprietary packages. I believe your lawyers are too prudent.
The department of lawyers at my company has come to the same conclusion and R has been banned company-wide due to GPL. RStudio is banned as well throughout the company.
“In layman terms, if you build your work on a GPL program it may force you to license your product with a GPL license, too. In other words – it restrains you from keeping your code proprietary.”
Good.
The GPL is about **licensing** of code, not about distribution. That’s why the L in GPL stands for License! If you create a derivative of GPLed code, the GPL does not force you to license that derived code to others. Note that distributing the code to others is effectively licensing it. All the GPL says is that if you do license that derivative code to others, then that licensing must be under the terms of the GPL (or equivalent). You are NOT compelled to further license (or distribute) your derived code to anyone. If you do further license (or distribute) your GPL-derived code to others, then it must be done under a GPL license.
So, if company A engages company B to write R code which calls GPLed libraries etc, and thus the R code is a GPL derivative, but under the contract the IP to that R code vests back in company A, then company A is not obliged to further license that code to anyone else, not even to company B. The GPL is enforced through copyright, thus transferring copyright on GPLed code is not the same as sub-licensing or distributing it).
But if, say, company B writes R code that relies on GPLed R code and libraries etc, and then wants to license and distribute that GPL-derived R code to, say, company C, then yes, it must be licensed to C under the terms of the GPL. That is the scenario that the OP is talking about, and I believe the legal advice he was given is sound.
As others have pointed out, that’s the whole intent of the GPL. If you wish to create products that you can distribute under non-GPL licenses, don’t use R!
Even if you don’t load any library, you are still using build-in R-base library which is GPLed. ;p
True that. Actually it seems that the only acceptable license for an R program or package using the R base package (i.e, the only acceptable license for any R pogram or package) is the GPL. At least if one believes the FSF interpretation.
https://www.gnu.org/licenses/gpl-faq.html#IfInterpreterIsGPL
“Another similar and very common case is to provide libraries with the interpreter which are themselves interpreted. For instance, Perl comes with many Perl modules, and a Java implementation comes with many Java classes. These libraries and the programs that call them are always dynamically linked together.
A consequence is that if you choose to use GPLed Perl modules or Java classes in your program, you must release the program in a GPL-compatible way, regardless of the license used in the Perl or Java interpreter that the combined Perl or Java program will run on.”
Thanks to all the contributors here. For a novice like me, its a great course in GPL and Open source. Long live Open source !!
This is a very interesting discussion. I moved to Julia for different reasons, but I’ve seen this discussion popping up in our community, because we use and promote MIT licenses, but every time someone wants to work on porting a GPL library, there’s discussion about what you can or cannot do. Some people don’t even read GPL code. I don’t know all the details and have never been really affected by this, but it really needs clarification. Thanks for the post.
It’s worth being very clear that this grey area is only around redistribution of code. I’d guess that 95% of businesses want to use R for analysis or for internal applications, and so this isn’t as apocalyptic as it first sounds.
Exactly (as I also noted above)! The GPL does **not** force you to distribute any modified code. Thus internal use of R within an organisation does not mean that organisation has to make all the R code they write available to their competitors under the GPL. But if they do distribute that code to others, then it must be under the GPL.
In academic use, it means that if you publish your R analysis code as a supplement to a scientific publication, then that code ought to be GPL licensed. That’s actually much better than the code having no explicit license, or just inheriting the scientific paper’s license (which is often CC-BY-SA for open-access journals).
Can you re-structure your business venture as a service?
The discussion emphasizes the need for clarifications.
Both on the legal boundaries and on ethical ones
***
Legal:
Some contributers to the discussion believe that any R package must be GPL, regardless of the question if it calls any GPL package directly.
If that is true, does it mean that “dplyr” (MIT) has a wrong license? I wouldn’t know. What would you say about the answer in https://www.quora.com/Why-is-an-MIT-license-sometimes-used-to-license-an-R-package-when-R-is-licensed-under-GPL-Why-shouldnt-all-R-packages-be-GPL-software?ch=10&share=2fea8689&srid=uokyD by which it is fine since “R packages are not derived work of the R compiler”
Also, Microsoft is using R, providing access to R through proprietary software and services, and put the use of R in its usage agreements: https://mran.microsoft.com/assets/text/mkl-eula.txt
They also write R packages which are only relevant and accessible to their customers. I assume they are not breaking any law.
So, my feeling was and stayed that the question is not * if * it is possible, but * how * is it possible, and under what restrictions.
***
intent:
The ethical boundaries are important, too.
It is needed to be heard what is the interpretation of the proper use of R.
Many here seem to think that you should only use R if you are willing to share your code.
That’s fine, but critical to understand if someone is considering a Shiny app.
A statement that you should only create a Shiny app if you are willing to share your code (say through cran or github) is a very strong statement.
There is nothing wrong with this attitude, but not without being clear about it. Many would not pay $10K a year for a shiny server pro license if this still means their source code should be shared.
Interesting that in rstudio::conf2019 Shiny for production was a hot topic, but the license issue (As far as I know) was not raised at all.
***
Use by startups:
Finally, it is not enough to be legal and ethical.
If (and yes, this is an if) there is an intent for R to be a tool used by startups, the question of investors being reluctant from GPL is a serious one.
Without a strong clarification, it is likely to be a blocker for startups. Meaning more people choosing other options, Python probably being the most common.
And if I shift to Python for work, my free time programing (which I am happy to share) will be in the same language.
I read this post and the following discussion a couple of days ago, and remembered when I came across a propietary, commercial ‘extension’ to a GNU GPL package.
Have a look, and note the sentence “Except for the deep learning routines, it requires an RPUDPLUS License for use.”: http://www.r-tutor.com/content/download
I rarely see such a thing, but I know there are more than that single one.
I can see why lawyers would advise against using GPL-licensed software.
But before ditching a language I learned through a painful process, I would ask the people who are responsible for their opinion. That would mean asking the R core foundation. Maybe they even commented on the matter before – a quick web search didn’t get me there, however.
Hi James,
The GPL is not about distribution, the whole ethical and moral purpose of Free Software is for the users freedoms. This is so that users can use their computer as they see fit and not controlled or harmed through the software of companies and developers. All this is clearly described on the Free Software Foundation website.
The copy-left license (like the GPL) protects the users freedoms to copy, study, run and distribute the software. The copy-left license prevents programmers/companies to change the software into proprietary software, there by restricting users freedoms.
Naturally, companies, start-ups, investors and IP layers want to maximize profit, hence the business sector dislikes the GPL because it could hinder their profits at some point. This is actually why the Open Source Initiative was created to counter the Free Software movement. The OSI promotes the open source model as a better software development methodology and not for protecting users freedoms.
For you to ask everyone to ignore the points made by free software advocates as “pseudo moral/ethic arguments” is a classic approach by the corporate sector who only want’s to take from the community and not give back for their own profit.
For example, Apple controls what apps will run on mobile phones that the users own (paid for), but Apple control. Here, users have no choice but to follow what Apple wants. Is this moral or ethical?
We all make moral and ethical decisions that not only affect ourselves but many other people and that should be discussed and debated.
A minor clarification, although tidyverse has a GPL license, some of its individual components have a more permissive MIT license (e.g. dplyr, magrittr, tidyr, tibble). Some components which have a GPL license include stringr (but you could use stringi…), readr, lubridate, purrr and ggplot2 (I’ll commit sacrilege, and say that there are non-GPL alternatives to ggplot2…).
The issue of using packages like shiny for the user interface could perhaps be side-stepped by completely separating the data-processing ‘engine’-room of the package from the user-interface component (e.g. storing the two in completely separate packages, so that the engine doesn’t depend on the shiny GUI, and could nominally be used independently of the shiny GUI through a command-line interface).
Altogether I think it is possible to create a fairly substantial R program WITHOUT using (in any way, linking, or calling, or whatever) GPL-licensed packages, with the exception of the R base libraries themselves! (GPL-2/3).
The issue is muddied by some well-respected and learned R package contributors who seem to have a more liberal attitude to what GPL ‘means’ than what a superficial reading of what GPL seems to imply (or is likely to have intended). (hadleywickham on https://twitter.com/levithatcher/status/771716687712231424 and https://twitter.com/hadleywickham/status/501491196943167490, perhaps because calling package functions with commands like ‘dplyr::inner_join’ don’t count as linking in Hadley’s opinion, because of lack of shared namespace etc.?)
In any case, I agree with the main point of this article, the uncertainty is troubling and off-putting for any one who is contemplating potential commercializing of R packages.
R is a great environment, and I think better for the casual data-scientist whose main job involves gathering/using data (e.g. in medical) rather than programming. But if a project ‘gets serious’ to the point of distribution and consuming more and more hours of time, the licensing terms are at best confusing, at worst overly restrictive to the point of stifling, for those who don’t have the time to invest hundreds more hours in learning how to do the same in Python.
Maybe its best for the sake of the R free software community that proprietary company are excluded. So much of R is copy left because the developers believe in free (as in freedom) software. If reducing the popularity of R is the cost of maintain the free software community around it than its definitely worth it.
That said, sorry you have to go.