Anyone can already generate proprietary code that is identical to your GitHub project's code just using copy and paste.
For people that care to respect copyright, there's a copilot setting to block exact copies of code in the training set (which only happens a tiny percentage of the time, unless you're actually trying to make it happen).
For people that don't care to respect copyright, git clone is a way more efficient way to violate your license.
> “If you look at the GitHub Terms of Service, no matter what license you use, you give GitHub the right to host your code and to use your code to improve their products and features,” [Kate] Downing, [an IP lawyer specializing in FOSS compliance] says. “So with respect to code that’s already on GitHub, I think the answer to the question of copyright infringement is fairly straightforward.”
This has some interesting implications – for example, it means I can't mirror somebody else's (open source) code on GitHub without their explicit agreement.
> > “If you look at the GitHub Terms of Service, no matter what license you use, you give GitHub the right to host your code and to use your code to improve their products and features,” [Kate] Downing, [an IP lawyer specializing in FOSS compliance] says. “So with respect to code that’s already on GitHub, I think the answer to the question of copyright infringement is fairly straightforward.”
So any code uploaded by someone other than the copyright holder renders someone liable to be sued for copyright infringement, AFAICS. The only question is whom it makes liable -- the uploader, GitHub (=Microsoft!), or both?
I can see arguments either way: The uploader is clearly infringing by giving away a right that isn't theirs to give. But so is GitHub / Microsoft, for using a "right" they haven't been properly given. So I'm provisionally leaning towards "both".
> I can't mirror somebody else's (open source) code on GitHub without their explicit agreement.
Who is doing the "mirroring" -- you, in uploading the code, or GitHub / Microsoft in actually hosting it, keeping it available for download from their "mirror"[1] site?
___
[1]: Is that even the correct terminology nowadays, when AIUI for lots of projects GitHub is their primary code repository?
So GitHub should immediately take down (and remove from their Copilot learning model!) all *GPL code uploaded by anyone but the ("primary"?) copyright holder.
There's one thing I'm missing from all these discussions and posts: is the generated code even copyrightable? IANAL, but code snippets often fall under the "scènes à faire" doctrine (everybody would do it in a similar way), in which case it's not. https://en.m.wikipedia.org/wiki/Sc%C3%A8nes_%C3%A0_faire
GitHub seems to think it is copyrightable, personally I doubt it is, simply because a human didn't create it and the process it was created by was automatic with no creativity.
Well, if the entire thing was generated, then no (according to the first link I posted above), since it was not produced by a human. However, no useful program is going to be entirely written by an AI, so any real program would have quite a lot of user input (I regularly will take what copilot suggests and then tweak it to what I specifically want). And then, yeah, it's copyrightable.
Also, there's no way for anyone to know what portion of code that I commit was hand written vs. generated, so you kind of have to treat it all as written by the committer anyway.
Though this does bring up interesting questions about what happens with things like automated PRs that fix bugs / update dependencies... are those then non-copyrightable? ¯\_(ツ)_/¯
Here's the kicker: your modified code snippet may still not be copyrightable if it's generic enough that everyone would do it in a similar manner.
Just as much as a hero riding off into the sunset is not copyrightable in a movie script. However, a hero riding off into the sunset with bananas in the pistol holsters would be.
This is what I would want to hear more about when discussing if Copilot violates copyright.
There's a setting on GitHub that blocks any suggestions that exactly match code in the training set. I doubt you'd ever get in trouble for code that was similar in structure but different variables etc from existing licensed code (especially since most small snippets of code are not terribly unique to begin with).
I mean, it's nice that they have a setting for the bare minimum a lazy undergrad would do to avoid getting caught for plagarism — replace some of the words in the copied paragraph with replacements from a thesaurus. It's not something I'd personally expect to hold up under real scrutiny though.
AFAIK that's not enough, for instance see the long-standing industry practice that people working on the Important Stuff are not allowed to ever look at the source code of the Direct Competitor; or clean-room reverse engineering, etc.
I guess time will tell how much acquiring companies (my worry) care about Copilot. Given the difficulty hiring good devs, and the productivity level of body-shop devs, I see it getting a whole lot of use very soon, acknowledged or not.
There's a big difference between reverse engineering (i.e. intentionally writing software that behaves identically to another piece of software), and writing your own code to solve your own problem that may superficially contain small portions of the similar logic as some other project. Copyrighted code has to be sufficiently creative and unique to qualify, otherwise after the first person wrote code to parse json from a web request, no one else would be able to do the same thing.
Kind of interesting.. I would like to point out this seems to be specific for the US.
But also.. In that case, when I commission an artist to paint my portrait, surely I can't claim to be the artist.. But I'm no lawyer.
I'm not sure there is a contractual agreement in GitHub's co-pilot that says: "Any code you write here is commissioned work". But honestly I didn't read the T&C's.
So I think you MAY have debunked my analogy, but not the main reason for the analogy.
Copy and paste doesn't really write code, just copies it from one place to another. Copilot on the other hand does generate new potentially novel code.
I'm sure that's what people said when they went from punch cards to assembly, and from assembly to C, and from C to Java.... and yet, here we are. Tools that let us write higher level code faster, just allow us to create more complicated software in a reasonable amount of time.
That's still 100% true of the examples I mentioned. There's always a higher level to consider. When we moved to C, we could stop worrying about what registers we were using. When we moved to python/Java we could stop worrying about managing memory. When we moved to web frameworks we stoping writing the guts of our servers. And if anything, programmers have become even better paid, despite so many more people in the industry.
I agree with you--however, programmers have not become even better paid because society values programmers. They have become better paid because software is a relatively new artefact in human society which has taken the human life by storm, which has made software companies immensely profitable, which meant more companies wanted to create software and attract the people that could help them do it.
As software takes a back seat (or at least a "normal" seat) in society, would we see a normalization of income? Could this be hastened by the development and introduction of tools such as copilot?
Potentially, unless there are new / better things that humans can claim they can provide compared to AI tools. This is the point where I think you and I agree, and I think it's your primary argument in any case (unless I'm mistaken).
AI can code low level stuff. This one function. This small piece of logic. What it can't do is conceive of how to take a bunch of different functions and put them together to produce an actual product. It can't tell you if you should use postges or mongo. Programmers will always be needed, we'll just move up the stack, and we'll produce more value per hour of our work, justifying our high salaries.
Compare the visible output of someone writing in assembly vs someone writing on top of a modern web framework. Is assembly harder? Yeah. But the web framework is going to give you a usable product in a fraction of the time with way more features. And that's worth more money to the company you work for.
It's always going to be a knowledge worker's job. It's always going to reward experience and creativity and attention to detail. A lot of programming is looking at the world, seeing a gap in what exists, and figuring out what best fits that gap. An AI can't do that. Programming is making 1000 tiny decisions that can't possibly be specified completely by a product manager and need a human to weigh the tradeoffs.
> AI can code low level stuff. This one function. This small piece of logic. What it can't do is conceive of how to take a bunch of different functions and put them together to produce an actual product.
Thats what everybody in the chess world said: "AI can decide low level stuff. This one move. This small attack on a rook. What it can't do is conceive of how to take a bunch of different tactics and put them together to produce a game of chess."
...Until Deep Blue beat Garry Kasparov.
> It can't tell you if you should use postges or mongo.
Yeah, and then came: "It may be able to play chess, but it can't tell you how to play Go."
The hard part about writing code isn't "how to write a for loop" and similar trivial things. Copilot make this process faster, but the hard part is still organizing your code so that it doesn't become a steaming pile of cowdung a few iterations down the line. That Copilot does not do for you.
So, unless you are a code monkey punching code into autogenerated skaffolding all day, your job is safe.
I think it's like any social network. It grows in value with how many people use it. Yes, you can host your own public git repo or even your own gitlab, but then there's a barrier of entry to contribute to your code, and it's a lot harder for others to discover it.
The network grows in value with the size of the protocol, not just with a platform. Social networks can and should operate like email, not like siloed platforms. All the value ends up being captured and controlled by one single entity.
(I will not get in a tangent about web3, but that is the one thing that web3 skeptics always fail to acknowledge is how the current web is broken in that regard. We were promised open protocols, and we end up with a handful of companies building their own walled gardens)
The only way that Github would get any modicum of credibility would be if they joined the effort from codeberg/forgefed and integrated with activitypub. As it is now, github will be nothing but a mirror for my repositories that I will be hosting on gitlab and/or my own gitea.
Familiarity with the UX and conventions on that platform. Almost everyone knows how to make a PR against a GitHub repo. But some random other code hosting site? It would be a lot less familiar, and people would have to spend time making an account and figuring out how to contribute.
Even low barriers of entry can cause a big drop in user engagement.
FWIW, there are some (admittedly fairly naive) checks to prevent PII and other sensitive info from being suggested to users. Copilot looks for things like ssh keys, social security numbers, email addresses, etc, and removes them from the suggestions that get sent down to the client.
There's also a setting at https://github.com/settings/copilot (link only works if you've signed up for copilot) that will check any suggestion on the server against hashes of the training set, and block anything that exactly duplicates code in the training set (with a minimum length, so very common code doesn't get completely blocked). Users must choose the value for this setting when they sign up for copilot.
It's still free with no payment for existing (beta/technical preview) customers. There was a github bug with some auth token nonsense that was causing problems, but all technical preview users should still be free for 60 days.
It's not copying open source code. If you learn an algorithm to balance a binary tree from reading GPL code, and then go use that algorithm in your own closed-source project, with your own variables and types and context, are you breaking GPL? You're not copying the code. Just because you learned about it from reading GPL code doesn't mean that whenever you write tree balancing code from now until the end of time, all that code has to be GPL'd.
Copilot learns the "shape" of code. Common patterns and algorithms, etc. You can't copyright an algorithm.
If you decompile runtime bytecode and assign your own variable names, does the copyright of the original source code no longer apply?
If you trace a picture and use it in your work of art, does the copyright of the original picture no longer apply?
If you copy a tune but set it to new instruments, does the copyright of the original tune no longer apply?
Sampling is a legal minefield in music, why would it become less of a minefield in code just because you've automated it? So far the best attempt at an answer about the legal issue of Copilot I've seen was that it's "not technically violating copyright", which honestly is not very reassuring and extremely morally inconsistent for a company built by a guy[0] who is philosophically invested enough in intellectual property as the pillar of human society to write An Open Letter To Hobbyists and use his Foundation to convince entire governments of adhering to IP laws instead of allowing the mass production of vaccines and medicine.
[0]: Yeah, I know that he no longer serves an active role in the company but this was very much a founding ethos and this is at least a fair bit hypocritical.
If you teach someone about music theory by listening to Stairway to Heaven, and then they write their own song that starts with an A minor chord... are they violating copyright of Stairway to Heaven?
Copilot isn't sampling. Sampling is literally copying snippets of someone else's music and putting it into your music. Copilot doesn't do that. There's no giant database of text that it just slurps suggestions out of.
Copying of code needs to be very direct. Even Google copied tens of thousands of lines of code from Oracle character for character and won the case taken all the way to the Supreme Court. When Oracle made changes (even during the court proceedings), Google kept copying the code and every change Oracle made. So I doubt you’re at real legal risk with what you were proposing.
Is your argument in good faith? Seems like you know enough about the matter to destinguish an API definition from implementation. That's what that ruling was about, and you seemed to know it, yet make the comparison as if it was valid.
I very often will let it suggest its thing and then tweak it to work how I want. It's like super auto-complete for me. If I can't remember how a specific pattern goes for some library, I'll let it write it for me, and then double check it to make sure it's doing what I want. That's still faster than me going to check the API and writing it all out by hand.
Most projects are 90% BS glue code and 10% actually interesting code. I don't mind only having help with the 90%.
I used copilot yesterday because I wanted a random 10 character long string and was like. Ahh I don’t have the brain power right now to think of this. And remembered I had copilot. So I enabled it. Wrote a comment. And it generated ~10 lines that solved my problem. Tweaked a little bit and rolled with it.
It helps solve the boring simple shit so I can focus on the interesting bit.
> Most projects are 90% BS glue code and 10% actually interesting code. I don't mind only having help with the 90%.
Yea, that makes sense, I agree with that. If your use case is skewed more towards "BS glue code" as you say, you'll find more use out Copilot. Then $10/month can be fair, cheap even.
For people that care to respect copyright, there's a copilot setting to block exact copies of code in the training set (which only happens a tiny percentage of the time, unless you're actually trying to make it happen).
For people that don't care to respect copyright, git clone is a way more efficient way to violate your license.