Software Piracy

This is the first of a three-part article on Software Piracy.

Determining who owns a piece of software source code, or whether it is copyrighted or considered a trade secret are legal issues and depend on many factors.  But underlying every contested software ownership issue is a requirement to determine if one piece of software is similar to another.  And, if similar, to what degree.  Whether one company misappropriated another company’s software is ultimately a legal decision and is not the purpose of this article.  This article is just about the comparison of software from a programmer’s point of view.

The functionality of the software or the high-level operation could be protected as a trade secret or, in many cases, by a patent.  But the actual lines of uncompiled source code can be protected by copyright laws.  The copyright is automatic and occurs when the software is written, but to enforce the copyright, the code must have been registered with the U.S. Copyright Office.

What is copied software?

There are generally two types of copying that occur – literal and non-literal.  Literal copying is much easier to detect and can often be recognized even by people who don’t understand software.  But literal copying of software does not necessarily mean that the works are identical.  One section could be considered literally copied if, by making a few minor changes (variable names, spacing, comments, etc.), the two code samples can be made to match.  Non-literal copying could include the creation of derivative works.

The largest software piracy case I worked as the Expert Witness for the plaintiff was a case where literal copying of code occurred, and as a programmer myself, it was an unusual feeling to see incontrovertible evidence of copied code.  Not just ideas, but literal lines, including misspelled words in comments.  In one case the duplicated code even contained a comment that referenced a function name that was defined only in the original software.

The software in this particular case was embedded code used to control a machine, and the plaintiff’s first clue that their software may have been stolen was when a customer reported experiencing the same unique bug in both the operation of the plaintiff’s machine and in the operation of a competitor’s product.  The error was unusual enough that the plaintiff began an investigation.

When software is copied line for line, proving that copying occurred is easier, but detection may still be difficult depending on the number of lines of code provided by the two parties (see Part II of this article for tips in identifying copied software).  If the copied software has been modified, or obfuscated, detection becomes even more difficult.  And when the software has undergone several revisions finding the similar code becomes tricky.

Why software from two companies may be similar

I believe that there are three ways that two applications could be similar, and in each case one would expect to find a different degree of correlation between the software:

1 – A similarity of necessity. If the underlying hardware and the business application are basically identical, as is usually the case in a software misappropriation case, it is possible that the software is similar.  The degree of similarity may be low, but it could be stronger if the same programmer wrote both applications (after a job change, for instance).  However, the chances of even the same programmer typing the same text, including spaces and punctuation, a few months after writing the original version, is, in my opinion, impossible.   The results may be close, but they would not be identical.

To be similar because of necessity and not copying you must determine if there is only one way to solve the problem.  For instance, if a programmer is given a list of names and told to put them in order, he will use some type of sort because that is the only way to solve this problem.  Even if both programmers implemented a Bubble Sort (a common sorting technique with many published examples), you would still not have evidence of copying.

2 – Similarity because of a Common Source. If both versions of the software are based on the same ‘parent’ code, you could expect them to look alike.  This code could come from a manufacturer, or it could be some of the widely available Open Source software.  The correlation might even be stronger than a similarity of necessity, but the accused company could easily prove that they did not copy the original code by providing the common source and their subsequent revisions.

3 – Similarity due to copying. This is the case we are trying to prove.  In cases where outright copying occurs, it is often the case that the developers who took the software will alter the copied code over time, either in an attempt to keep it from being discovered or just because of natural software evolution.  But in places that they did not adequately obfuscate the copied software the correlation between the two samples will be very high.

As the person trying to prove that code was copied, you may get lucky and discover that all the thief did was to change or delete comments, reformat the text and change variable names.  Your job will be much more difficult if the original software has been put into another language (such as assembly) or if the code was compiled and then run through a decompiler.

Code that is identical in functionality but differs only in variable names and comments may well be evidence that attempts were made to hide the fact that code was copied.  As Bob Zeidman writes in his book The Software IP Detective’s Handbook (page 252):

“…it is necessary to find all signs of copying, especially considering that modifications may have been made after the unauthorized copying.  The modifications may be explained by the normal development process, including debugging and feature enhancements and additions.  The modifications may also be the result of deliberate attempts to hide the copying.”

In any software misappropriation case, I always start looking for identical code.  If you find even the smallest section of code in both products, that is where you begin your investigation.

For hints on how to identify copied software, please see Part II of this article – “Tips for Identifying Copied Software.”

David Hancock’s Internet Base