I don’t know (but wanna learn) programming, but, for example, can’t you inspect the code of an app if it’s installed?
(yeah this is kind of a stupid question.)
EDIT: Thanks for the clarification, guys!
No.
First, “open source” doesn’t just mean “you can read the source”; it means that you have rights to modify it and make new versions too.
Second, compiled programs (e.g. most programs you run on a phone or a desktop PC) do not have source available for you to read.
Ah, that makes sense, thanks for clarifying.
It’s not a perfect analogy, but a good way to think about it if you’re not a programmer is to say “why do we need recipes when we can just buy a product in the store and read the ingredients list”.
Just because you know the ingredients, that doesn’t mean you know how to put them together in the right order, in the right quantities, and using the correct processes to recreate the finished product.
Going into a little more detail:
There are plenty of ways to do open source, and the differences mostly come down to the license something is published under. Some licenses prohibit redistribution, while others restrict commercial use. One of the more popular permissive licenses is the GNU General Public License (or GPL for short). Which you can read up on over here.
Technically there’s nothing stopping you from ignoring the terms of the license agreement and just doing whatever. Think “agreeing to the terms without actually reading them”. While the licenses are usually proper grounds for legal action, it depends on the project and the resources associated wether actual legal action is within the realm of possibilities.
When it comes to “everything is open source”, you’re technically correct in the sense that you can reverse engineer everything and the amount of work you’re willing to put in is the only limiting factor. Compiled code and techniques like code obfuscation and encryption will pose barriers, but they will not protect from someone determined to get in. In the same way a door lock will not protect you from someone who brings a blowtorch.
Some code is technically not open source, but is delivered in human-readable form. This is the case for things like websites and scripts in languages like python. Other software is compiled (pre-converted to specific instructions for your processor), and is delivered in binary, which is not particularly human-readable. But with the right tools even binary applications can be “decompiled” and converted into something slightly more closely resembling the original source code.
A great one liner from the YouTube channel Low-level Learning is “everything is open source if you can read assembly”.
So, in summary: It depends how you look at it, generally speaking open source means that te source code is available for the public to see and that you’re free to submit any suggestions or improvements to the code, no matter who you are. In practise the source code is sometimes visible (out of technical necessity or for troubleshooting purposes) even though the product is not open source, in which case the end user license agreement will likely contain a clause prohibiting you from doing anything with it.
Open source means you can read the source, Free software means you can modify and redistribute.
Well, no.
The term “open source software” was specifically invented to refer to the same set of software licenses as “free software”; but with a different political angle.
Really. You can look it up.
Kinda. What you’re referring to is “decompilation”, which is the process of taking the output of a compiler and trying to reverse-engineer the code that produced it. But decompiled code is really hard to read and modify, because it isn’t what humans wrote, it’s what the compiler translated it into, and that can have some unexpected changes than are mostly irreversible. And, since it’s closed source, if you somehow manage to make a change, you can’t re-release it – you don’t have the license to do so.
With open source, you see the same code as the maintainers, so it has the high-level programming concepts and good variable names, and you have permissions to fork and release your own version.
The other thing is that on modern platforms, decomplied code can and will be ludicrously complex and probably rely on several levels’ worth of abstraction layers, external libraries, API’s, and sandboxes provided by the OS or whatever other platform it’s meant to run on. Outside of microcontrollers and some embedded applications, the days are long gone where you just have relatively simple machine code running directly on the bare metal of the computer’s processor and unprotected memory.
This is a very good addition!
Assembly is a difficult beast
Your computer’s CPU doesn’t understand human language or code, so programs are compiled from human-readable programming languages (like C++, Rust, etc.) into binary machine code. Machine code is basically just a bunch of CPU instructions and data that are formatted specifically for your CPU’s architecture (depending on if it’s x86, ARM, etc.).
Most of the time, when you install a program/app/game, you’re only getting the compiled binary in your CPU’s machine code, so you couldn’t view the original “source code” without going through a complex process called “decompilation”. And even then, you wouldn’t have the legal rights to share or modify that code for others to use.
For something to be considered truly “open source”, it not only makes the original source code available to the user, it also publishes that code under a license like the GPL which gives the user certain rights to use, copy, and/or modify the code.
You said “most of the time” - when is that not the case?
Some programs are distributed as “scripts” (in a scripting language like BASIC, BASH, Python, JavaScript, Lua, etc) which are stored on your computer in human-readable form and only converted into CPU machine code when you run the program, through an “interpreter” program.
Of course, everything boils down to binary machine code in the end, because those CPU instructions are the only language that your CPU actually works with.
There are some programs (especially on Linux) where they don’t distribute compiled binaries and you just download and compile the source code yourself to be able to use the software. This can be because of legal reasons, technological reasons, or even just because a developer wants to be very transparent in what’s being run on your machine.
This is especially common with software alphas (either for new software, or for testing updates to existing software) where they just don’t bother compiling it for every type of system when it’s really just for use by a handful of developers while they’re actively working on or testing the code.
No, for several reasons. First off, open source is open source (code), as opposed to peeking at the compiled machine code/bytecode/whatever, or actively reverse engineering it. At that point it’s no longer the source. Depending on tech and compilers, code could be very hard to read once compiled. In many cases, any descriptive names and comments are history. Some people try to make it actively harder to figure out (obfuscation, DRM) on top of that.
After that, a lot of the principial difference comes from licensing and such, “open source” is usually differentiated from “source available” but where you can’t really “legally” do anything with it other than look.
Expanding a bit on what others have said, for anybody who is further interested (simplified; this whole discussion could be pages and pages of explanation)
The code we write (source code), and the code that makes the machine do its thing (executable code) are usually very different, and there are other programs (some are compilers, others are interpreters, I’m sure there are others still) to help translate. Hopefully my examples and walkthrough below help illustrate what others have meant by their answers and gives some context on how we got to where we are, historically
At the bare metal/electricity flowing through a processor you’re generally dealing with just sequences of 0s and 1s - usually called machine code. This sequence of “on” and “off” is the only thing hardware understands, but is really hard for humans, but it’s all we had at first. A program saved as machine code is typically called a binary (for formattings sake, I added spaces between the bytes/groups of 8 bits/binary digits)
00111110 00000001 00000110 00000001 10000000 00100110 00000000 00101110 00000000 01110111
A while later, we started to develop a way of writing things in small key words with numerical values, and wrote a program that (simplified) would replace the key words with specific sequences of 0s and 1s. This is assembly code and the program that does the replacements is called an assembler. Assemblers are pretty straight forward and assembly is a close 1:1 translation to machine code; meaning you can convert between the two
LD A, 0x01 LD B, 0x01 ADD A,B LD H, 0x00 LD L, 0x00 LD (HL), A
These forms of code are usually your executable codes. All the instructions to get the hardware to do its thing are there, but it takes expertise to pull out the higher level meanings
This kind of writing still gets tedious and there are a lot of common things that you’d do in assembly that you might want shortcuts for. Some features for organization got added to assembly, like being able to comment code, add labels, etc but the next big coding step forward was to create higher level languages that looked more like how we write math concepts. These languages typically get compiled, by a program called a compiler, into machine code, before the code can run. Compilers working with high level languages can detect a lot of things and do a lot of tricks to give you efficient machine code; it’s not so 1:1
This kind of representation is what is generally “source code” and has a lot of semantic things that help with understandability
int main() { int result = 1+1; }
There are some, very popular, high level languages now that don’t get compiled into machine code. Instead an interpreter reads the high level language and interprets it line by line. These languages don’t have a compilation step and usually don’t result in a machine code file to run. You just run the interpreter pointing to the source directly. Proprietary code that’s interpreted this way usually goes through a process called obfuscation/minimization. This takes something that looks like:
def postCommentAction(commentText, apiConnection) { apiConnection.connect() apiConnection.postComment(commentText, userInfo) apiConnection.close() }
And turns it into:
def func_a(a,b){b.a();b.b(a,Z);b.c();}
It condenses things immensely, which helps performance/load times, and also makes it much less clear about what the code is doing. Just like assembly, all the necessary info is there to make it work, but the semantics are missing
So, to conclude - yes, you can inspect the raw instructions for any program and see what it’s doing, but you’re very likely going to be met with machine code (that you can turn into assembly) or minified scripts instead of the kind of source code that was used to create that program
Take a look at the definition of “Free Software”:
A program is free software if the program’s users have the four essential freedoms:
- The freedom to run the program as you wish, for any purpose (freedom 0).
- The freedom to study how the program works, and change it so it does your computing as you wish (freedom 1). Access to the source code is a precondition for this.
- The freedom to redistribute copies so you can help others (freedom 2).
- The freedom to distribute copies of your modified versions to others (freedom 3). By doing this you can give the whole community a chance to benefit from your changes. Access to the source code is a precondition for this.
The term “open source” has a more corporate-friendly connotation, but the freedoms it entails are the same.
No. You’re probably thinking of what people refer to as “source available” which means the source is there and technically no one can stop you from looking at it but you’re not free to do whatever you want with it like you are open source stuff. It’s sort of a bad terminology but it is less.loaded than “free” which can mean “free as in beer” or “free as in speech”.
Somethings aren’t even source available either. Just because you can reverse engineer something doesn’t mean the source is available. Even JavaScript which is not compiled is not always considered source available. It is often minified and obfuscated in the browser. All variable names get replaced with junk.
In summary, open source refers to specifically to your rights to use software, not just your ability to see the source.
Imagine getting a can of purple paint and trying to figure out exactly how much red and blue dye was used to make that exact purple. Now imagine doing that every few lines of code in a code base of say 10k lines. That’s basically how decompilation goes. It’s extremely hard and even if you’re able to figure it all out, it’s still impossible to ever know what was actually originally written.
What you’re describing though does have done truth to it. There was a time when you could get a program from a magazine, type it all in to your commodore 64, and then it would run a pacman clone. These, line python today, are not compiled. So to have the program means to have the code too.
Try to inspect the code of any Microsoft program you have installed. You cannot, because you only have the executable, not the source code itself.
Although it’s possible to inspect the code of an installed application, that’s not the kind of code you’re probably thinking of. In many cases, the code that you can inspect is code that’s very hard to understand. You can learn about the difference between “machine code” and “high-level programming languages” to understand why you probably won’t be able to do anything of value with the code that you can inspect.
Even if you can inspect it, you can’t change it. (You could, but it would be very very difficult.)
Even if you can change it, that doesn’t make it Open Source, for reasons that others have already described.
One place where you can often inspect the source code is in a web browser. That’s still not Open Source, but it’s closer to source code and largely available to inspect.
Best wishes with your interest in learning to program!
Source code is called source because it’s the original verbose code that gets transformed through a compiler. The compiler output can be machine code which are special numbers the computer can interpret as instructions or minified code so the file size is smaller or even a totally different language through transcompilation. In all cases the source code is what was maintained by a human and is the original source of truth, while compiled code is transformed by a computer and it’s either condensed which makes it hard to read and loses informational context, or it’s transformed automatically which may not be as clean or idiomatic. Source code can have multiple compile targets so if you were to modify compiled code to change its behavior it would be very hard to collaborate or distribute those changes since it would only apply to one target. The source code is important because it’s easier to understand and it’s more organized and it’s a common source of truth that multiple developers can collaborate on.
Most compiled apps (exes) can technically be read, but only if you use a disassembler to convert the machine code into raw assembly.
However, this assembly has no:
- Comments
- Documentation
- Names
- Organization
- etc…
Only if you are extremely skilled at reading assembly can you read a compiled program.
Executable code inspection is not reading source.
If you read Javascript Code, it’s readable as Text. But even then, it may have been transformed from a readable source with speaking names and structure to an obfuscated mess that works the same, may be more performant, but is not human readable. It’s not the source, so it can’t reasonably be called open source.
If it’s not transformed it is though.
Different languages have different transformations. Most programs you install are compiled to transformed data. The text source is readable. The transformed result is not. Tooling may help inspecting or seeing parts, or trying to recreate the source, but it’s not the source.
This is why we say “free and open source”, free software (not to confused with “freeware”), or FOSS (Free and Open Source Software), with sometimes added explanation of “free as in freedom” in many more formal setting. Meaning that these software comes with several essential right and align with the ideology of the free and open source movement.
In most daily conversation, open-source is really short for the “free and open source software” described above.
Also “source” code, i.e. “where the software cames from” is different from machine code, which makes up the actual executable you run on your computer. Source code includes comments and sometime inefficiencies to make it human readable, the compiler strips these component and translate it into a language where only machine can understand. This is for legitimate performance and compatiblity reason.
Although there are process of decompilation, where people try their best to revert the translation, the “source” code is still not avaliable to you, since many things will simply gets removed by the compiler, and hence non-recoverable from just the machine code.
Web pages are one of the easiest things to inspect the code of. If you press Ctrl+U or F12 you can see the web page’s code directly. Some pages’ code is purposefully scrambled to keep you from doing this though.