What is GNU?
In 1983 Richard Stallman announced the GNU Linux project (pronounced: GNOO LIN-əks), which was developed as a derivative of the UNIX operating system. The project gained so much popularity that Richard and others founded the Free Software Foundation in 1985. Richard has been very outspoken on some of the early development of GNU and Linux. Research shows that the OS known as Linux is based on the Linux kernel but all other components are GNU based. Many software enthusiast believe that for Linus Torvalds to name Linux.. well.. Linux was unethical and that it should have been named GNU/Linux or GNU Linux to give credit to Richard’s contributions.
What is C?
C is one of the top 10 most popular programming languages in the world. According to data from GitHub it is currently the #8 most popular language on all GitHub repositories.
Typically programming languages fall into two categories:
- Interpreted languages — for example Python is interpreted into C and from there compiled down in the same fashion as C.
- Compiled languages — the C programming language is a compiled language which means it is preprocessed, compiled, assembled, and finally linked as it runs the program (this is the process ran when gcc main.c is ran from your terminal.
C programs start out in human-readable form and in order to run a .c file you must have access to a C compiler. If you are using a UNIX machine (such as a Mac), the C compiler is available for free thanks to that guy Richard and is named gcc. Please note that sometimes it is also referred to as cc.
What is gcc?
The abbreviation gcc stands for GNU Compiler Collection and was originally written to be the compiler for the GNU OS. It contains compilers for the following languages:
C, C++, Objective-C, Fortran, Java, and more.
The gcc compiler was written in C and accepts options and file names as arguments so that engineers can have absolute control over the compilation process.
As you can see above there are many different options to pass into the gcc function and at the very end you can see in @file and an infile. This just basically lets a programmer specify what file should be used as input and what file they would like to specify as their output file. Many advanced software projects utilize the output of one file as the input into another file to solve a task.
What happens when you type $ gcc main.c ?
This command runs four steps always in the same order every single time. They are: preprocessing, compilation proper, assembly and linking. The gcc command always runs the file in that order.
Let’s take a deeper look at those four steps:
Lets say you start off with a program you wrote called
that looks like this if you type
$ vi main.c
(terminal will open the file with the Vi text editor)
#include <stdio.h>int main(void)
Above is what you would call source code and when executed this program instructs the computer to print out
then the program quits and a new bash input line is created for you to type in additional commands.
It doesn’t quite do this automatically though and there are some important steps for me to lay out for you like I did above. This time I am going to go into extensive detail as to what they are and what they do.
Step 1: Preprocessing
Preprocessing is where gcc looks for lines in the file that have hashes and interprets them. When we run
$ gcc main.c
the main.c program gets processed by preprocessor. There are three really important things that the preprocessor does:
- First: it removes all the comments from our program main.c. Looking below what would be removed is the text after the # and /* . The proper C styling guide does an excellent job describing what are proper comments.
#include <stdio.h>/*** main - Entry point** Return: Always 0 (Success)*/
You can preprocess a “c file” by running the following command
$ gcc -E main.c
- Second: it includes code from header and source file
- Third: it replaces macros (if there are any that are used in the program) with code.
Lines beginning with the hash character followed by a directive such as
INCLUDEtakes the contents of a file, like a header file, and expands it into the source code. Next, this gets interpreted by the CPP. It is important to know that CPP is not part of the compiler itself. The CPP is a tool for substituting text when it really comes down to it. The files leave the preprocessor with the expanded macro language attached and enters the compilation stage. Also there are preproccesor macros for conditional compiling, which will cause the compiler to compile a block of code, only if a certain preprocessor flag is set to true. It relies on boolean logic to make this happen.
The option (aka flag) -E is passed to the
gcccommand to stop the compiler after the preprocessor step.
Step 2: Compilation
Next, the output of the preprocessor is passed into the compiler and the compiler generates assembly code. The assembly code generated is particular to the instruction set of the processor inside your computer. The most common of which are ARM and x86. The ability to utilize different assemblers enables gcc to turn C source code into machine code that can work on a plurality of computer architectures. Assembly code is the least human readable form of code before it becomes machine language, which is virtually impossible to read and interpret.
To generate assembly code from a C source file take a look below
$ gcc -S main.c
The -S option stops the compiler after the compiler step because the option -S is passed to the gcc command. The above command creates a file called main.s which contains the assembly code for the main.c file.
Lets take a look at a real life example of the output of the aforementioned command! The following is the assembly code for the basic construction of an int main function.
If you want to learn more about how Assembly works here is a great link.
Step 3: Assembly
Since our computers cannot interpret assembly code, the job of the assembler is to convert assembly code into binary code (base2) since that is what the computer’s metal can actually read and write to.
The assembler accepts the output of the compiler and turns it into machine code. Machine code is just an executable binary file that contains instructions for the CPU (central processing unit) to interpret. The output of the assembler is put in a file called main.o . This output becomes the final input into what is called the linker in the compilation process.
Before we get into the linker though let’s take a look at what our program looks like after we run the assembler on it.
Step 4: Linking
The last step before we are done with compilation is called linking. In linking, The linker accepts the
main.o as input and it also accepts any pre-compiled libraries that were imported with the
#include preprocessor directive. Next, it merges the unique (non duplicated parts) to make what is called a standalone executable binary. The linker should only link the functions out of a library that you declared. For example
printf would be imported instead linking in every function, used or unused, from the
stdio library. Since we did not use a compiler flag indicating otherwise and we did not use the
-o flag in
gcc , so the executable file will be saved to
a.out (which is the default) in the current working directory.
Lets see how this looks after running the following command
$ gcc main.c$ ls$ main.c a.out
Now we have our executable file named a.out. It’s a default file that is created if we don’t specify what file we should put our executable code in. If we wanted to specify a different output file we would do something like this
$ gcc main.c -o output$ ls$ main.c ouput
Finally we can go ahead and execute the code using the command
If we did not specify a new file using the -o flag we would execute the a.out file like so
Finally you will see the desired output of the “Hello World” program
There you have it. You now know how a .C file is compiled from start to finish. If you have any suggestions or comments please feel free to let me know down below. If you like my content please follow me!