Directives, Preprocessor, Compilers Linkers and Loader in C++

How our code runs in C++ ?

There are six stages in the life cycle of code execution in C++. The journey begins with the source code written by the user. This source code then goes through the preprocessor, which expands macros and includes all imported header files into the source code. After preprocessing, the code is compiled and converted into object files. These object files are then passed to the linker, which links them together and creates an executable file. Next, the executable is loaded into memory by the loader, and finally, the program is executed.

Preprocessor

Before we understand preprocessors we have to have to know that the code we see and the code the compiler sees is different. The preprocessor expands the directives before the code is sent to the compiler.

What is a directive?

Directive or preprocessing directive in C++ are special instructions given to to the preprocessor which is a part of the compilation process that runs before the actual code is compiled.These instructions starts  with # symbol .These instructions tells the preprocessor to include files,define constants or undefine macros.

Working of a preprocessor:

Here is an example of file “my_header.h”.In this part we will not get into what exactly the code does. We will just focus on  how the prepocessor works

typedef struct {
Int x;
Int y;
} point;

typedef enum {
OR_LEFT,
OR_RIGHT,
OR_CENTER
} orientation

point get_layout_left(orientation);

Here is an example of our main file. As we can see “#include “my_header.h”” this tells the preprocessor to import the code from the file to our main code

#include “my_header.h”
Int main()
{
Const point p=get_layout_left(OR_Left);
Return p.x;
}

After running the preprocessor this is the code that our compiler sees

Typedef struct {
Int x;
Int y;
} point;

Typedef enum {
OR_LEFT,
OR_RIGHT,
OR_CENTER
} orient

Point get_layout_left(orientation);
Int main()
{
Const point p=get_layout_left(OR_Left);
Return p.x;
}

Compiler:

In simple words the compile changes the human readable form code to machine code. We will discuss in detail how exactly this process is done.Which steps are involved.

These steps include :

  • Lexer
  • Parser
  • Semantic analysis
  • Optimization
  • Codegen
  • Assembler

Lexer: The lexer (lexical analyzer) is the first phase of a compiler. It reads the source code as a stream of characters and groups them into meaningful sequences called lexemes, which are then classified into tokens such as identifiers, keywords, operators, and punctuation. The lexer removes whitespace and comments, detects invalid characters, and produces a stream of tokens that is passed to the parser for further syntactic analysis.

Parser: The parser is the phase of the compiler that takes the stream of tokens produced by the lexer and checks whether they follow the grammatical rules of the programming language. It organizes these tokens into a structured, tree-like representation called a syntax tree (or Abstract Syntax Tree, AST), which reflects the hierarchical structure of the program. By doing this, the parser detects syntax errors and prepares the program’s structure for semantic analysis and later compilation stages.

Semantic analysis: is the compiler phase that checks whether the program is logically and meaningfully correct after it has been syntactically parsed. It examines the syntax tree to ensure that identifiers are properly declared, types are compatible, scopes are respected, and operations are valid. This phase builds and uses symbol tables, annotates the tree with type information, and reports errors such as undeclared variables or type mismatches before the program proceeds to optimization and code generation.

Optimization is the compiler phase that improves the efficiency of a program without changing its observable behavior. It analyzes the intermediate representation of the code to reduce execution time, memory usage, and redundant computations by applying techniques such as constant folding, dead code elimination, and loop optimizations. The optimized code is then passed to the code generation phase to produce faster and more efficient machine instructions.

Code generation is the compiler phase that converts the optimized intermediate representation of a program into target-specific machine code or assembly instructions. During this phase, the compiler selects appropriate CPU instructions, allocates registers, and determines memory layouts while respecting the target architecture and calling conventions. The generated code is then passed to the assembler and linker to produce the final executable.

The assembler is a tool used after code generation that translates assembly language produced by the compiler into machine-level instructions. It converts mnemonic instructions into binary opcodes, assigns addresses, and creates object files containing machine code along with symbol and relocation information. These object files are then passed to the linker to build the final executable program.

If you want to understand the working of compiler in a deeper level you can click here

Linker:

Takes in Object files produced by compiler and and link them an creates an executable file.

The linker is the final phase in the compilation process that takes one or more object files (produced by the assembler) and combines them into a single executable program.

It’s responsible for resolving symbols, addresses, and references between different object files and libraries.

Main functions of Linker include:

  • Symbol Resolution
  • Address Binding
  • Relocation
  • Combining Object Files
  • Linking Libraries
  • Producing the Executable

Symbol resolution is a key task performed by the linker to match symbol references with their actual definitions across multiple object files and libraries. During this process, the linker ensures that every function and variable used in a program is properly defined and assigns each symbol a concrete memory address. If a symbol is declared but not defined, or if multiple conflicting definitions exist, the linker reports an error.

Address binding is the process performed by the linker in which symbolic addresses in the program are replaced with actual memory addresses. After all object files and libraries are combined, the linker determines the final memory layout of the program and assigns concrete addresses to code, data, and variables. This ensures that all references in the program point to the correct locations when the executable is loaded and run.

Relocation is the linker process that updates address-dependent instructions and data references once the final memory layout of a program is known. When object files are compiled, many addresses are left as placeholders; during relocation, the linker adjusts these references to point to the correct locations in memory. This step ensures that function calls, variable accesses, and jumps work correctly in the final executable.

Combining object files is a task performed by the linker in which multiple object files produced during separate compilation are merged into a single program. During this process, the linker brings together code and data sections, resolves references between files, and organizes them into a unified layout. This allows programs split across multiple source files to function as one complete executable.

Linking libraries is the process by which the linker connects a program with external library code that it depends on. This can be done statically, where the library’s code is copied into the final executable, or dynamically, where references to shared libraries are resolved at runtime. Linking libraries allows programs to reuse common functionality without including all the source code directly.

Producing the executable is the final step of the linking process in which the linker combines all object files, resolved symbols, and library code into a single executable file. The linker organizes code and data sections, applies relocation, and ensures all references are correctly addressed so that the program can be loaded into memory and run by the operating system. This executable is the final output that the user can execute.

If you want to gain deeper knowledge regarding Linkers you can click here.

Loader :

It is the operating system component responsible for loading the executable into memory and preparing it for execution. It maps the program’s code, data, and stack into memory, resolves addresses for dynamically linked libraries, initializes global variables, sets up the runtime environment (including command-line arguments and environment variables), and then transfers control to the program’s entry point, typically starting the execution of main().

You can learn more about loader here.