Creating own programming language. Part 1


I started this project 7 months ago and in this post I’d like to talk about my journey and what I’ve done already. First of all I’d like to let you know that it’s written in thread safe, fearless concurrency, BLAZINGLY FAST 🔥🔥🔥 Rust programming language, and the source code is available on GitHub. Before I tried to make my own shit, I read an amazing book, called Interpreter book. It gave me a basic understanding of how programming languages work, who could’ve thought that it’s not a magic black box made by magicians but it’s a bunch of statements which contain statements and expressions inside of them(but if you look at C++ internals that shit’s definitely made not by humans). Expressions are units of code which produce a value, wheres statements don’t produce any value. For example return 5; is a statement, it doesn’t produce any value, wheres 5 is an expression and produces interger literal value 5.

My compiler consists of 3 phases:

  • Lexical analysis
  • Syntax analysis
  • Code generation

But compilers used in production have many more phases, which can include generating IR1 or code optimization for example.

Lexical analysis

Lexer was the first thing I wrote and it is the easiest part of the whole compiler. All its code fits in under 200 loc. Its purpose is to transform raw dawg text that programmer writes into an array of tokens which will be processed later by parser. For example when programmer writes a while loop, keyword while will be transformed into a token called WHILE and so on, also lexer skips lines which start with //. Full list of tokens can be found here. There’s no really more to it.

Semantic analysis

Unlike lexical analysis, semantic analysis is the hardest part in the compiler xd. That’s the place where all the big boi stuff is happening. Firstly, it checks the semantics of the language, eg. if you use statements correctly and if you have curlies and semis at the right places. Also it transforms list of tokens produces by lexer in AST2 nodes.

Parser is also the place where symbol tables and type tables are made(next few sentences can be complete horseraddis but that’s how I implemented it ¯\_(ツ)_/¯). Symbol table is a structure which holds existing variables and functions in current scope. So when you declare a new variable or function it adds the symbol to corresponding symbol table, it’s also used to look up the place of a variable when you need to get a value of it, or return an error if a variable doesn’t exist. And type type does exactly the same but instead of storing symbols, it stores user defined types. After fucking with these tables, trying to manage them separately, in the end I just put ‘em in a structure called Scope, and there’s one global scope which is returned from parser but also every function and every scope inside these functions contain its individual scope.

// Global scope
int foo[42];

int main() {
    // Function scope
    {
        // Scope inside function scope
    }
}

Code generation

The last part of the compiler is code generation, it takes AST nodes generated by parser and produces ready output. My compiler as an output spits out assembly language code which later can be assembled using GNU assembler3 and linked in an executable. The only remotely hard part of it was to learn assembly language itself because I never used it before :D. It took me a few hours to understand how to use idiv instruction, all the explaining websites were using fucking hieroglyphs to explain how that shit works. Though I could generate 0s and 1s, I thought it wouldn’t really teach me anything so I chose to generate assembly instead.

Features already added to the compiler

So far I’ve impemented a few primitive data type like: u8, i8, u16, i16, bool and void. There’s also support for mafs operators like +, -, *, / and comparison operators. Also it supports casting values and basic functions. For now there’s also only 1 control flow statement and it’s return, I don’t think it will be hard to add if expressions or loops, so right now I’m trying to add support for extern functions. Since I’m implementing x86-64 SysV ABI4 I will be able to call C functions from my code and vise versa, it’s so cool!