Programming Experiment #1: Creating a programming language over a weekend
In this series of blogs, I am going to spend a short amount of time researching and making something, usually around a few days. I am then going to write a blog, like this one, documenting what I learned, how far I got and any other interesting bits I find. The blogs are going to be a personal learning reflection but will hopefully be useful for a high level summary on a topic.
The first blog in this series is going to be about how I wrote my own programming language (Cobra) over a weekend, the language is currently interpreted and is purely written for fun. A great resource on the theory behind interpreter design can be found here which I recommend reading if you are planning to undertake a similar project.
I decided to name the language Cobra as I am implementing it in Python. Currently it is an interpreted language however I have plans to make it compile and then potentially even making it Self Hosted (written in itself!). Until I implement the compiler, it is an interpreted language running inside of an interpreted language which makes it extremely slow.
Picking the language paradigm
My language is a stack based or stack oriented language, I chose this paradigm because while I have never used a stack based language, I had a clear idea on how I could implement it in a relatively simple way. I also hoped to learn as much about stack oriented languages as they involve such a different way of thinking, whilst I would never use one for a production application, it was a fun experience.
Stack oriented programming is a paradigm where operations are primarily performed using a stack data structure. In this model, values are pushed onto a stack, and operators act on the top elements of the stack, replacing them with the result. It eliminates the need for variables and explicit argument passing, relying instead on stack manipulation. This paradigm is used in languages like Forth and PostScript, where the flow of data and control is managed by pushing and popping values from the stack. While stack oriented code can be more concise, the implicit data handling and side effects from functions leaving the stack in a bad state can be difficult to work with.
Many other more common paradigms exist such as OOP, imperative and functional. Modern languages often use a combination of these to make the developer experience as easy as possible.
Creating some operators
Whilst you may think that creating the syntax would be the next step, I focused on getting some basic operations working on the main stack. This let me build a program as a list of operators and then interpret it without having to parse a file. This approach is one I would repeat in any future languages I make as it let me get a solid foundation before having to worry about complex lexical analysis and the syntax of my language.
In my language, I have a big python enum with all the operators, I then create a 'constructor' which returns a tuple of the operation and any inputs.
class Operation(Enum):
PUSH = auto()
POP = auto()
ADD = auto()
PRINT = auto()The operation enum
# Operation constructors
def op_push(x): return (Operation.PUSH, x)
def op_pop(): return (Operation.POP,)
def op_add(): return (Operation.ADD,)
def op_print(): return (Operation.PRINT,)Their respective constructors
Once I had this, I could create a very basic program like so:
program = [
op_push(5), #Pushes 5 to the stack
op_push(10), #Pushes 10 to the stack
op_add(), #Adds the top 2 elements and replaces with output
op_print() #Prints the top of the stack
]Writing an interpreter
Now I have a list of operations, I can begin to write a very basic interpreter that will turn this list into an output. For these very basic operations, the interpreter looks very simple. I promise it gets more exciting once I add conditions, loops and procedures!
stack = []
for op in program:
match op[0]:
case Operation.PUSH:
stack.append(op[1])
case Operation.ADD:
a = stack.pop()
b = stack.pop()
stack.append(a + b)
case Operation.PRINT:
print(stack.pop())We have now made a very basic interpreter that iterates through the program above and outputs 15 correctly!
If you want to try it out yourself, either run this code locally or use a tool I created called PyZap.dev. PyZap.dev allows you to run python code in your browser, completely locally. Try altering the program variable and see how adding different operations makes the program behave differently.
from enum import Enum, auto
class Operation(Enum):
PUSH = auto()
POP = auto()
ADD = auto()
PRINT = auto()
# Operation constructors
def op_push(x): return (Operation.PUSH, x)
def op_pop(): return (Operation.POP,)
def op_add(): return (Operation.ADD,)
def op_print(): return (Operation.PRINT,)
program = [
op_push(5), #Pushes 5 to the stack
op_push(10), #Pushes 10 to the stack
op_add(), #Adds the top 2 elements and replaces with output
op_print() #Prints the top of the stack
]
stack = []
for op in program:
match op[0]:
case Operation.PUSH:
stack.append(op[1])
case Operation.ADD:
a = stack.pop()
b = stack.pop()
stack.append(a + b)
case Operation.PRINT:
print(stack.pop())Adding conditions and procedures
Now we have a basic interpreter that runs from top to bottom, whilst we can run very basic operations, it still cannot be used as a proper programming language as it is lacking flow control such as if statements and procedures.
Creating a call stack
The call stack is a stack that contains the programs instructions, as the program is interpreted, it will jump around this stack based. In my interpreter, the stack is populated with the operation tuples that I showed above.
Interpreting an IF statement
When the interpreter hits an IF statement, it either jumps into the if block OR it jumps over the top of it. This calculation is done at runtime based on the state of the stack.
I decided that I wanted the IF statements to take a comparison operator as an input and then compare it against the top two elements. Following the program example used above, it would look something like this.
program = [
op_push(1),
op_push(1),
op_if('=='),
op_print(),
op_end()
]The expected output for this program would be printing '1'
The implementation for the above behaviour can be found below. The find_block_end function finds the index of the end of that block. It is context aware and can handle nested IF statements.
case Operation.IF:
condition = op[1]
block_end = find_block_end(call_stack, i)
b = stack[-1]
a = stack[-2]
result = False
if condition == '==':
result = a == b
elif condition == '>':
result = a > b
elif condition == '>=':
result = a >= b
elif condition == '<':
result = a < b
elif condition == '<=':
result = a <= b
elif condition == '!=':
result = a != b
if not result:
skip_until = block_end + 1I initially used the eval() inbuilt to evaluate equality however if-elif are more secure and execute quicker in my testing.
Interpreting the PROC statement
Now conditions are implemented, only one thing is missing before I can start properly running logic operations, a way to loop. To save time, I knew I was going to need procedures at some point so I just decided to implement them now and use them as the primary looping mechanism (via recursion).
I decided to make the singular stack a global variable that can be accessed from everywhere, no matter the scope. This leads to an elegant way of passing data into and out of functions. The function can be passed a stack in any state and can return it in any state.
In order to know where a procedure is, the interpreter does a first pass that scans the operations and saves the positions of the PROC statements in the call stack. This scanning stage looks like this:
for index, op in enumerate(call_stack):
if op[0] == Operation.PROC:
procedures[op[1]] = index + 1 #Save the starting op of the PROC
This scanning step also allows for procedures to be called from anywhere as the interpreter is already aware of them.
Now the positions of procedures in the call stack has been saved, I need a way to execute them. This is done by the CALL statement, the implementation can be found below.
case Operation.CALL:
if op[1] not in procedures:
raise RuntimeError(f"Undefined proc: {op[1]}")
# Save return position
return_stack.append(i + 1)
# Jump to proc start
i = procedures[op[1]]
# Find proc block end to limit execution
block_end = find_block_end(call_stack, i)
call_end_stack.append(block_end)
i -= 1 # Because loop will increment i next
A stack is required to know where the function needs to return to. This also allows for recursive calling as each call will added and then popped as the function ends.
Now all of the logic that my programming language will use has been implemented, I am ready to write a lexical analysis module. This will turn english syntax into an array of operations, they are more colloquially known a a lexer.
Creating the syntax
The final stage before the language is usable is creating the syntax. I wanted syntax that was easy enough to understand whilst being easy enough to parse, I am very happy with the final result and it looks tidy.
Operations
OPERATION_NAME OPERAND_1 OPERAND_2 OPERAND_3 OPERAND_N...;
Eg,
PUSH 10;
PUSH 10;
PRINT;IF Statements
PUSH 1;
PUSH 1;
IF == { //Compares the top two elements
PUSH '1 = 1';
PRINT;
}You may notice string literals, handled by the lexer and then Pythons dynamic typing handles actual operations.
PROC Statements
The procedure below is used in my higher lower guessing game, see the full program below! The below can be used to see the general syntax.
PROC game_loop {
PUSH 'Please enter a number between 1 and 10: ';
CALL float_input;
IF == {
PUSH 'Correct';
PRINT;
RETURN;
}
IF > {
PUSH 'The number is higher!';
PRINT;
POP;
CALL gameloop;
}
IF < {
PUSH 'The number is lower!';
PRINT;
POP;
CALL game_loop;
}
}The PROC statements {} syntax is handled identically to the IF statements
Lexer Implementation
The lexer does a lot of string manipulation to remove comments and redundant whitespace before it starts to parse it. It finds individual operations by splitting on the semicolons. It then loops through all those operations and parses them into the operation command and any operands it may have. This is then parsed by a match statement that builds the program up.
The lexer supports //Single line comments and /*Multi Line Comments*/.
The code to actually add an operation looks like this:
case 'PUSH':
program.append(op_push(operands[0]))
case 'ADD':
program.append(op_add())
case 'PRINT':
program.append(op_print())
case 'IF':
program.append(op_if(operands[0]))
case _:
raise RuntimeError(f'{instr} is not a valid operation')And just like that, we have a 'programming language'. It is very barebones and only has a very small amount of operations, you may have seen other operations in the code snippets on top of just the PUSH, ADD and PRINT. These are operations that I implemented after actually trying to use the language. My first program was a countdown, this can be found below. It required a few extra operations.
PROC countdown {
PUSH -1;
ADD;
PUSH 0;
IF != {
POP; //Pop the 0 from the stack
COPY;
PRINT;
CALL countdown;
}
}
PUSH 11; //What number to countdown from +1
CALL countdown;
As you can see, this basic countdown program requires the POP and COPY operations
Extra Operations
As just discussed, I have implemented many more operations than I demonstrated in this blog. This was after using the language to make small test programs, you can find the full list of operations below
Operations
PUSH - Adds a float or str to the stack
POP - Removes the top element from the stack
ADD - Adds the top two floats
PRINT - Prints the top of the stack
IF - If statement
END - End of a block
DUMP - Dumps the entire stack
DEF - This has been deprecated.
GOTO - This has been deprecated.
COPY - Duplicates the top of the stack
RETURN - Return to the calling code
PROC - Define a procedure
CALL - Call a procedure
CONCAT - Concatenate top two strings/floats
MUL - Multiply top two numbers
SWAP - Swap top two elements
DIV - Divide top two elements
MOD - MODULO Operator on top two elements
CLEAN - Clear entire stack
IMPORT - Import another program
EXIT - Exits with the code of the float on the top of the stack
EVAL - Runs native python code
Higher Lower Guessing Game
To demo the languages capability, I created a demo higher lower game. The user guesses a number, and the program says higher/lower until the user guesses correctly. This uses the std library to expose the randint and exit_success functions.
IMPORT 'std/*';
PUSH 1;
PUSH 10;
CALL randint;
PROC gameloop {
PUSH 'Please enter a number between 1 and 10: ';
CALL float_input;
IF == {
PUSH 'Correct';
PRINT;
RETURN;
}
IF > {
PUSH 'The number is higher!';
PRINT;
POP;
CALL gameloop;
}
IF < {
PUSH 'The number is lower!';
PRINT;
POP;
CALL gameloop;
}
}
CALL gameloop;
POP;
PUSH 'The number was: ';
SWAP;
CONCAT;
PRINT;
CALL exit_success;Final Thoughts
I realise this blog has become extremely long, I did want to discuss how I implemented a standard library, as well as the potential future of this language. I think I will leave that to a futureBuilding Cobra in less than a day was an eye-opening experience that taught me a lot about the inner workings of interpreters and the design trade-offs that come with creating a programming language. Even though it’s slow and minimal, seeing it handle recursion, conditions, and procedures felt incredibly rewarding. This project started as a fun experiment, but it’s given me a much deeper appreciation for how real languages are structured and evolve over time.
Next, I want to explore taking Cobra further: potentially compiling it to bytecode or even transpiling it into Python, building a small standard library, and maybe even attempting to make it self-hosted (a version of Cobra written entirely in Cobra). Whether or not I get there, I plan to document that journey in future blogs, diving into topics like compilation, standard library design, and what it really takes to make a language stand on its own.