How to Create a Basic Arithmetic Programming Language — Part 2 | Sea #1b

4 min readJan 23, 2022

Welcome to the latest post in my Sea Programming Language series. This is the second part of the Arithmetic Programming Language section of it, so check out part one. Our goal is to begin implementing the Sea language’s arithmetic operations. In this process, you can learn how to create your own arithmetic language using either an interpreter or a transpiler.

I will admit this is more of a journal than a tutorial. My intent is to record my changes to the Sea language for my own benefit, and to allow you to analyze what I’ve done to do it yourself.

Finishing the Lexer

Last time, I finished creating the skeleton necessary to create our lexer, which will convert our code into a list of tokens. Now we can begin work on implementing it.

Specialized Errors

With our LexerError class, we can now create specific errors to display to the developer. I’ve gone ahead and added three errors:

The first is for when a single character (symbol) is unrecognized entirely. The second is when the second character of a potential operator is unrecognized; for example, “/=”. Lastly, we need an error for when a number contains more than one decimal point. Now that we have these, we can work on our tokens’ construct methods.

I’d like to point out that, if you are following along, there was a mistake in our Lexer’s skip method. The correct version is:

def skip(self):
    self.symbol = self.in_stream.read_symbol()

It used to be read rather than read_symbol. Now, let’s continue.

Token Construction

Our methodology here is to let the tokens create themselves, so the lexer doesn’t need to worry about it. The lexer just needs to match the first symbol to a type of token, and then the token uses the lexer to construct itself.

I realized I neglected to add a repr for the Punctuator class, so I quickly created one. Each punctuator is only one character, so we can simply take that character and match it to our list. Since the lexer already went through and matched the symbol with a punctuator, we don’t need to check again.

For operators, it’s a little more complex because we want the ** operator. We could make it the parser’s job to interpret two multiplication tokens as exponentiation, but I prefer to make it the lexer’s job, since this method can differentiate between ** and * *later down the line. Using the take_token_string method in the lexer, the job of creating the operator is pretty easy. If it doesn’t match any of our operators, then we raise our error.

Token Construction in the Lexer

We begin by ignoring spaces. Then, we create a new position for our potential token or error. We can then go through our token types and compare the current symbol against all possible symbols. This works because there isn’t any overlap in symbols that can start a token of different types. If there is no match, raise our error. Otherwise, using the matching token class to construct our token and give it a position. It is then added to the list.

Add the Lexer to the Interface

Now, we only have one thing left to do: add the lexer to the general interface.

Previously, our interface only printed back out anything we entered. But now, it creates a lexer, links it to the LexerError class, so errors can have positions, and attempts to make the token list. If it succeeds, it only prints “Done” for now. With debug on, it will display our tokens. Let’s take a look at the result:

As far as I can tell, our lexer is working! With our token list, the next step is to create a parser that can iterate through that list and construct an abstract syntax tree. This is where our order of operations will be encoded.

A Moment of Refactoring

As I begin to write the Parser, I realize I need to improve the folder structure of this project, so it won’t get out of hand. If you’re following along, check out the GitHub link to see the changes. Otherwise, nothing was changed that will affect functionality.

I’m going to leave things there. Seeing how short this article is, I do not find it right to monetize it. I may or may not continue this series, as working on the language becomes much harder when I have to do it in a manner that lets me record it.