How to Create a Basic Arithmetic Programming Language — Part 1 | Sea #1a
This is the second post in my Sea Programming Language Series; see Sea #0. In this article, I will begin programming Sea based on the grammar we created last time. I will start by implementing the arithmetic operations for Sea in an interpreter and transpiler. This will also lay the foundation for Sea in general.
I will be writing the code for Sea in Python. Once Sea is functional, I will rewrite the code in Sea itself. You can create your own Python project, or follow from my starting point.
Unfortunately, I haven’t had the time to properly continue this project due to school. I expect the development of Sea to slow down considerably. I actually typed the above introduction over a month ago at this point. Now, to begin, I will be working off of the grammar we developed previously. Eventually, I’ll create a text file in the actual project that describes the grammar. For now, I’ll just use the article I wrote.
A good method to begin creating a language is to implement the basic features first. We will be creating a skeleton for our project to build on, and we will create some of the arithmetic features our language has. So, let’s begin.
Basic Terminal Interface
To start off, we’re going to make Sea work both from the terminal, similar to the python
command, and for files. I began by creating a simple terminal interface that prints out what the user types, and ends if the user enters exit
, ^C
(Keyboard Interrupt), or ^D
(End of File):
This will result in an interface that behaves like the following:
For this to work, we need to continue to develop our main.py
:
I’m using the recommended approach in Python 3 to use if __name__ == "__main__"
and so my code is within a main function. There is a lot here that is simply planning for later. The first two lines of the main function simply parse some command-line arguments we will use later. mode
will tell us whether to use the transpiler, compiler, file interpreter, or terminal interpreter. debug
will tell us whether to print debug information or not. The remaining arguments passed in are optional information that describe the files we will eventually read.
For now, we only have our simple terminal connected. In order to run this, we need to run python modules/main.py None False
. While we aren’t using the debug
argument yet, the program expects at least two arguments. In the future, we’ll create a bash program to make running our language easier. Please note that, in each of these article sections, I will link the commit to which I am referring. If you want to see the full project structure at any given time, it may be found there. Also note that, in these GitHub code segments, I will be suffixing the file name with the commit the file is from.
Basic General Interface
As previously mentioned, we want both a file interface and a terminal interface. So, we can create a general interface that allows both of those to work. To begin, we should form the foundation of our error handling. There are many ways to handle lexer, parser, and visitor errors; however, I choose to use Python exceptions to do so. Since a serious error will prevent the completion of our process, exceptions abstractly make sense in this context.
The only interesting thing to point out with this class is that, to print out the message of the error, I have a get_message
method. This is so, in child classes, I will be able to override this to create custom messages.
This is our general interface so far. You should recognize debug
and mode
from before. I’ve also added this streams
parameter. This is because, when interfacing via the terminal, all input and output needs to be done in one way, and while interfacing via files, all input and output must be done a different way. By passing this function a generic streams
object, which we will define shortly, both of those types of interfaces can work here. Also notice that, when debugging is enabled, whether there is an error or not, the debug information will be printed.
Streams
We will need three main types of streams: input, output, and error. We can print out debug information directly to an output stream. The following is a lot of lines of very simple code. I certainly could make it simpler, but I think that the abstraction makes more sense this way.
This may seem overwhelming at first, but I promise you it’s not too complicated. In streams/general.py
, I created three abstract classes of stream types. InStream
is our abstract input stream class that simply has a method to read the stream one character at a time. OutStream
is our abstract output stream that simply writes data to the stream. ErrorStream
is our abstract error stream that is a modified output stream that can write an error. Then I also created a NullStream
. This is a stream that can be used for input, output, or error that simply does nothing.
Next is holder.py
which creates a simple data class — data classes are a very useful feature of python you should learn about — that holds four streams. This holder object is what we will pass into interface
as streams
in interfaces/general.py
.
Lastly, we actually implement these streams for the terminal interface. We will read characters from a buffer that is filled in interfaces/terminal.py
. Output and errors only need to be printed to the terminal.
So, while all of that code may look like a lot at first, it’s actually some incredibly basic code that serves primarily to create this abstract conceptualization of streams.
Using the General Interface
To use our general interface, we need to link it to our terminal interface.
To begin, I renamed the begin_interfacing
function to interface
. Be sure to modify your main.py
as well to reflect this. Now, we begin interfacing with the terminal by creating our streams
object (lines 8 to 13). Using our previously constructed classes, this task is simple. On line 17, our buffer now also writes to our input object. The reason we have a local buffer
variable at all is for future plans. It also helps us to simplify our code. I also added a few input checks. If the user enters nothing, the loop simply prompts them again. Also, I created a way for the user to toggle debug mode on or off while the program is running in terminal mode. Once we construct the buffer inside the streams.in_stream
object, we are ready to pass the object to general.interface
.
For our general.py
file, all I did was modify our try
block to perform the same function that it did previously: outputting the user’s input right back to them. If you are following along the project structure, I also modified our Makefile
so that we can simply run make
instead of python modules/main.py None False
. Testing our code as we write is crucial to find bugs before they grow out of control.
Positions
A useful feature of our program would be to tell the user where an error occurred. This is especially useful in a file, but is still very useful in the terminal. My version of this will be very simple for now, but you can make yours as complex as you want it to be.
The SymbolPosition
class keeps track of the line and column that a symbol appears in, given some stream. The Position
class then keeps track of a section of symbols in a stream. Both of these classes have some useful helper methods we will utilize later. I then also updated our ErrorStream
class so that the printed message will now be f”{type(data).__name__} at {data.position}: {data.get_message()}\n”
. Lastly, I modified the SeaError
class to also have a position
instance variable:
Now that we have a storage mechanism for positions, let’s see how they’re used in the lexer.
Lexer Skeleton
Recall, our lexer will go through the input stream one symbol at a time, and it will construct a list of tokens. To achieve this, we will require the following:
I’ve implemented all of the lexical operations we will need, apart from actual token construction. Our lexer will have a current position in the input stream that it keeps track of so our tokens can have an accurate position. We can then use our skip
, advance
, and take
methods to go symbol by symbol to construct tokens. Once we implement the take_token
method, our lexer will be functional. I’ve also created a foundational LexerError
for us to build errors on top of. Now, we need the tokens for us to create.
Tokens
Looking back at my previous article, we will have keyword
, identifier
, constant
, string-literal
, and punctuator
tokens. Eventually, we will implement all of these. For now, however, we are only trying to implement our arithmetic functionality. So, we only need constants and punctuators. Specifically, we will create the following classes: Token
, Constant
, NumericalConstant
, Punctuator
and Operator
.
Our tokens will have a position and some data. Each type of token will have a list of symbols that are allowed, and a method to construct them. Our numerical constants are either floats or ints. Lastly, our punctuators are our punctuation symbols and our operators. I’m using enums to keep track of each set of symbols, so we can compare against them later. Hopefully I’m not going through these too quickly. I recommend reading over the files a few times to see what’s going on. There’s a lot of code, but it’s mostly skeleton functions we will work with later on.
As I am writing this, I’ve realized this is a dauntingly large task, and so I’ve decided to split this article into (at least) two parts. In the next article, I will continue the development of our lexer. We will be able to construct tokens and print out the list of tokens in the debug information. We will construct a parser to create an abstract syntax tree, as well as our interpreter. Near the end, we will be able to read from files, transpile code into C, and we will create a bash program to interact with our code. That will conclude the arithmetic section of Sea. Beyond that, we will continue all necessary work for this language. If you have questions, please ask them.