Tales From The Code Front Stories in words and pictures

10 POKE53280,0

How #Genesis64 parses C64 BASIC (Part 1)

First, let's get a glimpse of what the C64 BASIC (BASIC v2) consists of. There are 76 keywords the parser of a real C64 understands:

  • Operators for logic, arithmetics, and strings
  • Commands
  • Numerical functions
  • String manipulation
  • String expressions
  • Output methods
Operator Commands Numeric functions String manipulation Output
AND CLOSE GOSUB NEXT RUN ABS PEEK CHR$ SPC(
OR CLR GO ON SAVE ASC POS LEFT$ TAB(
NOT CONT GOTO OPEN STOP ATN RND MID$  
+ CMD IF POKE STEP COS SGN RIGHT$  
- DATA INPUT PRINT SYS EXP SIN STR$  
* DEF INPUT# PRINT# THEN FN SQR    
/ DIM LET READ TO FRE TAN    
END LIST REM VERIFY INT USR    
= FOR LOAD RESTORE WAIT LEN VAL    
< GET, GET# NEW RETURN   LOG      
>                

There are also 3 system variables and a constant:

ST (STATUS) TI (TIME) TI$ (TIME$) π (PI)

(Bold means that #Genesis64 understands it)

During the next couple of posts, I'll try to shed some light on how G64 handles these.

10 POKE53280,0

When I first had the idea of doing a C64 parser in javascript it was just this: parse C64 BASIC and I started of by reading through a good deal of articles describing how parsing is done and I read Let’s Build A Simple Interpreter. Part 1. Then I tried what I've learned by writing the first version of G64 and didn't like how it worked.

I had the morinic idea to try something a bit ... different.

In G64 parsing is done in multiple steps and the result is a list of Tokens that are then used to "run" the BASIC program. First, let's have a look at the Token structure:

Doesn't look pretty, but it does the job.

So everything in G64 that deals with BASIC either is a Token or returns one. The most important part of that structure is TokenType ... which is ... the type of the token :).

Let's do a very BASIC example program:

10 poke53280,0

(If you wonder why I didn't go with a simple: 10 PRINT "hello world", it is because PRINT is everything but nice and simple (but that is for another post))

As I mentioned, I decided to go the lazy way and use Regular Expressions as much as possible, so here's the first one:

/(\d+)\s*(.*)/

This returns two matches if the line has a line number, for the line above it returns 10 and pO53280,0.
For this post, I'm going to skip a few steps (I'll explain them in detail when dealing with PRINT and strings), for now, the next step is de-abbreviation and it simply converts pO to POKE.

Now that we have POKE53280,0 we can feed that into the Tokenizer.

/^poke\s*(.*)$/

All commands have this simple Regex attached and the first thing the Tokenizer does is to loop over that list of commands and try to match their Regex (there's more, but let's try to keep it simple).

This gives us a single match: 53280,0. As POKE expects 2 numerical parameters separated by a comma, we try to split that piece of code by the comma (",").
The results are 53280 and 0. Both are fed into the Tokenizer and (as they are numbers) returned as Token, type "number" and added to POKE Token's value list.


This is what the resulting Line (and the Tokens) look like in the debugger.

Easy, isn't it?

Run

Now that we have a Token, running it is quite easy and to make things short, here's the code for running the POKE Token:


I need to rewrite this sometime, it shows that this is one of the first commands I added.

As you can see we pass the Token itself and return a Token, which is either an EOP (EndOfPart) or ERR (as in error). If the returned value is EOP we carry on with the next Token or line, if it is an ERR the program is stopped and the error is printed.
There's also some sort of explanation added to the error Token and spit out to the console, which can be used to find an error a bit quicker.

Next time I'll explain how operators work and how mathematical equations are solved.

You can try out your C64 BASIC skills in #Genesis64. You can get a list of all working c64 BASIC commands and functions by typing "help" and hit enter.

Happy coding,
nGFX

Comments are closed