Interp - an Interpreted Programming Language Pt.4 - Hello, world

4 minute read

Welcome to Pt.4 where we implement a simple hello world case. :)

Parser

Parsing is where text gets converted into data structures, or Rust structs in this case. Lexing is a part of parsing where text gets converted into tokens. We make a directory for all things parsing.

src/parser/mod.rs:

pub mod line;
pub mod expr;
pub mod loc;

lalrpop_util::lalrpop_mod!(grammar, "/parser/grammar.rs");

pub fn parse_line(line_str: &str, filename: &str, line_nr: usize) -> line::Line {
    grammar::LineParser::new().parse(filename, line_nr, line_str).unwrap()
}

What are line, expr, or loc? What is LineParser?. We’ll get to that shortly.

Line

The Line struct represents a line of code. Currently it can either be empty or contain a single expression.

src/parser/line.rs:

use crate::parser::expr::Expr;
use crate::parser::loc::Loc;

pub struct Line {
    pub type_: LineType,
    pub loc: Loc,
}

pub enum LineType {
    Empty,
    Expr(Expr),
}

Expr

Expressions are values. At the moment they can be strings, variables, or function calls.

src/parser/expr.rs:

use crate::parser::loc::Loc;

pub struct Expr {
    pub type_: ExprType,
    pub loc: Loc,
}

pub enum ExprType {
    String(String),
    Variable(String),
    Call(Box<Expr>, Vec<Expr>),
}

Loc

Locs are locations of source code.

src/parser/loc.rs:

pub struct Loc {
    pub filename: String,
    pub line_nr: usize,
    pub index: usize,
}

Lalrpop

For parsing and lexing we use the LALRPOP crate (found here). Add LALRPOP and snailquote (found here) to your dependencies.

Cargo.toml:

[dependencies]

...

lalrpop-util = { version = "0.19.8", features = ["lexer"] }
snailquote = "0.3.1"

...


[build-dependencies]
lalrpop = "0.19.8"

Let’s define a build script. NOTE: ‘build.rs’ is outside of the ‘src/’ directory.

build.rs:

fn main() {
    lalrpop::process_root().unwrap();
}

Grammar

Now we define a grammar.

src/parser/grammar.lalrpop:

use crate::parser::line::{Line, LineType};
use crate::parser::expr::{Expr, ExprType};
use crate::parser::loc::Loc;

// Our parser will take a filename and a line number to construct good Locs
grammar(filename: &str, line_nr: usize); 

// This is our lexer, these are it's tokens
// the r#"..."# strings are regexes 
match {
    r#"[A-Za-z_][A-Za-z_0-9]*"# => Identifier,
    r#""([^"\\]|\\(.|\n))*""# => String,
    "(",
    ")",
    ",",
} else {
    r#"."#
}

// @L holds the index of the character where it is. Used to construct Locs
// LineParser is defined here
pub Line: Line = {
    <l:@L> => Line { type_: LineType::Empty, loc: Loc { filename: filename.to_owned(), line_nr, index: l } },
    <l:@L> <a:expr> => Line { type_: LineType::Expr(a), loc: Loc { filename: filename.to_owned(), line_nr, index: l } },
}

expr: Expr = {
    string,
    variable,
    function_call,
}

// ex. "hello"
string: Expr = {
    <l:@L> <a:String> => Expr { type_: ExprType::String(a.to_owned()), loc: Loc { filename: filename.to_owned(), line_nr, index: l } }
}

// ex. Print
variable: Expr = {
    <l:@L> <a:Identifier> => Expr { type_: ExprType::Variable(a.to_owned()), loc: Loc { filename: filename.to_owned(), line_nr, index: l } }
}

// ex. Print("Hello, world!")
function_call: Expr = {
    <l:@L> <a:expr> "(" <b:expr_list> ")" => Expr { type_: ExprType::Call(Box::new(a), b), loc: Loc { filename: filename.to_owned(), line_nr, index: l } },
}

// ex. "arg1", "arg2"
expr_list: Vec<Expr> = {
    => vec![],
    <a:expr> "," <mut b:expr_list> => { b.insert(0, a); b },
    <a:expr> => vec![<>]
}

We have now defined a grammar. parse_line (see above) now takes a line of code and returns a Line struct.

VM

Next we implement a virtual machine. The job of the virtual machine is to look at the parsed Lines and Exprs and act upon them.

Value

Values are pieces of data the virtual machine keeps in memory. Value is defined like so:

src/vm/value.rs:

pub enum Value {
    Void,
    String(String),
    PrintBuiltin,
}

String is a string, Void means undefined, and PrintBuiltin is, well, the built-in Print function. Now for some logic, hold on to your hats.

VM State

src/vm/mod.rs:

mod value;

use crate::parser::line::{Line, LineType};
use crate::parser::expr::{Expr, ExprType};

pub struct VM {
    // Our VM doesn't hold any state yet
}

pub fn init_vm() -> VM {
    VM { }
}

exec_line

Evaluates and executes a Line struct

pub fn exec_line(vm: &mut VM, parsed_line: &Line) {
    match &parsed_line.type_ {
        LineType::Empty => { }
        LineType::Expr(expr) => {
            expr_to_value(vm, &expr);
        }
    }
}

expr_to_value

Converts Expr to a Value

fn expr_to_value(vm: &mut VM, expr: &Expr) -> value::Value {
    match &expr.type_ {
        ExprType::String(str) => value::Value::String(snailquote::unescape(&str).unwrap()),
        ExprType::Variable(var) => {
            match var.as_str() {
                "Print" => value::Value::PrintBuiltin,
                _ => todo!()
            }
        }
        ExprType::Call(fn_expr, arg_exprs) => {
            let fn_value = expr_to_value(vm, &fn_expr);
            match fn_value {
                value::Value::PrintBuiltin => {
                    println!("{}", arg_exprs.into_iter().map(
                        |a| value_repr(expr_to_value(vm, a)))
                        .collect::<Vec<String>>().join(" "));
                    value::Value::Void
                }
                _ => todo!()
            }
        }
    }
}

value_repr

This function returns the string representation of a Value.

fn value_repr(val: value::Value) -> String {
    match val {
        value::Value::String(str) => str,
        value::Value::Void => "<void>".to_owned(),
        value::Value::PrintBuiltin => "<Print>".to_owned(),
    }
}

Wrapping it up

We should update main. Input -> Parser -> Virtual Machine

src/main.rs:

mod opt;
mod parser;
mod vm;

use std::io::{self, BufRead};
use std::fs;
use structopt::StructOpt;

fn main() {
    let opt = opt::Opt::from_args();

    let mut vm = vm::init_vm();

    let reader: Box<dyn BufRead>;
    let filename: String;
    match opt.source_file {
        Some(_filename) => {
            reader = Box::new(io::BufReader::new(fs::File::open(&_filename).unwrap()));
            filename = _filename;
        }
        None => {
            reader = Box::new(io::BufReader::new(io::stdin()));
            filename = "<stdin>".to_owned();
        }
    }

    for (line_nr, line_str) in reader.lines().enumerate() {
        let line_str = line_str.unwrap();

        let parsed_line = parser::parse_line(&line_str, &filename, line_nr);
        vm::exec_line(&mut vm, &parsed_line);
    }
}

Testing

Let’s test our interpreter. Add Interp (or whatever you call your language) to path. The binary can be found inside your project directory under ‘target/debug/’

$ cargo build
$ echo 'Print("Hello, world!")' > hello.it
$ interp hello.it
Hello, world!

Source Code

Source code for this part can be found here.

Next Up

Click for Pt. 5 or click here for all parts.

Updated: