Interp - an Interpreted Programming Language Pt.4 - Hello, world
Welcome to Pt.4 where we implement a simple hello world case. :)
Parser
Parsing is where text gets converted into data structures, or Rust structs in this case. Lexing is a part of parsing where text gets converted into tokens. We make a directory for all things parsing.
src/parser/mod.rs:
pub mod line;
pub mod expr;
pub mod loc;
lalrpop_util::lalrpop_mod!(grammar, "/parser/grammar.rs");
pub fn parse_line(line_str: &str, filename: &str, line_nr: usize) -> line::Line {
grammar::LineParser::new().parse(filename, line_nr, line_str).unwrap()
}
What are line, expr, or loc? What is LineParser?. We’ll get to that shortly.
Line
The Line struct represents a line of code. Currently it can either be empty or contain a single expression.
src/parser/line.rs:
use crate::parser::expr::Expr;
use crate::parser::loc::Loc;
pub struct Line {
pub type_: LineType,
pub loc: Loc,
}
pub enum LineType {
Empty,
Expr(Expr),
}
Expr
Expressions are values. At the moment they can be strings, variables, or function calls.
src/parser/expr.rs:
use crate::parser::loc::Loc;
pub struct Expr {
pub type_: ExprType,
pub loc: Loc,
}
pub enum ExprType {
String(String),
Variable(String),
Call(Box<Expr>, Vec<Expr>),
}
Loc
Locs are locations of source code.
src/parser/loc.rs:
pub struct Loc {
pub filename: String,
pub line_nr: usize,
pub index: usize,
}
Lalrpop
For parsing and lexing we use the LALRPOP crate (found here). Add LALRPOP and snailquote (found here) to your dependencies.
Cargo.toml:
[dependencies]
...
lalrpop-util = { version = "0.19.8", features = ["lexer"] }
snailquote = "0.3.1"
...
[build-dependencies]
lalrpop = "0.19.8"
Let’s define a build script. NOTE: ‘build.rs’ is outside of the ‘src/’ directory.
build.rs:
fn main() {
lalrpop::process_root().unwrap();
}
Grammar
Now we define a grammar.
src/parser/grammar.lalrpop:
use crate::parser::line::{Line, LineType};
use crate::parser::expr::{Expr, ExprType};
use crate::parser::loc::Loc;
// Our parser will take a filename and a line number to construct good Locs
grammar(filename: &str, line_nr: usize);
// This is our lexer, these are it's tokens
// the r#"..."# strings are regexes
match {
r#"[A-Za-z_][A-Za-z_0-9]*"# => Identifier,
r#""([^"\\]|\\(.|\n))*""# => String,
"(",
")",
",",
} else {
r#"."#
}
// @L holds the index of the character where it is. Used to construct Locs
// LineParser is defined here
pub Line: Line = {
<l:@L> => Line { type_: LineType::Empty, loc: Loc { filename: filename.to_owned(), line_nr, index: l } },
<l:@L> <a:expr> => Line { type_: LineType::Expr(a), loc: Loc { filename: filename.to_owned(), line_nr, index: l } },
}
expr: Expr = {
string,
variable,
function_call,
}
// ex. "hello"
string: Expr = {
<l:@L> <a:String> => Expr { type_: ExprType::String(a.to_owned()), loc: Loc { filename: filename.to_owned(), line_nr, index: l } }
}
// ex. Print
variable: Expr = {
<l:@L> <a:Identifier> => Expr { type_: ExprType::Variable(a.to_owned()), loc: Loc { filename: filename.to_owned(), line_nr, index: l } }
}
// ex. Print("Hello, world!")
function_call: Expr = {
<l:@L> <a:expr> "(" <b:expr_list> ")" => Expr { type_: ExprType::Call(Box::new(a), b), loc: Loc { filename: filename.to_owned(), line_nr, index: l } },
}
// ex. "arg1", "arg2"
expr_list: Vec<Expr> = {
=> vec![],
<a:expr> "," <mut b:expr_list> => { b.insert(0, a); b },
<a:expr> => vec![<>]
}
We have now defined a grammar. parse_line (see above) now takes a line of code and returns a Line struct.
VM
Next we implement a virtual machine. The job of the virtual machine is to look at the parsed Lines and Exprs and act upon them.
Value
Values are pieces of data the virtual machine keeps in memory. Value is defined like so:
src/vm/value.rs:
pub enum Value {
Void,
String(String),
PrintBuiltin,
}
String is a string, Void means undefined, and PrintBuiltin is, well, the built-in Print function. Now for some logic, hold on to your hats.
VM State
src/vm/mod.rs:
mod value;
use crate::parser::line::{Line, LineType};
use crate::parser::expr::{Expr, ExprType};
pub struct VM {
// Our VM doesn't hold any state yet
}
pub fn init_vm() -> VM {
VM { }
}
exec_line
Evaluates and executes a Line struct
pub fn exec_line(vm: &mut VM, parsed_line: &Line) {
match &parsed_line.type_ {
LineType::Empty => { }
LineType::Expr(expr) => {
expr_to_value(vm, &expr);
}
}
}
expr_to_value
Converts Expr to a Value
fn expr_to_value(vm: &mut VM, expr: &Expr) -> value::Value {
match &expr.type_ {
ExprType::String(str) => value::Value::String(snailquote::unescape(&str).unwrap()),
ExprType::Variable(var) => {
match var.as_str() {
"Print" => value::Value::PrintBuiltin,
_ => todo!()
}
}
ExprType::Call(fn_expr, arg_exprs) => {
let fn_value = expr_to_value(vm, &fn_expr);
match fn_value {
value::Value::PrintBuiltin => {
println!("{}", arg_exprs.into_iter().map(
|a| value_repr(expr_to_value(vm, a)))
.collect::<Vec<String>>().join(" "));
value::Value::Void
}
_ => todo!()
}
}
}
}
value_repr
This function returns the string representation of a Value.
src/opt.rs:
fn value_repr(val: value::Value) -> String {
match val {
value::Value::String(str) => str,
value::Value::Void => "<void>".to_owned(),
value::Value::PrintBuiltin => "<Print>".to_owned(),
}
}
Wrapping it up
We should update main. Input -> Parser -> Virtual Machine
src/main.rs:
mod opt;
mod parser;
mod vm;
use std::io::{self, BufRead};
use std::fs;
use structopt::StructOpt;
fn main() {
let opt = opt::Opt::from_args();
let mut vm = vm::init_vm();
let reader: Box<dyn BufRead>;
let filename: String;
match opt.source_file {
Some(_filename) => {
reader = Box::new(io::BufReader::new(fs::File::open(&_filename).unwrap()));
filename = _filename;
}
None => {
reader = Box::new(io::BufReader::new(io::stdin()));
filename = "<stdin>".to_owned();
}
}
for (line_nr, line_str) in reader.lines().enumerate() {
let line_str = line_str.unwrap();
let parsed_line = parser::parse_line(&line_str, &filename, line_nr);
vm::exec_line(&mut vm, &parsed_line);
}
}
Testing
Let’s test our interpreter. Add Interp (or whatever you call your language) to path. The binary can be found inside your project directory under ‘target/debug/’
$ cargo build
$ echo 'Print("Hello, world!")' > hello.it
$ interp hello.it
Hello, world!
Source Code
Source code for this part can be found here.