Fowler's post advocates the use parser generators (in particular, ANTLR). I'd like to go a little further and make the following points:
- Writing a recursive decent parser is much easier than you think.
- You don't need a parser generator to overcome your parser fear.
- Get over the trauma induced by the compiler class you took.
Actually I only need to show a good argument for point #1. The other two points naturally follow.
I will make my case by using the well established technique of proof by example: A 64 lines Java class that evaluates arithmetic expressions supporting +, -, *, /, negation, and parenthesis.
public class CompactSolver {
private String input;
public CompactSolver(String input_) {
input = input_.trim();
}
private char peek(int offset) {
return offset >= input.length() ? '\0' :
input.charAt(offset);
}
private boolean consumeIf(char c) {
if (peek(0) != c)
return false;
consume(1);
return true;
}
private String consume(int numChars) {
if (numChars == 0)
throw new RuntimeException("syntax error");
String result = input.substring(0, numChars);
input = input.substring(numChars).trim();
return result;
}
public double eval() {
double lhs = mul();
if (consumeIf('+'))
return lhs + eval();
else if (consumeIf('-'))
return lhs - eval();
return lhs;
}
private double mul() {
double lhs = unary();
if (consumeIf('*'))
return lhs * mul();
else if (consumeIf('/'))
return lhs / mul();
return lhs;
}
private double unary() {
if (consumeIf('-'))
return -unary();
if (consumeIf('(')) {
double result = eval();
if (!consumeIf(')'))
throw new RuntimeException("Missing ')'");
return result;
}
return literal();
}
private double literal() {
for (int i = 0; true; ++i)
if (!Character.isDigit(peek(i)))
return Integer.parseInt(consume(i));
}
}
Update 1: Changed the precedence of the (unary) negation operator (Many thanks to Orlando).
Nice post.
I noticed that the expressions are evaluated in a different way to common interpretations. The binary operators are right associative instead of left and the unary negation operator has the lowest precedence instead of the highest.
e.g.
2-3*4+20 gives -30.0 instead of 10.0
-2-3*4+20 gives 30.0 instead of 6.0
Here's how I'd write the parser using Waxeye. I wrote evaluators for Ruby and Java and they ended up being about the same size as your parser. However, things really start to pay off once the language gets bigger.
calc <- ws sum
sum <- prod *([+-] ws prod)
prod <- unary *([*/] ws unary)
unary <= '-' ws unary | :'(' ws sum :')' ws | num
num <- +[0-9] ws
ws <: *[ \t\n\r]
Anonymous
November 11, 2008 at 5:59 PMHi Orlando,
My main point is that parsing (recursive decent or otherwise) is simple. I am not preaching against parser generators...
Anyway, thanks for the comment. I fixed the precedence of the negation operator. Left-associativity is also not very complicated: the penalty should be about 5-6 extra lines of code.
Unknown
November 12, 2008 at 12:45 PMCare to share the 5-6 extra lines of code?
Unknown
August 16, 2013 at 2:52 AMThank you for sharing thiss
Sofia
July 3, 2024 at 8:18 AM