Skip to content

Implement IndentationError #1512

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 12, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions compiler/src/error.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
use rustpython_parser::error::{LexicalErrorType, ParseError, ParseErrorType};
use rustpython_parser::location::Location;
use rustpython_parser::token::Tok;

use std::error::Error;
use std::fmt;
Expand Down Expand Up @@ -41,6 +42,20 @@ pub enum CompileErrorType {
}

impl CompileError {
pub fn is_indentation_error(&self) -> bool {
if let CompileErrorType::Parse(parse) = &self.error {
match parse {
ParseErrorType::Lexical(LexicalErrorType::IndentationError) => true,
ParseErrorType::UnrecognizedToken(token, expected) => {
*token == Tok::Indent || expected.clone() == Some("Indent".to_string())
}
_ => false,
}
} else {
false
}
}

pub fn is_tab_error(&self) -> bool {
if let CompileErrorType::Parse(parse) = &self.error {
if let ParseErrorType::Lexical(lex) = parse {
Expand Down
33 changes: 26 additions & 7 deletions parser/src/error.rs
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ pub enum LexicalErrorType {
StringError,
UnicodeError,
NestingError,
IndentationError,
TabError,
DefaultArgumentError,
PositionalArgumentError,
Expand All @@ -36,6 +37,9 @@ impl fmt::Display for LexicalErrorType {
LexicalErrorType::FStringError(error) => write!(f, "Got error in f-string: {}", error),
LexicalErrorType::UnicodeError => write!(f, "Got unexpected unicode"),
LexicalErrorType::NestingError => write!(f, "Got unexpected nesting"),
LexicalErrorType::IndentationError => {
write!(f, "unindent does not match any outer indentation level")
}
LexicalErrorType::TabError => {
write!(f, "inconsistent use of tabs and spaces in indentation")
}
Expand Down Expand Up @@ -121,7 +125,7 @@ pub enum ParseErrorType {
/// Parser encountered an invalid token
InvalidToken,
/// Parser encountered an unexpected token
UnrecognizedToken(Tok, Vec<String>),
UnrecognizedToken(Tok, Option<String>),
/// Maps to `User` type from `lalrpop-util`
Lexical(LexicalErrorType),
}
Expand All @@ -143,10 +147,19 @@ impl From<LalrpopError<Location, Tok, LexicalError>> for ParseError {
error: ParseErrorType::Lexical(error.error),
location: error.location,
},
LalrpopError::UnrecognizedToken { token, expected } => ParseError {
error: ParseErrorType::UnrecognizedToken(token.1, expected),
location: token.0,
},
LalrpopError::UnrecognizedToken { token, expected } => {
// Hacky, but it's how CPython does it. See PyParser_AddToken,
// in particular "Only one possible expected token" comment.
let expected = if expected.len() == 1 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be an opportunity to use the first function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because the intent is to get None if expected has more than two elements.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, okay, I understand. Thank you!

Some(expected[0].clone())
} else {
None
};
ParseError {
error: ParseErrorType::UnrecognizedToken(token.1, expected),
location: token.0,
}
}
LalrpopError::UnrecognizedEOF { location, .. } => ParseError {
error: ParseErrorType::EOF,
location,
Expand All @@ -167,8 +180,14 @@ impl fmt::Display for ParseErrorType {
ParseErrorType::EOF => write!(f, "Got unexpected EOF"),
ParseErrorType::ExtraToken(ref tok) => write!(f, "Got extraneous token: {:?}", tok),
ParseErrorType::InvalidToken => write!(f, "Got invalid token"),
ParseErrorType::UnrecognizedToken(ref tok, _) => {
write!(f, "Got unexpected token {}", tok)
ParseErrorType::UnrecognizedToken(ref tok, ref expected) => {
if *tok == Tok::Indent {
write!(f, "unexpected indent")
} else if expected.clone() == Some("Indent".to_string()) {
write!(f, "expected an indented block")
} else {
write!(f, "Got unexpected token {}", tok)
}
}
ParseErrorType::Lexical(ref error) => write!(f, "{}", error),
}
Expand Down
5 changes: 1 addition & 4 deletions parser/src/lexer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -791,11 +791,8 @@ where
break;
}
Ordering::Greater => {
// TODO: handle wrong indentations
return Err(LexicalError {
error: LexicalErrorType::OtherError(
"Non matching indentation levels!".to_string(),
),
error: LexicalErrorType::IndentationError,
location: self.get_pos(),
});
}
Expand Down
6 changes: 3 additions & 3 deletions parser/src/python.lalrpop
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ FileLine: ast::Suite = {

Suite: ast::Suite = {
SimpleStatement,
"\n" indent <s:Statement+> dedent => s.into_iter().flatten().collect(),
"\n" Indent <s:Statement+> Dedent => s.into_iter().flatten().collect(),
};

Statement: ast::Suite = {
Expand Down Expand Up @@ -1124,8 +1124,8 @@ extern {
type Error = LexicalError;

enum lexer::Tok {
indent => lexer::Tok::Indent,
dedent => lexer::Tok::Dedent,
Indent => lexer::Tok::Indent,
Dedent => lexer::Tok::Dedent,
StartProgram => lexer::Tok::StartProgram,
StartStatement => lexer::Tok::StartStatement,
StartExpression => lexer::Tok::StartExpression,
Expand Down
26 changes: 26 additions & 0 deletions tests/snippets/invalid_syntax.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,32 @@ def valid_func():
else:
raise AssertionError("Must throw syntax error")

src = """
if True:
pass
"""

with assert_raises(IndentationError):
compile(src, '', 'exec')

src = """
if True:
pass
pass
"""

with assert_raises(IndentationError):
compile(src, '', 'exec')

src = """
if True:
pass
pass
"""

with assert_raises(IndentationError):
compile(src, '', 'exec')

src = """
if True:
pass
Expand Down
4 changes: 3 additions & 1 deletion vm/src/vm.rs
Original file line number Diff line number Diff line change
Expand Up @@ -416,7 +416,9 @@ impl VirtualMachine {

#[cfg(feature = "rustpython-compiler")]
pub fn new_syntax_error(&self, error: &CompileError) -> PyObjectRef {
let syntax_error_type = if error.is_tab_error() {
let syntax_error_type = if error.is_indentation_error() {
Copy link
Contributor

@windelbouwman windelbouwman Oct 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using a sequence of if statements, we could use here match statements on the error variable to check all it's variants.

I would prefer to use here match indeed, to create the proper matching python error based upon the CompilerErrorType variant, and even the inner LexicalErrorType.

match error {
    CompilerErrorType::ParseError(parse) => {
        match parse {
            .... => self.ctx.exceptions.indentation_error.clone(),
            .... => self.ctx.exceptions.tab_error.clone(),

        }
    },
    ... => syntaxError.clone()
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main motivation of current design is to simplify compiler/VM public interface. In a sense {CompilerErrorType, ParseErrorType, LexicalErrorType} really are internal implementation details, the only thing VM needs is exception type, message, and location.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the motivation, but I still feel, the vm is so much connected to the parser, it is actually fine to leak the internals of the parser into the vm code.

In any case, let's go with the current design.

self.ctx.exceptions.indentation_error.clone()
} else if error.is_tab_error() {
self.ctx.exceptions.tab_error.clone()
} else {
self.ctx.exceptions.syntax_error.clone()
Expand Down