Parsing Lisp Code in Snap!

bluebaritone21 · February 13, 2024, 8:22pm

I'm trying to generate a tree of tokens based off of text, In this case, (* (+ (/ 1 3) (+ (/ 6 4) 4).
If it works, it should generate a tree of tokens, like

.
However, my best guess,

Doesn't come close!

I feel like recursion is the way to go, but I'm not quite sure about the implementation.

ego-lay_atman-bay · February 13, 2024, 8:43pm

Part of your problem, may be because you can't just simply split by (, you have to factor in that the pairs are not right after each other.

(* (+ (/ 1 3) (+ (/ 6 4) 4)))
^  ^  ^     ^ ^  ^-----^  ^^^
|  |  |-----| |-----------|||
|  |-----------------------||
|---------------------------|

If you simply split by the parentheses, you don't get close. Instead you actually need to go character by character, checking for special characters, like parentheses, and spaces. A simple HOF won't work for this.

bluebaritone21 · February 13, 2024, 8:48pm

I've got this, which splits tokens:
.

Now, I'm going to write a function that iterates over this list and splits it into the tree. I think that recursion is gonna be my best bet in it.

ego-lay_atman-bay · February 13, 2024, 8:55pm

Starting with all the tokens is actually a pretty good method.

jens · February 13, 2024, 9:41pm

shhhhh, don't tell anyone, but in v10 dev there's a little secret experiment that pretty much does exactly this: It lets you parse LISP code to blocks:

and vice-versa

encode v10

and if you select "lines" instead of "text" it'll pretty print it with indentations:

still extremely buggy and under construction, but we'll get there.

bluebaritone21 · February 13, 2024, 9:43pm

Wow! This is amazing!

bluebaritone21 · February 13, 2024, 10:33pm

I'm thinking that the logic should go like this:

bh · February 14, 2024, 12:00am

Whoa! How does that work? I mean, that's a value, right? You could SPLIT BY LINES it? Then would the inner lines be text strings starting with spaces? Starting with tabs?

By the way, that's not official correct pretty printing. The lines

        (/ 6 4) 4

and

) 2

should each be two lines

        (/ 6 4)
        4

and

   )
   2

In the second case, the more clear-cut, the rule is that nothing on a line can be at a higher level in the tree structure than the first character of that line. Exception: Some authorities allow a sequence of consecutive close parens (no spaces between them) on a line.

The first case is less totally forbidden, but I wouldn't do it, because the status of the 4 is easy to miss. Once you've put an input expression on a separate line, all its siblings should also be on separate lines.

There's something a little weird about SPLIT of JOIN of SPLIT. Maybe a better notation?

Yes, that's right. But next time give it a white (or whiteish gray) background. :~)

Oh, except, officially the apostrophe ' is an abbreviation for (quote _) where the _ can be an atom or a list. 'a ➞ (quote a); '(a b) ➞ (quote (a b)). This translation happens during tokenization, so you shouldn't have a '( token.

The kind of thing of which ' is an instance is reader macros.

bluebaritone21 · February 14, 2024, 12:14am

Okay. I'm following along with The Scheme Programming Language, Fourth edition as I make this program, so I've definitely misunderstood some things.

I'll make '( its own token, which will be converted to quote internally while I generate the AST, as shown in the plan. :)

callietastrophic · February 14, 2024, 2:39am

YAAAAAAAAAAAAAAAAAAAAAAAAAAAY

This is incredible stuff.

Whoa! How does that work? I mean, that's a value, right? You could SPLIT BY LINES it? Then would the inner lines be text strings starting with spaces? Starting with tabs?

Jens just proved what I've been trying to say for YEARS.

Text is block is text is block is...

Whitespace is a thing implemented for human readability is ALSO the thing that makes block languages so incredibly powerful, thanks to emoji substitution, text is block is.

bluebaritone21 · February 14, 2024, 2:53am

Done.

bluebaritone21 · February 14, 2024, 3:03am

I'm not sure what I did wrong with this:

bh · February 14, 2024, 6:34am

No, ' doesn't become a QUOTE token; it becomes an expression (quote __) (i.e., a list, like all non-atomic expressions). '( isn't a thing in itself; it's ' followed by a list (up to the matching close parenthesis), so the list goes in place of the __ above.

'(a b c) becomes (quote ⊆a b c)) where the ⊆ (meant to be an underlined open parenthesis but there's no such character alas) is the exact parenthesis that you typed after the quote.

In PARSE, you are trying to solve a tree-structured problem with a simple iteration. What you want is to make a recursive call to PARSE when you see an open paren.

Ah, I'm reading your mind, and I see that you don't think you can do that because the input to the recursive call will be the entire rest of the input list, whereas you want it only to use up the part up to the matching close paren.

There are various ways to solve this problem, but here's the way I like best:

Then you

and inside PARSE you say

(or PEEK if you're going to read the same token again). For example:

(except if you do this you'll probably change the name of the input to PARSE). So you always, even at top level, enter PARSE looking at an open paren unless the expression is atomic (a symbol or a number).

Cute aspect of this approach is that the recursive call looks like an infinite loop because the input doesn't change, but the input includes local state, and it's gotten closer to the end of its list.

Of course PARSE doesn't do a recursive call if the very first token it sees is an open paren; it just skips over that.

Oh yeah, ⊣ is a traditional end-of-text signal, and ⧮ is error bars around a white square. ("Error bars" are a thing in statistics, not indicating an error the way we mean it, but I'm making a pun by using it.)

But don't let the cute technical details make you forget the boldfaced sentence with which I started this response. If you have a tree structured problem, don't even consider an iterative solution!

qw23 · February 14, 2024, 8:02am

A LISP (or Scheme) parser could build a basis for an interpreter or transpiler for Snap! - is that something @bluebaritone21 or @jens have in mind?
That would be an interesting tool, e.g. for implementing SICP examples in Snap!

jens · February 14, 2024, 8:50am

the reason I'm experimenting with a LISP-ish text encoding for scripts is that it'll make the currently very complicated process of writing parts of Snap it itself and persisting it in files much more straightforward and maintainable. Currently I'm exporting custom block definitions to xml and then I'm back-tick quoting them inside a JS file to reuse them inside Snap. With a friendlier representation I hope to be able to write more of Snap directly inside JS.

Transpilation or even LISP/Scheme compatibility is totally not my intention, but it's certainly possible. However, note that a good number of serious folks are already doing that on many fronts, except that they're building their own tools that work on xml, and that works fine.

Pedagogically I'm absolutely not yet convinced about the whole thing. My hacker mind loves it that we can now more or less directly use a text representation and run it inside Snap, this totally works:

as does this:

(and here you can already see that it's not about Scheme or LISP). So we can use blocks to let folks run text, that's ... funny and somehow ... satisfying. I also love that this very process clearly demonstrates how all programming languages work: Text does not "run", it needs to be converted into blocks first, because only blocks are executable.

My teacher-impostor mind protests and points out that a) this "code" is reinforcing the wrong idea about text being somehow more "real", and that folks are going to love it way too much (for all the wrong reasons, including "transpiling" SICP examples, haha!). Also I hate that it reinforces English as the seemingly "underlying" language.

Anyway, this is currently nothing more than a first experiment. We'll see where it takes us.

bh · February 14, 2024, 9:04am

Hahaha, Jens, I love you. This is your equivalent to my saying all you need is lambda. :~)

SICP in Snap! is definitely on my to-do list! But I was expecting to just hand-translate the examples.

jens · February 14, 2024, 11:39am

Programming languages are nothing without their lambdas

Yes! Please!! I can't wait!!! :3

bluebaritone21 · February 14, 2024, 3:57pm

That's really smart!

This very nearly works. I just need to change the for out for an until.
(Edit: it doesn't work, but I have no idea why.

)

bh · February 14, 2024, 6:06pm

The reason it's not reporting is that there's no REPORT in PARSE! :~P

But you're cheating by using a test case in which there's only one level of recursion and also there's nothing after the subexpression. Try this:

(+ (* 1 (- 5 2) (/ 1 9)) 7)

bluebaritone21 · February 14, 2024, 7:05pm

Ah, man! I'll try to figure out what's wrong with my logic.