Regular expression parser

re

@bh how do I do infixes? E.g., a|b->(| a b)

Look in CSLS vol 3 and search on the page for "Expressions and Precedence."

I can't quite understand it.

Can you ask a more specific question?

I think I would have more of a chance of understanding it if you explained it, especially if it's in Snap! terms.

Umm, maybe tomorrow... sorry...

lol its morning for me and you are rapidly posting haha

Ok.

@18001767679 it's midafternoon here.

haha ik

Okay, let's see what I can do.

The heading of this thread is "regular expression parser" but that's not what we're doing at all, right? We're doing parsing of arithmetic expressions, even though your example is a boolean rather than an arithmetic operator.

So, you've put the program text through a tokenizer, so now you just have meaningful tokens to deal with and not spacing and so on. In particular, instead of a text string such as "3 + 4" or "3+4" you have a list (3, +, 4).

Now what you want to do is scan through that list left to right and get something you can run. And the problem you have is that when you've scanned (3, +, 4) you can't multiply 3 by 4 yet because maybe the input expression is actually 3+4×5 and you have to do the multiplication before you can add the result to 3.

Is that where we're starting?

Let's get agreement on the starting point and then I can continue from there.

No, I am trying to parse regular expressions; | just happens to be an infix binary RE operator.

I'm trying to parse it the way the lisplistparse project parses Lisp lists.

Umm, DuckDuckGo doesn't know what that is, and neither do I...

I think your situation is basically the same as arithmetic; you have to handle infix operators with different precedences.

The way Lisp parses lists is trivial, just call the parser recursively when you see a left paren, and report your current value to the caller when you see a right paren. That's it, except for a couple of details, such as turning 'FOO into (QUOTE FOO). Anything else that comes along is just an item in the list you're building.

lisplistparse
And you do know what it is--you wrote it after all!

Ugh, hard to believe I wrote that. It's because I was trying to meet the OP halfway.

Part of the hair is from the fact that it's trying to tokenize and parse all at once. That's doable, but it confuses the issue. Much better to separate out those parens (which is all there is to tokenization in Lisp except for exceptions) first:

After this I started to write the actual parser but it has a bug and I have to get packed before my 6am flight tomorrow (8 hours from now). So I'll show you the current state of things. First, as I said earlier, I want to turn the input token list into a buffer object that includes a pointer saying where I'm up to. This is necessary because the recursive calls for sublists have to tell their caller how far they've read and so instead of calling ALL BUT FIRST OF all the time, I just increment the pointer by mutation, so the token list that the caller already has stays up to date:

lisplistparse script pic (3)

Once that's clear, you're ready to read PARSE:

So, find my bug for me. :~)

Just to be clear, the OP was me. I had wanted to have a SPLIT BY XML block and eventually you posted this, and then I ended up failing,

Right, okay, trying to meet you halfway! :~P

Also, why were you trying to meet me halfway?

Because I'm a teacher, and because the way to get someone to understand something is to meet them where they are and move them one step at a time.

Ah.