I'm imagining something like this:
And what if I have something like this, with a <foo>
inside another <foo>
?
<foo>
x
<foo>
y
</foo>
z
</foo>
I'm imagining something like this:
And what if I have something like this, with a <foo>
inside another <foo>
?
<foo>
x
<foo>
y
</foo>
z
</foo>
Oh, right, I forgot about the stuff inside a tag.
As for a foo inside a foo, that's the joy of recursion: When you see the second foo, you make a recursive call to the same split procedure, and when it sees the matching (inner) /foo it returns the list it created, which will be an item in the outer foo's list.
What makes a recursive solution a little tricky is that the outer call needs to know how far the inner call got in the input list. So you attach an item number to the input list, and every time you read an item from the list you increment that item number. (Or I suppose you could just SET (input) TO (ALL BUT FIRST OF (input)).)
You already have a file for reading xml in your source code. I'm sure it wouldn't take that much time.
It's a naive XML2XPath parser.
XML from Sample XML File (books.xml) | Microsoft Docs
How would I do that?
I'm sitting on my hands to keep myself from writing it for you...
For starters, why don't you leave XML aside temporarily and write a program to parse
((A B) C (D E))
into
Don't use SPLIT; just a FOR loop that looks at each character and does something special if it's (
or )
.
But make sure you call the same procedure to parse the (A B)
etc.
Ok, and should I give you the program when I'm done?
Edit 2: Working on it now.
Edit 3: I'm getting stumped by the outer parentheses.
This is my code so far:
Could you share the project? It's not obvious to me just from looking at it why it's doing that. Thanks.
Ah. So I instrumented your code with a at the beginning, and what I found was that in the recursive calls, the input was (
the first time and ((
the second time. So that's why you're getting those weird lists of parentheses.
If you just work with the text string, when you return from a recursive call, the outer call won't know how much of the string the inner call parsed. That's why I suggested a data structure that combines the string with a pointer to where you're up to in it:
This can, of course, be done purely functionally, but parsing is one of the things that turn out way easier non-functionally, because of this business of reading each character only once. You can think of text with pointer
as a sort of poor-man's object, with next character
as its only method.
I'm refraining from showing listify helper
so as not to spoil it if you want to write it yourself, but if you'd rather just read mine it's here: https://snap.berkeley.edu/snap/snap.html#present:Username=bh&ProjectName=listify
Once you've absorbed this technique, you can go back to parsing XML. :~)
P.S. Why not just
next character
could replace the one item with its all but first letter
.)
P.P.S. I have a slight bug in LISTIFY, which should skip over leading spaces before doing anything else. But you get the idea.
I was still stumped, so I took a look at your code, and I still can't figure out how it parses it. For example, in what case is "item" an empty string?
If the input text has two spaces in a row.
I probably need something interactive to help me understand it. (Same with matrix multiplication, but that's for a different topic.)
Oh, try showing the local variables and visual-stepping it.
I kind of get it now.
Hmm, not sure what to make of "kind of." Want to ask a question? Want to drop the topic? Want to talk about XML?
I guess I might know enough to talk about XML now.
Okay, well, it's the same idea, except that <foo>
is like an open paren, and </foo>
is like a close paren!
Except one slight difference is that the parentheses don't show up at all in the result list, whereas you do want to preserve what kind of tag this is.
This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.