Parsing an XML String?

glenbull · June 22, 2020, 2:43pm

We are attempting to parse an XML string: "note pitch step" (with each item enclosed in angled brackets as shown in the image below).

Letter 6 of XML String yields “>”

Letter 7 of XML String yields “<”

Letter 8 of XML String yields “p”

Substring of XML String before “Letter 6 of XML String” yields “<note”

Substring of XML String before “Letter 7 of XML String” yields “ ”

Substring of XML String before “Letter 8 of XML String” yields “<”

Why is “Substring of XML String before ‘Letter 7 of XML String’” empty?

(See image below.)

fridolinux · June 22, 2020, 5:06pm

I would parse it like this:

P.S.
I have no Idea, why Substring of XML String before Letter 7 of XML String reports nothing.

lowclouds · June 22, 2020, 8:23pm

Letter 7 of xml string is '<'.
'substring of xmlstring before '<' is empty because, starting from the beginning of xml string there are no letters before the initial '<'

fwiw, I like to break up xml like this:

glenbull · June 22, 2020, 10:52pm

Thanks for this assistance. We would like to import a music notation interchange format known as "Music XML" into Snap! We can do this by using Excel to convert the file to a comma delimited format, and then import into Snap!

However, it will be nice to do the parsing directly in Snap! without the intermediate step.

Thanks so much.

lowclouds · June 23, 2020, 1:06am

Neither of the splitting methods above will be effective at parsing an xml file, like Music XML, that has nested parts. The good thing is that Music XML looks pretty easy to parse.

bh · June 23, 2020, 4:43am

The good news is, there's a library to parse JSON...

Personally I think the "><" technique is a little fragile. Especially since in the typical use case, the actual information comes in between the ">" and the "<"!

Depending on just how recursive it is, my first try would be regular expressions looking for "<foo>[^<]*</foo>" and replace the stuff on the inside with a list that includes the tag and the payload, and keep doing that until you run out of tags. That way you are gathering from the inside out.

Or of course just write a proper parser. :~)