Hello, all!
I've been running into an issue with the following block:
I have the following list:
I use the following block to programatically get the JSON string {“next”: “ask”, “details”: {“for”: “generate-Character”, “theme”: “uwu boi uwu”}} from the list.
what a humorously unfortunate predicament! at least it's somewhat noticeable that they're "fake" quotation marks if you look hard enough, unlike U+037E (Greek question mark) and the semicolon.
a little off topic, but interestingly, the rust compiler has a specific error message for if you mistakenly use a Greek question mark instead of a semicolon.
Then that's your problem. AI will sometimes do unexpected things like this to make it seem more like a human. Just split the text by “ and append " to each item (excluding the last), then join them all back together.
There are many, many examples of different Unicode characters that look the same, partly because languages borrow glyphs from each other. The folks in charge of Unicode need to develop a Web API server that you send a string and it sends back a canonical version of the string in which all decorations (font, style, capitalization (okay, make that one optional in the API), size, baseline height, etc.) are removed and then each character is replaced with the lowest Unicode value that produces the same appearance. This would be super useful to implement equality checks for user-entered text. Is there already such a thing?
PS It should also optionally check short substrings, such as turning '' (character 39 twice) into " (character 34).