Random Word Generator (Markov Chain)

How does it generate the words?

Where did "-j" come from? (it's in the listings)

The training data.

Yes but where in the Adventures of Sherlock Holmes is that "-j"?

Maybe from a phrase like "poorly-judged"?


"rooobuwrnsjoavnuza" at 124k

I'm still getting double-"i"s

"ventfiaippcyoz" at 137k

"hewsph" at 142k

I wonder if I can get something with "xaia" repeated, like "musoxaiaxaiaxaiaxaia".

There's a list of possible sentence starters that it picks by taking every first letter of every word in the data.
First, it picks a sentence starter as the first letter.
The training data is similar to yours, except it's in a slightly uglier list format:
image (Letters that can come after S, not particularly organized.)
It finds the list of letters that can come after the last one it picked, then picks a random letter from that list.
Repeat process. I skipped a few steps, but they're not important and don't affect it much.
Then, when it finds that the next letter is blank, it kills the program. A blank letter means that the sentence can end after that character (e.g. letter 5 of the string "food" would return nothing).
Because of this, you need to have your data split up into sentences, or at least lines, so that there are plenty of opportunities to stop generating.
This leaves the length of a result up to random chance; it's not uncommon to get something like "ac."

That happened to me a lot (not specifically with "xaia", though).

@joecooldoo had it with "ap" (Edit: Why did I write "go"?)

Could you give some examples?

"ongabm" at 158k

"thenem" at 407.5k, still mostly nonsense now.

"spodehogbvrp" at ~585k, somewhat sensical, except "spode" isn't a word, and "bvrp" is a little hard to pronounce, unless it's pronounced like "burp", in which case I guess it's pretty easy to say. (I should stop rambling on.)

What's the average variable the average of?

Edit: It's the average number of occurrences of a two-letter combination. BTW
untitled script pic (45)
can be shortened to
untitled script pic (46)

Also, the formula for GET AVERAGE isn't right. It should be (total) + (item), not (average) + (item).

How does that help?

I was going to ask exactly that. (That is, if I hadn't been preoccupied with my Pyraminx topic and project as well as eating a burrito.)


has the AVERAGE variable.

It's the next day... It's still learning.

when replies > views lol

(ps. great project)



1700000+ length in the Markov chain, 937 items in the listings, and it is starting to generate actual words!