I made a Text Completion model in Snap! (BETA!)

,

BETA ANNOUNCEMENT!

We've finally entered the BETA VERSION for the model after 2 days! But for now, it's still a Work in Progress. Feedback is crucial for further improvement, expect continuous updates.

... Wait! there's more?

I'm also currently working on a Chat Completion model! (Yes, this early.) And it will be HIGHLY BASED on this project.

Note that i'll update on this less frequently, cuz I have school tomorrow...

Unlike all other AI bots, this is much more easier to train, tweak, and use! And most importantly, it's low-end hardware friendly! (With major slowdowns, of course. If you have a VERY potato PC.)

Press the space key for options.

Logs (Current Build: 384)

Build 129:

  • First Release

Build 130-173:

  • Fixed a few minor issues.

Build 173-192:

  • Added the option to Enable/Disable penalty of the Stop token.
  • Added the option above to the global options.

Build 192-200:

  • Removed the Debug variable.
  • Working on a new out-of-the-box dataset!
  • Mode 7 in construction.

Build 200-238

  • Completely changed Dataset.
  • Optimized the major code of the project. (Generating text)
  • Made custom blocks for the major code for a much more compact experience.
  • Fixed Mode 3 code with broadcast "data" variable mentioned inside.

Build 238-347

  • Completely changed the UI (IN CONSTRUCTION)
  • Changed how Penalty tokens work.
  • Made the Complete text reporter a separate command block.

Build 347-384

  • Added minor changes to the UI
  • Added a Terminal screen.
  • Added Terminal Screen compatibility with other code.
Direct Instructions Manual

Modes

Select a mode that you want to use (Choose wisely!):

SELECT FROM ONE OF THESE MODES ONLY


  1. Select the First Token in Predictions: This option picks the initial token from the prediction list.

  2. Choose a Random Token from Predictions: The model randomly selects a token from the list of predictions in this mode.

  3. Select a Random Token with a Minimum Value: This option selects a token randomly, ensuring it has a value greater than or equal to the given number.


  1. Continuous Output (Never Stops): With this choice, the model generates content without stopping, excluding any predefined stop token.

  2. Strict Dataset Filtering: This mode applies stricter filtering on the dataset, leading to more accurate responses. However, there might be instances where no tokens are generated.

  3. Context Preservation Mode: In this mode, the model aims to maintain context by retaining all items from the context dataset, provided each item contains tokens relevant to the generated result.

  4. (EXPERIMENTAL!) Syntactical Harmony Mode: This mode emphasizes coherent syntax in the model's responses. It identifies and selects tokens that often co-occur, while also ensuring grammatical accuracy. For instance, when dealing with parentheses, it checks for a preceding left bracket before generating a right one, ensuring syntactical correctness. This mode aims to produce responses that are both contextually relevant and free from grammatical errors. (OUT NOW FOR PUBLIC TESTING PURPOSES.)

You can enter multiple Modes at the same time by separating the numbers in a comma (,) NO SPACES!

Adding text to the Dataset:

Feeding the model text is as simple as eating a bite-sized brownie. All you have to do is open up the Options menu, pick one of the options in the Dataset submenu, follow the instructions, and wait for it to complete!

Adding the model to your project

This isn't available yet, as the model is a Work in Progress.

Recommended Options

Here are some of the options that worked well for me:

  • Case Sensitivity = TRUE
  • Penalty Tokens = 1

You can try it here:
https://snap.berkeley.edu/snap/snap.html#present:Username=unifiedadvanc3r&ProjectName=Text%20completion%20model

You can also share your datasets in the forums, or share it with a friend if you want! If you wanna learn more about it, expand the text below:

Exporting and Importing your Dataset

Exporting a Dataset.

It's quite simple! Just go to the project editor, go to Variables, and get the following blocks:

image

Then, all you have to do is connect them to the Dataset variable, report it, then Export it into a file!

image

(NOTE: You could also Export it by Right Clicking on the name while the Dataset list is shown and clicking Export, just like importing the list. This might save you time.)

Importing a Dataset

This is also quite a simple task! Just make sure that you show the Dataset variable, and right-click on the shown list (specifically on the top of the lis or the name.) Then click Import

After that, select the Dataset list, then simply Open it to replace the existing Dataset!

(NOTE: Make sure to save your Dataset if it took you time to add text! You'll waste time when you realize you imported the wrong file and the list is all empty!)

image

sorry to be grammar police, but it should be "comma", coma is when you get knocked out
cool project tho

That made me laugh when I realized that... :rofl:

excuse me, what did the ai just say?
Screenshot 2023-09-22 8.55.32 PM

it has a space between it, not a curse word.

This is super cool, and I think it works quite well!! I'm in the middle of trying to understand the code, so... will get back to y'all when I do, but in the meantime, if you're looking to load text from a URL (like a Project Gutenberg novel) into the dataset, try this:
Text completion model script pic (2)

The script above won't import, for some reason, but you can import the block definitions.

Text completion model script pic (4)
Text completion model script pic (5)

Also, a bug report: when running the model to generate, the user has to create the global "debug" variable, otherwise an error occurs where the "debug" variable doesn't exist.

Nice project!

Okay, you wanted feedback...

Why isn't init_api a custom block rather than a BROADCAST AND WAIT?

The results are funny. Where did the database come from? For example,

prompt: When in the course
#tokens: 15
modes: 5,6

gives no result, but the very similar

prompt: When in the course of
#tokens: 15
modes: 5,6

(just adding "of" to the prompt) gives me
When in the course of a cup of tea?

When there's no result, the program should say so, instead of just looping.

The reason is so that it's easier to debug. It's really hard to debug this thing if it was in a custom block, and makes the screen messy and cluttered. I'll be adding a Custom Block version in the project later on.

This one was now fixed, i forgot to save the entire project after removing all traces of the Debug variable in the AI model.

I added the dataset myself from everything on my mind, and a few 250 example sentences from ChatGPT. That's the reason why I invented the way of adding each line of text to the dataset possible without copy and pasting each line.

UPDATE: I'm currently working on a new mode "Mode 7" where it can choose tokens that are frequently with each-other, but this time I'll make sure it fixes grammatical errors. For example, when making a parenthesis, it makes the right bracket first. It'll check first if the current sentence has a left bracket before making the right one.

Using integers to represent options makes it hard for users to remember. I vote for a UI that uses sprites to represent radio buttons (for incompatible options) and/or checkboxes (for compatible options).

I made the project to just bare bones, so it'll be just straight forward to use. But I will be making a GUI for it after my free time. :slight_smile: