I made a machine code

I've made a machine code, I want to hear your opinions, especially from people who understand very well computer science.


Codification & Parsing

Code Function
0 NULL
1 END/EOF
2 FALSE
3 TRUE
4 DO/THEN
5 AND
6 OR
7 XOR
8 NOT
9 IF
A ELSEIF/ELSE
B FOR/IN
C WHILE/BREAK
D ARGS
E ASSIGN
F MEMORY/RETURN

With ARGS:

Code Function
4 new BYTE in file
9 IF*
A ELSEIF/ELSE*
B new FILE
C UNPACK

With ASSIGN:

Code Function
None SET
5 AND
6 OR
7 XOR
8 NOT
9 DELETE
A INSERT
B ROTATE
C MOVE

With MEMORY:

Code (BIN) Function
00xx get BITS
01xx get BYTES
10xx ADRESS/PARSE
11xx LEN
xx00 INPUT cell
xx01 TEMP cell (in 8 bits)
xx10 GLOBAL cell
xx11 ADRESS

Syntax & Logics

  • END goes at the end of the function.
  • EOF goes always at the end of the script. Bits after the end are ignored.
  • If file errors when compiling, NULL is returned and is not executed.

Statements support:

  • IF, ELSE and ELSEIF verifies if the condition is not FALSE or NULL (except for ELSE), otherwise continues, syntax is:
    IF <statement> THEN <expression> (ELSEIF ...) (ELSE <expression>) END
  • FOR and IN syntax iterates the function inside it over the items of the passed file and it passes data in the first expression(s) (value and index respectively):
    FOR <expression(s)> IN <expression> DO <statement> END
  • WHILE repeats the statement until the statement is FALSE or NULL syntax is:
    WHILE <expression> DO <statement> END
  • BREAK should only be used inside loops, syntax is:
    BREAK END
  • ASSIGN should go after a cell, file or file item in order to assign a value. They can be compund as well:
    <expression> (<func>)ASSIGN <expression>
  • In NOT case, the expression which goes after ASSIGN doesn't go:
    <expression> NOT ASSIGN
  • DELETE and INSERT case can be used to (1) add/delete a bit/byte(s) at the end of the file, (2) insert/delete bit/byte(s) at a position. Bit/bytes larger than 1 works as well.
  • ROTATE case can be used to circular shift a file or items of a file.
  • In MOVE case, the address of the global cell goes immediately after ASSIGN:
    <expression> MOVE ASSIGN <adress>
  • RETURN is to return a value, and it is useful when compiling a file and returning values. It should go before ending a function or the file:
    RETURN <expression> END/EOF

Expressions support:

  • NULL returns the null file data type, used when a file doesn't return anything or it errors.
  • FALSE and TRUE returns bools, which represents a single binary digit, can be 0 (false) or 1 (true).
  • AND, OR and XOR are logical operations which are the most internal operations performed by a computer and they are commutative. They prioritize files: comparing files will evaluate in every bit that operation (file 1 op file2 = file3), and with bools will always operate them (bool1 op bool2 = bool3). AND checks if both arguments are valid (file and TRUE = file; file and FALSE or NULL = NULL), OR checks if any of them are valid (file or bool = file) and XOR is similar to or but also checks that neither of them are valid (file xor TRUE = NULL; file xor FALSE or NULL = file)
  • NOT negates the expression: if it is FALSE or NULL it will return TRUE, otherwise FALSE.
  • ARGS are used to (1) call a file that has been parsed and it can include inputs; if the inputs are packed in one file, then use UNPACK. (2) get the nth bit/byte(s) in big-endian (bit by default) of a file, or (3) to prioritize/separate/create an operation. An END should be put at the end.
  • To create a file, the following syntax should be followed:
    ARGS FILE <8 bits int> (BYTE <8 bits int> BYTE ...) END
  • IF, ELSE and ELSEIF syntax in ARGS is:
    ARGS IF <expression> THEN <expression> (ELSEIF ...) (ELSE <expression>) END
  • UNPACK should be used only when calling a parsed file. If used in another context will return NULL. Its syntax is:
    ARGS UNPACK <statement> END
  • MEMORY is used to retrieve data from memory (ROM/RAM) using an option and address. The main syntax is:
    MEMORY <options> <address>
  • Options of memory are divided into 2 pairs of 2 bits. The first pair indicates what things should be retrieved: BITS and BYTES means that the file will be broken into chunks of 1 bit (bools) or byte respectively. ADRESS is to get the address of the cell globally; PARSE is to parse the file so it can be compiled (use ARGS to call it, if invalid will return NULL) and LEN is to get the length of the file in bytes.
  • The second option of memory is for the context of the cell: INPUT is for getting the input when a file is called (they are read-only), TEMP is for quick and safe access to a cell, it is followed with an 8 bits int (when creating a new one will allocate in a new cell when it is not linked with another cell), GLOBAL is for a global cell in memory, but they are subject to changes anytime (be sure to verify its availability before making changes), and ADRESS which expect ARGS with one/two inputs, context (optional, GLOBAL by default) and address. If no inputs are passed, then a new cell is returned; if invalid inputs, then NULL is returned.

Data types work:

  • NULL is an empty file, meaning that it does not have any bytes stored in itself, so basically it can store bytes later. By logical operators and conditionals, it is considered as FALSE
  • Both books and FILEs (including NULL) are considered numbers when using file positions, length and items.
  • FILEs are not like arrays in high-level programming languages. They can store information but cannot be broken automatically rather than in bits and bytes, although you could use FOR loops to perform that kind of action. Also, they are copied when assigning it to another cell; use MOVE if you want to safely move a file to another cell.
  • When using logical operations, if one of their options is BITS and the files have not the same size, the returned value will reshape to the smallest one; if both are BYTES, then the greatest is used.
  • If a file is not referenced directly with any type of cell, then a cell is created to handle it. If later a cell references to it then that file moves the allocation.
  • Trying to get the item of bool or invalid/inexistent item of a file will return NULL.
  • When assigning an item of a file, the left bits/bytes are filled with zeroes if they are inexistent.

Examples

  1. Adding numbers (8 bit reversed)
  • Readable code:
MEMORY TEMP 1 ASIGN ARGS FILE 0 END
FOR MEMORY TEMP 2 MEMORY TEMP 3 IN MEMORY BITS_INPUT 0 DO
	MEMORY TEMP 0 ARGS MEMORY TEMP 3 END ASSIGN MEMORY TEMP 2 XOR MEMORY INPUT 1 ARGS MEMORY TEMP 3 END XOR MEMORY TEMP 1
	MEMORY TEMP 1 ASSIGN MEMORY TEMP 2 AND MEMORY INPUT 1 ARGS MEMORY TEMP 3 END	
END
RETURN MEMORY TEMP 0
EOF
  • Hexadecimal:

F101EDB001BF102F103BF0004F100DF1031EF1027F001DF10317F101F101EF1025F001DF10311FF1001


I challenge you to make a program that can calculate the nth Fibonacci Number. I also would like to know if this can be used in a real-life situation, if it has the requirements and how it would be copied in Snap!.
Ask me questions if you have! I have put general information but not all things this has included. I am also going to make contests with this (need a name for this!).

There is an error in your code. ASIGN is not a defined keyword.
Edit: Oh
Edit #2: Nevermind for a moment i thought "Assign" was the mispelling and "Asign" was how it was actually spelled. I somehow manipulated myself into believing the wrong. How gullible I am.

The control structures seem a bit too high-level for this kind of stuff. In the various instruction sets I've seen for machine code (not a lot though), they usually implement it using jumps.

What structures are you talking about?

The control structures. You know, IF, FOR, WHILE, BREAK. I don't think instruction sets for machine code have those -- they only have conditional jumps.

Jumps for me are too ugly and understandable. You may have the reason that they are high-level, but using expressions and functions (parsed files in this language) are simpler.

Does this mean that those first thing(s) are equivalent to upvars? That they have to be variable names? If so, it's misleading to use the word "expression" to describe them. The C Reference Manual calls them "Lvalue(s)" ("L" for Left, because they go on the left side of the equal sign), which makes me feel not so bad about the name "upvar." But a language-agnostic name would be "symbol" or "variable name."

Well, I used to teach machine structures at Cal... But really it depends on your purpose. If you are trying to model the architecture of a plausible hardware design, there's a lot I could tell you about, for example, why certain once-common architecture features haven't been used in the past 30 years or so, e.g., computational instructions (such as your Boolean functions) with arguments in memory. And, as pumpkinhead said,

And, I don't think there's ever been an actual computer that didn't have arithmetic operations in hardware.

But if what you're after is a thought-experiment model of computation, like Turing machines or lambda calculus, then you have more freedom to do it however you want.

Very cool.

I think you should make the syntax a tad bit easier for people. I mean, it may be machine code but you can still make more syntax commands.