GPASSM2 - a general purpose assembler
Posted: Sat May 20, 2017 8:45 am
GPASSM2
A general purpose assembler
As i have done attemts at making my own CPUs, the hardest part of it have always been testing.
This is mostly due to the fact that writing machine code is a huge pain in the ass.
This led me to do as many people have done before me, namely to create an assembler for each cpu i made.
After a couple of attempts I realized that it would get really annoying to create a new assembler each time i altered the architecture and created the original GPASSM. In the end it only worked for one architecture, but a lot of learning was done.
GPASSM2(really creative name, right?) is the second version of my general purpose assembler and it introduces the arch file format to describe the architecture it should assemble the machine code for. It does not support macros, which I might change in the 3rd version.
Features
The .arch file describes the target platform, and the .asm code is the assembly code.
To assemble a file with GPASSM2 you do
You can use the lua interpreter provided, or if you don't trust me(you should be careful about who you trust online), you can review the lua code and use your own lua interpreter(mine* is Lua 5.2, but 5.1 should work fine[not tested])
If you are using windows, all you need to do is specify the name of your files in "start.bat" and it will run when you click on it.
If the debug word is present it will generate "debug.txt". It will show each line of the assembly code and right under it, the code that was generated and the possition in memory of that data. The data is in binary and is separated into words.
Content of indep.zip
the first word is always the mnemonic
the second word should be a number and is the lenght of the instruction with its parameter
everything after that is defining how it behaves
you can add two things: litterals and parameters
litterals are small pieces of code that is always present
parameters are values that the user inputs
parameters are defined by a name followed by the number of bits in brackets
litterals are defined by the number followed by the number of bits in brackets
numbers default to base 10, but any number can be preceeded by a base 10 number and
the character '|' to specify the base it is interpereted as.
this is limited to between 2 and 36
Example
the name of the instruction is "add" and it is 16 bits long
the first 4 bits are always "0101" as defined by "2|0101[4]".
following is two parameters named regA and regB which are 2 bits each
after these parameter it consists of "000000" for padding
and lastly it has a parameter named outputReg which is 2 bits long
so if the user inputs
the resulting code become
A good habit would be to use the comments to describe the instruction so that this file may also be used as documentation
Other functions
words prefaced with the '@' symbol is interperated as a instruction for the assembler.
To see the full description of the .arch file, see example.arch.
Limitations
Each parameter for a instruction is limited to 53 bits of accuracy except when strings is used, but this can drop as low as 32 bits of accuracy (but not lower). This is due to the programming language that is used and a quirk in the numerical translator.
The Future
I have started the planning of version 3 of the GPASSM but that is a ways of.
GPASSM3 should include support for macros, which order the parameters is typed, and different ways of encoding data(think floats, integers, etc) and probably a few other tweaks.
I the meantime I will be available to fix bugs if they are reported.
Hopefully some of you will find it usefull.
*lua is not mine, but is open source, and the version provided is compiled by me and I am pretty sure I am not breaking any rules by providing it
A general purpose assembler
As i have done attemts at making my own CPUs, the hardest part of it have always been testing.
This is mostly due to the fact that writing machine code is a huge pain in the ass.
This led me to do as many people have done before me, namely to create an assembler for each cpu i made.
After a couple of attempts I realized that it would get really annoying to create a new assembler each time i altered the architecture and created the original GPASSM. In the end it only worked for one architecture, but a lot of learning was done.
GPASSM2(really creative name, right?) is the second version of my general purpose assembler and it introduces the arch file format to describe the architecture it should assemble the machine code for. It does not support macros, which I might change in the 3rd version.
Features
- platform independent
- numerical convertion from ANY basenumber [from 2 to 36]
- automatic handling of fileformats [little- and big-endianness, byte allignement, word allignement]
- comments [both in .asm and .arch files]
The .arch file describes the target platform, and the .asm code is the assembly code.
To assemble a file with GPASSM2 you do
Code: Select all
lua.exe GPassm2.lua example.asm example.arch debug
lua interpreter assembler assembly code platform describer debug flag[optional]
If you are using windows, all you need to do is specify the name of your files in "start.bat" and it will run when you click on it.
If the debug word is present it will generate "debug.txt". It will show each line of the assembly code and right under it, the code that was generated and the possition in memory of that data. The data is in binary and is separated into words.
Content of indep.zip
- examble.arch, a file containing a example of the .arch filetype, as well as a lot of comments describing it
- example.asm, a file containing a example of assembly code with comments
- example.bin, the example assembled
- GPassm2.lua, the general purpose assembler
- Lua.exe, a lua interperater
- lua5.2.3.dll, library used by lua.exe
- start.bat, simple file for starting the assembler
the first word is always the mnemonic
the second word should be a number and is the lenght of the instruction with its parameter
everything after that is defining how it behaves
you can add two things: litterals and parameters
litterals are small pieces of code that is always present
parameters are values that the user inputs
parameters are defined by a name followed by the number of bits in brackets
litterals are defined by the number followed by the number of bits in brackets
numbers default to base 10, but any number can be preceeded by a base 10 number and
the character '|' to specify the base it is interpereted as.
this is limited to between 2 and 36
Example
Code: Select all
add 16 2|0101[4] regA[2] regB[2] 0[6] outputReg[2]
the first 4 bits are always "0101" as defined by "2|0101[4]".
following is two parameters named regA and regB which are 2 bits each
after these parameter it consists of "000000" for padding
and lastly it has a parameter named outputReg which is 2 bits long
so if the user inputs
Code: Select all
add 0 1 2 ;meaning add register 0 and one and store the result in register 2
Code: Select all
0101 00 01 000000 10 ;spaces added for ease of understanding
Other functions
words prefaced with the '@' symbol is interperated as a instruction for the assembler.
Code: Select all
@BYTEORDER little #endianness big|little
#default in big
#if you are using logicCircuit, BYTEORDER must be little
@BYTEALIGNMENT right #padding of bitstring to align to nearest byte
# none no padding is added to align the bitstring to neares byte
# left padding is added to the left of the bitstring (most significant bit)
# right padding is added to the right of the bitstring (least significant bit)
#default is none
#if you are using logicCircuit, BYTEALIGNMENT must be right
@WORDSIZE 8 #defines the size of each word
#a word is the smallest number of bits that can be individually adressed
#so if the cpu uses byteadressable memory, each word equals a byte
#if it uses 16 bit words in it's memory, each word is 16 bits
#default is 8
#if you are using logicCircuit, WORDSIZE should be the width of the data bus from the rom/ram
@WORDALIGNMENT left #alignement of instructions into words
# none no padding is added to instructions to align them to the words
# left padding is added to the right of the bitstring to allign it to the left (most significant bits are added)
# right padding is added to the left of the bitstring to allign it to the left (least significant bits are added)
#default is right
#jumpLabels jump to nearest word, not neares byte
#it is recomended to use WORDALLIGNEMENT left or right as none can cause instructions to partially share words
#making jumping impossible, or hard
@CHARACTERSIZE 8 #number of bits used in encoding strings
#useful for aligning strings to words
#default is 8
#setting this to lower than 8 will result in data loss
@PREDEFINEDVARIABLE screen 16|00
#name value
#defines a variable which will be accessable in the .asm file
#useful for defining adresses of hardware components
Limitations
Each parameter for a instruction is limited to 53 bits of accuracy except when strings is used, but this can drop as low as 32 bits of accuracy (but not lower). This is due to the programming language that is used and a quirk in the numerical translator.
The Future
I have started the planning of version 3 of the GPASSM but that is a ways of.
GPASSM3 should include support for macros, which order the parameters is typed, and different ways of encoding data(think floats, integers, etc) and probably a few other tweaks.
I the meantime I will be available to fix bugs if they are reported.
Hopefully some of you will find it usefull.
*lua is not mine, but is open source, and the version provided is compiled by me and I am pretty sure I am not breaking any rules by providing it