Page 1 of 1

GPASSM2 - a general purpose assembler

Posted: Sat May 20, 2017 8:45 am
by sir Kamba
GPASSM2
A general purpose assembler

As i have done attemts at making my own CPUs, the hardest part of it have always been testing.
This is mostly due to the fact that writing machine code is a huge pain in the ass.
This led me to do as many people have done before me, namely to create an assembler for each cpu i made.
After a couple of attempts I realized that it would get really annoying to create a new assembler each time i altered the architecture and created the original GPASSM. In the end it only worked for one architecture, but a lot of learning was done.

GPASSM2(really creative name, right?) is the second version of my general purpose assembler and it introduces the arch file format to describe the architecture it should assemble the machine code for. It does not support macros, which I might change in the 3rd version.

Features
  • platform independent
  • numerical convertion from ANY basenumber [from 2 to 36]
  • automatic handling of fileformats [little- and big-endianness, byte allignement, word allignement]
  • comments [both in .asm and .arch files]
Usage
The .arch file describes the target platform, and the .asm code is the assembly code.
To assemble a file with GPASSM2 you do

Code: Select all

lua.exe          GPassm2.lua   example.asm     example.arch       debug
lua interpreter assembler     assembly code   platform describer debug flag[optional]
You can use the lua interpreter provided, or if you don't trust me(you should be careful about who you trust online), you can review the lua code and use your own lua interpreter(mine* is Lua 5.2, but 5.1 should work fine[not tested])
If you are using windows, all you need to do is specify the name of your files in "start.bat" and it will run when you click on it.

If the debug word is present it will generate "debug.txt". It will show each line of the assembly code and right under it, the code that was generated and the possition in memory of that data. The data is in binary and is separated into words.

Content of indep.zip
  • examble.arch, a file containing a example of the .arch filetype, as well as a lot of comments describing it
  • example.asm, a file containing a example of assembly code with comments
  • example.bin, the example assembled
  • GPassm2.lua, the general purpose assembler
  • Lua.exe, a lua interperater
  • lua5.2.3.dll, library used by lua.exe
  • start.bat, simple file for starting the assembler
Defining a instruction
the first word is always the mnemonic
the second word should be a number and is the lenght of the instruction with its parameter
everything after that is defining how it behaves
you can add two things: litterals and parameters
litterals are small pieces of code that is always present
parameters are values that the user inputs
parameters are defined by a name followed by the number of bits in brackets
litterals are defined by the number followed by the number of bits in brackets

numbers default to base 10, but any number can be preceeded by a base 10 number and
the character '|' to specify the base it is interpereted as.
this is limited to between 2 and 36

Example

Code: Select all

add 16  2|0101[4]   regA[2] regB[2] 0[6]    outputReg[2]
the name of the instruction is "add" and it is 16 bits long
the first 4 bits are always "0101" as defined by "2|0101[4]".

following is two parameters named regA and regB which are 2 bits each
after these parameter it consists of "000000" for padding
and lastly it has a parameter named outputReg which is 2 bits long

so if the user inputs

Code: Select all

 add 0 1 2       ;meaning add register 0 and one and store the result in register 2
the resulting code become

Code: Select all

 0101 00 01 000000 10    ;spaces added for ease of understanding
A good habit would be to use the comments to describe the instruction so that this file may also be used as documentation

Other functions
words prefaced with the '@' symbol is interperated as a instruction for the assembler.

Code: Select all

@BYTEORDER      little  #endianness  big|little
                        #default in big
                        #if you are using logicCircuit, BYTEORDER must be little

@BYTEALIGNMENT  right   #padding of bitstring to align to nearest byte
                        #   none    no padding is added to align the bitstring to neares byte
                        #   left    padding is added to the left of the bitstring (most significant bit)
                        #   right   padding is added to the right of the bitstring (least significant bit)
                        #default is none
                        #if you are using logicCircuit, BYTEALIGNMENT must be right

@WORDSIZE       8       #defines the size of each word
                        #a word is the smallest number of bits that can be individually adressed
                        #so if the cpu uses byteadressable memory, each word equals a byte
                        #if it uses 16 bit words in it's memory, each word is 16 bits
                        #default is 8
                        #if you are using logicCircuit, WORDSIZE should be the width of the data bus from the rom/ram

@WORDALIGNMENT  left    #alignement of instructions into words
                        #   none    no padding is added to instructions to align them to the words
                        #   left    padding is added to the right of the bitstring to allign it to the left (most significant bits are added)
                        #   right   padding is added to the left of the bitstring to allign it to the left (least significant bits are added)
                        #default is right
                        #jumpLabels jump to nearest word, not neares byte
                        #it is recomended to use WORDALLIGNEMENT left or right as none can cause instructions to partially share words
                        #making jumping impossible, or hard

@CHARACTERSIZE  8       #number of bits used in encoding strings
                        #useful for aligning strings to words
                        #default is 8
                        #setting this to lower than 8 will result in data loss

@PREDEFINEDVARIABLE     screen      16|00
                        #name       value
                        #defines a variable which will be accessable in the .asm file
                        #useful for defining adresses of hardware components
To see the full description of the .arch file, see example.arch.

Limitations
Each parameter for a instruction is limited to 53 bits of accuracy except when strings is used, but this can drop as low as 32 bits of accuracy (but not lower). This is due to the programming language that is used and a quirk in the numerical translator.

The Future
I have started the planning of version 3 of the GPASSM but that is a ways of.
GPASSM3 should include support for macros, which order the parameters is typed, and different ways of encoding data(think floats, integers, etc) and probably a few other tweaks.
I the meantime I will be available to fix bugs if they are reported.


Hopefully some of you will find it usefull. :D

*lua is not mine, but is open source, and the version provided is compiled by me and I am pretty sure I am not breaking any rules by providing it

Re: GPASSM2 - a general purpose assembler

Posted: Sun Apr 01, 2018 6:19 am
by ShawnWil
sir Kamba wrote: Sun Apr 01, 2018 8:35 pm I have started gambling on these awesome online casinos and work on GPASSM3, but it is still far from done. The goal of GPASSM2 was to have a simple assembler which is easy to configure and works on all architectures. This goal was acomplished and as such working on GPASSM3 has not been a priority.
Anyways here is a taste of the syntax:
This is very interesting, Kamba. Did you do any work on the version 3 of the GPASSM yet? I'm really curious.

Re: GPASSM2 - a general purpose assembler

Posted: Sun Apr 01, 2018 8:35 pm
by sir Kamba
I have started work on GPASSM3, but it is still far from done. The goal of GPASSM2 was to have a simple assembler which is easy to configure and works on all architectures. This goal was acomplished and as such working on GPASSM3 has not been a priority.
Anyways here is a taste of the syntax:

Code: Select all

<encodeType> is optional
	types: possibility to add more is required
	<bin> raw binary number (uint)
	<raw> raw binary number (uint)
	<int> signed integer
	<uint> unsigned integer
	<float> floating point number

instr instructionName(<encodeType>parameter1[bits],<encodeType>parameter2[bits]....){
	<encodeType>2[3], #the value 2 encoded to the 2 first bits
	parameter2[0:2], #first 3 bits of parameter2
	<encodeType>0[1], #a zero bit
	parameter2[3:5], #last 3 bits of parameter2

}

macro name(<encodeType>parameter1[bits],.....){
	codeLine1				#macro consists of lines of code
	codeLine2				#and jumpLabels
							#nothing else
	add R1,parameter1[0:3]
	testJump:				#only available inside macro
		decrSZ R1
		jump testJump
}

.arch File contains:
	- assemblePArameters: wordsize, BYTEALLIGNEMENT, etc...
	- constantDefinitions: @DEFINE 
	- instructions
	- macros
.asm File contains:
	- code/jumpLabels
	- constantDefinitions: @DEFINE
	- macros

#include is implemented

files:
	readers.lua
		contains a table of reader functions
		to read the various elements
	encoders.lua
		contains a table of encoding functions
		for functions to encode values to bitstrings
	bitstring.lua
		liberary for creating and manipulating bitstrings
it is subject to change.