A Forth-like CPU

CPU architecture

The memory architecture for this CPU is a RAM with 16 bit address bus and 16 bit data bus. This can be increased to more bits, e.g. 32. In Forth usually there are many calls, which is the reason why the highest bit in every opcode is used to mark the rest of the bits as the address of a word, which is called, if the bit is set to 1. With this concept only the lower half of the address space is callable, but for higher memory addresses it is possible to load a literal and call to it.

There are 8 internal registers:

With the rising edge of the clock, the values of the registers are read. Register changes are available with the next rising edge of the clock.

When changing the address register, the q register is changed with the next rising edge of the clock (the clock is divided by 2, to avoid memory read latency problems). Setting the wren register to 1 stores the data register at address location with the next rising edge of the clock. The wren register is resetted to 0 automaticly.

For 16 bit systems, this is the description of the bits of an opcode:

The bits of an opcode:

At reset the pc is set to 256, sp to 63 and rp to 255. At every rising clock edge the next cell at pc register is read and the pc register is incremented. Then there are 2 steps at subsequent rising clock edges (mixed with one delay cycle after each step). At step 1, the microcode 1 is executed, if not escaped by a call bit, and at step 2 the microcode 2 is executed. Before each step, wren is resetted to 0 and data is pre-assigned with q.

The following table shows the which bits are tested at which step, which implicit actions are executed and some examples how to set the bits for implementing some Forth words.


1 1 1 2 2 2
bit description:
microcode 1 wren push-bits microcode 2 wren push-bits
implicit action:
    data=q, wren=1     data=q, wren=1
pop 0 no-push a2address 1 pushr
popr 0 no-push a2address 1 push
pop 0 no-push q2a 0 no-push
a2data 1 push a2address 0 nop
pop 0 no-push a2address 1 no-push
a2address 0 no-push a2address 1 push
call a
pc2data 1 pushr a2pc 0 no-push
pc2address 0 no-push a2address 1 push
literal a
pc2address 0 no-push q2a 0 no-push
pop 0 no-push and 1 push
pop 0 no-push or 1 push
pop 0 no-push xor 1 push
pop 0 no-push plus 1 push
branch a
pop 0 no-push branch 0 no-push
jump literal
pc2address 0 no-push q2pc 0 no-push
jump a
a2pc 0 no-push a2address 0 no-push
popr 0 no-push q2pc 0 no-push

VHDL implementation of the CPU core

This is a VHDL implementation of the CPU: forth_core.vhd. With Quartus 7.1, compiled for a Cyclone I FPGA, it needs 423 LEs (out of 5,980 available LEs) in a small demo project, which maps one address of the RAM space to some output pins and another address to some input pins.

With Xilinx ISE 9.1.03i the same demo project needs 491 LUTs on a Spartan 3E (out of 9,312 available LUTs). Full Quartus 7.1 sample project for the TREX board and Xilinx project for the Spartan 3E starter kit: ForthCPU-0.2.zip. The Forth assembler and compiler is included in the sw-directory.

Forth Assembler

With a small Forth program you can define the mnemonics as constants and a word "vm-code", which assembles anything until "end-code". All words are interpreted as constants with the following exceptions: if a word is preceded by a colon (:), then it will be defined as a new constant with the current address. The special word ".org" changes the address where to generate the code. After backslash, anything is ignored as comment until line end.

An example program, which blinks an LED, displays button states with other LEDs and stops the program, if button 0 is pressed:


.org    100

\ start of main program
: start cmd-literal 1 call led  \ LED on
        cmd-literal 0 call led  \ LED off

        \ test keyboard
        cmd-literal-a keyboard-port cmd-a@
        cmd-literal-a 1 cmd-and cmd-literal-a end cmd-branch-a
        cmd-jump-literal start

        \ stop CPU, if key at bit 0 was pressed
: end   cmd-jump-literal end

\ switch LED on/off
: led   \ xor with keyboard buffer
        cmd-literal-a keyboard-port cmd-a@ cmd->a cmd-xor

        \ store at led-port
        cmd-literal-a led-port cmd-a!

        \ wait half a second
        call 05s

\ push TOS to data stack
: vm-dup
        cmd->a cmd-a> cmd-a>

\ delay half a second (500=0x1f4 milliseconds)
: 05s   cmd-literal 1f4
: l1    call 1ms
        call minus
        call vm-dup cmd-literal-a l1 cmd-branch-a
        cmd->a cmd-return

\ delay about one millisecond
: 1ms   cmd-literal 2f0
: l2    call decr
        call vm-dup cmd-literal-a l2 cmd-branch-a
        cmd->a cmd-return

\ substract 1 from TOS
: decr  cmd-literal-a ffff cmd-plus


Forth cross compiler

With a Forth cross compiler, the above program could be written like this:

: 1ms 2f0 0 do loop ;

: 05s 1f4 0 do 1ms loop ;

: led
	keyboard-port @ xor
	led-port !

: stop
	begin again

: main
		1 led
		0 led
		keyboard-port @
		1 and if stop then

Implementation: TODO

14. July 2007, Frank Buß