Studying VICE 3.1 (svn r34431) Commodore 8-bit emulator

Introduction

A long time ago, I thought Commodore 64 was a pretty simple 8-bit machine to understand, especially compared to the modern PCs that come bundled with huge operating systems, libraries and software. To a some extent that conception was true, but when you dig deep down into the internals of Commodore 64's workings, you will come across complexity. Emulating C64 is far from easy. VICE, the VersatIle Commodore Emulator aims to be as precise as possible in its emulation. Nowadays VICE even ships with x64sc that is more accurate than the standard x64 emulator, but as a tradeoff it is more resource intensive, so people with low-end machines will continue to use x64.

In this article I will try to gain a very basic understanding of how VICE executes a simple machine language program written in assembly. The program is as follows:

LDA #$01
STA $0400
RTS

It is equivalent to the BASIC program:

POKE1024,1

It will print character "A" on the top left corner of the screen. Nothing more, nothing less. So this is probably about as simple a program as it gets that does something that can be visually observed.

Let's go through the program now.

LDA #$01

6510core.c contains a big loop where 6502/6510 CPU opcodes are fetched and executed. Our LDA command is in the immediate argument form, so it is a9 in hexadecimal:

    case 0xa9:          /* LDA #$nn */
                LDA(p1, 0, 2);
                break;

To optimize speed, LDA is defined as a function-like C preprocessor macro. So the code in macros like these is inserted directly into the place where they are called, in effect creating inlined functions. Using a modern C compiler, you could indeed probably replace all these macros with proper inline functions, but the VICE developers are aware of the fact that VICE needs to be built on old systems too, so they prefer to keep the old-fashioned macros. It is also good practice to follow the wisdom: Don't fix it if it ain't broken. The C preprocessor macros are pretty ugly, but they do the job.

Here is the LDA() macro:

#define LDA(value, clk_inc, pc_inc) \
    do {                            \
        uint8_t tmp = (uint8_t)(value);   \
        reg_a_write(tmp);           \
        CLK_ADD(CLK, (clk_inc));    \
        LOCAL_SET_NZ(tmp);          \
        INC_PC(pc_inc);             \
    } while (0)

reg_a_write() is also macro:

#define reg_a_write(c) reg_a = (c)

reg_a stands for 6502/6510 Register A (the Accumulator), and it is defined in mainc64cpu.c along with other CPU registers:

void maincpu_mainloop(void)
{
    /* Notice that using a struct for these would make it a lot slower (at
       least, on gcc 2.7.2.x).  */
    uint8_t reg_a = 0;
    uint8_t reg_x = 0;
    uint8_t reg_y = 0;
    uint8_t reg_p = 0;
    uint8_t reg_sp = 0;
    uint8_t flag_n = 0;
    uint8_t flag_z = 0;
#ifndef NEED_REG_PC
    /* FIXME: this should really be uint16_t, but it breaks things (eg trap17.prg) */
    unsigned int reg_pc;
#endif

The comment refers to GCC 2.7.2.x that is ancient. I am pretty sure that nowadays a C struct would work just fine, but since the code base is old and it works fine, why bother to change it? So the VICE developers keep it like it is, having the 6510 registers as separate variables.

So we see the first thing LDA immediate mode command does is that it will assign Register A with our immediate operand value 1.

CLK_ADD() is a function-like C preprocessor macro to increment the CPU clock. In this case, the amount is zero, so the LDA immediate mode command executes very quickly.

#ifndef CLK_ADD
#define CLK_ADD(clock, amount) clock += (amount)
#endif

CLK is macro that resolves to maincpu_clk which stands for "main CPU clock", obviously:

#define CLK maincpu_clk

c64cpusc.c: defines and initializes maincpu_clk:

/* Global clock counter.  */
CLOCK maincpu_clk = 0L;

types.h contains a typedef to alias uint32_t as CLOCK. So the maincpu_clk is an unsigned 32-bit integer that cannot overflow: If it is incremented too far, it will wrap around to zero again.

typedef uint32_t CLOCK;

LOCAL_SET_NZ() is a function-like macro too, and it seems that this just sets flags Z and N to the value given to LDA, i.e. in this case 1. I am not quite sure what is the point of that, but the macro is very simple:

#define LOCAL_SET_NZ(val) (flag_z = flag_n = (val))

As its last task, LDA will increase the Program Counter register by 2. This is 2 because the LDA command itself is 1 byte long and the immediate operand is 1 byte as well, totalling 2.

#define INC_PC(value)   (reg_pc += (value))

You will remember reg_pc was defined earlier along with reg_a and others. So that was the first line of our assembly program handled by VICE. Let's next examine a more complicated command STA absolute mode.

STA $0400

The big loop executing 6502/6510 opcodes sees STA absolute mode as 8d in hexadecimal:

        case 0x8d:          /* STA $nnnn */
                STA(p2, 0, 1, 3, STORE_ABS);
                break;

STA() is defined as a function-like macro just like other opcode handling:

#define STA(addr, clk_inc1, clk_inc2, pc_inc, store_func) \
    do {                                                  \
        unsigned int tmp;                                 \
                                                          \
        CLK_ADD(CLK, (clk_inc1));                         \
        tmp = (addr);                                     \
        INC_PC(pc_inc);                                   \
        store_func(tmp, reg_a_read, clk_inc2);            \
    } while (0)

We have already seen how CLK_ADD and INC_PC work, so our interest is in the store_func() call. STA() called it with STORE_ABS, so let's study STORE_ABS() next:

#define STORE_ABS(addr, value, inc) \
    do {                            \
        CLK_ADD(CLK, (inc));        \
        STORE((addr), (value));     \
    } while (0)

We skip CLK_ADD() because as I have already said, we have seen how it works. So what does STORE() do? mainc64cpu.c defined it like this:

#ifndef STORE
#define STORE(addr, value) \
    memmap_mem_store(addr, value)
#endif

We see that this is again a function-like macro and it calls the C function memmap_mem_store()

static void memmap_mem_store(unsigned int addr, unsigned int value)
{
    memmap_mem_update(addr, 1); 
    (*_mem_write_tab_ptr[(addr) >> 8])((uint16_t)(addr), (uint8_t)(value));
}

It first calls memmap_mem_update() with parameters addr for address and 1 for boolean TRUE meaning "we are doing a write with this function call". The function is defined as follows:

inline static void memmap_mem_update(unsigned int addr, int write)
{
    unsigned int type = MEMMAP_RAM_R;

    if (write) {
        if ((addr >= 0xd000) && (addr <= 0xdfff)) {
            type = MEMMAP_I_O_W;
        } else {
            type = MEMMAP_RAM_W;
        }
    } else {
        switch (addr >> 12) {
            case 0xa:
            case 0xb:
            case 0xe:
            case 0xf:
                if (pport.data_read & (1 << ((addr >> 14) & 1))) {
                    type = MEMMAP_ROM_R;
                } else {
                    type = MEMMAP_RAM_R;
                }
                break;
            case 0xd:
                type = MEMMAP_I_O_R;
                break;
            default:
                type = MEMMAP_RAM_R;
                break;
        }
        if (memmap_state & MEMMAP_STATE_OPCODE) {
            /* HACK: transform R to X */
            type >>= 2;
            memmap_state &= ~(MEMMAP_STATE_OPCODE);
        } else if (memmap_state & MEMMAP_STATE_INSTR) {
            /* ignore operand reads */
            type = 0;
        }
    }
    monitor_memmap_store(addr, type);
}

Since write is 1 ("TRUE"), we go to the first if-clause. Since addr is 0x0400, type will be MEMMAP_RAM_W, indicating that this is not I/O write to VIC-II, SID, etc.

The last call to monitor_memmap_store() is probably to just notify VICE's internal machine language monitor of this memory access. Using the ML monitor, you can set tracepoints for memory accesses and VICE has to tell the monitor when they have triggered. This is what I assume, I did not study the function.

So what does the mystical code

(*_mem_write_tab_ptr[(addr) >> 8])((uint16_t)(addr), (uint8_t)(value));

try to do? Remember that not all RAM locations are "normal" RAM. Commodore 64 has memory mapped hardware device registers in the RAM address space. So RAM accesses are through function pointers that do appropriate things for certain memory address reads and writes. For instance, if you want to change screen's border color, you would issue command:

POKE53280,2

In VICE, the memory address 53280 is mapped to VIC-II video chip emulation and the above call will handle that case, along with others, by calling the appropriate handling function.

File c64memsc.c contains the following definitions:

* Pointers to the currently used memory read and write tables.  */
read_func_ptr_t *_mem_read_tab_ptr;
store_func_ptr_t *_mem_write_tab_ptr;
static uint8_t **_mem_read_base_tab_ptr;
static uint32_t *mem_read_limit_tab_ptr;

/* Memory read and write tables.  */
static store_func_ptr_t mem_write_tab[NUM_CONFIGS][0x101];
static read_func_ptr_t mem_read_tab[NUM_CONFIGS][0x101];
static uint8_t *mem_read_base_tab[NUM_CONFIGS][0x101];
static uint32_t mem_read_limit_tab[NUM_CONFIGS][0x101];

c64memsc.c contains function mem_initialize_memory() that initializes RAM, here's one part of that function:

/* Default is RAM.  */
    for (i = 0; i < NUM_CONFIGS; i++) {
        mem_set_write_hook(i, 0, zero_store);
        mem_read_tab[i][0] = zero_read;
        mem_read_base_tab[i][0] = mem_ram;
        for (j = 1; j <= 0xfe; j++) {
            if (board == 1 && j >= 0x08) {
                mem_read_tab[i][j] = void_read;
                mem_read_base_tab[i][j] = NULL;
                mem_set_write_hook(0, j, void_store);
                continue;
            }
            mem_read_tab[i][j] = ram_read;
            mem_read_base_tab[i][j] = mem_ram;
            mem_write_tab[i][j] = ram_store;
        }

As you can see, mem_write_tab[i][j] is assigned a function pointer to function ram_store() for ordinary RAM writes. It is defined simply like this:

void ram_store(uint16_t addr, uint8_t value)
{
    mem_ram[addr] = value;
}

The mem_ram is defined as a simple array:

/* The C64 memory.  */
uint8_t mem_ram[C64_RAM_SIZE];

So I guess our STA $0400 will be a regular RAM write via ram_store(). The VIC-II video chip will then "see" the ordinary RAM location containing our value 1 (standing for big letter "A"), and the VIC-II will the do its magic writing the letter "A" to the top left corner of the screen. I do not have the courage now to study VIC-II logic to verify this, it would probably be quite complicated.

So, let's move to the RTS command:

RTS

The big loop contains 60 as hexadecimal value for RTS assembly command:

        case 0x60:          /* RTS */
                RTS();
                break;

It is defined as a function-like C preprocessor macro just like LDA and STA:

#define RTS()                        \
    do {                             \
        uint16_t tmp;                    \
                                     \
        CLK_ADD(CLK, CLK_RTS);       \
        tmp = PULL();                \
        tmp = tmp | (PULL() << 8);   \
        JUMP(tmp);                   \
        FETCH_PARAM(reg_pc);         \
        CLK_ADD(CLK, CLK_INT_CYCLE); \
        INC_PC(1);                   \
    } while (0)

CLK_ADD() we have seen, so let's focus on PULL(). It is a function-like macro:

/* Stack operations. */

#ifndef PUSH
#define PUSH(val) ((PAGE_ONE)[(reg_sp--)] = ((uint8_t)(val)))
#endif
#ifndef PULL
#define PULL()    ((PAGE_ONE)[(++reg_sp)])
#endif

"PULL" is 6502/6510/C64 terminology and it means the same as "POP":ping the stack, i.e. getting the topmost element. So we get two PULL() commands to assign tmp the return address that was stored in the stack. We need two PULL() macro calls, since the return address is a 16-bit unsigned integer that consist of two consecutive 8-bit unsigned integers.

Next we have JUMP(). Again we have a function-like macro that jumps to the return address just like the name suggests:

/* Implement the hack to make opcode fetches faster.  */
#define JUMP(addr)                                                                         \
    do {                                                                                   \
        reg_pc = (unsigned int)(addr);                                                     \
        if (reg_pc >= (unsigned int)bank_limit || reg_pc < (unsigned int)bank_start) {     \
            mem_mmu_translate((unsigned int)(addr), &bank_base, &bank_start, &bank_limit); \
        }                                                                                  \
    } while (0)

It first assigns the addr to the Program Counter register. I am not quite sure about the following if-clause, but it seems like it is a sanity check and does something with memory mapping.

The purpose of FETCH_PARAM(addr) is vague to me. The macro name suggests fetching a value:

#define FETCH_PARAM(addr) ((((int)(addr)) < bank_limit) ? bank_base[(addr)] : LOAD(addr))

#ifndef LOAD
#define LOAD(addr) \
    memmap_mem_read(addr)
#endif

but I do not understand what is done with it!

Conclusion

Even the simplest programs require lots of code to execute on an emulator. We only saw the bare bones of a very simple program execution and even then we did not quite understand all of it. We also skipped the whole VIC-II video chip's business of displaying the letter "A" - we covered only the data write to RAM.

I did what I could, but may have misunderstood something. If you have corrections or clarifications, you are welcome to send me email.

Best regards,
Kalevi Kolttonen <kalevi@kolttonen.fi>
Helsinki, Finland
February 10th 2018

UPDATE February 10th, 2018: I just noticed that mainc64cpu.c contains alternative macros for PUSH() and PULL():

/* Route stack operations through read/write handlers */

#ifndef PUSH
#define PUSH(val) (*_mem_write_tab_ptr[0x01])((uint16_t)(0x100 + (reg_sp--)), (uint8_t)(val))
#endif

#ifndef PULL
#define PULL()    mem_read_check_ba(0x100 + (++reg_sp))
#endif

And buried inside #ifdef FEATURE_CPUMEMHISTORY are these:

/* Route stack operations through memmap */

#define PUSH(val) memmap_mem_store((0x100 + (reg_sp--)), (uint8_t)(val))
#define PULL()    memmap_mem_read(0x100 + (++reg_sp))

It seems to me that these #ifdef FEATURE_CPUMEMHISTORY macros are used for supporting CPU Memory History. I did not know VICE had support for such a thing. But I do not know why you would use (*_mem_write_tab_ptr[0x01]) function pointer to access the stack. I guess it would be slower than direct access. I really have no clue!

Anyway, the stack is located on page 1, so 0x01 in (*_mem_write_tab_ptr[0x01]) comes from that fact. So the function pointer array is indexed by Commodore 64 memory pages.