Simulating the NIOS II w/mmu using Verilator
I've fallen in love with "verilator".
It's a verilog simulator which generates C++ code. It's not fair to call it a verilog to
C translator because it does much more than that.
I like it because it forces problems out of my designs, generates good lint output and runs very very fast. For many of the designs I simulate using cver or iverilog takes many hours, sometimes a day. With verilator I cycle time performance which is withing an order of magnitude of my behavioral models (well, ok, that might be an exaggeration but it seems that way). My sims often shorten from overnight to 10 minutes.
So, I wanted to study the TLB accesses on the NIOS II running full linux so I could
improve my behavioral model of the NIOS II. To that end I grabbed a pre-baked
NIOS II and stripped off everything but the cpu. I then wrote my own "testbench" models for the memory - sdram and flash as well we the jtag uart and timers. Just a simple hack to answer when the cpu calls. Nothing as complex a full Avalon bus with arbitration, because I don't need that to sim the basic cpu.
I also had to write behavioral models for "altsyncram" and the "altmul_sum" multiplier. That was fun. The sync ram model is very very dense inside and pretty much inscrutable. But fortunately the NIOS uses the sync rams as basic dual port memories in a fairly constrained way. It does make interesting use of the byte enables and clock enables - my guess is this helps them make timing. When the pipe stalls the clock enables are de-asserted to stall the sync rams "internal pipe" as well. It's fun to watch in green waves.
So, I have gotten the NIOS II to simulate using verilator. I'm still having a problem with the JFFS2 reading of flash, but I'm very close to running something in userland. The kernel comes up to the point where it mounts the JFFS2 file system, which is good. It just has not used the TLB's yet.
I like the NIOS II architecture, especially with the MMU, if you can call it that. It's a simple six stage pipeline with tightly coupled I and D caches. The I cache is small but seems to work pretty well, as does the D cache. The caches are 16 way set associative.
The "mmu" is really just a small set of TLB's which are used for various segments of the address space. They function much like the i-cache near as I can tell, i.e. an n-way set associative memory. I'm still working out the details.