July 16, 2010

Simulating the NIOS II w/mmu using Verilator

I've fallen in love with "verilator".

It's a verilog simulator which generates C++ code. It's not fair to call it a verilog to
C translator because it does much more than that.

I like it because it forces problems out of my designs, generates good lint output and runs very very fast. For many of the designs I simulate using cver or iverilog takes many hours, sometimes a day. With verilator I cycle time performance which is withing an order of magnitude of my behavioral models (well, ok, that might be an exaggeration but it seems that way). My sims often shorten from overnight to 10 minutes.

So, I wanted to study the TLB accesses on the NIOS II running full linux so I could
improve my behavioral model of the NIOS II. To that end I grabbed a pre-baked
NIOS II and stripped off everything but the cpu. I then wrote my own "testbench" models for the memory - sdram and flash as well we the jtag uart and timers. Just a simple hack to answer when the cpu calls. Nothing as complex a full Avalon bus with arbitration, because I don't need that to sim the basic cpu.

I also had to write behavioral models for "altsyncram" and the "altmul_sum" multiplier. That was fun. The sync ram model is very very dense inside and pretty much inscrutable. But fortunately the NIOS uses the sync rams as basic dual port memories in a fairly constrained way. It does make interesting use of the byte enables and clock enables - my guess is this helps them make timing. When the pipe stalls the clock enables are de-asserted to stall the sync rams "internal pipe" as well. It's fun to watch in green waves.

So, I have gotten the NIOS II to simulate using verilator. I'm still having a problem with the JFFS2 reading of flash, but I'm very close to running something in userland. The kernel comes up to the point where it mounts the JFFS2 file system, which is good. It just has not used the TLB's yet.

I like the NIOS II architecture, especially with the MMU, if you can call it that. It's a simple six stage pipeline with tightly coupled I and D caches. The I cache is small but seems to work pretty well, as does the D cache. The caches are 16 way set associative.

The "mmu" is really just a small set of TLB's which are used for various segments of the address space. They function much like the i-cache near as I can tell, i.e. an n-way set associative memory. I'm still working out the details.


July 5, 2010

pxe booting, huge pages and running raw x86_64 code

Lately, for one of my customers, I've been using "pxeboot".

You know you're in trouble when you order a motherboard with 2 cpu slots and 16 ram slots...

It seems most high end server mother boards support "pxe", which is essentially dhcp + tftp lan booting. Basically booting over the network.

The fun thing is it loads the code 0x7c00, just like the bios loads the boot sector. This means, you can write simple real mode code, just like in a boot sector.

I've been doing this and then jumping to protected mode and then writing C, but naturally the customer wants to use "long mode" since his machine has 320Gb of ram and he wants to use it all. So, I crafted up some code to switch from real mode to long mode. This requires turning on paging so I decided to use 1Gb pages to make the paging table smaller (since I am lazy and writing real mode code is painful).

Anyway, I have pxe loadable code which will flip into "long mode" and run a 64 bit mini-os, all loadable from the network.

Not that hard to do, but. The interesting part is that QEMU was very helpful up through protected mode, but bochs has really shone as a good pc emulator. I modified "bfe" the bochs front end so I could single step through the boot code when loaded but the bios. This is helpful. But I used qemu since it's faster.

But qmeu does not support 1gb pages. I was surprised to find bochs does, if you turn on x86_64 support.

It's a little frightening to build a page table for a 320gb machine, but hey, it only took 320 PDP entries and one PML4 entry.

The point of the story is that "PXE" booting seems like a useful thing for some applications, and being able to write raw code to use a PC and an embedded system in sometimes helpful.

machine level instructions are interesting

This will not be interesting unless you find machine level instructions interesting.
Lately *all* I do is machine level instructions.

NOTICE how everyone mentioned in the presentation (last pages) uses their TWITTER addresses. Am I lost in the haze? Does everyone use twitter except me?

http://www.scribd.com/doc/28264000/Descent-into-Darkness-Understanding-your-systems-binary-interface-is-the-only-way-out

Anyway, I like this presentation. It's funny. It has nice pictures. It has great diagrams. And it describes a cool idea. And it talks about x86_64 machine level instructions.

March 26, 2009

Re-creating old CPU designs

IMG_0036.jpg

Over the years I've done a number of experiments using Verilog, a hardware modeling language. In several of these experiments I have attempted to recreate old CPU designs like the MIT CADR lisp machine and the DEC PDP-8/I. My latest experiment is to recreate the PDP-11, in modern verlog, using modern simulation techniques.

Note that this has been done before. I know of at least 2-3 old microcoded versions and more recently there are 3 other groups which have done this, but in all the cases the code is either not in verilog or is proprietary and closed. Not very helpful.

I have not (yet) delved into SystemC, but I have done some fun work with co-simulation. Most recently I wired my RTL simulation of the pdp-11 in almost-verilog to a "known good" pdp-11 instruction set simulator. The idea is that both the RTL simulation and the instruction set simulator run the same code and at the end of each instruction cycle the results are compared. The "results" are the internal register values, the processor status word and the list of bus operations which occurred (address, type, data).

In a perfect world the two simulations will run in lock step and any deviation is a bug. And this is mostly true. The comparison turns out to be extremely helpful and very valuable.

Again, however, this is not new. I learned this technique from others who are smarter than I am.

While attempting to recreate the pdp-11 I ran into a number of interesting problems. The instruction set is fairly simple but it is not RISC. The effective address computations are complex and in many cases doubled. Let me supply an example:. Here is a list of the 8 addressing modes. A complex instruction can have a source operand (with one of these addressing modes) and a destination operand (with one of these addressing modes). So, in the worst case you need to compute the effective address and do one or more fetches for the source and destination.

mode symbol  ea1     ea2             ea3             data          side-effect                                                                               
0    R       x       x               x               R               x       
1    (R)     R       x               x               M[R]            x       
2    (R)+    R       X               x               M[R]            R<-R+2  
3    @(R)+   R       M[R]            x               M[M[R]]         R<-R+2  
4    -(R)    R-2     x               x               M[R-2]          R<-R-2  
5    @-(R)   R-2     M[R-2]          x               M[M[R-2]]       R<-R-2  
6    X(R)    PC      M[PC]+R         x               M[M[PC]+R]      x       
7    @X(R)   PC      M[PC]+R         M[M[PC]+R]      M[M[M[PC]+R]]   x       
Seems complex, yes? Each M[] is a memory read. The basic register indirect is simple. But modes 6 & 7 add the side effect of reading addition operand data from the next instruction location. This increments the pc as well as fetching an offset which gets added to the result of a previous EA calculation.

So, how to implement this? My first thought was a complex state machine. After a while I got frustrated and thought it might be easier just to make a machine which recodes the old pdp-11 instruction into new "risc-like" instructions on the fly. Sort of a just-in-time binary recompilation. I think this is how modern day X86 machines work. The fun idea would be to have several "machines" running ahead and converting the pdp-11 CISC instructions into simple RISC instructions, filling several FIFO's. The then RISC engine could use modern ideas like a multi-stage pipeline, speculative execution and branch prediction. While very cool, I quickly decided that was more complexity than I wanted at this stage.

I do think, however that it might make sense initially to do a simple "recoding engine" and a simple "risc pipeline". I want to do it and compare the gate count to a state machine version.

So, I set out to do a simple state machine version. I tried to compress the states as much as possible but current feel there has to be a decode state, four states for each operand, an execute state and a write-back state. The four states for each operand can be reduced to a little as one, depending on how the instruction decodes. I tried to eliminate the single EA state for each operand but instructions like:

   mov   @(R5)+,@(R5)+

causes problems. Why? because the value of R5 is incremented twice, once after each EA calculation. If I did the EA and post-increment in one state I needed to special case the increment (to be 2x) if the both registers were equal. And it got to be a big mess. I capitulated, added a state, and reduced the complexity.

I should note here that all pdp-11's, except one, are microcoded. And I can see why.

At some point I do want to try an experiment by adding a pre-fetch unit, keeping at least 3 words available and doing the EA calculations in parallel. The EA calculation will stack up (i.e. stall) queuing up for memory reads, but it has the potential for being more efficient, especially if there is a cache which does burst reads and the line size is at least 8 bytes.

I know this might all sound crazy, but I've learned a lot in the process and almost everything I have learned has been useful in my day job.

January 14, 2009

a used 2g iphone is actually cool

I bought a used iphone 2G model for work. I didn't intend to use it as an actual phone. But, as time wore on, I started playing with it, and (mostly) prying it out of the hands of my 10 and 12 year olds and I've grown to like it.

Oddly, I've yet to activate it. I suppose I will soon, but the idea of spending $75/month is a little painful right now. I guess I'm paying $50/month for my current phone, so maybe it's not that much of an increase.

The most fun part for me has been the mp3 player. I have a lot of music on our home file server, all legally paid for and ripped from CD's. Except for the mp3's I downloaded directly from Amazon (THANK YOU Amazon for selling MP3's!). Anyway, all of the music is legit. And now I can load it onto the phone and listen to it as I walk to and from work. Heh - I've become one of those "young people" who wear headphones. But I swear it's great to listen to music on the way to work.

I may have to upgrade to the 3G version, however, if only for the EDGE and GPS. I think the GPS might come in pretty handy.

I'm a little sad because my sleek little MP3 playing Sony Ericson slide phone is still pretty cool. I didn't intend to cheat. It just happened.

One thing I will mention - using iTunes can be a real pain. It's a very pretty interface but for the first time user it can be pretty confusing. I spent 15 minutes trying to figure out how to get music into it from a file on my hard disk. It should not be that hard.

I should also mention that the apps you can buy (or download for free) from the Apple "app store" are pretty cool. There's a lot of things to browese through. A few too many, to be honest. I would be nice to find some place the reviews the apps and recommends the best ones. Naturally my son (he's 10) immediately downloaded the Star Wars "light saber" app. Even I find it fun after a few coctails.

Even without activation the 802.11 connectivity make it a very useful device. I can browese, get weather, stock info, you tube, send email, etc...

All in all, I like it. And I didn't think I would.

January 8, 2009

pcc (portable c compiler) lives again!

Two interesting things happened this week

- the "R" programming language was talked about in the mainstream press.

- I discovered that the openbsd folks are working on using a non-gcc C
compiler (pcc). Turns out in the non-linux unix world there is not so
much love for gcc.

This makes some sense. gcc is huge and hard to work with. When it
compiles it uses every available resource and eats the machine. It's
not that easy to port or maintain. And it keeps getting bigger.

"pcc" on the other hand is 5-10 times faster, generates reasonable code
and is easy to port and work on. Someone is actually maintaining it.
This is the same pcc some will remember from Bell Labs in the '70s, back
when all the world was a pdp-11 (just before all the world became a
vax).

And, as eco systems go, it's good to have more than one option. Linux
is competely dependant on gcc. Netbsd on the other hand, is not.

(I've been working on getting netbsd to run on a my vax 11/730 again,
so I've fallen back in love with netbsd. Well, I think we're more like
friends with benefits, but please don't tell linux I've been cheating.)

anyway, I thought that was interesting. Apparently the pcc maintainer is
planning to add PIC support to pcc this winter which is one of the
missing features needed.

Ubuntu upgrades. wow!

I have a couple of machines running Ubuntu. More and more lately.

One machine at home was running Ubuntu 7.04 and mythtv. I was loath to change it
because it was working and I hate having to type "ssh" when I'm watching tv.

But, I finally did it over the holidays. First I upgraded to 7.10, which was a pain.
7.04 is not longer supported and I have to hack the apt config file to point to the
archives. But this broke the upgrade. So I ended up starting with a archive pointing apt
config and then switching it in the middle of the upgrade to point to the normal repository.
A bit hair raising but it worked.

Once at 7.10 I could cleanly upgrade to 8.04 and then to 8.10. I did it all via ssh and
and it worked with very few problems. My hat is off to the Ununtu guys.

Getting X to work consistantly with my Nvidia display card was no so easy. With some
releases there is support from Ubuntu. But not 8.10. For that I had to go back to using
the linux install script from NVidia, which does all sorts of fun things under the covers.

But now I have mythtv back, and I'm running 8.10 and all is well with the world.

That went so well I upgrade a machine at work and it too did the right thing. Very nice.

Sadly I'm about to wave goodbye to RedHat. I with I didn't have to,but they don't seem
to be providing the same level of coolness that Ubuntu us. The "apt-get" system is
just too nice. I never want to see another rpm again.

Virtually everything I wanted was available from apt-get. Amazing. And easy!

October 25, 2007

My experiment with Mythtv

I love my tivo. I've added a big disk to it. But I want to see more of my "personal media" on the tv (pictures, movies, etc) so I decided to make a Mythtv box.

I made a nice new pc in a "stereo cabinet like" box. I used a motherboard with built in HDMI output (very nice) and a 64 bit AMD cpu. I got all this from Mwave.

  • Motherboard: BIOSTAR TF7050-M2 nVIDIA GeFORCE 7050PV CHIPSET MICRO ATX
  • Box: ANTEC NSK2400 (BLACK / SILVER) MICRO ATX DESKTOP CASE

And of course I stuffed a dvd cdrom drive and a huge SATA disk drive.

Using a motherboard with HDMI out was key, becauase I just bought a big lcd hdtv.

When it arrived installed Ubuntu using this page:

MythTV_Feisty_Backend_Frontend

I bought a Tivo remote control because everyone in my house knows how to use one. I got it from WeaKnees. I love the stuff they sell.

I then found I had an "IR problem". I solved it with a USBUIRT device, which I also love. Get the one with the 56k detector. It "just worked".

Interesting bits:

- I found a tivo control file which described the tivo remote. And then I had to hack the mapping file quite a bit to get it to be the way I wanted.

- Support for the nvidia chipset was not in the kernel. I used an install package from Nvidia which was scary. But it did provide a nice X windows based config tool which turned out to be handy.

(X windows looks *really* nice on my hdtv. It makes me thing my next monitor for my office will be a 40" lcd hdtv. why not? huge screen!)

- DVD's did not "just play" by default. Instead I got odd /dev/hdc i/o errors. I asked a friend (thanks Bill!) and installed a few missing DVD decode packages. apt-get to the rescue. It then worked fine. I thought for a 1/2 day I had a bad cd-rom drive.

Result:

After many apt-gets and much twiddling I now have a nice Mythtv box which responds to a tivo remote. I can view my video and pictures and watch Jim Lehrer on the big screen (my record list is only PBS and F1 on Speed :-)

I may still get a HD Tivo, but the MythTv is a very nice adjunct and allows me to do things in a nice linuxy way with my home media. I plan to try firewire next and if that works ok I may skip the tivo hd...

March 18, 2007

Modifying read-only file systems in an embedded system

It's often a good idea to make the root file system in a embedded system read-only. If you do this and only make changes to files in a ram disk (mounted under /tmp, for example) the device will always come back to a known state when powered up. This is a nice feature and often a requirement.

But sometimes you need to make changes which persist. Check out the "mini fan out" file system. It allows you to layer changes on top of a read-only file system.

I have not used this package directly, but I plan to shortly.

Firefox bookmarks

Just a quick note on bookmarks. I use several laptops, a common machine at home and a workstation at work. This can sometimes get confusing when I save web bookmarks in various different places.

I recently discovered "foxmarks". I only used Firefox, and foxmarks is an extension which syncs up my bookmarks everyplace I install it. It's very handy.

I also discovered "gmarks", which places the bookmarks in a window on the left makes them easy to use. I'm still on the fence with gmarks

Sometimes I wish there was a way to make cross platform "notebooks" with firefox, which would combine a directory tree under SVN control, bookmarks, pdf and an email folder all into one. And make it easy to archive.