January 3, 2011

mindcrack ("minecraft" really)

im001159.jpg

I didn't really see this coming, but I've become a "mindcraft" server admin.

If you don't have a 12-14 year old boy, you might not know what this is.

Near as I can tell it goes directly to the pleasure center in their
brain and chemically bonds to the receptors.

It's a rather interesting java based "3D world" which you can move
around in and create things. But that description does not do it
justice. It's quite an interesting wonderland were the object seems
to be able to construct things - buildings, bridges, towns, etc...

There is a java based client which (I'm guessing) uses OpenGL, since
it has nice real-time 3d graphics. The graphics are a bit cheesy but
pretty fun to look at. The system creates a 3d virtual world with
lots of interesting rules. And it's multiplayer.

There is an associated server. It has been hacked to include java
"plugins". So naturally I was strong-armed into creating one for my
son. And now, of course, he wants to make it "public". Which
presents some network security issues and opens the notion of using a
cloud VM.

The note worthy thing here is that there is no "one server". Anyone
can create a server and house a world.

The plugins are java code, and seem to be able to manipulate the world
rules in interesting ways. One will let you "fly". Another creates a
"magic carpet" which is transparent and allows you to walk anywhere,
even through things. Lots of other interesting ones.

Anyway, I was curious if anyone else had run into this. The
interesting thing to me is that it seems to have spread like wildfire
in the "young boys" world and there are a ton of interesting "plugins"
which are not written by the original author, and in fact, there is a
"modded" server which enables the plug ins, also not written by the
author.

People seem to disassemble the original java and then glue in their
changes and re-release. It's an interesting phenomena, IMHO.

I'm also impressed with the fact that it's all written in java. It
runs at a good clip.

A quick way to boot linux on a windows pc

IM001124.JPG

This is handy if you need linux quickly on windows

This is probably old hat to most people, or uninteresting if you have vmware,
but in a pinch, this is a quick way to boot up linux on a windows box,
which can be very useful.

http://www.pendrivelinux.com/use-qemu-to-boot-linux-from-windows/

With some small edits to the .bat file you can add local disk images.

Handy if you need to do some quick unix work, like building a kernel or testing
an small application.

December 25, 2010

WIFI Enabled Thermostats

On a whim I went out and bought two of the new WIFI enabled thermostats from 3M.

They are really from radiothermostat.com, but 3M seems to be subsidizing them with
an OEM arrangement.

Oddly, 3M's sight has no mention of this product. But you can buy it at Home Depot

Or you can try

http://www.radiothermostat.com/wifi/

Worst case google "filtrete wifi"

The bottom line is that for $99 you get a reasonable programmable thermostat which also does WIFI. It supports WEP and WPA2 encryption. It has a touch screen and a modern look. And it's simple to install.

I've wanted one of these for years but they were always $300-400 which was too much. When I saw one for $99 I had to jump.

I was skeptical it could work, or work well, but I was able to quickly install it and get it working on my WIFI network. I actually followed the directions and they worked. To get it on the network it starts out as something like an AP - it shows up with a name as a peer to peer
net on your laptop (I didn't realize you could do that running peer-to-peer). You select the thermostat's temporary AP and then connect to it's web server and tell it how to get on your private wifi. It then talks (via ping) to "radiothermostat.com
s" server and you can manage the thermostat remotely via the web or an iPhone app.

The iPhone app will connect directly (via WIFI) or let you manage the thermostats remotely
via their server. Handy in the summer...

It looks like you can also program it using a web protocol. They claim they will publish an API shortly but lots of hackers seem to have figured the RESTful protocol. It's not too complex and
I'm sure 10 lines of perl will do everything I want to do.

I wonder if it runs linux :-)

-brad

December 7, 2010

PCI and PCIe performance

After a short string of job interviews a few months ago I sat down and tried to quantified some of my thinking about PCI, PCIe and how they interact with software.

It's a work in progress and needs more content, but it's a start. I plan to add more graphics and empirical data over time

Software and Performance Ramifications of PCI and PCIe Design Paradigms

Feel free to email comments.

July 16, 2010

Simulating the NIOS II w/mmu using Verilator

I've fallen in love with "verilator".

It's a verilog simulator which generates C++ code. It's not fair to call it a verilog to
C translator because it does much more than that.

I like it because it forces problems out of my designs, generates good lint output and runs very very fast. For many of the designs I simulate using cver or iverilog takes many hours, sometimes a day. With verilator I cycle time performance which is withing an order of magnitude of my behavioral models (well, ok, that might be an exaggeration but it seems that way). My sims often shorten from overnight to 10 minutes.

So, I wanted to study the TLB accesses on the NIOS II running full linux so I could
improve my behavioral model of the NIOS II. To that end I grabbed a pre-baked
NIOS II and stripped off everything but the cpu. I then wrote my own "testbench" models for the memory - sdram and flash as well we the jtag uart and timers. Just a simple hack to answer when the cpu calls. Nothing as complex a full Avalon bus with arbitration, because I don't need that to sim the basic cpu.

I also had to write behavioral models for "altsyncram" and the "altmul_sum" multiplier. That was fun. The sync ram model is very very dense inside and pretty much inscrutable. But fortunately the NIOS uses the sync rams as basic dual port memories in a fairly constrained way. It does make interesting use of the byte enables and clock enables - my guess is this helps them make timing. When the pipe stalls the clock enables are de-asserted to stall the sync rams "internal pipe" as well. It's fun to watch in green waves.

So, I have gotten the NIOS II to simulate using verilator. I'm still having a problem with the JFFS2 reading of flash, but I'm very close to running something in userland. The kernel comes up to the point where it mounts the JFFS2 file system, which is good. It just has not used the TLB's yet.

I like the NIOS II architecture, especially with the MMU, if you can call it that. It's a simple six stage pipeline with tightly coupled I and D caches. The I cache is small but seems to work pretty well, as does the D cache. The caches are 16 way set associative.

The "mmu" is really just a small set of TLB's which are used for various segments of the address space. They function much like the i-cache near as I can tell, i.e. an n-way set associative memory. I'm still working out the details.


July 5, 2010

pxe booting, huge pages and running raw x86_64 code

Lately, for one of my customers, I've been using "pxeboot".

You know you're in trouble when you order a motherboard with 2 cpu slots and 16 ram slots...

It seems most high end server mother boards support "pxe", which is essentially dhcp + tftp lan booting. Basically booting over the network.

The fun thing is it loads the code 0x7c00, just like the bios loads the boot sector. This means, you can write simple real mode code, just like in a boot sector.

I've been doing this and then jumping to protected mode and then writing C, but naturally the customer wants to use "long mode" since his machine has 320Gb of ram and he wants to use it all. So, I crafted up some code to switch from real mode to long mode. This requires turning on paging so I decided to use 1Gb pages to make the paging table smaller (since I am lazy and writing real mode code is painful).

Anyway, I have pxe loadable code which will flip into "long mode" and run a 64 bit mini-os, all loadable from the network.

Not that hard to do, but. The interesting part is that QEMU was very helpful up through protected mode, but bochs has really shone as a good pc emulator. I modified "bfe" the bochs front end so I could single step through the boot code when loaded but the bios. This is helpful. But I used qemu since it's faster.

But qmeu does not support 1gb pages. I was surprised to find bochs does, if you turn on x86_64 support.

It's a little frightening to build a page table for a 320gb machine, but hey, it only took 320 PDP entries and one PML4 entry.

The point of the story is that "PXE" booting seems like a useful thing for some applications, and being able to write raw code to use a PC and an embedded system in sometimes helpful.

machine level instructions are interesting

This will not be interesting unless you find machine level instructions interesting.
Lately *all* I do is machine level instructions.

NOTICE how everyone mentioned in the presentation (last pages) uses their TWITTER addresses. Am I lost in the haze? Does everyone use twitter except me?

http://www.scribd.com/doc/28264000/Descent-into-Darkness-Understanding-your-systems-binary-interface-is-the-only-way-out

Anyway, I like this presentation. It's funny. It has nice pictures. It has great diagrams. And it describes a cool idea. And it talks about x86_64 machine level instructions.

March 26, 2009

Re-creating old CPU designs

IMG_0036.jpg

Over the years I've done a number of experiments using Verilog, a hardware modeling language. In several of these experiments I have attempted to recreate old CPU designs like the MIT CADR lisp machine and the DEC PDP-8/I. My latest experiment is to recreate the PDP-11, in modern verlog, using modern simulation techniques.

Note that this has been done before. I know of at least 2-3 old microcoded versions and more recently there are 3 other groups which have done this, but in all the cases the code is either not in verilog or is proprietary and closed. Not very helpful.

I have not (yet) delved into SystemC, but I have done some fun work with co-simulation. Most recently I wired my RTL simulation of the pdp-11 in almost-verilog to a "known good" pdp-11 instruction set simulator. The idea is that both the RTL simulation and the instruction set simulator run the same code and at the end of each instruction cycle the results are compared. The "results" are the internal register values, the processor status word and the list of bus operations which occurred (address, type, data).

In a perfect world the two simulations will run in lock step and any deviation is a bug. And this is mostly true. The comparison turns out to be extremely helpful and very valuable.

Again, however, this is not new. I learned this technique from others who are smarter than I am.

While attempting to recreate the pdp-11 I ran into a number of interesting problems. The instruction set is fairly simple but it is not RISC. The effective address computations are complex and in many cases doubled. Let me supply an example:. Here is a list of the 8 addressing modes. A complex instruction can have a source operand (with one of these addressing modes) and a destination operand (with one of these addressing modes). So, in the worst case you need to compute the effective address and do one or more fetches for the source and destination.

mode symbol  ea1     ea2             ea3             data          side-effect                                                                               
0    R       x       x               x               R               x       
1    (R)     R       x               x               M[R]            x       
2    (R)+    R       X               x               M[R]            R<-R+2  
3    @(R)+   R       M[R]            x               M[M[R]]         R<-R+2  
4    -(R)    R-2     x               x               M[R-2]          R<-R-2  
5    @-(R)   R-2     M[R-2]          x               M[M[R-2]]       R<-R-2  
6    X(R)    PC      M[PC]+R         x               M[M[PC]+R]      x       
7    @X(R)   PC      M[PC]+R         M[M[PC]+R]      M[M[M[PC]+R]]   x       
Seems complex, yes? Each M[] is a memory read. The basic register indirect is simple. But modes 6 & 7 add the side effect of reading addition operand data from the next instruction location. This increments the pc as well as fetching an offset which gets added to the result of a previous EA calculation.

So, how to implement this? My first thought was a complex state machine. After a while I got frustrated and thought it might be easier just to make a machine which recodes the old pdp-11 instruction into new "risc-like" instructions on the fly. Sort of a just-in-time binary recompilation. I think this is how modern day X86 machines work. The fun idea would be to have several "machines" running ahead and converting the pdp-11 CISC instructions into simple RISC instructions, filling several FIFO's. The then RISC engine could use modern ideas like a multi-stage pipeline, speculative execution and branch prediction. While very cool, I quickly decided that was more complexity than I wanted at this stage.

I do think, however that it might make sense initially to do a simple "recoding engine" and a simple "risc pipeline". I want to do it and compare the gate count to a state machine version.

So, I set out to do a simple state machine version. I tried to compress the states as much as possible but current feel there has to be a decode state, four states for each operand, an execute state and a write-back state. The four states for each operand can be reduced to a little as one, depending on how the instruction decodes. I tried to eliminate the single EA state for each operand but instructions like:

   mov   @(R5)+,@(R5)+

causes problems. Why? because the value of R5 is incremented twice, once after each EA calculation. If I did the EA and post-increment in one state I needed to special case the increment (to be 2x) if the both registers were equal. And it got to be a big mess. I capitulated, added a state, and reduced the complexity.

I should note here that all pdp-11's, except one, are microcoded. And I can see why.

At some point I do want to try an experiment by adding a pre-fetch unit, keeping at least 3 words available and doing the EA calculations in parallel. The EA calculation will stack up (i.e. stall) queuing up for memory reads, but it has the potential for being more efficient, especially if there is a cache which does burst reads and the line size is at least 8 bytes.

I know this might all sound crazy, but I've learned a lot in the process and almost everything I have learned has been useful in my day job.

January 14, 2009

a used 2g iphone is actually cool

I bought a used iphone 2G model for work. I didn't intend to use it as an actual phone. But, as time wore on, I started playing with it, and (mostly) prying it out of the hands of my 10 and 12 year olds and I've grown to like it.

Oddly, I've yet to activate it. I suppose I will soon, but the idea of spending $75/month is a little painful right now. I guess I'm paying $50/month for my current phone, so maybe it's not that much of an increase.

The most fun part for me has been the mp3 player. I have a lot of music on our home file server, all legally paid for and ripped from CD's. Except for the mp3's I downloaded directly from Amazon (THANK YOU Amazon for selling MP3's!). Anyway, all of the music is legit. And now I can load it onto the phone and listen to it as I walk to and from work. Heh - I've become one of those "young people" who wear headphones. But I swear it's great to listen to music on the way to work.

I may have to upgrade to the 3G version, however, if only for the EDGE and GPS. I think the GPS might come in pretty handy.

I'm a little sad because my sleek little MP3 playing Sony Ericson slide phone is still pretty cool. I didn't intend to cheat. It just happened.

One thing I will mention - using iTunes can be a real pain. It's a very pretty interface but for the first time user it can be pretty confusing. I spent 15 minutes trying to figure out how to get music into it from a file on my hard disk. It should not be that hard.

I should also mention that the apps you can buy (or download for free) from the Apple "app store" are pretty cool. There's a lot of things to browese through. A few too many, to be honest. I would be nice to find some place the reviews the apps and recommends the best ones. Naturally my son (he's 10) immediately downloaded the Star Wars "light saber" app. Even I find it fun after a few coctails.

Even without activation the 802.11 connectivity make it a very useful device. I can browese, get weather, stock info, you tube, send email, etc...

All in all, I like it. And I didn't think I would.

January 8, 2009

pcc (portable c compiler) lives again!

Two interesting things happened this week

- the "R" programming language was talked about in the mainstream press.

- I discovered that the openbsd folks are working on using a non-gcc C
compiler (pcc). Turns out in the non-linux unix world there is not so
much love for gcc.

This makes some sense. gcc is huge and hard to work with. When it
compiles it uses every available resource and eats the machine. It's
not that easy to port or maintain. And it keeps getting bigger.

"pcc" on the other hand is 5-10 times faster, generates reasonable code
and is easy to port and work on. Someone is actually maintaining it.
This is the same pcc some will remember from Bell Labs in the '70s, back
when all the world was a pdp-11 (just before all the world became a
vax).

And, as eco systems go, it's good to have more than one option. Linux
is competely dependant on gcc. Netbsd on the other hand, is not.

(I've been working on getting netbsd to run on a my vax 11/730 again,
so I've fallen back in love with netbsd. Well, I think we're more like
friends with benefits, but please don't tell linux I've been cheating.)

anyway, I thought that was interesting. Apparently the pcc maintainer is
planning to add PIC support to pcc this winter which is one of the
missing features needed.