Acorn
The company Acorn was a computer manufacturer in England. They build in the 1980's the famous BBC computer, Second Processors, the Acorn Cambridge Workstation and some other systems. The most important heritage is the Acorn-RISC-Machine which is today well known as the ARM processor.
Somehow the Series 32000 is part of the ARM genesis. In 1983 Sophie Wilson of Acorn showed that a 4 MHz 6502 could outperform an 8 MHz NS32016. She identified the importance of memory bandwidth. This result leaded Acorn to start the development of the ARM processor. If she would have waited some more years until the arrival of the NS32532 todays smartphone would use a Series 32000 processor - ok Udo, wake up!
32016 Second Processor
The BBC Micro computer was a very successful system in the UK of the 1980's. In Germany it was known under the label of its manufacturer Acorn. It was based on the 6502 microprocessor running at 2 MHz. A lot of websites about the BBC Micro demonstrate that the system is still be loved by its users.
What made the BBC micro model B very interesting was its expansion concept. One could add another processor to the system and the original one became an I/O controller. The other processors were called second processor. They were placed in a separate box, contain their own power supply and were connected via a 40-pin ribbon cable to the BBC Micro. Boxes were build with different processors like the 6502, the Z80 and the NS32016. For me the NS32016 version is of interest and the Figures below show such a system.
I got the photos from the website wouter.bbcmicro.net . Thanks to the provider for his support! If you want to know more about the BBC Micro you should visit this website.
Fig. 1. The outside of the 32016 Second Processor box.
Fig. 2. The power connector of the 32016 Second Processor box.
Fig. 3. The inside of the 32016 Second Processor box.
The processor chip set is only running at 6 MHz. I guess that this decision was made because 10 MHz parts were very expensive. The CPU and FPU of this board are still marked in NS16000 style. The empty space below the CPU (IC3) is reserved for the NS32082 MMU (IC2). Inserting this device could enable the use of Unix. The NS32202 ICU was kept out. Maybe there were not enough interrupt sources to justify the additional cost of the ICU.
The main memory in the right half of Figure 3 is 1 Mbytes build from 32 TI 256-kbit chips. Two EPROMs contain the BIOS (or something like that) called PANDORA. Having its own ROM on board is a clear difference from the coprocessor boards for the PC like the one from Systems/Opus.
I could not identify any PALs. The glue logic is made of TTL chips. Therefore it is a simple system design with one exception.
The exception is sitting in the lower left corner of the PCB. It is the chip from Ferranti which is an ULA = Uncommitted-Logic-Arry. Another name for this kind of chips is gate-array. The customer of a gate-array vendor is defining the function and therefore the connections between prefabricated transistors or gates. The designer from Acorn implemented the complete interface to the host in this chip. The 40-pin ribbon cabe is connected to it. The interface has its own name. It is called Tube. You can find information about the Tube in the english version of Wikipedia. Every second processor had his own Tube. The host had only a simple bus interface at the other end of the ribbon cable occupying 32 bytes of the 6502 address space.
More information about the application of the Tube can be found in the Tube User Guide of Acorn Computers. The schematic diagram of the system can be downloaded here. Please note that IC6 is wronlgy named TBP24510. The correct name is TBP24S10 and the device is a 256 words by 4 bit PROM from Texas Instruments. But it doesn't matter since IC6 is "NF" = "Not Fitted" (see the empty place in the upper left corner in Figure 3).
A nice detail in Figure 3 is the position of jumper LK14. The jumper is labeled "1MB" but it is inserted in the lower position as if there is no 1 Mbytes of DRAM. Compare this to jumper LK12 which is labeled "1MB" too.
After looking around in the internet I believe that Acorn Computers build a lot of these systems. Many of them still exist today. Some were upgraded to 10 MHz, some got an NS32082 MMU. I will try to get one. In January 2016 one was offered at eBay. It goes away for over 400 €! Nice price for a 30 year old system.
Fig. 4. The inside of one of John's 32016 Second Processor box. This one has the NS32082 MMU installed. The photo is available in 4 times higher resolution here.
Fig. 5. A prototyp of the 32016 Second Processor still exists today. The brother of the owner worked for Acorn and took it out of the bin.
The 32016 Second Processor in Figure 5 has 640 kbytes of memory. The upper row of DRAM chips uses 256-kbit devices from Hitachi and the lower row uses 64-kbit devices from Fujitsu. The device with the red label is the NS32016 CPU. IC2 is the NS16081 FPU. The last device with a golden cap is the NS32201 TCU. The chip set runs at 6 MHz. The socket of IC3 is empty. This is the place for the NS32082 MMU.
Acorn Cambridge Workstation
In August 1985 Acorn announced a new computer: the Acorn Cambridge Workstation (ACW). The system design was similar to a BBC computer with a 32016 Second Processor. But the mechanical design was very different. A 12 inch colour or monochrome tube was the center of the system. Below the tube a 640 KB Floppy disc and a 20 MB Winchester was placed. Below the drives the 6502 I/O processor board was placed.
Acorn made a nice advertisement for their new product which can be seen at en.wikipedia.org.
The main memory for the NS32016 was expanded from 1 MB in the Second Processor to 4 MB. This was an expensive extension. The price of the basic configuration of the ACW with 1 MB and software was £5845. Each additional MB costed £1000. One MB was made of 32 256k-Bit DRAM devices. In 1986 I bought such chips for my computer and I payed 7.70 DM plus tax for one device ...
In August 2016 Ed visited The Centre for Computing History in Cambridge/England. There he saw a functional (not completely) ACW. He took many photos and some of them are presented here. It must have been a nice and interesting trip - something to remember if I will be in England. Thanks to Ed for his contributions to this website!
Fig. 6. The Acorn Cambridge Workstation - a compact and powerful computer.
Fig. 7. A look inside the Acorn Cambridge Workstation.
Fig. 8. The Acorn Cambridge Workstation uses an interesting mechanical design. The boards on the sides can swing open to get an easy access.
Fig. 9. Below the monitor tube is the 6502 I/O processor board placed.
The drawback of the compact design of the machine must be heat transfer. The tube alone generates a lot of heat. In Figure 9 I can see one little fan in blue colour. The location is other than ideal. Is this then sufficient? I believe that the outside of the case must be warm (if not hot) during operation.
Fig. 10. The working horse of the Acorn Cambridge Workstation - the NS32016 board with 4 MB of main memory.
The speed of the NS32016 was limited to 8 MHz instead of the possible 10 MHz. There could be different reasons for this decision:
- cost: 8 MHz parts are less expensive than 10 MHz parts,
- power: 8 MHz parts use less power = less heat,
- design: driving 128 DRAMs with no wait states is easier at 8 MHz compared to doing the same at 10 MHz operation.
Maybe for Acorn all of the above points favored the 8 MHz option. Of course it means less performace for the customer.
Fig. 11. A detailed view of the NS32016 CPU cluster of the Acorn Cambridge Workstation.
The NS32016 CPU (below the label TESTED), the NS32081 FPU and the NS32201 TCU use heat sinks. This is no mistake even for 8 MHz parts which may get hot in this environment. The DM81LS95AN device is an octal buffer with tri-state outputs.
Fig. 12. A detailed view of the Winchester controller. Interesting that it is split in two boards.
On the left board in Figure 12 an Intel P8085AH microprocessor can be found. Together with the Adaptec chip set it forms an intelligent controller for the Winchester hard disk drive. An intelligent controller takes away some load of the 6502 I/O processor. The main job of the 6502 is the alphanumeric and graphical interface to the user of the system.
Fig. 13. Good news at the end: the old ACW is still running in 2016!
32016 Second Processor based on an FPGA and M32632
Shortly after I got knowledge about the 32016 Second Processor the idea came up to implement the system in an FPGA together with the M32632. Quickly I learned that such systems already exist for other processors. In October 2015 I came in contact to John Kortink who turned out to be the ideal partner for this project. He has all the old BBC hardware and software and some 32016 Second Processors. In addition he has build 6502 and Z80 Second Processors with the same FPGA hardware that I use, the DE0-Nano from Terasic. His homepage is www.zeridajh.org.
Because of the distance of 900 km between us we decided to share the work. I build the FPGA content, sent him the FPGA image and he test it. This procedure started in November 2015. To my great disappointment nothing worked except for the welcome message.
In December 2015 John found out that the BASIC interpreter bas32 uses a forbidden instruction: the opcode MEI (multiply extended integer) with an odd register as destination. Nevertheless the NS32016 produces something useful (it switches the register usage) but the NS32532 generates nonsense. It is therefore possible to make the M32632 compatible to both the NS32016 and the NS32532.
Beginning of January 2016 I found a bug. John and I hoped that the job was done. But again programs running under PanOS were crashing. At last on 30 January 2016 I found after two weeks of debugging another bug in M32632. I corrected it and this time the e-mail of John said "!!!!! EVERYTHING WORKS !!!!!". This was a great moment.
There are two instruments which are very helpful in finding bugs. One is to do a comparison of the content of the main memory between the original machine and the new machine. The other is to build in the FPGA special hardware to track a certain behavior of the CPU. For example to see what the CPU is writing into selected areas of memory and what is the program counter address in this case.
Fig. 14. John's BBC Master and the DE0-Nano containing the M32632 connected to it.
Figure 14 shows the hardware for the project. The BBC Master based on an enhanced 6502 microprocessor is huge: 46.7 cm wide, 34.5 cm deep and 7.5 cm high. The DE0-Nano only measures 7.6 cm x 4.9 cm. John offers at Soft 32016 Second Processor the configuration files for the DE0-Nano. He sells also the interface board. The M32632 is running at 50 MHz and the SDRAM is running at 100 MHz.
The last test of the FPGA hardware was the Linpack benchmark - see also Linpack at M32632/Performance. After modifying the old Fortran source code of the 90's (Fortran 95) to the much older Acorn Fortran compiler of the 80's (Fortran 77) the program runs perfect. Aside from a small change from dfloat() to dble() most of the effort went into the time measurement. The 32632 Second Processor (as I like to call it) with active cache achieves around 1.3 Mflops. Running without cache it performs at 450 kflops. What do you think is the number for the original machine running at 6 MHz? 50 times less - just 26 kflops.
The assembler output of the Acorn Fortran compiler is not so fast compared to the Gnu Fortran compiler. The same hardware achieves on a NetBSD system 2.163 Mflops. For example Acorn uses the opcode CXP (Call External Procedure) for a subroutine call instead of the faster opcode BSR (Branch to Subroutine).
But every running Fortran compiler for Series 32000 is remarkable. And don't forget that this system has also a C compiler, a Pascal compiler and even a Lisp compiler! This machine and its software must be preserved!
Downgrading the M32632 to NS32016
In January and February 2016 a small team of enthusiasts in England tried to realize a 32016 Second Processor in a very small FPGA. The hardware, which is called Matchbox-LX9 (see Figure 16), already existed and was used for some other Second Processors. It is based on the Xilinx FPGA XC6SLX9 which is a member of the Spartan-6 family. Its capacity is 16 DSP blocks, 32 RAM blocks and 5,720 LUTs. Main memory of the Matchbox-LX9 is 2 MB SRAM. A schematic and PCB layout of the board can be found here.
The 6-input LUT of the Spartan-6 is more powerful compared to the 4-input LUT of the Cyclone-IV. But it was obvious that there are not enough LUTs to implement an M32632 with all features. The first feature to left out was the MMU because the original hardware didn't have the NS32082 MMU. The second feature to eliminate was the cache. This will slow down the CPU. But with a fast 32 bits wide asynchronous SRAM as main memory the M32632 will be faster than the original.
Unfortunately this was not sufficient. The only chance to succeed was to kill the FPU. This is sad because some programs will not run. At the end the downgraded M32632, the Tube, a ROM and the SRAM interface need all of the DSP blocks, 22 RAM blocks and 5,138 LUTs. A detailed description of the modifications in the source code of M32632 can be found here. The modified source code is available at github. The CPU runs at a clock frequency of 32 MHz.
With all changes the M32632 is very similar to the NS32032 CPU which is the 32-bit bus version of the NS32016. But during software testing it turned out that the modified M32632 is still better than the NS32016 without the NS32081 FPU. The opcodes MOVF and MOVL (move floating point float/long) are still functional in the M32632. This allows PanOS to run.
There is a website available to read more about the activity of the team: www.stardot.org.uk/forums
This was the first system which only uses the I/O interface of the M32632 for all accesses. Therefore a bug was found in the I/O controller. But now I believe that the M32632 is free of simple bugs ...
Fig. 15. Another BBC Master together with the Matchbox-LX9 FPGA board.
Fig. 16. A very close look at the Matchbox-LX9 FPGA board. A second ISSI SRAM is on the underside.
Fig. 17. Proud moment: the screen shot of the successful operation of the downgraded M32632.
Performance Comparison
CLOCKSP is a common benchmark in the BBC Micro world. The results are in comparison to a 2 MHz 6502 processor. There are two sets of numbers. One is for bas32 without floating point hardware support and the other is for bas32f which uses the FPU. This benchmark is used here to compare different implementations of the Series 32000 architecture.
Fig. 18. The bas32/bas32f results for the 32016 Second Processor: 6 MHz and no cache. Thanks to John for the numbers.
Fig. 19. The bas32/bas32f results for the DE0-Nano: 35 MHz and active cache.
Fig. 20. Different bas32/bas32f results from John for 35 MHz and new results for 50 MHz.
Fig. 21. Results from John for Linpack for 35 MHz and 50 MHz.
Fig. 22. The bas32 result for the Matchbox-LX9: 32 MHz and no cache.