Find out what I'm doing, Follow Me :)

Tuesday, November 29, 2011

Intel Xeon Processor Review

In the beginning of 2002 simultaneously with a release of the Pentium 4 Northwood Intel launched a Xeon based on the Prestonia core (similar to the Pentium4's one) - a new version of its CPU for dual-processor workstations and servers of the entry level. Before that there were almost no previous Xeon versions (with the Foster core) which also supported dual-processor configurations. And now when the Xeon clocked at 2.2 GHz arrived at our lab everything has fallen into place...

Pentium 4 and Xeon: what's the difference?

As you know, the Xeon Foster is called "Willamette with SMP support", and the Xeon Prestonia is known as "Northwood with SMP support". These CPUs are based on the same core. Besides, the pairs of "Foster - Willamette" and "Prestonia - Northwood" are based on the same fabrication process and have the same L2 cache size. The processors have different sockets, but if you look at the signal assignment diagram the pins of the Socket 603 processor will show that most of them control either power supply or ground, i.e. they are not significant.
The Xeon based on the P4 core has almost no new important things except the Hyper-Threading technology. It just acquired some functions of the Pentium III Xeon (Slot 2). Let's examine them.
Processor Information ROM (P.I. ROM). It contains such data as electrical characteristics of the core and L2 cache, processor stepping, e-signature etc.
Scratch EEPROM - this chip is offered by Intel to OEM companies who can record there whatever data they wish. Besides, it can be used by the system to store data on a computer, processor, default settings etc. This ROM is a universal solution which is available in any Xeon based system.
Machine Check Architecture (MCA) is a processor subsystem which searches and logs faults in operation of the system logic. It controls faults of 5 basic subsystems: external and internal bus, cache, Translation Look-aside Buffer and Instruction Fetch Unit. The MCA can be used in different cases: for example, information on failures can be read by the server OS.
Hyper-Threading is an Intel developed technology of performance increasing in multitask systems which makes possible to create two logical CPUs on one physical processor by parallel implementation of two threads which use simultaneously different processor units (e.g., ALU and FPU). This technology appeared first in the Prestonia based Xeon and will probably be supported in future Intel's server processors.
It will become clear what the developers of the Xeon pursued if we look at these three units. Desktop users may consider them unnecessary, but we are talking today about the sphere where reliability and compatibility are the most important things. And superfluity is more desirable than insufficiency. For example, you can install a processor of the new revision onto an old mainboard. But what if it won't work? What if the BIOS won't be able to define power supply correctly? Taking into account the price of such system, it's better let the system use information from the P.I. ROM. In fact, all these functional units are designed to prevent any failures of such an expensive system. It becomes clear what's the key difference between the Pentium III-S and Athlon MP on the one hand, and Xeon on the other hand. The first two are, in fact, modifications of desktop processors for SMP systems; the Xeon contains units typical of powerful server CPUs. In is interesting, though, that it is P-III-S of Intel which is positioned as a server processor, and the current versions of the Xeon are meant for workstations.

Intel i860 chipset

i860 chipset on the Supermicro P4DC6+. From left to right: Intel 82806AA PCI 64 Hub (Intel P64H), Intel 82860 Memory Controller Hub (MCH), Intel 82801BA I/O Controller Hub (ICH2)
Although the i860 processor is meant for high-performance workstations it seems to be more suitable for servers. It is, in fact, a i840 for SMP processors with the Pentium 4 core. The chipset uses the same Accelerated Hub Architecture, works only with RDRAM and has a dual-channel memory controller (RIMM modules are installed in pairs). There is also 64-bit PCI support via a separate Intel 82806AA PCI 64 Hub (Intel P64H).
The Intel i860 is meant for the Intel Xeon processors running at 100 MHz of the bus (it has quadruple bandwidth which is equivalent to 400 MHz). The chipset consists of three main components:
  • Intel 82860 Memory Controller Hub (MCH);
  • Intel 82801BA I/O Controller Hub (ICH2).
Apart from these two, it's possible to use additional chips to extend chipset's capabilities. Here are key parameters of the i860:
  • operation with 1/2 Intel Xeon processors;
  • up to 2 GBytes RDRAM supported;
  • support of AGP 4X, Ultra DMA/100/66/33, LPC (Low Pin Count) interface and 4 USB ports (v1.1);
  • integrated Intel 10/100 Mbps network controller.
Other standard functions such as ACPI, Suspend-to-RAM/Disk, Wake-on-LAN are also supported.
The branched structure of the chipset supports 6 (!) PCI buses. All of them are marked on the diagram of the chipset (or rather of the Supermicro P4DC6+ mainboard based on this chipset). After installation of additional cards and start-up of the ICDiag utility ( all 6 PCI buses could be seen.
The system components (SCSI controller, PCI 32-bit and 64-bit slots) are related to different buses which means that data flows are well separated. Such architecture is very important for a system where data are actively exchanged between components... As such activity is vital for servers we named the i860 a server chipset.
It is interesting that the AwardBIOS V6.0 Medallion was never used either in "heavy" boards or in Supermicro products. However, the CMOS Setup menu is made in style of Phoenix Technologies which is typical of server solutions from Intel and Tyan.
After the problems with the i840+SDRAM Intel wasn't going to couple the i860 with SDRAM. But it wasn't necessary anymore: the prices of RDRAM fell down to the acceptable level and usage of the PC133 SDRAM might kill the performance of the Xeon processors. The rated 2 GBytes of the memory can be lifted up to 4 GBytes by using the Intel 82803AA MRH-R chip (Memory Repeater Hub -- RDRAM) installed on the mainboard between the Memory Controller chip and two memory channels.

Supermicro P4DC6+ mainboard

Supermicro has 3 dual-processor models on this chipset for the Socket 603: P4DCE, P4DC6 and P4DC6+. All of them are based on the common design in the Extended ATX format, support up to 2 GBytes RDRAM (4 slots for RIMM), have the same set of slots (AGP 4X/Pro 1.5 V, 2 PCI 64-bit/66 MHz slots and 4 PCI 32-bit/33 MHz) and an integrated network adapter - Intel 82559. The P4DCE doesn't has a SCSI controller, the P4DC6 comes with an integrated Ultra 160 SCSI chip Adaptec AIC-7899W, and the P4DC6+, in addition to the SCSI controller, has a SO-DIMM connector for an Adaptec 2005S card - it is an inexpensive Zero Channel RAID controller (ZCR) which uses an integrated SCSI chip.
One more difference is that the P4DCE and P4DC6 have VRM units designed as separate daughter cards. At the same time, the Supermicro P4DC6+ which we used for testing the Intel Xeon has both VRMs onboard.
The layout of the P4DC6+ is very good. There are several jumpers to disable the integrated network and SCSI controllers or to set a frequency of the 64-bit PCI segment (33/66 MHz). Graphics is not integrated (which proves once again that the board is designed for workstations). Usage of 1 8-pin or 2 4-pin and 8-pin additional power supply connectors is a distinguishing feature of the Supermicro line based on the i860. Besides, as I mentioned above, it's unusual to see the AWARD BIOS on the Supermicro's board.
The P4DC6+ ships with two coolers for processors (with vertical fins and a fan attached on one side); they are quite noisy (4700 rpm, 16.2 CFM) but efficient. There are also cables (FDD, IDE, 50-pin one and 68-pin SCSI), a bracket for the rear computer panel, a CD-ROM and diskettes with drivers and a comprehensible user manual.
It is known that there are Iwill and Tyan which also released their boards on the i860 chipset, but they are not widely available yet.

Test system configuration and testing technique

We happened to get the new and the old Intel Xeons almost simultaneously, and thus we got a line of the processors clocked at 1.7 GHz (Foster), 1.8 and 2.2 GHz (Prestonia). As a result, we were able to test the top Xeon and compare performance of the Foster and Prestonia running at close frequencies. We tested the dual-processor systems as the SMP support is a distinguishing feature of the Xeon. However we also included a uni-processor configuration with the Pentium 4 2.2 GHz as it was interesting to take a look at its performance in comparison to the Xeon with a similar frequency and core.
The dual-processor system with Athlon MP 1900+ based on the Tyan Thunder K7 board was used as a competitor. Besides, we included the Pentium III-S 1.26 GHz based system (Tualatin, 512 KBytes L2 cache) and ServerWorks ServerSet III HE-SL based boards. You might be surprised as the chipset has awful driver support (in particular, the AGP didn't work and it was impossible to use the boards for graphics stations). Later Tyan removed an AGP port in its new revision of the ServerSet III HE-SL based board - Thunder HEsl-T.
However, Microsoft was able to make its Windows XP (or, rather, Windows 2000 starting with Service Pack 2) "understand" this chipset. And after that the AGP port on the ServerSet III HE-SL started to work flawlessly, and the platform became a normal solution for graphics stations. So, the board was included in the tests - after all, it's not forbidden to use the server Pentium III-S in workstations - it's just necessary to have an appropriate motherboard. And we do have them - Supermicro left an AGP port in the P3TDE6 board and we took it for the tests. The OS we used was Windows XP Professional.

Test system configurations

CPUIntel XeonIntel Pentium 4Intel Pentium III-SAMD Athlon MP
Frequency1.7 GHz1.8 GHz2.2 GHz2.2 GHz1.26 GHz1900+ (1600 MHz)
FSB frequency, MHz400400400133266
L1 cache, KB16161632128
L2 cache, KB256512512512256
ChipsetIntel i860Intel i850ServerSet III HE-SLAMD-760MP
MainboardSupermicro P4DC6+Intel D850MDSupermicro P3TDE6Tyan Thunder K7
Memory512 MBytes PC800 RDRAM512 MBytes Reg'd PC133 SDRAM512 MBytes Reg'd DDR SDRAM
Video cardNVIDIA GeForce3 (ASUS V8200, 64 MBytes DDR SDRAM, Detonator 21.85)
Hard discSeagate Cheetah X15 36LP, 36,4 GBytes, Ultra 160 SCSI
OSWindows XP Professional
So, today we have gathered top processors for different platforms. All the systems were equipped equally (see the table): 512 MBytes memory of the corresponding type, a 36 GBytes SCSI Seagate Cheetah X15 36LP HDD and an AGP video card on the GeForce3 (the GF3 copes rather well with the professional OpenGL used in workstation tasks).
We used our standard performance estimation method for tough systems with some additions, and applications of the following classes:
  • operation with 2D graphics (script for Adobe Photoshop 6.0.1);
  • 3D modeling - 3D Studio MAX 4.26, Lightwave 7b and A|W Maya 4.0.1;
  • 3D virtualization with the professional OpenGL (SPEC ViewPerf 6.1.2);
  • DivX and MP3 encoding (DivX 4.12 and GOGO-no-coda 2.39c), archiving (WinAce 2.11 with a 4096 KBytes library).
Besides, we included two CAD tests with design engineering applications SolidWorks 2001 and Solid Edge V10. We enabled standard SPECapc tests for SolidWorks 2001 and SPECapc for Solid Edge V10.

Test results

As you remember, Xeon "Foster" hasn't been widely available on the market. It shipped just to major assemblers such as Compaq and Dell. And now, when we have its test results in front of us, it is clear why Intel didn't hurry to promote the Xeon Foster. Its rather low performance could have spoil the reputation of the new family, that is why its promotion was put off until the Xeon Prestonia - a 0.13-micron core with a doubled cache (whose size matters much for the Pentium 4 core) and with a high performance.
The results of the dual-processor systems are shown above those of the uni-processor ones and are of the same color. Therefore, the second column for the Pentium 4 2.2 GHz is lacking.

Video and audio encoding, archiving

The DivX is a very memory-intensive test, and the memory becomes a bottleneck here (all the P4 based processors have the same results although they work at different frequencies - the maximum gap is 500 MHz). As a video card can't influence conversion from one format to another, there is only dual-channel PC800 RDRAM or the hard drive which can limit the operation. But the results of the Athlon MP and Pentium III-S deny the fault of the HDD. Besides, as you can see, we used both uni- and dual-processor systems although the DivX codec doesn't support SMP. It is interesting that while SMP helps dual systems based on classical cores - followers of the Pentium Pro architecture (Pentium III-S and Athlon MP), it puts obstacles in the way of the Xeon (P4 core)!
Here the leading group consists of Pentium 4 2.2 GHz (uni-processor), Xeon 2.2 GHz and Athlon MP. The junior Xeon keeps up with the Pentium III-S which means that the old horse won't damage a furrow, unlike the young horse.
The WinAce results are quite interesting (the data for the uni- and dual- processors systems are identical): it is the first time when the Athlon MP loses to all. There is just one suggestion: if the WinAce uses SSE, it could failed to find its support in the Athlon MP; it sometimes happens if a program is written badly. But it's just a suggestion.

3D modeling programs

Almost everywhere the Xeon 2.2 GHz system goes ahead (both uni and dual) except the 3D Studio MAX 4.2 where it goes on a par with the Athlon MP 1900+. The latter shares the second place with the Xeon 1.8 GHz in the other tests sometimes outscoring it. It is interesting that in the LightWave 7b the Athlon XP/MP lose to the Pentium 4, though in the LightWave 6.5 it was vice versa. However, NewTek (the developer of the packet) says the the 7th version was changed much with regard to the peculiarities of the Intel's architecture. Besides, the Pentium 4 2.2 GHz goes shoulder to shoulder with the uni-processor Xeon 2.2 GHz and this fact proves that their cores are very similar. The Pentium III-S fights successfully against the Xeon "Foster" 1.7 GHz except the LightWave (when a scene is rendered without ray-tracing).

Raster graphics

The Adobe Photoshop prefers the Pentium III and Athlon MP to the new Intel's core although there are modified filters supporting SSE2 for the Pentium 4 based processors. But the new Intel processors were able to win at the expense of a frequency - the dual Xeon 2.2 GHz outscores the dual Athlon MP though by a little margin. On average, (if we used the complete set of instructions and filters of the Photoshop) the SMP doesn't allow for a big gain. However, the most of filters of this editor are plugins which are developed not only by Adobe. Such promiscuity of the implemented code doesn't let us hope for a considerable SMP optimization. Some modules are able to use the second processor, that is why there is a gain, but there are not many of them, that is why the gain is not big.

SPEC ViewPerf

Here we are showing the results of only uni-processor systems as the dual ones had the same or even lower scores.
AWadvs-04. All the systems are on one level except the Pentium III-S in combination with the ServerWorks ServerSet III HE-SL which falls behind by 22%. The PC133 memory is not enough for applications which use intensively OpenGL and texturing. The PC2100 DDR SDRAM (Athlon MP + AMD-760MP) is able to bear the load though it has a smaller bandwidth than the dual-channel PC800 RDRAM in the i850/i860 systems.
DX-06. We don't display the results of this test though it was carried out. The matter is that the peculiarity of the IBM Data Explorer core (which the DX-06 is based on) noticed in examining of the Pentium 4 Northwood appeared on the scene once again: all CPUs on the P4 core with 512 KBytes L2 cache turned out to be slower than those whose L2 cache was equal to 256 KB. But such situation didn't repeat anywhere apart from DX-06, that is why we decided not to use the test for estimation of performance of the Northwood/Prestonia based processors as there is some error in the program that affects their results.
DRV-07. This subtest depends on a core performance, memory efficiency and a speed of the 3D accelerator. The first place is taken by the Northwood/Prestonia based system, and the difference between them is not great (it seems that the video card is a bottleneck). The Xeon "Foster" 1.7 GHz falls behind (because of a small L2 cache). And the Athlon MP 1900+ lags behind as the throughput of the PC2100 DDR SDRAM (2.1 GBps) is much lower than that of the dual-channel PC800 RDRAM (3.2 GBps). The Pentium III-S has lost because of the lowest core speed and the slowest memory.
MedMCAD-01. Almost all systems are limited by the graphics accelerator's performance, and the Pentium III-S with its PC133 falls behind by a great margin.

CAD applications (design engineering)

Although we used two packets from different developers, it makes no sense to comment the performance results separately for the SolidWorks 2001 and for the Solid Edge V10. As we used tests from SPEC in both cases (Standard Performance Evaluation Corporation) the categories for performance estimation are used the same in both cases. We chose three of them: Composite Score, Graphics Score and CPU Score.
Composite Score. In both packets the first places are taken by the Pentium 4 2.2 GHz (in the uni-processor class), Xeon 2.2 GHz and Athlon MP, while the Xeon 1.7 GHz and Pentium III-S are at the tail-end. The Xeon 1.8 GHz keeps ahead in the SolidWorks, and in the Solid Edge there is almost no difference between the 1.7 and 1.8 GHz Xeon although the second has a twice larger cache. By the way, SMP optimization is almost lacking: the speed in dual systems is just a little higher.
Graphics Score. The difference between the systems on different CPUs is the most considerable here! The video card was used the same - and the speed of the graphics system differs as much as twice. However, the leaders remain the same (they even enlarged the gap): the Pentium 4 2.2 GHz, Xeon 2.2 GHz and Athlon MP.
CPU Score. The Solid Edge prefers the classical architecture - the Pentium III-S outscored even the faster Athlon MP in performance of the computational system (but keep in mind that the P-III-S has a twice larger L2 cache). Only the Xeon 2.2 GHz was able to outperform it. The SolidWorks 2001 has different preferences: Pentium 4 2.2 GHz, Xeon 2.2 GHz and Athlon MP.


The new processor Intel released for dual-processor systems is rather successful. Now there is one more highly efficient processor from Intel like Pentium 4 (which supports SSE/SSE2, has a large L2 cache and doesn't dissipate too much heat), and now also supports SMP. There is also a very powerful Intel i860 chipset which allows assembling modern and balanced systems, and there are high-quality and powerful boards on it like Supermicro P4DC6+. When the Intel Xeon will start shipping it will be much simpler to choose a platform for high-performance workstations.
But you should be aware of the fact that it makes no sense to build uni-processor systems on the Xeon. Performance of uni-processor computers on the Xeon and Pentium 4 is the same, while the Xeon+i860 platform is much more expensive. At the same time, a dual-processor system based on the Xeon is almost twice faster as compared with the Pentium 4 "Northwood" of the same frequency.
The server Pentium III-S 1.26 GHz looks paler than the Xeon though in some applications this platform goes on a par with the Xeon 1.8 GHz. The Athlon MP has a comparable performance level with the Xeon, and many mobo makers have already announced support of the dual Athlons and are developing AMD-760MPX based boards but currently the market can offer only Tyan's boards.
The Xeon with the Pentium 4 core looks rather promising. This year the company will release new versions working at the FSB frequency of 533 MHz, and chipsets supporting DDR memory. Besides, in the first quarter of 2002 we will get a new version of the Intel Xeon equipped with L3 cache and supporting 4-processor configurations. 

No comments:

Post a Comment