1.gif (1563 bytes)
2.gif (706 bytes)
3.gif (453 bytes)
4.gif (1472 bytes)
5.gif (318 bytes)
6.gif (1886 bytes)
marketing2
7.gif (394 bytes)
Home Links   Site map
State of the Opteron

A time for reflection
As the new year starts to get up to speed, it's a time for reflection on where
we are, where we've been, and--most importantly for the technology
industry--where we're going. 2004 brought a multitude of changes to the
technology landscape, and nowhere was that more prevalent than in the CPU
industry.

This time a year ago, AMD was the underdog, with lots of expectation riding on
the success of its "Hammer" line. With the workstation and server markets
clearly in mind, AMD intended to make good on its promise to shareholders to
increase average selling price (ASP), and nowhere are margins better than in the
workstation and server market.

With that in mind, Geek.com thought it might be a good idea to see exactly where
we are today in terms of Opteron's capabilities, a sort of "State of the
Opteron" review. Along the way we tried to compare AMD's finest with
contemporary offerings from Intel's P4 and Pentium-M lines, as well as AMD's own
Athlon 64 line. To give things a sense of "where we've come from" perspective,
we're adding an Athlon MP system to the mix as well.

The Contestants
In the AMD corner, we have systems from Appro, Monarch Computer Systems, and the
our own personal collection. In the Intel corner we have systems from
longtime Intel standby Dell. Let's meet the contestants!

Starting at the bottom rung of the Opteron ladder is the dual-Opteron 244
workstation, running with 2 GB of DDR400 ECC memory on a Tyan K8W motherboard.
Storage is provided by two Maxtor 250 GB 7200RPM SATA units using the onboard
SATA controller with RAID1 mirroring. Video is provided by a Quadro4 980 XGL
professional OpenGL video card. This system is used regularly for multimedia
production involving 3D rendering, video compression, and DVD authoring. Gigabit
Ethernet is standard.

Next up is the Appro Scorpion system. Sporting dual-Opteron 248 CPUs, 2 GB of
DDR333 ECC memory, and a Tyan K8W motherboard, it put up some formidable scores
and is back for an encore performance. Storage was provided by a single Maxtor
20 GB ATA-100 unit, although the Scorpion chassis can be ordered with gargantuan
SCSI RAID systems if needed. Video came from the popular Quadro4 980 XGL.
Gigabit Ethernet is standard.

New to the list of reviewed systems is the Monarch Computer Systems Empro
Workstation line. Built on the ubiquitous Tyan K8W platform, the system housed
two Opteron 250 CPUs and 2 GB of DDR400 ECC memory. Adding the only SCSI
subsystem to this comparison, the Empro came with four 73 GB Seagate Ultra320
SCSI drives set up in a RAID 0+1 config. The system came bundled with a Quadro
FX 4000 professional OpenGL card, but was replaced with the Quadro4 980 XGL for
testing. Gigabit Ethernet is standard.

Taking a step back from the ultra high-end, high-budget market, we have the
Socket 939 Athlon 64 3200+ "Winchester" CPU. The processor nestles snugly in an
MSI K8 Neo2 Platinum motherboard and plays nicely with a pair of matched Corsair
1 GB DDR400 (a.k.a. PC3200) XMS LL modules. The disk subsystem consists of a
pair of Maxtor 250 GB 7200 SATA units set up as JBOD. Typically this system is
overclocked, and has the GeForce FX 6800 Ultra inside for gaming. However, for
the purposes of reviewing, the system was put back at its stock clock and
outfitted with the now-familiar Quadro4 980 XGL. Gigabit Ethernet is standard.

Last on the AMD list comes the Athlon MP setup, although it's like no Athlon MP
AMD ever sanctioned. Built from two Athlon XP-M "Mobile Barton" CPUs and
utilizing UpgradeWare's XP-TMC multiplier controller, the machine has both
watercooled CPUs running at 2266MHz on an MSI K7 Master-L motherboard. The
platform shows its age, however, by being stuck with 2 GB of DDR266 ECC memory,
lacking SATA entirely. Not to be dissuaded, SATA has been added in the form of a
PCI-based SIIG SATA adapter and two Maxtor 250 GB 7200RPM SATA units in a RAID1
configuration. Video was provided by the Quadro4 980 XGL. Gigabit Ethernet is
not available on the motherboard, with Fast Ethernet being standard.

Leaving the AMD camp, we come upon some of Dell's finest offerings. Starting at
the top we have a Dell Precision 370 workstation. Featuring a 3.2GHz "Prescott"
core Pentium 4 in LGA775 form factor on an i925 chipset motherboard, it
represents the mid-range of available Intel uniprocessor setups. The reviewed
system included 2 GB of the latest DDR-2 533 memory technology. Storage is
provided by a built-in SATA controller hooked up to two Maxtor 200 GB 7200RPM
SATA drives configured as JBOD. Since the machine features PCI Express, the use
of the Quadro4 980 XGL was not an option for video, so a Quadro NVS 280 PCI-E
video card was used instead. Gigabit Ethernet is standard.

For reasons that shall be explained near the end of the review, we decided to
throw the odd Pentium-M into the mix as well. Coming in the guise of a Dell
Inspiron 8600 laptop, the Pentium-M features a 2 MB L2 cache and runs at 1.6GHz.
Rounding out the unit is a solid 1 GB of DDR333 in two SODIMMs. Storage is
handled by the unit's onboard ATA-100 IDE controller and a single Seagate 60 GB
ATA-100 5400RPM drive. Onboard LAN is limited to Fast Ethernet.

The Tests

You won't find any PC productivity benchmarks in this review. Nor are you going
to find any gaming benchmarks. Why? Quite simply, most of the above workstations
will run you considerably north of US$3,000 to acquire. These systems aren't
designed for use by executive assistants or twitch gamers, they're designed to
do one thing very, very well: crunch numbers.

You also won't see any server-based benchmarks here. Although some of the chips
in this review are used frequently in servers from various vendors, the point of
this review was to look at workstation CPUs as exclusively as possible,
narrowing down the variables wherever possible. Tests like TPC benchmarks are
very I/O-dependent and rely as heavily on disk performance as they do on raw CPU
speed.

As you will see, the focus of this review was to find out just how much, if any,
improvement can be found in the Opteron of today to the exclusion of just about
everything else. Our benchmarks have been selected accordingly.

First up is one of the most popular applications for high-end 3D rendering on
the Windows platform: 3D Studio Max. 2004 has seen the release of version 7 of
this platform, although its late introduction meant our benchmarks must instead
be run with the prior version, 6.0. Service Pack One for Max 6 has been applied,
and all benchmarks have been run with their default options directly from the
sample CD. 5 runs of each benchmark were executed, the highest and lowest
dropped, and the remaining three averaged. Note that only rendering benchmarks
were conducted; this isn't an OpenGL video card review, it's a CPU review.

Following on the 3D graphics bandwagon is the free POV-Ray raytracing rendering,
found at POVRay.org. Although not as easy to use as 3D Studio Max, POV-Ray's
price can't be beat, and its renderer is very advanced. In skilled hands it can
produce content as good as or better than packages costing thousands of dollars.
And, just like 3D Studio Max, it's a CPU hog, making it a perfect addition to
our test suite. Benchmarks were run with the included benchmark.pov test run
with the default configuration.

When people aren't slinging 3D renderings around inside their workstations,
they're very likely crunching some other types of numbers. To that end, we used
the free distributed.net client. The client participates in a number of
computational "challenges" maintained by distributed.net. The two current
projects are cracking the RC5-72 encryption algorithm and computation of 25 mark
Optimal Golomb Rulers (OGR-25). Both projects strain a processor to the utmost,
and are sometimes used as burn-in tools for new systems. Benchmarks were run
using the "benchmark all projects - selected cores" option.

In a similar vein, mathematicians take advantage of the uniqueness of the number
"pi" to similarly strain modern CPUs. Pi is an irrational number, meaning it
cannot be expressed as a fraction and has an infinite number of non-repeating
decimal places. Computing pi to absurd lengths has been a staple of computers
for a while, and we used the freely-available SuperPi benchmark to time how long
it took to compute pi to anywhere between 256,000 decimal places all the way up
to 32 million decimal places. SuperPi also stresses the memory subsystem,
working out the entire gamut of CPU/memory interaction.

There is a long list of other benchmarks we would liked to have run, such as
SiSoft Sandra's CPU suite, Sciencemark's offerings, or perhaps a competing 3D
package like Maya. Time limitations on donated systems kept the list somewhat
short.

3D Studio Max test

SINGLEPIPE2

The SINGLEPIPE2.MAX benchmark makes use of a great deal of raytraced reflections
by putting a mirrored teapot in the middle of a room covered with a varied
checkerboard of mirrored and opaque squares. The scene has very simple geometry
and stresses the CPU more than the memory subsystem.

The biggest surprise on the graph should be immediately apparent. The
overclocked Athlon MP system actually beats the dual-Opteron 250 setup, albeit
by a statistically insignificant one-second margin. The results of this test
were run multiple times to make sure there wasn't a flaw somewhere, but the
numbers stayed consistent. The staying power of this platform is simply amazing
given its age, although this test is largely CPU-bound and does not stress the
memory subsystem very much. This will be a factor later.

The single-CPU P4 shows itself to be a remarkable performer as well, coming in
only 64% slower than the dual-2.4GHz Opteron setup, despite having only a single
CPU. The 800MHz clock rate advantage coupled with DDR-2 memory provides a great
deal of computing power.

Similarly, the Athlon 64 3200+ gives a good accounting of itself despite having
half the CPUs of the leader. It took 75% longer than the dual-Opteron 250 combo
to complete the benchmark, despite being 400MHz slower than a single 2.4GHz
Opteron.

Predictably, the Pentium-M comes at the tail end, but it's not as crushing
defeat as one might expect. Running at a measly 1.6GHz and saddled with DDR333
RAM, the P-M puts in surprisingly good scores, taking a little more than twice
as long to complete the benchmark as the dual-Opteron 250 setup. When you
consider the differences in clock rate, the P-M is actually an amazingly good
performer.

UNDERWATER

The UNDERWATER.MAX benchmark makes use of Max's multipass
rendering feature, with heavy emphasis on raytraced reflections
and refractions.

This graph looks suspiciously similar to the SINGLEPIPE2, but
the OC'd Athlon MP setup doesn't pull an upset this time.
Instead, the pair of Opteron 250s takes the crown decisively.
Note the scaling of the Opteron line, as performance is almost
linear with clock speed increases. This will become even more
apparent later when the uniprocessor benchmarks come into play.


MAX5RAYS

Developed originally on 3D Studio Max R5, this benchmark remains
useful even on later versions. It makes use of Max's volumetric
lighting function to calculate light rays shown through text.

The benchmark runs were almost heartbreakingly fast, so fast
that some of these systems almost took longer to load the
benchmark than they did to render them. The graphs are full of
upsets, with the single Athlon 64 3200+ overpowering the
dual-Opteron 244 setup, and the OC'd dual-Athlon MP setup
beating the dual-Opteron 250 setup by a good margin. So odd were
the numbers that everything was re-run multiple times, only to
get the same results. Still, this benchmark is a very simple one
involving little more than light calculations.

CBALLS

CBALLS.MAX is similar to the first SINGLEPIPE2.MAX benchmark in
that it makes use of raytraced reflections. However, the much
simpler scene makes calculations much faster, as is shown on the
graph:

Finally, a graph that seems to show something "expected." The
Opteron 250 setup resumes its place at the head of the table,
with the 248 and 244 setups falling into their predicted niches.
The OC'd Athlon MP setup finds a spot almost identical to the
2.2GHz Opteron 248 setup--not surprising, considering the Athlon
MP is running at 2.26GHz. The Athlon 64 3200+ takes almost
exactly twice as long as the dual Opteron 250--again, not a
terrible surprise since it's making do with half the number of
processors. The P4, however, makes handy work of the Athlon 64,
scoring just 85% slower than the leader. The Pentium-M again
brings up the rear, but runs amazingly close to the 3.2GHz P4
and 2GHz Athlon 64, despite having a grossly inferior 1.6GHz
clock rate.

VOL_LIGHT

VOL_LIGHT.MAX makes use of volumetric lighting quite similarly
to the preceding MAX5RAYS.MAX benchmark. However, in this scene
the volumetric light casts a shadow on a plane.

Unlike the oddities found in the previous volumetric lighting
bechmark, this one is more like what we might've expected. In
fact, the graph looks very similar to CBALLS.MAX, albeit with
different values.

STADIUM

STADIUM.MAX is the most extensive benchmark found on the Max CD
in that it stresses much more than just the CPU. The benchmark
is so large it is actually distributed as a ZIP file on the CD
to conserve space. The geometry of the scene is of an indoor
basketball stadium, with every seat modeled and everything
lavishly textured. Just loading it consumed nearly a gigabyte of
RAM on all test systems.

Here is the first major defeat of the OC'd Athlon MP system, and
it's a whopper. Taking 276% longer to complete the benchmark,
the Athlon MP's ancient single-channel DDR266 memory setup is
the Achilles' heel of the entire system. It's beaten about the
head by every system in the test--including the single 1.6GHz
Pentium-M! What's more, the 2.0GHz Athlon 64 3200+ system ties
neck and neck with the dual-Opteron 248, which has two CPUs
running at 2.2GHz each. The uniprocessor P4 is right up there
with the dual-Opteron 250. What's going on here?

Behind the scenes, Max has a lot of things to do before it can
actually render anything, such as geometry setup and some
texture calculations. In the case of this benchmark, the bulk of
the "render" time is actually spent setting up the scene in
memory, and most of these tasks are not multithreaded. Watching
the CPU graphs of the dual-processor systems, one CPU would be
monopolized while the other was largely idle.

This goes a long way towards explaining the results of this
benchmark. The Athlon MP setup is mortally wounded by a slow
system bus and the effective loss of one CPU. The Athlon 64
3200+ may have a slower clock speed, but it also has faster
non-ECC memory with lower latency, allowing it to tie the
dual-Opteron 244 setup. The single P4 with its fantastic DDR-2
manages to put in a good showing against the Opteron, which is
saddled with slower ECC memory and the higher latencies
associated with multiprocessor systems.

POVRAY

POV-Ray is the first of our uniprocessor benchmarks, by which we
mean that all of the remaining tests either make no use of or do
not recognize more than one CPU during benchmarking.
The BENCHMARK.POV file was run with all settings at their
defaults.

With dual CPUs effectively nullified, this benchmark becomes a
referendum on efficient use of clock cycles. The single CPU
systems, heretofore at the tail end of the Max benches, stage a
decent comeback. The Athlon 64 puts in a very good showing,
besting the Opteron 244 soundly and coming very close to the
Athlon MP, which is running 266MHz faster. The P4 also bests the
dual-Opteron 244. The Pentium-M continues to come in last place,
but given its very low clock rate one would've expected it to be
even slower. Keep in mind that although it took 71% longer to
complete the benchmark, it's running 50% slower than the Opteron
and is hamstrung with single-channel DDR333 memory.

DISTRIBUTED.NET

The DISTRIBUTED.NET client version 2.9009.494 was used with
"benchmark all projects - selected cores."



The Pentium 4 architecture does not appear to like this
particular benchmark. Despite its high clock rate, it scores
nearly dead even with the Opteron 244 system running a little
more than half the P4's speed. Even more embarassing, the
Pentium-M is running at half the speed, yet manages an OGR-25
score only 25% slower. Keep in mind that the Pentium-M is
running with single-channel DDR333 while the P4 has dual-channel
DDR-2 running at 533MHz!

All of the AMD systems seem to like this benchmark, though. The
Athlon 64 3200+ manages to nearly tie the Opteron 248 despite
its slower clock rate, again showing the performance boost when
going from ECC DDR to low-latency, non-ECC DDR. This also
demonstrates how, when given single-threaded tasks, uniprocessor
systems will always perform better than multiprocessor systems
of similar clock rate. Gamers take note.

SUPERPI

As stated earlier, SuperPi calculates the value of pi to a
specified length. We timed how long it took each system to
compute values of pi from 512,000 digits up to 32 million
digits. This pushes the processor and memory setups very hard.

Finally, we have a benchmark that can make even the strongest of
these systems work feverishly for more than a few minutes, and
the pack is separated almost exactly where we'd expect it to be.
The Opteron systems take their predictable places, again showing
just how linearly the Opteron is scaling with speed increases.
The on-die memory controller contributes significantly to the
stellar scores, whereas the lack of one puts the OC'd Athlon MP
setup behind even the Pentium-M. The P4 and Athlon 64 systems
stay very close throughout the entire range, with the
low-latency memory of the AMD system and the efficient DDR-2
memory of the P4 allowing both to be very competitive.

Conclusion

Lots of numbers, lots of graphs ... what can we deduce from it all?

To start with, the Opteron 250 setup is immensely powerful, definitely
capable of taking pretty much whatever you can throw at it and ask for
more. In fact, the more intense the task, the better it seems to perform
when compared with its contemporaries. Its performance when compared
with other Opterons is extremely linear and predictable: a 10% bump in
clock speed yields almost exactly 10% more performance. Dual CPU scaling
is also quite good.

It should also be noted that the systems performed flawlessly, with no
crashes or odd behavior. Usage was buttery smooth, even when both CPUs
were running at full utilization. The Opteron 250 dominated nearly every
benchmark, coming in second only once. For power users looking for the
best, look no further than a workstation sporting two of them.

Intel's offerings have seen a rather precipitous drop in value lately.
Clock speed increases have pretty much stopped, what with "Prescott" P4s
putting out in excess of 100W of heat after consuming vast quantities of
current. The dark horse here is the Pentium-M, however. Yes, it came
last in almost every benchmark, but the margin of loss was not always
what one might expect given the huge disparities between the P-M and,
say, a dual-Opteron 250. Intel has already released "Dothan" core P-M
parts up to 2.1GHz, and enthusiasts have overclocked it to 2.4GHz with
ease.

The performance of the Pentium-M at these speeds was extremely good,
although still hamstrung by slow DDR333 memory and a poor floating-point
math unit. A dual-Xeon might've put up a very spirited fight in this
comparison, especially in the 3D Studio Max tests--had we been able to
procure one. Alas, perhaps we'll leave that for a later review.

AMD's toughest competition comes not from Intel, though, but from itself
in the form of the excellent Athlon MP. The author's home-brewed OC'd
Athlon MP system represents a pinnacle AMD chose not to pursue, as the
fastest Athlon MP ever made only ran at 2.13GHz.

How high could AMD have gone? The author has reliably tested this system
with both CPUs running near 2.4GHz, as have others. However, this is
more the exception than the rule, and usually requires extreme cooling
measures not suitable for production PCs. AMD may have killed the Athlon
in its prime, but the platform really didn't have much further to go.

Price-wise, the runaway killer of all systems was the OC'd Athlon MP.
The MSI K7D Master-L motherboard can be had for around $180. Athlon XP
2500+ "Mobile Barton" CPUs can be had for as little as $80, and only a
deft touch with sharp blade is needed to "convert" them into Athlon MPs.
Upgradeware's XP-TMC multiplier controllers are $30 each, and ECC DDR266
remains economical.

In short, for the cost of a single Opteron 250 CPU and mainboard, you
could build an entire dual Athlon rig that will perform amazingly well.
But there are significant caveats, not the least of which is the
system's "unsupported" configuration. The system may be competitive now,
but it's going to be eclipsed very soon.

For AMD, "Hammer" represents the future. The Athlon platform is
essentially dead, with the 760MPX chipset going almost four years with
no updates--and none are coming. AMD has further stated that Athlon
production will cease in 2005, with all production capacity centered
around Athlon 64s and Opterons. Very soon we'll be seeing dual-cores in
Opterons and even Athlon 64s, with Intel following suit.

The "Hammer" line isn't so much a revolutionary jump over the Athlon
XP/MP--it's more akin to picking up where the Athlon XP/MP left off. We
can lament the passing of the venerable Athlon, but the torch has been
put in more capable hands.

bott1.gif (423 bytes) bott2.gif (245 bytes) ABOUT 3FN.NETWORKTOOLS FOR OUR CLIENTSACCEPTABLE USAGE POLICYCONTACT USPARTNERSHIP PROGRAM