Response to Jeff Savit Blog

As part of the announcement of z10 IBM made some marketing claims about the large number of distributed Intel servers that  could be consolidated with zVM on a z10.  The example cited used Sun rack optimized servers with  Intel Architecture CPUs.  Sun Blogger Jeff Savit objected strenuosly to the claims mainly because of the low utilization assumed on the Sun machines that the claims compared to.  You can read it here:

http://blogs.sun.com/jsavit/entry/no_there_isn_t_aI responded, he responded.  When I was out of pocket  for awhile and did not respond soon enough and his blog cut off replies on that thread.  I am putting my latest response here.  Thanks to Mainframe blog for providing the venue to do so.  My latest responses to Jeff are in blue italics.

Posted by Joe Temple on June 24, 2008 at 11:28 AM EDT #

This format is very difficult for parry and riposte, but let's try. I would like to use different colors, but I can't (AFAIK) put in HTML markup to permit that. So: Joe's stuff verbatim within brackets, and each of his sections starts with a quote of a sentence of mine (which I identify, within quotes) for context. Each stanza identified by name and employer (this is Jeff speaking):

Joe(IBM): [[[Jeff, your post is rather long and rather than build a point by point discussion too long for a single comment I will put up several comments. Starting with the moral of the story: There are several: • quoting Jeff: "Use open, standard benchmarks, such as those from SPEC and TPC."

Better to use your own. They have not been hyper tuned and specifically designed for. They have a better chance of representing reality. But be careful not to measure wall clock time on “hello world” or lap tops will beat servers every time.]]] 

Jeff(Sun): In a perfect world, every customer would have the opportunity to test their applications on a wide variety of hardware platforms to see how they perform. But they don't, and they rely on open standard benchmarks to give them some information about how the platforms would perform. Or, they do have applications they could benchmark, but they're non-portable, or run solely on a single CPU (making all non-uniprocessor results worthless), or otherwise have poor scalability or any of a hundred other problems. Imagine comparing IBM processors based on the speed of somebody writing to tape with a blocksize of 80 bytes! Even if they get a useful result, the next customer doesn't benefit at all and has to start from scratch. It's not trivial to make good benchmarks that aren't flawed in some way. That's why the benchmark organizations exist - to provide benchmarks that characterize performance and give a level playing field for all vendors. IBM, Sun, and others are active in them - our employers must think they have value. Obviously there is "benchmarketing" and misuse of benchmarks. THAT is what I'm railing against. Hence, my following bullet that says "read and understand". But frankly, benchmarks Specweb/specwebssl/Specjvm, the SPEC fileserver benchmarks, and benchmarks like TPC.org's TPC-E provide representative characterization of system performance (with sad exceptions like TPC-C, which is broken and obsolete, but IBM still uses for POWER). The characterization of TPC-C as "old and broken"  may have something to do with Sun's inability to keep up on that benchmark.  One of the characteristics of TPC-C that none of the other benchmarks has is that it has at least some "non local" accesses in the transactions.  Sun's problem with this is that such accesses defeat the strong NUMA characteristic of their large machines.  One of the results of this  is that all machines scale worse on TPC-C than on the benchmarks Jeff cites. Since Sun is very dependent on scaling a large number of engines to get large machine capacity close to IBM's machines they are highly susceptible to this.   The effect  is  exacerbated by NUMA (non uniform memory access).  That is, a flat SMP structure will mitigate this.   The mainframe community's problem with TPC-C is that the non-local traffic is all balanced and a low percentage of the load.  As a result TPC-C still runs best on a machine with a hard affinity switch set and does not drive enough cache coherence traffic to defeat numa structures.  When workload runs this way it does not gain any advantage from z's schedulers or shared cache or flat design. Think of TPC-C as a fence.  There is workload on Sun's side and there is workload on the mainframe side of TPC-C.  All the Industry Standard Benchmarks sit on Sun's side and scale more linearly than TPC-C.  For workloads that are large enough to need scale that run on the Sun side of the TPC-C fence, IBM sells System p and System x.  When you consolidate disparate loads the Industry Standard benchmarks do not represent the load and  with enough "mixing"  the  composite workload will eventually move to the mainframe side of the TPC-C fence.  See Neil Gunther's Guerilla Capacity Planning, for a discussion of contention and coherence traffic and their effect on scale.  Particularly  read chapter 8, to get an idea about how the benchmarks lead to overestimation of scaling capability.    A lot of people have worked very hard to make them be as good as they are. IBM uses these benchmarks all the time - with the notable exception of System z.  System z is designed  to run workloads with non uniform memory access patterns, randomly variable loads, and much more serialization and cache migration than occurs in the standard benchmarks , where strong affinity hurts, rather than enhances throughput. It is the only machine designed that way (Large shared L3 and only 4 way NUMA on 64 processors). Also, the standard benchmarks are generally used for "benchmarketing".  As a result the hard work involved is not purely driven by the noble effort by technical folks that Jeff portrays, but rather by practical business needs, including the need to show throughput and scale in the best possible light.  That's the point, isn't it. It works in a monopoly priced marketplace where it doesn't have to compete on price/performance,  as it does with its x86 and POWER products. Where else are you going to run CICS, IMS, and JES2?  There are alternatives to System z on all workloads, it is matter of migration costs v benefits of moving.  Many applications have moved off CICs and IMS to UNIX )and Windows over the years. Sun has whole marketing programs to encourage migration.  In fact a large fraction of UNIX/Windows loads do work that was once done on mainframes.  As result the mainframe must compete.   Similar costs are incurred moving work from any UNIX (Solaris, HPUX,  AIX, Linux to zOS. Or moving from UNIX to Windows.  The other part of the barrier is the difference in machine structure.  This barrier is workload dependent.  Usually, when considering two platforms for a given piece of work one of the machine structures will be a better fit.   When moving work in the direction favored by the machine structure difference the case can be made to pay for the migration..  This is what all verndors do.  Greg Pfister (In Search of Clusters), suggests that there are three basic categories of work.  Parallel Hell, Paralle Nirvana, and Parallel Purgatory.  I would suggest that there are three types of machines optimized for these environments (Blades in Nirvana, Large UNIX machines in Purgatory, and Mainframes in Hell)  To the extent that workload is in parallel hell, the barrier to movement off the mainframe will be quite high.   Similarly attempts to run purgatory or nirvana loads on the mainframe will run in to price and scaling issues. IBM asserts that consolidation of disparate workloads using virtualization will drive the composite workload toward parallel hell, where the mainframe has advantages due to its design features, mature hypervisors and machine structure.

To the second observation about wall clock time on trivial applications: yes, obviously.

Joe(IBM): [[[quoting Jeff: •"Read and understand what they measure, instead of just accepting them uncritically."
Yes, particularly understand that the industry standard benchmarks run with low enough variability and low thread interaction that it makes sense to turn on a hard affinity scheduler. Your workload probably does not work this way.]]] 

Jeff(Sun): I'm not sure what's intended by that. Are you claiming that benchmarks should be run against systems without fully loading them to see what they can achieve at max loads? Hmm. Anyway, see below my comments about low variability and low thread count - which applies nicely to IBM's LSPR.]]]   I guess I am claiming that the industry benchmarks basically represent parallel nirvana and parallel purgatory.  I am asserting that mixing workload under single OS or virtualizing servers within an SMP drives platforms toward parallel hell.  The near linear scaling of the industry standard loads on machines optimized for them will not be achieved on mixed and virtualized workloads.  In part this because sharing the hardware across multiple applications will lead to more cache reloads and migrations than occur in the benchmarks.   I see Jeff's reference  to LSPR as a red herring for two reasons.  While LSPR has not been applied across the industry,  the values it contains have been used to do capacity planning rather than marketing. The loads for which this planning is done are usually a combination of virtualized images each either running mixed and workload managed  under zOS or  VM and zLinux.   This could not be done successfully if  the scalability were as idealized as the Industry standard benchmarks.   Second, I do not suggest that LSPR is the answer, but rather that the current benchmarks do not sufficiently represent the workloads in question (mixed/virtualized) for Jeff to make the claim that z does not scale as he did elswhere in the blog entry.  Basically,  to draw his conclusion he compares the LSPR scaling ratios to Industry benchmark results on UNIX SMPs. This is not  a good comparison.

Joe(IBM): [[[quoting Jeff: •"Get the price-tag associated with the system used to run the benchmark." Better to understand your total costs including admin, power, cooling, floorspace, outages, licensing, etc."

Jeff(Sun): That's what I meant. Great.  Because the hardware price difference that Sun usually talks about is only a small percentage of total cost.  The share of total cost represented by hardware price shrinks every year.

Joe(IBM): [[[quoting Jeff: • Relate benchmarks to reality. Nobody buys computers to run Dhrystone." Only performance engineers run benchmarks for a living.]]]

Jeff(Sun): Sounds like a dog's life, eh? OTOH, they don't have users...

Joe(IBM): [[[quoting Jeff: •"Don't permit games like "assume the other guy's system is barely loaded while ours is maxed out". That distorts price/performance dishonestly." Understand what your utilization story is by measuring it. Don’t permit games in which hypertuned benchmarks with little or no load variability and low thread interaction represent your virtualized or consolidated workload. Understand the differences in utilization saturation design points in your IT infrastructure and what drives them."]]]

Jeff(Sun): Your comment has nothing to do with what I'm describing. What I'm talking about is the dishonest attempt to make expensive products look competitive by proposing that they be run at 90% utilization, while the opposition is stipulated to be at 10%, and claim magic technology (like WLM, which z/Linux can't use) to permit higher utilization and claim better cost per unit of work on your own kit. That's nothing more than a trick to make mainframes look only 1/9th as expensive as they are. Imagine comparing EPA mileage between two cars by spilling 90% of the gas out of the competitor's tank before starting. As far as "no load variability and low thread interaction", I suggest you take a good look at IBM's LSPR. See http://www-03.ibm.com/servers/eserver/zseries/lspr/lsprwork.html which describes long running batch jobs (NO thread interaction at all) on systems run 100% busy (NO load variability). The IMS, CICS (mostly a single address space, remember), and WAS workloads in LSPR should not be assumed to be different in this regard either. This doesn't make LSPR evil: it is not - it's very useful for comparisons within the same platform family. But consider SPECjAppserver, which has interactions between web container, JSP/servlet, EJB container, database, JMS messaging layer, and transaction management - many in different thread and process contexts. I suggest you reconsider your characterization about thread interaction. Complaints about thread interaction and variability of load are misplaced and misleading.  The comparison of zLinux /VM at high utilization with highly distributed solution at low utiliation is valid, and well founded on both data  and system theory.   You could make similar comparisons of  consolidated  Virtualized UNIX v  distributed Unix,, VMware v Distirbuted Intel.  Any cross comparison of virtualized v distributed servers  will be leveraged mainly by utilization rather than by raw  performance as measured by benchmarks.  Thus the comparison Jeff complains about as dishonest does in fact represent what happens when consolidating existing servers using virtualization.   My second point is that in making comparisons between consolidated mixed worklload solutions that industry benchmarks are not represetative of the relative capacity or the saturation design point for each of the  systems in question.  There is no current benchmark to use for these comparisons.  This includes LSPR, Suns Mvalues, rPerfs,  as well as the industry benchmarks.  None of them works.  Each vendor asserts leverage for consolidation based on their own empirical results, or perceived strengths in terms of machine design.     I am saying that the scaling of these types of workloads is  less linear that the industry benchmark results and that  some of the things z leverages to do LSPR well  will  apply in this environment as well. Joe(IBM): [[[quoting Jeff: •"Don't compare the brand-new machine to the competitor's 2 year old machine" Understand what the vintage of your machine population is. When you embark on a consolidation or virtualization project compare alternative consolidated solutions, but understand that the relative capacity of mixed workload solutions is not represented by any of the existing industry standard benchmarks.]]] 

Jeff(Sun): We're talking at mixed purposes. What I mean is that one vendor's 2008 product tends to look a lot better than the competition's 2002 box, making invidious comparisons easy. Moore's Law has marched on.  The truth is that when you do a consolidation you usually deal with a range of servers some of which are 4 or 5 years old.  2 year old  vintage is probably farirly representative.  In any case Moore's law does not improve utilization of distributed boxes unless you consolidate work in the process of upgrading. Unless a consolidation is done the utilization will drop when you replace old servers with new servers.  For the consolidation to occur within a single application, the application has to span multiple old servers in capacity.  Server farms are full of applications which do not use a single modern engine efficiently let alone a full multicore server.   Jeff's main argument is with the utilization comparison.   The utilization of distributed servers, including HP's, Sun's and IBM's, is  very often quite low.  It is possible to consolidate a lot of low utilized servers on a larger machine. The mainframe has a long term lead in the ability to do this, that includes hardware design characteristics (Cache/Memory Nest), specific scheduling capability in hypervisors (PR/SM and VM), and hardware features (SIE).   How many two year old low utilized servers  running disparate work can an M9000 consolidate?   

Joe(IBM): [[[quoting Jeff: • "Insist that your vendors provide open benchmarks and not just make stuff up."
Get underneath benchmarketing and really understand what vendor data is telling you. Relate benchmark results to design characteristics. Characterize your workloads. (Greg Pfister's In Search of Clusters and Neil Guther's Guerilla Capacity Planning suggest taxonomies for doing so.) Understand how fundamental design attributes are featured or masked by benchmark loads. Understand that ultimately standard benchmarks are “made up” loads that scale well. Learn to derate claims appropriately, by knowing your own situation. (Neil Gunther's Guerilla Capacity Planning suggests a method for doing so)]]]

Jeff(Sun): This is not the "making stuff up" that I was referring to. I was referring to misuse of benchmarks in the z10 announcement, which IBM was required to redact from the announcement web page and the blogs that linked to it. I'm not arguing against synthetic benchmarks that honestly try to mimic reality, I'm arguing against attempts to game the system that I discussed in my "Ten Percent Solution" blog entry.  I have explained the comparison made for the z10 announcement above.   Jeff objects to the utilzation coparison which is legitimate. In fact when servers are running at low utilization most of them are doing nothing most of the time.  That is the central argument for virtualization which is generally accepted in the industry.  I am also pointing out that Industry Standard Benchmarks are not created in purely noble attempt to uncover the truth about capacity.  In fact they are generally defined in a way that supports the distributed processing, scale out. client server camp of solution design, which is why they scale so well.   Think about it.  The industry standard committees each vendor has a vote.  System z represents 1/4 of IBM's vote.   Do you think there will ever be an industry standard benchmark which represents loads that do well on its machine structure?  The benchmarks and their machines have evolved together.  They can represent loads from single application codes that are cluster or numa concious.   What happens to all of those optimizations when workloads are stacked and the data doesn't remain in cache or must migrate from cache to cache?  The point is that relevance and validity of  either side of this argument is highly workload dependent.   The local situation will govern most cases.  Neither an industry benchmark result nor a single consolidation scenario  is more valid than the other. 

Joe(IBM): [[[quoting Jeff: • "Be suspicious!"Be aware of your own biases. Most marketing hype is preaching to the choir. Do not trust “near linear scaling” claims. Measure your situation. Don’t accept the assertion that the lowest hardware price leads to the lowest cost solution. Pay attention to your costs, and don’t mask business priorities with flat service levels. Be aware of your chargeback policies and their effects. Work to adjust when those effects distort true value and costs."]]]

Jeff(Sun): With this I cannot disagree. That's exactly what I have been discussing in my blog entries: unsubstantiated claims of "near linear scaling" to permit 1,500 servers to be consolidated onto a single z (well, the trick here is to stipulate that 1,250 of the 1,500 do no work!) By definition servers running at low utilization are doing nothing most of the time.or to ignore service levels (see my "Don't keep your users hostage" entry). Actually virtualization  of servers  on shared hardware can improve service levels by improving latency of interconnects.  I'll also add "beware of the 'sunk cost fallacy'": you shouldn't throw more money into using a too-expensive product that has excess capacity because you've already sunk costs there.  Actually, adding workload to an existing large server can be the most effiicent thing to do in terms of power, cooling, floorspace, people, deployment, and time to market, even if the price of the processor hardware is higher.  These efficiencies and the need for them is locally driven.  In general there may or may not be a "sunk cost fallacy" .  In fact  you should also be aware of the "hardware price bargain fallacy".  Finally, Sun itself recognized System z and zVM as "the premier virtualization platform" when Sun and IBM jointly announced support of Open Solaris on IBM hardware.

by Joe Temple July 28, 2008 in Systems Technology
Permalink

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d834521c8469e200e553dae97b8834

Listed below are links to weblogs that reference Response to Jeff Savit Blog:

Comments

This diaglogue was very interesting. As a user in need of a means to compare zLinux to other options, I need be able to do so without making an investment in z. I've been fairly sucessful in using the public benchmarks that Jeff refers to in plannng a virtual infrastructure around VMware or in implementing large complex applications on Sun. We'd like to explore Z but not knowing how to translate our workloads into Z capacity is turn off.

Posted by: larry mc | Oct 29, 2008 5:33:58 PM

Larry MC. I can help you compare z to other servers. IBM has the capability to compare z to other machines in a variety of ways. You can do two things. First, ask IBM for a discussion about "Fit for Purpose" with an IBM System Architect or cSM (SA). These folks come from a variety of backgrounds but work on cross platform archietctural decisions to find "Fit for Purpose". Second you can get hold of my CMG paper on relative capacity which will be presented at CMG2008 and published in 1Q09. Third you can contact me through email at IBM. Put Temple III in the field of the "employee directory" on the IBM website to get the email address.

Posted by: Joe Temple | Dec 7, 2008 7:08:08 AM

I guess they have not been hyper tuned and specifically designed for. They have a better chance of representing reality.

Posted by: cheap computers | Feb 10, 2010 1:44:04 PM

The hyper tuned system is going to be the future. This writeup was perfect, I could,t have expressed it better myself.

Posted by: plantronics wireless headset | May 27, 2011 5:04:01 PM

The comments to this entry are closed.



The postings on this site are our own and don’t necessarily represent the positions, strategies or opinions of our employers.
© Copyright 2005 the respective authors of the Mainframe Weblog.