« LDAPv3 Conference in … | Home | Another Defection? »

30 April 2007 - 10:48Some Thoughts About Performance

There have been some pretty spirited interchanges here about performance. There is a pervasive feeling in the industry that software performance (efficiency) is a non-issue. I suspect that's a belief adhered to by the 95th percentile of modern computer users and developers. At least, that's the way it feels.

For reasons mentioned in the recent comments, Symas can not act as if software efficiency and performance are unimportant. There are two reasons for that. One is practical and the other is a fundamental point of philosophy.

Heavy stuff ahead ;-) ... read on

Functional justification for efficient software: It is best to have a particular workload running on a single system. It may be a single-memory multi-processor as is becoming popular with multi-core processor chips but that is still a single system. Such a single system is likely to be the best price-performance solution. It will also be the easiest to administer and manage. And, it will not require the added complexity of load-balancing front ends, etc. From a total cost per transaction delivered, the single system (server) solution is hard to compete with.

All too often, an application workload grows more quickly than forecast. Sometimes it is more popular or useful. Other times, the efficiency of the software (or hardware) is misjudged. In either case, costly upgrades must be made. Beyond a certain price point, upgrades carry such a premium for each new unit of server capacity that it may not be appropriate to continue to grow the server. At that point, the workload has to be split across multiple servers (scaled out).

It is reasonable to want to have at least one hot-standby server for any critical workload. It is even rational, in unusually critical situations to have a third and/or even a second pair in a disaster recovery remote location. This investment in recoverability and fail-over is independent of server capacity and has to do with points of failure and the enterprise's tolerance (or lack thereof) for software-mediated service (application) outage. This reduces total cost per transaction but provides certain guarantees of business continuity and service level.

When the workload expands to be too large for the single server, costs jump at rates that are faster than the rate of the transactions. The introduction of additional functioning copies of control programs, load-balancers and replication mechanisms add to the overall cost. This is in addition to the cost of the application workload delivery and demand administration overhead. Replication is a performance tool that can be very valuable to spread read access to a directory (any read-only copy of a database) but it comes at a cost on a per transaction basis. This cost is often justified based on the performance difference to local users compared to access the workload server over distant networks. But it is a cost that can/should not be ignored.

So, whenever an application workload must be split because it is growing past the capacity of the servers available, that growth forces expenses not justified by other requirements. That is a failure of the systems platform providers to offer server systems with the capacity required to support the workloads. Scaling out (multiple servers sharing the load) is an accommodation with financial and labor costs that are undesirable. Scaling out introduces many more potential points of failure, too.

Application workload performance on any processor is limited to that processor's absolute capacity to execute the mix of machine instructions in the sequence required to perform the workload's programs. Software efficiency can dramatically reduce the amount of application work a computer can perform (as we have recently shown in our benchmarks of various Directory Services software packages). To the extent that software offers such substantially better performance potential, it can help enterprises significantly reduce the cost and, possibly, complexity of successful application workloads.

Practically, attention to software efficiency gives Symas a competitive edge with customers whose directories have the potential to become mission-critical and heavily used databases. We simply can not compete if we offer a directory that is significantly slower to respond to requests or that can't handle the highest transaction rates possible.

Philosophic justification for efficient software: The second reason that software efficiency and performance matter is that with the growing number of good programmers aspiring to produce professional code, there is no excuse for knowingly developing code, over the long haul, that performs badly. Prototyping code in whatever is comfortable and easily debugged is fine. Getting something working is terrific. Putting that to work in customer environments helps you and the customer refine your understanding of the application and its true requirements. But, as it becomes more mature, more robust, it should also become efficient, for two reasons.

The first reason that a person or project should produce efficient code, in the long run, is that the failure to do so merely challenges someone more focused on such matters to do it for their own reasons and self-promotion. Why put in all the effort to get something amazing working to stand aside and let someone else come in and do the reengineering that would be easier for you. If you have any financial or ego stake in the software, you leave yourself open to being remembered as the hacker who slapped together the prototype while others get the kudos for the improved version.

The second reason someone should produce efficient code is that not to do so is, essentially, a disservice to the customer. It is, at some fundamental level, a statement that this software wasn't good enough to do well or that the customer didn't deserve the best quality software (measured on this often costly metric).

Performance of software that we hack for our own use is only as relevant as we decide to make it. If it's getting in your way, you'll probably fix it.

Performance of software that runs on end-user machines where local performance is more than adequate, performance does not appear to be an issue but, as additional software is introduced, we are used to seeing older slower machines get sluggish. This is not a feature, it is a bug. In general, there is no real excuse for the bloat and the poor performance. It is disrespectful of the user, evidence of an opinion that they have money dripping out of their pockets with nothing better to spend it on than more computer capacity. In other words an insult.

We select tools and acquire skills and ply our trade. As we work on projects, we have to ask if we're using appropriate tools. As good as I may think PhP or Python is, I don't think I'd kid myself into writing a BIOS or kernel in either ... at least not for a production application workload serving platform. No matter what, when you're working on server system software, paying attention to performance, quantitatively, is a responsibility and a part of the profession.



Let the code speak for itself and when it says it's slow, don't make excuses for the tool-chain. It's the engineering, friend. And the quality of the engineering.

--- Marty

UPDATE: All of this presupposes that the software works correctly and meets the functional expectations it purports to support (standards, compatibility, platform support, etc). At some level, functional completeness is a more fundamental requirement. However, after performance optimization is undertaken, it does not relieve the developers from the responsibility to keep the software current from a functional perspective. This is not unique to Open Source Software. It is as much a problem with commercial software. There is another post of about equal length due on all that.

This is directly aimed at Directory Servers which do not support the LDAPv3 Standard (RFC 4510 et al) by not implementing subtree rename, schema checking, or supporting the mandatory multiple value capability for commonNames. Those are functional deficiencies that we know about in other peoples' directory server packages.



one comment:

Performance matters. I got my start in programming working on embedded systems where we squeezed code for every clock cycle and every byte of space it could give us. The result was a system that had double the performance of the original software. It allowed our customers to stay with their existing hardware for years, and saved them money.
Since then, I’ve heard people saying things like “CPU is commodity. Disk is commodity” to justify sloppy programming. Their idea was that efficiency didn’t matter, because hardware would always become faster to make up for it. Not surprisingly, the result was software that maxed out any available hardware when offered miniscule traffic loads.
Efficiency matters because efficient programs are more likely to be correct. My experience has been that efficient programs are more likely to be smaller and easier to debug, too.


No trackbacks:

Please enable javascript to generate a trackback url


  
Remember personal info?

Emoticons / Textile

Comment moderation is enabled on this site. This means that your comment will not be visible on this site until it has been approved by an editor.

  ( Register your username / Log in )

Notify:
Hide email:

Small print: All html tags except <b> and <i> will be removed from your comment. You can make links by just typing the url or mail-address.