When Performance Measurement Goes Awry

656316_tape-measureOne of the major contributions of 1950s business management was the idea of management by objectives (MBO).  This concept was first introduced by Dr. Peter Drucker in his 1954 book, The Practice of Management.  MBO incubated throughout the 1960s and took hold in the 1970s.  As proposed by Drucker, objectives were to be established in order to accomplish five ideals: 1) to organize and explain the whole range of business phenomena in a small number of general statements; 2) to test these statements in actual experience; 3) to predict behaviour; 4) to appraise the soundness of decisions while they are still being made; and, 5) to enable practicing businesses to analyze their own experience and, as a result, improve performance.

Setting hard goals and objectives is part of everyday existence in the investment industry. For the overseers of defined benefit pension plans, their fiduciary responsibility is to act in the best interest of the plan participant, providing a structure to achieve a return to meet the obligations promised. For defined contribution plans, the long-term goal is the same – the only difference is that the participant has a more pronounced impact on the outcome. For individuals, the main goal is to increase overall net worth through returns from a well-defined investment program.

The investment management industry is one of numbers.

If a manager outperforms his/her benchmark by a dollar, we know the exact components as to how the dollar was made up. We know what percent of this dollar came from the market itself (the beta), what percent came from skill or luck (the alpha) of the money manager, and how much came from trading.  Given the size of the Canadian pension market (approximately $2 trillion), you might think that, with all the very sophisticated performance measurement techniques available, we would have an excellent handle on how to effectively measure the ongoing success or failure of the investment fund itself (the decisions made by the plan sponsor and consultant) and the capital market components (the decisions of the money managers).

In my experience, this has not been the case. I believe, this inability to effectively measure success or failure has cost the pension industry hundreds of millions of dollars over the past decade through the firing of managers at the wrong time and for the wrong reasons and due to the hiring of managers at the wrong time and for the wrong reasons.

This article – and the one that will follow next week — will attempt to address the evolution of performance benchmarking over the past 50 years and the various caveats associated with the use of each measurement tool.

 Defining the benchmark

Benchmarking returns has become the primary tool for measuring success or failure in our industry. The term is widely used, however, and can mean very different things to different players within the investment field.

In 1998, the Association of Investment Management and Research (AIMR — now the CFA Institute) took a stab at defining what a benchmark was for the investment management community: “Benchmarks are important tools to aid in the planning, implementation and review of investment policy. They clarify communication between the investment fiduciary and the investment manager and provide a point of departure for assessing return and risk.”

The CFA Institute believes that “a benchmark is essentially the starting point for evaluating success.”

It is defined specifically as “an independent rate of return (or hurdle rate) forming an objective test of the effective implementation of an investment strategy.”

In determining what makes a good benchmark, the CFA Institute believes that:

“…a benchmark should be a focal point in the relationship between the manager and the fiduciary body overseeing the prudent management of the assets. The thoughtful choice of a benchmark will make the relationship between these parties more effective and enhance the value of the investment strategy. The most effective benchmarks are:

  • Representative of the asset class or mandate;
  • Investable (e.g., a viable investment alternative);
  • Constructed in a disciplined and objective manner;
  • Formulated from publicly available information;
  • Acceptable by the manager as the neutral position;
  • Consistent with underlying investor status.”

Finally, “choosing a bad or inappropriate benchmark can undermine the effectiveness of an investment strategy and lead to dissatisfaction between client and manager.”

The overall objective of performance measurement is to assess ongoing performance by providing measurable standards for the plan sponsor to ensure achievement of its long-term goals.

Bottom line: if the benchmark is inappropriate, then any analytics against the benchmark will result in useless or misleading information.

The evolution of benchmarks

In 1896 Charles H. Dow put 12 stocks together to form the Dow Jones Industrial Index. This was followed by Standard & Poor’s with the beginning of its indexes in 1923. These two index services provided an opportunity to track market activity and, later, the ability to compare money manager results for the stock component of an investment fund.

Other index providers have emerged over the past few decades and it became a way of creating a brand name, promoting other products and services offered by the index providers, and creating a revenue source through licencing agreements to plan sponsors, money management organizations and other service providers.

Over the decades, there have also been four basic methods to track the performance of an investment fund and money managers against some sort of benchmark. These are:


The first measurement yardstick in the 1950s and 1960s was against inflation –the Consumer Price Index (CPI).  During these two decades, inflation averaged 2.4% (i.e., 2.3% on average for the 1950s and 2.6% for the 1960s).  The thought was that, if the fund return sufficiently exceeded the inflation rate, assets would outpace liabilities and create a surplus to either reduce the plan sponsor’s cost of funding the plan, or give the plan sponsor the opportunity to increase benefits – or both.

Actuaries, typically, assumed that the overall return from the components of the capital market would exceed inflation by some margin.  As a result, money managers were given a value-added target above the rate of inflation of, generally, 3% to 4% (i.e., if inflation averaged 3% over a 10-year period, then the manager was expected to earn a 6% to 7% return on assets).

Managers were evaluated over rolling 10-year periods – the good old days.

Given that the vast majority of the funds being managed by financial institutions were balanced funds (including both equity and debt securities), this was not an unreasonable benchmark for a sufficiently long time horizon. However, in shorter periods, capital market returns diverged widely from inflation and then, as now, no investment was available that would simply match inflation.

Peer group placement

In the late 1950s and early 1960s, a U.S.-based consulting firm, A.G. Becker Fund Evaluation Group, introduced the concept of peer group sampling. Becker believed that, by placing stock managers, bond managers and balanced fund managers within like groups, they would be able to distinguish the good managers from the bad managers. The firm segmented managers’ performance into quartiles and reduced the primary measurement period to four years – supposedly representing a typical market cycle.  Managers could then be evaluated in both up and down markets.

Being the only measurement game in town, the concept thrived, and money managers began to be hired and fired based on relative performance within a peer group sample. As there was now an apparently objective way to distinguish one manager from another based on performance, manager search activity began to increase and actuarial and performance measurement firms entering the field.

By the 1970s, peer group sampling became the primary measurement tool for pension plan sponsors.  Even today, peer group sampling with much finer distinctions among comparison groups remains a major influence for sponsors.


A third measurement source began to emerge around the mid-1980s: a comparison against cap-weighted market indexes. At the same time, passive/index managers also became an alternative to active management.  If the median active manager could not outperform the designated index selected as the bogey (even before fees) and, if superior managers who consistently outperformed the index could not be identified with any confidence, then why have active management at all – given the much higher fees?

Index providers began to flourish and creating more and more ways to slice the market (and manager styles) became a new revenue source for them.  In the mid-1980s to mid-1990s, equity indexes began to be segmented by cap size and value/growth styles and the “nine-box matrix” was formed by Morningstar.  In fixed income, indexes became segmented by term, issues and credit quality.

Since passive/index services provided an alternative to active managers, a value-added target was added to the predetermined index selected (e.g., a value-added target over S&P 500 returns of 200 basis points over four-year moving-average time frames). This was designed to offset both the higher fees of active management and the risk of under-performance associated with active management. Such a benchmark for a core, U.S. equity manager arguably meets all six of the CFA Institute’s criteria.

To add support to indexes as the appropriate benchmark for performance comparisons (for the reasons stated under the Benchmark Defined section above), the Bank Administrative Institute was the first organization to set out performance standards in 1968. This was quickly followed by the Investment Council Association of America in 1971; the U.K. Society of Analysts in 1972; and, AIMR in 1998.

As a result, in the mid-1980s through to the 1990s, index-related benchmarking became as important to plan sponsors as peer group comparisons and inflation all but disappeared as a way of measuring overall fund performance. It was replaced by a policy benchmark as a weighted blend of asset class indexes.

Three things happened. First, the emphasis shifted from measuring the fund as a whole to measuring, in detail, the component parts managed by the money managers; balanced funds became segmented and specialty funds emerged. Second, the time frame for measuring results contracted from 10 years to four years. Third, attribution of value-added results could now be tracked. Under peer group sampling, since the characteristics of median funds were not known, performance attribution against the median fund could not be undertaken.


Benchmarking has become significantly more sophisticated over the past decade – and, significantly, more complicated.  With the tools available to analyze performance results in great detail, plan sponsors and investment consultants are in a better position to determine manager skill. This doesn’t mean they can (even ex-post and much less ex-ante) just that they are in a better position to do so.

For example, the index providers can use benchmarking, attribution, and factor analyses to separate manager style impacts from security selection impacts within that style. This is a critical distinction – it was the plan sponsor’s choice to pick a manager with a given style and it is the sponsor who should be accountable for the results of that choice (unless the manager professes to be an all-weather manager, like the old balanced funds).

Some services go further to create custom indexes, both in constituents and non-market cap-weights, seeking to reflect the manager’s exact universe for analysis and the specified investment approach for selecting from this universe.

Analysis of the time series of manager contributions to value added can also identify issues such as whether the manager has simply benefited from higher systematic risk taking than in the benchmark, or tends to perform better in one type of market environment than another, or whether the results are heavily affected by a small number of exceptional observations.

Next week we’ll post part 2 of this article which looks at the state of benchmarks today