All governments face the same problem: how can they know whether the actions they take to benefit citizens are successful or are, instead, wasting valuable resources and slowing social and economic progress? Obtaining that knowledge is hard and often considered a quixotic ambition, particularly in the data-poor environments of many middle- and low-income countries. Taking time to learn how well government programs work has also been criticized as a technocratic sideshow to the main stage of politics. The tide is turning, however. Throughout the world policymakers and citizens alike are recognizing that the very legitimacy of public sector institutions is jeopardized by their inability to demonstrate the positive differences they make and, when necessary, to change course to improve performance. Politicians are increasingly demanding “value for money,” citizens have the ability to quickly and widely broadcast complaints against the State, and standards of openness and accountability are trending upward. Evaluating and using evaluation results are increasingly seen as activities that are as intrinsic to good government as transparency.

While the evaluation of public policies and programs relies on innovations and experiences developed over more than half a century in recent years researchers and practitioners have greatly expanded the application of new methods to program evaluation in low- and middle-income countries, seeing this as a fundamental tool for social progress. Building on experience in industrialized countries, academic researchers, government officials, individuals at bilateral and multilateral agencies and non-governmental organizations have promulgated innovative evaluation approaches that are appropriate for varied contexts in middle- and low-income countries. Contemporary leaders in South Africa, Mexico, Colombia, Brazil, Indonesia, Rwanda, Kenya and many other countries have committed to evaluation as an instrument of accountability to voters, and a means of fulfilling their executive responsibilities. By interrogating the effectiveness of efforts to prevent disease, improve learning outcomes, increase family incomes, and reduce gender bias, supporters of program evaluation are contributing both to improvements in specific interventions and to the larger cause of enlightened social and economic policy.

Politics First, Effectiveness Second

Core social choices are worked out in political processes, whether democratic or otherwise. Questions such as assigning priority to defending borders versus improving schools or building roads are answered through political negotiations that reflect collective values and power relationships. Despite efforts to override processes to arrive at a set of social choices – for example, by asserting a set of affirmative universal rights or by advocating “value- neutral” tools like cost-benefit analysis – government priorities are rightly established through the wonderful and messy human process referred to as “politics.” Evidence, knowledge and technical expertise has its role to play in this process, but it is neither determinate nor sufficient. Rather evidence is itself contested in this forum but it does inform and shape debates.

Once these choices are made, the tasks facing governments are how to design, fund and execute often massive public programs that are aligned with those priorities, and then to measure progress against expectations. Governments have to sort out how to identify and reach target populations, how to set benefit levels, how to deliver services of high quality but affordable cost, and many other tricky issues for which there is no recipe or playbook. In the education sector, for example, one political administration may wish to expand the role of private providers while another may seek to universalize and improve public education. While the agendas differ, they both imply a need to figure out how to use public funds and policies to achieve the goals. It is at these stages that technical, empirical tools have more direct benefit, influencing managerial choices, regulatory decisions, and policy design. While all of the technical tasks are difficult, perhaps the most difficult to undertake in a systematic and sustained manner is the measurement of progress. Yet without it, the public sector perpetually lacks the information required for improving program design; has difficulty sustaining support from constituents when opposition emerges; and finds implementation bottlenecks challenging to overcome.

The problem of measuring what matters, faced by governments of all countries, is particularly important to solve in middle- and low-income countries. With vastly more needs than domestic (plus donor) funding can meet, with weak and unreliable official statistics, and with severely limited technical capacity within government agencies, policy makers in developing countries typically operate in the dark. Yet the stakes are extraordinarily high. An inability to know what’s working is very costly, resulting in scarce funding and political capital being wasted on ineffective if well-intentioned schemes.

Evaluation Holds Much Promise

In many developing countries, so little attention has typically been given to empirical information and technical considerations that the design or modification of health, education and anti-poverty programs is influenced by the latest ideas from consultants sent by donor agencies; by improvised adaptation of efforts in neighboring countries; or by guesswork. The opportunities for false assumptions and self-interest to affect program design and implementation are manifold.

Public officials are not the only ones who operate in the dark or on the basis of the limited signs of success or failure that they can observe directly. Citizens are similarly constrained. Other than public budget information – which is increasingly available to the public thanks to the “open budgets” movement – citizens and the groups that organize on their behalf have few sources of information about how well or poorly government programs are being implemented. They have almost no information about the effect of government programs on outcomes such as improvements in health within disadvantaged communities, reductions in sexual violence, improvements in the ability of school age children to read and write, increases in the income of women in poverty, or improvements in the productivity of small- scale farmers receiving seed, fertilizer and training. Without such information, they are lacking crucial facts that could inform their votes or citizen action.

This is where many types of program evaluation demonstrate their value. Program evaluation includes dispassionate assessment of whether a program was implemented as designed. Rigorous factual analysis can detect how many seemingly well-designed programs lose their way in basic implementation (White 2009). This might include, for example, situations in which the beneficiaries are not identified well, the staff are poorly trained, or supplies are stuck at the port of entry. A central task of examining the effectiveness of government programs is to simply answer the question: Was the program implemented as designed? If not, why?

In Kenya, for example, a World Bank-financed project sought to improve agricultural extension practices, and yet the evaluation found little change in what extension agents were doing during the project lifetime; only 7 percent of participating farmers had the amount of contact with extension agents that the project design had anticipated. In Bangladesh, most of the women and children who were supposed to receive supplementary feeding in a large nutrition program did not. This type of execution failure is prevalent, and can be detected with basic program evaluation methods that track actions to see whether implementation occurred as planned.

In addition to identifying execution failures (and successes), program evaluation can provide valuable information about the cost of interventions and targeting strategies and the system outputs (such as the number of trainees or the number of women with access to savings accounts). It can shed light on institutional strengths and weaknesses that influence the ultimate sustainability of any effort. It can reveal the meaning and interpretation of change as experienced by beneficiaries themselves.

Evaluations which assess execution, operations, costs, strategies, institutional development, and meaning all answer important questions. Another set of fundamental questions relates to impact in terms of outcomes. These questions are:

  • Did the program, when implemented as designed, improve outcomes?
  • Were the gains large enough to be worth the cost? And
  • Are the gains larger than would have been produced with alternative ways of

using the same resources?

These questions, important as they are, are rarely answered. Each hinges on an ability to measure the net impact of a particular program on a defined set of outcomes at the individual and/or community level. Furthermore, the usefulness of answering these questions for a particular program is limited unless situated within a larger body of evidence from which to assess the reliability of findings and compare the program with alternatives.