Establishing Evaluation Offices Part 3

By: Cindy Clapp-Wincek, Senior Evaluation Advisor

How to balance evaluation with other types of evidence

We ended the first blog in this series talking about the ups and downs of evaluation as a Federal agency priority. Sometimes evaluation is traded off for and put at odds with implementation. I remember when a manager involved in budget decisions refused to allocate any money for “studies” – he thought there was too much data gathering and not enough doing! As evaluators (or others who choose to read this blog) know, investments in evidence and evaluation can give managers the information they need to strengthen activities and policies. Fortunately, that case with the manager was decades ago.  

More recently, I have seen cycles in which evaluation was traded off with respect to the increased emphasis on performance monitoring. Sometimes this meant that performance monitoring provided raw data, and all of the analysis was left up to busy decision-makers/implementers. The newer priorities of Evidence, big data, and Learning are now part of this dance too. But these shouldn’t be seesaw ups and downs between different types of information. They should be thoughtful choices about what kinds of data, what kinds of analysis, and what degrees of avoiding bias are the best mix to increase the effectiveness, efficiency, and outcomes of programs.  

In the previous paragraph, I capitalized the “E” in Evidence and the “L” in Learning. I did so because these have specific meanings for each in the monitoring and evaluation world (see note on jargon in the second blog in this series). Now when we talk about Evidence, we are referring to the Evidence Act of 2018. There is a specific set of Executive branch expectations and criteria for how agencies will implement the Evidence Act specified in OMB memos and circulars (1). A good shorthand are the five evaluation standards developed through a well-informed consensus process (specified in the March 2020 OMB Phase 4 guidance):

  • Relevance and utility
  • Rigor
  • Independence and objectivity 
  • Transparency
  • Ethics

These standards provide a reasonable backdrop for all information gathering and analysis–all five of these standards should be reviewed when determining information needs. Indeed in the guidance, evaluation is included as one form of “Evidence” and performance monitoring is another. 

In another piece of complimentary OMB guidance, “attributes” of data are characterized as:

Data are most valuable when they are meaningful for analyzing progress and identifying ways to improve performance. Data need to be sufficiently accurate, and timely to inform a decision, behavior, or outcome by those who have authority to take action. For information to be actionable, it must be prepared in a format appropriate for the user.  

OMB, 2016, Section 240.9

In short, they are saying that data, including performance monitoring data, should be meaningful, accurate, and timely (2).  

When information lives up to these types of standards, managers can trust that they can safely use the information to make decisions that will enhance the effectiveness and impact of the programs they work on.  

Although all types of information gathering should indeed avoid bias, in my experience evaluation and evaluators have had the strongest set of practices and training to be able to actually achieve these standards. I have not found that to be true for all types of performance monitoring and learning.

What that generally cost was time and money. Getting evaluation findings generally took longer than looking at a dashboard of performance monitoring data or the collected wisdom of some Learning events. The question then becomes how good-enough does the data have to be? How rigorous does your information gathering need to be? For some people, only an impact evaluation (randomized control trial or quasi-experimental design) has the safeguards for controlling bias that achieves “rigor.” But if you are trying to make timely adaptations in programs, the answers from a full-scale evaluation may come too late. A good practice is to determine what constitutes “good enough” data with all key stakeholders and decision-makers at the outset of an evaluative activity, aligned with the intended use of the evaluation to make sure the level of rigor does not outweigh the utility of the effort.

Some of the ways that I thought about balancing these types of evidence and information included thinking about:

  1. Audiences and timing. Performance monitoring in the foreign assistance world had taken on a real accountability-to-Congress focus, and the Department of State housed “common indicators” so that progress could be compared across programs. Evaluation should be saved for those instances where the monitoring and learning evidence seem to be indicating that a program’s design is not holding or when a follow-on program is planned and the logic model is not clear. Bigger comparative evaluations can look at what is known about the effectiveness, efficiency, and outcomes of a certain type of programming. These should absolutely be read by on-the-ground staff, but these evaluations tend to be most directed to policymakers within the agency. They are most effective when there is some form of budgetary or priority decision coming up that this type of comparative evaluation informs. 
  2. Champions and interest. Build on whatever approaches to gathering better information have momentum at this time. USAID, for example, has a strong learning community that provides guidance, energy, and momentum. As indicated earlier, evaluation has champions as well that have sustained at least a minimum level of activity even when high-level attention is not focused there (I’m thinking of the years from about 2005-2011 when USAID had no evaluation office but a small function in the Management Bureau). An issue to manage is that the strong proponents of each type of evidence may not necessarily see the full value of the others. This is where the competition comes in. Managers and Evaluation Officers need to actively integrate the champions of each to get the most out of all the approaches.  
  3. External expectations. For most of my career, the Government Performance and Results Act was a strong driver for performance monitoring. The Evidence Act is now broadening out to multiple types of information with particular emphasis on evaluation. The standards, criteria, and actions identified can and should have an effect on what choices the head of an evaluation office makes. But, be sure to tailor them to the needs of your agency integrating and balancing external expectations with internal priorities.  

In summary:

  • Be thoughtful and deliberate about strengthening all types of systematic information gathering and use
  • Be responsive to the Evidence Act and use it as a tool for greater attention and organizational investment in evidence-based decision making
  • Use OMB guidance for experience and good practice – particularly the standards. Look at for more resources 
  • Distill the needs of the managers within your agency both at headquarters and on-the-ground and align evidence generating activities accordingly
  • Support and build momentum with your agencies’ evaluation/information champions at the senior or staff levels

We hope you have enjoyed our series on considerations for Establishing Evaluation Offices. Is your organization establishing or reprioritizing an evaluation office, implementing new evaluation policies, or going through other related organizational behavior change shifts? Headlight would be more than happy to support you and apply our expertise and experience to help make this change easier for you. For more about this, please contact us at Stay tuned for our next mini-series on Learning Reviews.

End Notes and Resources Referenced:

(1) Resources referenced here include but are not limited to  and Even further guidance is identified in those documents.

(2) The source referenced here is OMB Circular A-11 section 240.9, available at

Leave a Reply


  • Jim Rugh

    I complement Cindy for doing such a good job of articulating considerations that should be kept in mind by anybody involved in setting up or strengthening an evaluation office or function in an agency. Certainly her many years of experience helping federal agencies do this have provided valuable lessons learned and wise recommendations.

    However, as I read her blog what kept coming to my mind were “Fine, but what are the push-backs from those who are resistant to such evaluation functions?” More specifically, with the current news echoing in the background, I wondered about all those who seem to prefer their imagined worldview and, basically, have a “data be dammed” attitude. I.e. an antipathy to “fake news.”

    So an important reflective question that needs to be asked by those of us who strongly believe in evidence-based decision-making is: How can we convince those who are faith-based (strongly believing that their programs are good) that they should welcome (or at least accept) a more objective process of verifying that their good efforts are, indeed, producing the desired results?