Jose Miguel Gomez, Reuters

International Reports

Accountability Is Only the Beginning

A Plea for the Strategic Use of Monitoring and Evaluation

Discussions about monitoring and evaluation in development cooperation still tend to revolve around justifying the use of funds – often taxpayers’ money – and proving their effectiveness. Of course, this is right and important, but monitoring and evaluation harbour the potential to do more. The goal must be a change in attitude, moving away from being “guardians of the indicators” to becoming “friends and helpers”.

“No policy area is scrutinised so closely for its effectiveness as development policy”. This statement by German State Secretary Martin Jäger reflects the pressure for accountability exerted on development cooperation and overseas aid. This is completely understandable – after all, taxpayers need to know how much of their money is being spent overseas, for what purpose, and to what effect. Moreover, development cooperation can be a controversial issue. Since the early 2000s, experts like the US economist William Easterly have been criticising the development assistance efforts of international institutions, primarily from an economic point of view. He argues that setting the wrong incentives can lead recipient countries to follow counterproductive paths, saying that this has occurred more often than success stories in recent decades. And the pressure is rising. Other actors are now focusing on many recipient countries’ growing dependence on development assistance. For example, populist movements insist that foreign aid is mostly a waste of money and should be cut. This is also happening in Germany, where discussions about budget cuts – similar to those that took place under former US President Donald Trump – are spearheaded in the Bundestag by the right-wing populist AfD parliamentary group in particular.

The inevitable financial and economic difficulties triggered by the COVID-19 pandemic will further ignite the debate, gradually shifting it from the political to the public arena. As it happens, the public is already sitting up and taking notice. Development cooperation is said to be expensive, inefficient, and can in fact cause more damage than good to its target groups. Special emphasis is placed on the financial aspect regarding the use of taxpayers’ money as a reason for questioning the whole principle underpinning development cooperation. A survey commissioned by the German Institute for Development Evaluation (DEval) reveals that one in four Germans believes development cooperation to be ineffective, with only one in ten viewing it as effective. One complaint by the critics is that around half of all development aid fails to reach its intended recipients due to corruption.

At the same time, there is growing pressure from international actors, too. Countries opposed to principles of the liberal world order attempt to entice Germany’s traditional development cooperation partners with lucrative, unconditional offers, thus creating “donor competition” in recipient countries. The fact that these tempting offers are not always guided by sustainability and the interests of the recipients but by the donor’s own financial and strategic advantages and interests, does not seem to stand in their way. There are growing fears about whether – and for how long – values-based development aid propagated by the “West” will be able to keep up in terms of their appeal.

In short, experts, politicians, and institutions engaged in development cooperation are facing mounting pressure to justify their activities. This article will now focus on the situation in Germany, where expectations placed on development cooperation stakeholders can seem somewhat utopian. Development assistance should combine speed with maximum impact – along with low and fully traceable expenditure. This is normal in technical cooperation (TC) – measuring TC and disaster relief against this yardstick should be possible. In the governance sector, however, the situation is more complex not merely because it involves long-term commitments. Nevertheless, development cooperation can and must not evade the requirement to strive for effectiveness, while also documenting and communicating it to the outside world. How can this be credibly achieved? And what instruments can be used to not only deal with the – partly justified – pressure for accountability but also to put this pressure to good use?


Can Monitoring and Evaluation Act as an Internal Compass?

Going beyond ethical principles and values, data should be able to provide the best possible evidence that good use is being made of financial and human resources and that the rationale behind projects is guaranteed. There are two key instruments for keeping the development cooperation ship on course and away from danger: monitoring and evaluation (M&E). Until now, these terms have mainly been used for the purpose of accountability, but they have much more potential. They can proactively and purposefully help to steer programmes and act as their “internal compass”. Both are essential components of development cooperation but often fail to live up to their capabilities, particularly in smaller organisations, in the “tug-of-war between learning and accountability”. Having said that, the internal and external pressures on development cooperation as a whole have led to the professionalisation of M&E. The achievement of objectives, impact, and efficiency all must be monitored under the German Federal Budget Code. What is more, most non-governmental organisations (NGOs), political foundations, and other organisations now have their own M&E units and structures. The accountability debate mainly focuses on financial and administrative procedures. However, modern management systems can now record financial flows with great speed and accuracy and thus reduce the risk of money straying from its intended purpose. But it is more difficult to deal with objections about the effectiveness of development cooperation. This is also because, particularly in the area of governance, evaluating effectiveness is perceived as being both more difficult and more opaque than financial monitoring, making it of little use as an orientation aid. This means the results rarely find their way into strategies or policy debates and are correspondingly more difficult to retrieve.

The greatest challenge lies in understanding how to prove and document the “impact”.

There are manifold reasons for this. On the one hand, those who implement development cooperation often tend to reduce M&E mechanisms to accountability, while ignoring the potential and opportunities that they afford (such as for self-reflection and initiating learning and strategies). On the other hand, the use of excessively technical terminology and methods can be a deterrent and is usually unwieldy and inadequate for governance issues. This means that findings are often communicated in unappealing, even incomprehensible, ways that are not tailored towards their audience.

But the greatest challenge lies in understanding what needs to be proven, the concept to be documented – the “impact”. What is meant by “impact” and how is it defined?


Distinguishing Between Output, Outcome, and Impact

The renewed discussion about the concept of “impact” reflects a fundamental change in German development cooperation over the last twenty years. This is illustrated, for example, by the establishment of Germany’s largest and most well-known international development agency. In 2011, the Deutsche Gesellschaft für Internationale Zusammenarbeit (GIZ) was created through the merger of the Deutsche Gesellschaft für Technische Zusammenarbeit (GTZ), Capacity Building International (InWEnt) and the German Development Service (DED). Even before the name was changed, a trend had begun that is not unique to GIZ: the tendency to embed technical cooperation, infrastructure development, and even emergency relief in overarching social and political levels and structures, bolstered by “good governance” measures, support for administrative reforms, and policy advice. In the area of good governance alone, GIZ has reported a 65 per cent increase in projects since 2008, and the financial volume has more than tripled in the same period.

TC thus became IC (international cooperation), which posed the question of impact and how to measure it once again, but from a different angle. This is also against the background of changes to the international debate. Consequently, agreements and international commitments, which Germany has also entered into, are now more focused on impact. It may still be possible, at least ostensibly, to establish causality between an action and the changes that ensue in the case of technical activities. However, this becomes even more difficult as the links between cause and effect become more complex. For example, a new well enables the irrigation of arable land, which yields better harvests and leads to improved food security in a region. But in the case of a project designed to bring about long-term changes in patterns of behaviour, it is far more difficult to prove or even identify causal links. Have consultancy services in local authorities and workshops with civil servants resulted in improved budgeting in the municipality? Does this in fact reduce corruption, initially at local level and later at national level?

It is more difficult to measure impact when it comes to civic education and policy dialogue programmes.

The task of evaluators and M&E experts initially became more difficult when technical cooperation was accompanied by measures in the political sphere and public administration. Where interventions such as the above-cited wells or seed bags had a tangible, direct impact on the recipients that could also be proven through observations or traditional quantification, complex structural changes are much more difficult to measure, and the description of their impact is correspondingly broad. And it is even more difficult to define and measure impact when it comes to civic education, policy dialogue programmes, and similar initiatives.

It is, therefore, hardly surprising that the debate surrounding effectiveness in development cooperation has become more differentiated, but also more diffuse. Depending on their mandate, internationally active organisations measure their work against different benchmarks. For example, the World Bank – as a development bank – sets itself narrower criteria and defines impact as “the indicator of interest with and without the intervention: Y1 - Y0”. According to this approach, impact is only achieved if the difference triggered by a certain intervention can be measured beyond doubt, although this cannot be attained without the costly inclusion of control groups.

Whereas the OECD Development Assistance Committee (OECD DAC) defines impact as “positive and negative, primary and secondary long-term effects produced by a development intervention, directly or indirectly, intended or unintended”. This second definition is broader and also forms the basis for the approach taken by most German development cooperation organisations. This is understandable and appropriate for a multilateral organisation such as the OECD (Organisation for Economic Co-operation and Development). The assumption is that impact can take place at different levels. The English language makes a basic distinction between output, outcome, and impact. Output is the direct result of an intervention, such as a workshop. It is typically easy to measure, whereas the other two categories are more difficult to pin down. Outcomes should include behavioural changes among specific target groups, i. e., effects at the target group level, or changes in status (e. g., malnutrition in a target population has been reduced by a factor of X), while impact goes beyond this and encompasses a macro level, e. g., society or some of its parts. These levels are described accordingly as “sphere of control (output)”, “sphere of influence (outcome)”, and “sphere of interest (impact)”. Against the background of these differentiation options, English has become the lingua franca of German development cooperation because of its ability to differentiate more precisely between the various levels of impact and hence the changes that have been achieved.

Considering the difficulty of defining and measuring what has been achieved, it is not surprising that justifying and proving a project’s “meaningfulness” in terms of its “effectiveness” poses quite a challenge. This is made even more difficult given that terminology is not understood equally or uniformly by all parties involved in development cooperation. This is something that hampers communication with target groups in Germany. So even the terminology presents an obstacle – but this is just one among many.


Problematic Perceptions

Growing pressure for accountability over recent years and the increasingly complex terminology have led M&E units in virtually all German development cooperation organisations to take on more staff and professionalise their operations. The demand for postgraduate courses in this field has also increased in recent years and many consultancy firms focus exclusively on evaluations for development cooperation organisations worldwide.

The professionalisation of M&E units enables data to be aggregated with greater speed and accuracy and provides the prerequisites for a wide range of enquiries – such as from ministries or parliamentary groups in the Bundestag – to be answered more fully and, if necessary, more quickly. However, the pressure for accountability experienced by M&E units may also make them join others in viewing themselves as oversight panels. This means the urgency of monitoring impact in their day-to-day work may supersede the benefits of M&E measures, which could be of use in other areas: such as in supporting strategic decision-making, identifying niches where they have unique strengths compared to competing institutions, and much more. They also play a key role in the institutional learning process. Even though projects may be set up differently depending on the region or topic, there might still be similarities in the project processes. A body with an insight into all projects across the world could be of great value for the institutional learning process, but it is often inadequately exploited as a resource.


Theory versus Practice

Monitoring and evaluation are carried out at all levels in development cooperation: at the micro level (the activity), the meso level (the project), and the macro level (the programme or sector programme). For example, a workshop may already have a direct and demonstrable impact on its participants. We can assume that these participants go on to influence those around them, which, in turn, leads to behavioural change at a broader, higher level – but this is more difficult to prove. It is true that these are different aspects with differing requirements of M&E, yet they still derive from the lowest level and are interrelated. If data is gathered incorrectly at the micro level, it is difficult to gain the best possible picture at a higher level and becomes even more difficult the further one goes.

Data collected in the field often fails to meet the standards expected of M&E units.

Sophisticated monitoring and scrupulous data collection at the micro level provide a fundamental framework for all future data collection. A well-thought-out system with milestones and opportunities for redirection and intervention is essential but requires human resources in the field backed up by financial resources. What is more, everyone involved needs a clear understanding of the issue at hand. However, these resources are often inadequate, so it is hardly surprising that data collected in the field fails to meet the standards expected of M&E units in most cases. Surveys asking participants about their levels of satisfaction or impressions tend to contribute little to impact evaluation. It can also be challenging to monitor complex structural changes, the effectiveness of “track 2 formats” (informal discussion channels, often in diplomatically or politically sensitive contexts), and networks since it takes time to survey participants and observe project managers – both requirements that can be difficult to meet. In addition, the (usually written) impact indicators might not be in line with reality or particular characteristics of the project. Obviously, this makes it even more difficult to carry out evaluations at programme level. Evaluators often have to deal with anecdotal evidence and rely on the gut feelings of project managers and their superiors, making it hard to quantify their findings. Added to this is the long-held belief that the main role of M&E units is for control. This makes it difficult to be transparent about missing or undesirable results, even if it is nobody’s fault. This additionally hampers the acceptance and potential impact of M&E measures.


Evaluation in Practice – What Are the Consequences for Projects?

In practice, how often are negative results dealt with transparently? Do evaluations also reflect when projects are unsuccessful? And what about the validity of the evaluations, how close do they come to reality on the ground? These are legitimate questions, as sometimes evaluators must accept that they only have a limited understanding of the general framework (and thus of the project). They collect their own data over a limited period and only analyse previously collected data. That is why evaluations are “merely” assessments and deliberations based on (albeit well-founded) assumptions. Depending on available data, these may be closely aligned with the project realities but can never fully reflect them. However, evaluations never claim to do this, and thus rarely tend to make radical recommendations. Both internally and externally, they should be viewed as just one of many elements involved in management and strategic processes. Nevertheless, the fear that an overly negative evaluation could determine the project’s future and their own career makes project managers defensive (and evaluators worry that they may not get any more assignments).

Still, it is not uncommon for evaluations to be critical, though they usually take a constructive tone and focus on what has been achieved. Even in a country like Afghanistan, where, despite significant efforts in the area of peace and state-building by multiple stakeholders, any successes are often stymied by the volatile security situation and a resurgence of groups such as the Taliban or local cells of the so-called Islamic State, evaluations such as those carried out by the German government have concluded that “Afghanistan’s economic and social development […] has already made remarkable progress since the overthrow of the Taliban” and that “Germany […] has contributed to this development over the past 18 years – especially through its development cooperation – and, together with the Afghan government and the international community, has laid vital foundations for the country’s social and economic progress”. This is just as true as the statement that in principle “the political will, the political assertiveness, the political values and the design of the economic system in the partner country” impede or facilitate the work.


Handling Failures and Mistakes

It is regularly observed that development projects’ success or failure mainly depends on their basic conditions. Other than tactical issues, it is these conditions (rather than evaluations) that most often decide whether projects can continue or whether development work in the particular country can even carry on at all. For example, the chaotic situation in Yemen following the overthrow of the then President Ali Abdullah Saleh forced most international development cooperation organisations, including GIZ, to withdraw their staff and work remotely with local groups. Obviously, the cooperation with Yemen has had to be adapted considering the difficult conditions since then. However, this does not mean the projects are inevitably less effective, provided the adaptation is appropriate and well thought-out. Evaluations can be helpful in this respect. In the end, however, the organisation must decide whether and how to proceed with a project. The same applies to how it handles failures and mistakes. There is no blanket response to the question of whether the criticisms and recommendations of evaluators are adequately addressed, for example in follow-up projects. At the end of the day, it is the organisation that makes the decision in this respect, taking all the above-mentioned points and other relevant issues into account.

A constructive error culture would help to ensure that lessons learnt from mistakes could also benefit other projects worldwide.

It also applies to the handling of data obtained and the findings based thereon. Data is dry, so it is vital to present it in the right way for each target group. But this is usually limited to an evaluation report, which often provides the sole basis for all types of communication – whether with management, the funding organisation, or when handling requests under the Freedom of Information Act. If the communication is to be “heard”, it needs to address the extremely varied needs and interests of different groups. In this respect, the evaluation units in particular have an obligation towards the various stakeholders. They must be able to interpret and prepare findings in such a way that the benefit of evaluations is clear for recipients to see – as well as ultimately the benefit of the project work itself. After all, they provide the link between the organisation and the – mostly external – evaluators and can best assess which routes are worth pursuing and which not. This is also the only way to initiate internal learning processes or be involved in higher-level strategic consulting.


A Change of Image – From Monitoring to Consulting

Several steps are required to address the above-mentioned challenges. Firstly, there needs to be a change of perception. M&E units have to master the balancing act of performing their monitoring function without being “guardians of the indicators”. They (also) need to act as consultants and offer solutions. Rather than simply looking back and making judgements, they must look ahead in a constructive way. An approachable manner is part of being a “friend and helper”. Conversely, “being evaluated” tends to erect barriers because it always entails being judged. A positive error culture and the will to learn and change is the only way to ensure that evaluations are viewed as helpful instruments that can be used in a profitable way. The question of accountability should not represent an obstacle to the learning process: “Evaluation serves two main purposes: accountability and learning. Development agencies have tended to prioritize the first, and given responsibility for that to centralised units. But evaluation for learning is the area where observers find the greatest need today and tomorrow.” In particular, the results of monitoring and evaluation processes should be used to build on projects and, where necessary, avoid deficiencies that have already arisen in similar situations. The importance of learning from M&E and acting on its findings was noted in the latest OECD Peer Review.

Discrepancies between theory and practice are difficult to overcome, as professionalism in the field cannot always be expected or put into practice in equal measure. It is not always possible to ensure that monitoring corresponds to requirements, particularly due to a lack of personnel in many cases. Even if data can be collected and evaluated (which is not always fully successful in view of diverse and challenging tasks faced by project managers), analysing and processing the data usually presents another difficulty. It is not possible to assess medium and longer-term effects at the political or economic level on an ad-hoc basis. One option would be to focus on sampling, especially for smaller organisations that have to manage without a monitoring officer, and to work with qualitative data collection methods in lieu of quantification. While these may be easier to manipulate, they are more meaningful than data that is not consistently collected in a correct manner. The focus should be more on the positive effect of learning from mistakes and less on the fear of admitting to failure. A constructive error culture would help to ensure that lessons learnt from mistakes could also benefit other projects worldwide, provided that the findings are recognised, addressed, and communicated. Apart from this, circumstances outside one’s control (such as conflicts or natural disasters) may also torpedo projects and render them ineffective. Here, too, dealing with mistakes, if communicated, could be a help and stimulus, but without any compulsion to change course in subsequent projects.

Communication is therefore a vital prerequisite for learning processes and the key to increasing the relevance of M&E measures at the micro, meso, and macro levels, while also being essential for documenting and sharing the findings: Who receives what information, and what do they expect? How well-versed are they in methodology and what time resources are available? It is important to prepare data and communicate it such that it is appropriate for the target group. After all, different data or ways of preparing it are needed for a parliamentary question or ODA statistics than for communicating impacts to the press and public.

Just like management consultants, M&E units should ask themselves these and similar questions before communicating the results of evaluations. Especially on strategic and political issues, they must be able to extrapolate, prepare, and anticipate potential effects at policy level in order to anticipate transfers to other actors. However, in the political and governance sphere they should also exercise caution and ask themselves which aspects they should perhaps not communicate. Faced with undemocratic or authoritarian societies and structures, too much transparency could prove to be counterproductive or even dangerous in some cases. This is why the applicable maxim should be that normative requirements should be openly communicated, while operational information (e. g., protecting sources) should, in some cases, be treated as confidential.


Conclusion and Outlook

In any case, the “right” communication will become increasingly important due to the recent trend for government spending – particularly on development assistance – to be subject to ever closer scrutiny by politicians and sections of the public. Above all, the consequences of the coronavirus pandemic and its impact on the German economy will further intensify the debate on development spending. In turn, this will increase the pressure on development cooperation actors to document and communicate – or even market – the effectiveness of their activities. Not to mention the challenges posed by systemic rivals such as China, the Gulf states, and Turkey, which exploit every conceivable means to gain influence, and not only on the economic front. Accordingly, attractive concepts and documented impacts are selling points for German development cooperation organisations that should not be overlooked.

The Federal Ministry for Economic Cooperation and Development (BMZ) is also aware of this and has added issues such as effectiveness and data availability to its agenda as part of the BMZ 2030 reform strategy. This reflects how the BMZ clearly understands the growing need for accountability. Thus, the concept of impact – and hence impact documentation and communication – will become even more important and less of a niche aspect when designing internal processes. It is essential to anticipate this development and proactively take steps to move towards one other. Only in this way can M&E experts and policymakers turn strategically effective monitoring and evaluation into an internal compass for development cooperation – a compass that steers the ship towards success and acceptance, even in turbulent waters.

– translated from German –



Dr. Angelika Klein is Head of the Evaluation Unit at the Konrad-Adenauer-Stiftung.


Lukas Kupfernagel is Desk Officer in the Evaluation Unit at the Konrad-Adenauer-Stiftung.


Choose PDF format for the full version of this article including references.

Dr. Sören Soika


Editor-in-Chief International Reports (Ai) +49 30 26996 3388

Fabian Wagener

Fabian Wagener

Desk Officer for Multimedia +49 30-26996-3943