Big Data 4 Development: An added tool to the Monitoring and Evaluation Toolbox

Articles & Insights

June 8, 2020

“M&E has a role to play in helping people and projects to adjust modalities alongside the COVID realities. To achieve this, both tech-enabled and human data collection and evaluation efforts need to be culturally and linguistically sensitive.”

— Jeff Chelsky and Lauren Kelly,

World Bank Blog

The challenge associated with the COVID-19 pandemic has made development partners and practitioners to ask difficult questions on how programs can be implemented using consistent data from the changing environment. In equal measure, implementing partners have also mentioned that the policies put in place in communities have made it difficult to access the beneficiaries with the time-of-reach being reduced considerably. While evaluations are considered a resolve of specific laid out periods in programs (base-line, mid-line and end-line), monitoring is a continuous expectation by the implementing, and development partners.

What is the issue under investigation?

The current methods of data collection and analysis have fallen short due to the restrictions of access and other policy directives. On the other hand, data production has increased within the same period. The World Bank has recently encouraged the use of digital data in monitoring and evaluation processes based on the lessons learnt from third party monitoring in insecure areas. Specifically,“M&E has a role to play in helping people and projects to adjust modalities alongside the COVID realities. To achieve this, both tech-enabled and human data collection and evaluation efforts need to be culturally and linguistically sensitive.”¹ This policy brief extrapolates the argument of inclusion of big data as an additional tool for monitoring and evaluation for development programs in the Horn of Africa.

Monitoring and evaluation can be defined as a rigorous independent assessment of either completed or ongoing activities. The process determines the extent to which activities/programs achieve their stated objectives and contribute to decision-making.² When conducting evaluations, it is important to prioritise human needs as influenced by political decisions; reflective of the environment where the program is implemented.³ The result is an intertwining of representative politics and science. Similar to questions that arise on what is actually considered as “change” or “valid data,” multicultural validity in monitoring and evaluation provides a vehicle for organising concerns about pluralism and diversity while reflecting on the boundaries of the intervention under evaluation.

Why is it of concern?

These ideals play well in environments with little to no interruptions with participation being an aspect that is always assumed based on the placement of beneficiaries.⁴ The development work in the Horn of Africa however, faces more threats than before. Intra and inter-conflicts have grown,⁵ terrorism and other forms of insecurity including climate change, poverty and hunger; contribute to uncertainties that may not allow for programs to stick to the consistent “normal” monitoring and evaluation approaches always utilised. Data from late 2019 and 2020 has also shown COVID-19 pandemic has created barriers to development work. These issues are further compounded with election periods in Africa, with three countries in the Horn of Africa anticipating elections in 2020. The questions around data for monitoring and evaluation will suffer challenges associated with access, reach and utilisation, if big data is not included in the toolbox.

Huge amounts of data is disseminated online in an amazing speed. This data is usually heterogeneous in form of unstructured texts, audio and video formats. The hidden value of development impact for example, can not be revealed by means of traditional data collection and fieldwork in the current networked and interconnected world.⁶ The commencement of the UN Global Pulse initiative in 2009 by the Secretary-General of the United Nations (UN), Ban Ki-moon was an indication that data can be used for good. Its goal included harnessing big data technology for human development.⁷ Data from mobile phone and social media can be utilised in fighting hunger, disaster and poverty and even countering violent extremism, and terrorism. It can be done through the inclusion of big data analytics in monitoring and evaluation of development programs in the Horn of Africa.

The process of collection of data for monitoring would be categorised to include; performance, environment, and operational data. Performance data are among the most monitored currently with most of the evaluations focusing on relationships between the inputs and the outputs for development proposes. The inclusion of the Environment data can be useful to detect variances that may affect the success of interventions. It is also critical to evaluate the Operational data that includes the feedback process from the implementing partners to the beneficiaries as this data contains various assumptions that affect the different theories of change for interventions. While this data remains a critical source of insight if collected and analysed continuously, interventions often concentrate on the performance category due to the challenges associated with limited tools to merge the heterogeneous data sets. As a result, the response timelines from governments, development and implementing partners, is strained and operations are based on “leaps-of-faith” that overlook current data cycles. The adoption of big data in monitoring and evaluation provide options and anticipate avenues for response in such situations.⁸

Ongoing monitoring and evaluation processes focus mostly on the input and outputs while anticipating the outcomes over a period of time. While this process seems ideal, challenges of association of the change from initiatives in the community reflect the influences from other variables out of the scope of the program. On the other hand, communities utilise and share data regularly, with behaviour change being influenced by the current state of affairs. An analysis of this data can determine the impact of development programs in communities at scale. It would be possible to understand the efficiency of programs, potential influencing factors, program status, predict responses from the community and also revise program strategies, based on the conversations that individuals in communities have. Therefore, Big data can fill the gap between knowledge extraction and presentation, and decision-making. The value of the data sought can provide guidance on acquisition, processing and application. Validation can be determined by several factors including fidelity, correlation and freshness. Correlation for any monitoring and evaluation data can be determined through its relationship to the specific development program and risk assessment. Real time outcomes from data sharing and networked outputs by members of the communities can provide an interpretation of the reality in the community.

While there are benefits of applying big data to M&E, there are some biases that may arise in the process. However, the biases associated with asking program specific questions to determine outcomes can be avoided through the use of big data as it allows for data fidelity in relation to the realities on the ground. Specifically, big data analytics can be associated with adaptive monitoring as it dynamically anticipates the changing environment, performance variables, and operations, to collect raw data, and analyse, to shape resource utilisation and impact from development programs. The development of dynamic programs can leverage on the current and predicted changes and assist in ensuring proper utilisation of finite development funding. The idea that nuanced issues need time for research to be applied, reports produced and a grant developed become non-existent as continuous streaming of data from communities provide an avenue for feedback and loop for development work. This serves as an opportunity to use data for good.

A lot of program activities waste resources on logistics and activities that do not contribute to the overall goal of the project. Big data allows for a comparison of multiple variables that go beyond availability of resources and presence of a recipient community to determine the best fit for program activities and planning. The wastage of development funding is reduced. The information and stories collected from program activities offline and online serves as missed opportunity as a lot of the information and meaning placement is lost on the isolated nature of the incidents as they occur. In reality however, these incidents are not isolated and activities build upon each other in a networked environment, requiring analyses and expansion of meaning. Such “success stories” can be brought together in a dashboard and used to show the impact beyond program activities through the use of big data.

With the current state of web 2.0, monitoring and evaluation is aware of the threats posed to institutions. The idea of leveraging distinct and heterogeneous data sources can help to draw a clearer picture of the system to protect, by correlating diverse information flows coming from multiple origins. This process can be possible to extract additional insights on potentially threatening activities that are being carried out. This is the basis of big data analytics as an additional M&E strategy. Exploiting the hidden value of data that is already available but not fully utilised, can assist in reaching the targeted populations especially in development programs.⁹ The mix of variable data marks the shift from a mostly human-controlled distributed monitoring model (data collection through fieldwork) to fully automated processes for monitoring that tries to relieve as much as possible the burden of analysing data to infer high-level information.

The granularity of monitoring and evaluation of development program data is critical. This process allows for adjusting program needs according to the contexts and the fast changing environments. Though activity and monthly reports complement; mid-line, and end-line external evaluations, an accurate tuning of the amount of variables to be monitored and the frequency of data collected is fundamental to study and plan at design time the computational load on the monitoring and evaluation infrastructure for development programs. Though big data analytics has strategic potential benefits for developing countries, there are some challenges in implementation phase, such as data access and privacy, human resource capacity, infrastructure, management, and financial capabilities. Furthermore, some developing countries are facing issues in digital divide and analytical processes such as methodology, interpretation accuracy, analytical methods and anomaly detection. While mixed methodology application in data collection for offline qualitative and quantitative data sets remain crucial, solutions to encompass the combination of multiple analysis techniques including big data, can improve the capability of detecting potential threats and triggering protection actions for development programs.

Where has it been applied before?

There is evidence that big data has been used in monitoring complex human engagements such as issues related to violent extremism and terrorism. One such evidence is a paper by O’Halloran et.al (2016), that uses multimodal analysis to interpret text and image relations in violent extremist discourse.¹⁰ The paper confirms that continuous use of manual analysis and discursive interpretation of limited number of multi-modal texts by mostly humanity based experts is not enough. It also proposes an important need to transfer to an automated recognition of multi-modal meaning on large data-sets as a way of coping with the volume of the data in continuous generation.

Using the evidence of big data application in monitoring from this paper, online narratives and messaging by violent extremists individuals and groups has grown exponentially over the last decade, as violent extremists and their supporters capitalize on the strategic and organisational gift of the online spaces.¹¹ This space is viewed as the networked opportunities for reaching and influencing transnational audiences. Yet much analysis of the intersection between terrorist influence and the virtual world of social influence (as witnessed through program evaluations), continues to be grounded in the persistence of content-and/or platform-focused analysis of online violent extremist messaging and interactions.¹²

Research in the humanities has focused on the potential negative aspects of the “Big Brother” scenario of digital data that includes; surveillance, ethical issues of privacy and confidentiality, openness and transparency versus security issues.¹³ However, the objective of including big data analytics to monitoring and evaluation can address the problems of society, to ensure crisis avoidance through the reduction of threats to stability. The process would result in healthy and sustainable development. Development programs can often have grand changes to community behaviours and in such cases as it would-be easy to pick and include them as part of the success stories in the evaluation process. In instances where change is witnessed at individual level and not-be viewed at grand scale, the individual behavioural change would be included through online interactions and relationships with peers. Just as lone wolf searching of weak-links online as Katie et. al (2014) mentions in terrorism research, Big data can aid in providing insights into programmatic changes especially in instances where the same can not be openly evidenced at grand scale.¹⁴

Which issues need consideration when applying big data?

Apart from the issues of data management, there are other critical issues that need consideration when opting for big data inclusion for M&E in development programs. A number of commentaries have suggested that large studies are more reliable than smaller studies and there is a growing interest in the analysis of “big data” that integrates information from thousands of persons and/or different data sources.¹⁵ The same can be said when including a larger sample size in monitoring and evaluation. These challenges can include, sampling error, measurement error, multiple comparisons errors, aggregation error, and errors associated with the systematic exclusion of information.¹⁶ It is therefore necessary to exercise greater caution to be sure that big sample size does not lead to big inferential errors.

To resolve these issues, the data collection and curation should enable easy combination of data across systems. Marching of various variables using big data can allow for the information to be shared across the large pool and increase representation. Additionally, the working of big data whether collected from social media pools of from program activities, should remain representative and acknowledge internet penetration (as an example), and other challenges that may either progress or bar communities from sharing. Information from the community on development programs can have ascertainment bias based on the frequency of information shared vis-à-vis the lack thereof. As a response, the big data pools can get fixed effect models that follow a random time schedule to respond to the frequency of information ingestion for the impact of specific development programs to communities.

Development work in communities is dynamic. It features activities that is influenced, and in-turn influences the behavioural, social and environmental responses from the beneficiaries in the community. The expansion to include these characteristic in big data analytics allows for a variable comparison and variance that provides the relationships and pictorial of the impact from development program based on the interactions of the various variables. Non inclusion of all the aspects of influence to the community may result in a measurement bias that most mid-line and end-line evaluations have. This is a discrepancy between the true outcome and the observed outcome that can be controlled through a systematic measure development that big data analytics may apply.

While real time data for decision-making may not be the forte for all development work in the Horn of Africa, studies suggest that the effect from interventions systematically reduce over time.¹⁷ Additionally as the population becomes more heterogeneous and complex, the effects continuously decrease in a phenomenon referred to as voltage drop.¹⁸ To this end, the ideal of providing mid-line and end-line evaluations suffers a lot of undocumented changes or limited levels of sustainability reporting as most of the information on development program is lost with time. While a complete scrapping of the mid -and- end evaluation processes is not recommended, the application of real time big data analytics can assist in dealing with this information loss.

How do we apply big data to development work in the Horn of Africa?

When developing a deployment for big data analytics, it is crucial to put in mind the data source, the privacy protection processes that may include the incorporation of an aggregate anonymiser, analytics and the production of targeted information.¹⁹ It is therefore important for development partners to stress the inclusion of data management and the possibility of big data relations from the programs funded. The application of big data in monitoring and evaluation can assist in the development of logical frameworks that are dynamic and allow for the application of real-time data on community engagement and program activities while influencing inputs for specific programs.

In conclusion, a lot of the decisions on development programs come from retrospective observations from activity reports rather than prospective observations based on the rapidly changing environment. As a result the changes in the community are often treated as assumptions for programs with funds set aside for rapid response and limited valuable insights on use. Unlike in the case of clinical trials, most development programs do not have a single outcome and yet the setup of the evaluation processes have a single focus setting based on the variable interactions. A key question ahead of every evaluation is whether key decision makers within the organization are likely to listen and be able to act on the evaluation’s findings. In times of COVID-19, institutional priorities and decision-makers’ knowledge and accountability needs are shifting.²⁰

Even as we adapt and redirect our focus, another overarching question remains about feasibility. It is important to consider available capacity and the resources to collect and analyse data needed to respond to our evaluation questions of interest. The application of big data analytics to M&E of development programs should start with the mapping of various strategic program objectives to technology. It should encompass the critical success factors, critical information, and solutions from big data, to be applied in gathering real time information. As a critical component, the inclusion of both observations is important, as documented in the World Bank article by Estelle, Jos & Mariana.

1. Jeff Chelsky, and Lauren Kelly, “Bowling in the dark: Monitoring and evaluation during COVID-19 (Coronavirus),” World Bank Group Blog, April 01, 2020, https://ieg.worldbankgroup.org/blog/mande-covid19, accessed July 02, 2020.

2. UNDP, Handbook on planning monitoring and evaluation for development results,” United Nations Development Program, 2009, available at: http://web.undp.org/evaluation/handbook/documents/english/pme-handbook.pdf, accessed July 02, 2020.

3. Jennifer Greene, “Stakeholders,” in S. Mathison (Ed.), Encyclopedia of evaluation, (Thousand Oaks, CA: Sage, 2015; pp. 397-398).

4. Pierre-Marc Daigneault and Steve Jacob, “Toward Accurate Measurement of Participation:Rethinking the Conceptualization and Operationalization of Participatory Evaluation,” American Journal of Evaluation, Vol 30: 3, 2009: Pp 330-348.

5. Redie Bereketeab. Ed.,“The Horn of Africa: Intra-State and Inter-State Conflicts and Security,” (Pluto Press, London Uk, 2013); Alex de Waal, “The Real Politics of the Horn of Africa: Money, War and the Business of Power,” (Polity Press Cambridge- UK, 2015)

6. Yuanjun Guo, Zhile Yang , Shengzhong Feng, and Jinxing Hu, “Complex Power System Status Monitoring and Evaluation Using Big Data Platform and Machine Learning Algorithms: A Review and a Case Study,” Hindawi Complexity Volume, 2018, Article ID 8496187, 21 pages, https://doi.org/10.1155/2018/8496187

7. Anwaar Ali, Junaid Qadir, Raihan ur Rasool, Arjuna Sathiaseelan, Andrej Zwitter and Jon Crowcroft Ali et al., “Big data for development: applications and techniques,” Big Data Analytics, Vol 1:2, 2016, DOI 10.1186/s41044-016-0002-4

8. Junaid Qadir, Anwaar Ali, Raihan ur Rasool, Andrej Zwitter, Arjuna Sathiaseelan & Jon Crowcroft, “Crisis analytics: big data-driven crisis response,” Journal of International Humanitarian Action, Vol 1: 12, 2016, https://doi.org/10.1186/s41018-016-0013-9.

9. Leonardo Aniello , Andrea Bondavalli , Andrea Ceccarelli, Claudio Ciccotelli , Marcello Cinque, Flavio Frattini, Antonella Guzzo, Antonio Pecchia, A. Pugliese , Leonardo Querzoni , S. Russo, “Big Data in Critical Infrastructures Security Monitoring: Challenges and Opportunities,” Journal of Big Data, 2014, accessed at, https://arxiv.org/pdf/1405.0325, date accessed July 02, 2020.

10. Kay L. O’Halloran, Sabine Tan, Peter Wignell, John A. Bateman, Duc-SonPham, Michele Grossman & Andrew Vande Moere, “Interpreting text and image relations in violent extremist discourse: A mixed methods approach for big data analytics, Terrorism and Political Violence, Vol 31:3, 2016: Pp. 454-474. DOI: 10.1080/09546553.2016.1233871

11. Jytte Klausen, “Tweeting the Jihad: Social Media Networks of Western Foreign Fighters in Syria and Iraq,” Studies in Conflict and Terrorism, Vol 38:1, 2015: Pp 1-22.

12. Mac Sageman, Leaderless Jihad: Terror Networks in the 21st Century, Philadelphia: University of Pennsylvania Press, 2014; Ines von Behr, Anais Reding, Charlie Edwards, and Luke Gribbon, “Radicalisation in the Digital Era: The Use of the Internet in 15 Cases of Terrorism and Extremism” RAND Europe, 2013, available at http://www.rand.org/content/dam/rand/pubs/research_reports/RR400/RR453/RAND_RR453.pdf, accessed July 02, 2020; Gabriel Weimann, “Terror on the Internet:The New Arena, the New Challenges” Washington, DC: United States Institute of Peace Press, 2006.

13. Max Craglia, Kees de Bie, Davina Jackson, Martino Pesaresi, Gabor Remetey-Fulopp, Changlin Wang, Alessandro Annoni et al., “Digital Earth 2020: Towards the Vision for the Next Decade,” International Journal of Digital Earth, Vol 5:1, 2012: Pp 4–21.

14. Katie Cohen, Fredrik Johansson, Lisa Kaati & Jonas Clausen Mork, “Detecting Linguistic Markers for Radical Violence in Social Media,” Terrorism and Political Violence, 26:1, 2014: Pp. 246-256, DOI: 10.1080/09546553.2014.849948.

15. Robert M. Kaplan, David A. Chambers, and Russell E. Glasgow, “Big Data and Large Sample Size: A Cautionary Note on the Potential for Bias,” WWW.CTSJOURNAL.COM, Clin Trans Sci 2014; Volume 7, Issue 4, 2014:Pp 342–346.

16. Big Data, Big Opportunities, and Big Challenges, Jeffrey A. Frelinger, The Journal of Investigative Dermatology Symposium (2015) 17, 33–35; doi:10.1038/jidsymp.2015.38

17. Steven A. Schroeder, “Shattuck Lecture. We can do better–improving the health of the American people,” National Library of Medicine, 20:357, 2007: Pp. 1221–1228. doi: 10.1056/NEJMsa073350.

18. Amy A. Kilbourne, Mary S. Neumann, Harold A. Pincus, Mark S. Bauer, Ronald R. Stall, “Implementing evidence-based interventions in health care: application of the replicating effective program framework.” Implemention Science, Vol 2: 42, 2007. https://doi.org/10.1186/1748-5908-2-42

19. Annas Vijaya, Linda Salma Angreani, and Mokhamad Amin Hariyadi, “Big Data Analytics: Towards a Model to Understand Development Equity for Villages in Indonesia,” AMATEC Web of Conferences 164, 01004, 2018.

20. Estelle Raimondo, Jos Vaessen & Mariana Branco, “Adapting evaluation designs in times of COVID-19 (coronavirus): four questions to guide decisions,” World Bank Blog, April 22, 2020, https://ieg.worldbankgroup.org/blog/adapting-evaluation-designs-times-covid-19-coronavirus-four-questions-guide-decisions, accessed July 02, 2020.