Introducing the PeaceKeeping Operations Corpus (PKOC)

Elio Amicarelli
Jessica Di Salvatore
Journal article

How to translate UN words into data that can be analysed


The PeaceKeeping Operations Corpus (PKOC) provides a machine-readable collection of 27 years of UN Secretary-General’s reports on peacekeeping operations. This creates exciting new opportunities for the peace and conflict research community to undertake quantitative content analysis on peacekeeping operations through the use of “text as data”. 

The PKOC contains the digitized records of 1,455 United Nations Secretary General (UNSG) reports covering 68 UN peace operations from 1994 to 2020. It can be updated automatically and easily combined with existing datasets on peacekeeping and conflict. It has three formats adjusted for analytical need: (a) plain PKOC contains digital versions of the ‘raw’ UNSG reports; (b) reduced PKOC contains pre-processed UNSG reports with stop words and punctuation removed, words converted to lowercase and stemmed; and (c) tagged PKOC contains the ‘raw’ UNSG reports where each word is annotated with respect to its grammatical class.

PKOC provides rich metadata for every UNSG report, including mission acronym, period of deployment, and host country, to allow the quick retrieval of key information. This enables easy comparisons between different UN peacekeeping missions within a specific deployment period as well as analysing changes within missions over time,

Analysing textual references to the “Protection of Civilians” (PoC) mandate can usefully illustrate how the dataset can be interrogated and interpreted. Each UN mission’s focus on PoC varies greatly, both in the mission mandate and in the frequency of UNSG reporting on this activity (see Figure 1). UNSG reports generally aim to provide an insight into the strategic development of UN peacekeeping missions rather than an extensive account of activities. Given the strategic focus of reports, not mentioning PoC could imply it is not necessarily a strategic priority for the mission. That would be problematic if missions had a mandate to protect civilians. Similarly, a lot of references to it may suggest a significant focus on this task (but not implying success/failure at performing it). For example, UNSG reports on the UN Mission in South Sudan contain the highest percentage of sentences relating to PoC – likely a result of both a strong PoC focus in the mission mandate, and the severe difficulties that the peace operation has faced in protecting civilians against a challenging conflict backdrop.

The PKOC dataset can also be used to create new variables to analyse changes in UNSG reporting. The structure and headings of each report can be utilised to measure a document’s complexity, i.e. its coverage of the array of peacekeeping-related policy domains. This measure, in turn, can indicate how, for example, a mission’s multidimensionality changes over time.

Read the Introducing the PeaceKeeping Operations Corpus (PKOC) article in full 

Policy summary prepared by Evgenija Kroeker.