[PDF] Understanding Civil War Violence through Military Intelligence: Mining Civilian Targeting Records from the Vietnam War

Abstract

Military intelligence is underutilized in the study of civil war violence. Declassified records are hard to acquire and difficult to explore with the standard econometrics toolbox. I investigate a contemporary government database of civilians targeted during the Vietnam War. The data are detailed, with up to 45 attributes recorded for 73,712 individual civilian suspects. I employ an unsupervised machine learning approach of cleaning, variable selection, dimensionality reduction, and clustering. I find support for a simplifying typology of civilian targeting that distinguishes different kinds of suspects and different kinds targeting methods. The typology is robust, successfully clustering both government actors and rebel departments into groups that mirror their known functions. The exercise highlights methods for dealing with high dimensional found conflict data. It also illustrates how aggregating measures of political violence masks a complex underlying empirical data generating process as well as a complex institutional reporting process.

Full PDF

UUnderstanding Civil War Violence throughMilitary Intelligence: Mining Civilian TargetingRecords from the Vietnam War

Rex W. Douglass ∗ June 2015 † Abstract

Military intelligence is underutilized in the study of civil war violence. Declas-siﬁed records are hard to acquire and difﬁcult to explore with the standard econo-metrics toolbox. I investigate a contemporary government database of civilianstargeted during the Vietnam War. The data are detailed, with up to 45 attributesrecorded for 73,712 individual civilian suspects. I employ an unsupervised ma-chine learning approach of cleaning, variable selection, dimensionality reduction,and clustering. I ﬁnd support for a simplifying typology of civilian targeting thatdistinguishes different kinds of suspects and different kinds targeting methods.The typology is robust, successfully clustering both government actors and rebeldepartments into groups that mirror their known functions. The exercise highlightsmethods for dealing with high dimensional found conﬂict data. It also illustrateshow aggregating measures of political violence masks a complex underlying em-pirical data generating process as well as a complex institutional reporting process.

Civil wars blur the line between civilians and combatants. This is the fundamentalproblem for governments that must separate rebels from innocents and for civilianswanting to remain neutral and safe. Military and police forces expend enormous re-sources attempting to identify and eliminate speciﬁc individuals ﬁghting for violentgroups. What do those programs look like from the inside? How do they pick their ∗ I thank David Madden, Josh Martin, Walter Fick, and Roxanna Ramzipoor for research assistance, andthe United States National Archives staff, particularly Richard Boylan and Lynn Goodsell for generouslysharing their time and expertise. I am grateful for comments from Erik Gartzke, Joanne Gowa, KristenHarkness, Stathis Kalyvas, Chris Kennedy, Matthew Kocher, Alex Lanoszka, John Lindsay, David Meyer,Kris Ramsay, Jacob Shapiro, Tom Scherer, members of the Empirical Studies of Conﬂict Group, the YaleProgram on Order, Conﬂict, and Violence, the UCSD Cross Domain Deterrence Group, and two anonymousreviewers. This research was supported, in part, by the U.S. Department of Defenses Minerva ResearchInitiative through the Air Force Ofﬁce of Scientiﬁc Research, grant † Forthcoming as a chapter in C.A. Anderton and J. Brauer, eds., Economic Aspects of Genocides, MassAtrocities, and Their Prevention. New York: Oxford University Press. a r X i v : . [ c s . C Y ] J un argets? How effective are they, and how are their costs distributed across both civilianand rebel supporters?Targeting programs are necessarily secretive, which makes government recordshard to come by. When available, they tend to consist of unwieldy, unstructured, andoften undigitized intelligence dossiers, which makes quantitative analysis an expensiveproposition. Because of these problems, nearly all existing studies of targeting pro-grams are either qualitative in nature (Moyar & Summers, 1997; Comber, 2008; Nat-apoff, 2009) or depend on data developed from nongovernment sources like interviews,surveys, and news reports (Ball et al. , 2007; Silva et al. , 2009). When declassiﬁedgovernment records are available, they tend to be at the event level, like attacks (Biddle et al. , 2012; Berman et al. , 2011) or air operations (Lyall, 2014) without details on thevictims. The rare exception are peacetime police records like Stop and Frisk data fromNew York, which provide information on both demographic details about the suspectand details of the altercation (Gelman et al. , 2007).I analyze an electronic database of civilian targeting efforts created during the Viet-nam War. The data are extensive, covering 73,712 individual rebel suspects from theperspective of the government’s police and military operations. The database containsdetailed information on both the victims and the operations targeting them. At issueare two central questions.The ﬁrst is descriptive: How exactly does a civilian targeting program work inpractice? All civilian targeting programs are secretive, but the Phoenix Program isuniquely surrounded in historical controversy. During and immediately after the warcritics and proponents debated whether it was a broad intelligence and policing effort orsimply a punitive assassination program (Colby & McCargar, 1989). Since the war, thediscussion has turned to whether the Phoenix Program achieved its aims of neutralizinghigh ranking targets (Thayer, 1985). More recently, it has been asked why the programcaught some suspects while letting others escape, and what implications that mighthave for civilians deciding whether to join rebellion (Kalyvas & Kocher, 2007).The second is theoretical: How should we conceptualize civilian targeting? Is therea simpler topology we can use to classify violence against civilians in terms of thekinds of victims or the kinds of methods employed? Much of the work on civiliantargeting either explicitly or implicitly dissagregates along the severity of targeting,treating killings as distinct from arrests because they are theoretically different or easierto document. Others divide targeting along how the victims are selected, particularlywhether they were individually singled out or targeted as part of a larger group (Kalyvas2006). Is there a principled and data driven way to categorize and describe civilian Two prime examples include the East German Stasi ﬁles which came into public stewardship, and theGuatemalan National Police Archive which is now in the charge of government and international humanitar-ian agencies as part of a truth and reconciliation effort (Aguirre et al. , 2013). The records are comprehensivebut unstructured and will require a major effort to analyze once the all of the raw documents are digitized(Price et al. , 2009). For some of the issues and methodology of working with retrospective sources see (Price & Ball, 2014;Seybolt et al. , 2013). With few exceptions, the database and most declassiﬁed intelligence products from the Vietnam Warhave remained unused. This is partially because working with found data is technically challenging, requiringextensive cleaning and documentation— and partially because the tools for such data are only now gainingpopularity in the social sciences.

The Empirical Background: War in South Vietnamand the Phoenix Program

In the later years of the Vietnam War, the South Vietnamese government went on the of-fensive against the nonmilitary members and supporters of the rebel opposition. Whilepolitically effective, the 1968 Tet-Offensive was a military setback for irregular forcesthat shifted the main military threat from the organized rural insurgency to the incom-ing conventional forces from North Vietnam. The government took advantage of thisshift by pushing out into rural areas, projecting its power with a wide range of police,militia, military, and special forces units. With the expansion of government controland a ﬂood of funds, logistical support, and U.S. advisers, the government set out tomap, monitor, and police its civilian population at an industrial scale. Goals were setout to dismantle the political opposition with widespread arrests, killings, and induceddefections.The Phoenix Program (Phung Hoang) was created in 1968 to coordinate this initia-tive — which was, in reality, fractured across dozens of separate intelligence and po-lice initiatives across South Vietnam. Its substance as an institution included nationalguidelines that inﬂuenced targeting goals and reporting, as well as physical ofﬁces con-taining advisors who coordinated and distributed information on suspects. While theprogram was not responsible for directly acting against suspects, it was instrumentalin bringing together and documenting all of the scattered existing efforts (Moyar &Summers, 1997). It was in operation until it collapsed during the Eastertide invasionby North Vietnam at the end of 1972. The process by which the program collected information was scattered and disjoint.Intelligence and Operations Coordinating Centers (IOCCs) maintained Lists of Com-munist Offenders and sometimes detailed maps of hamlets, with names of occupantsand photographs. Military units like the 1st Infantry Brigade, 5th Infantry Division(mechanized) sometimes formed special teams from its military intelligence detach-ment to coordinate with and make up for weak IOCCs in their area, maintaining theirown card ﬁle in addition to the Local List of Communist Offenders. Additionally, vil-lage, district, and province chiefs often maintained their own parallel intelligence netsand records. In January of 1969, the Ofﬁce of the Secretary through the Advanced ResearchProjects Agency (the predecessor to DARPA) began Project VICEX to develop a country- It began out of a regional coordination effort called Intelligence Coordination and Exploitation for theattack on the VCI (ICEX) in July 1967. By 1968, Phung Hoang Committees were established 44 provincesand 228 districts. Example members of a typical district IOCC team included Village Chiefs, Deputy Village Chiefs forSecurity, Village Military Affairs Commissioners, Village National Police Chiefs, Popular Forces PlatoonLeaders, Hamlet Chiefs, and more. Military Assistance Advisory Group. Vietnam Lessons Learned No. 80: US Combat Forces in Supportof Paciﬁcation, 29 June 1970. Saigon, Vietnam: Headquarters, U. S. Army Section, Military AssistanceAdvisory Group, 1970-06-29 http://cgsc.contentdm.oclc.org/u?/p4013coll11,1524. Biographical data on suspects and neutralizations were enteredinto the national database at district and province IOCCs. Enumerators were trainedwith coding guidelines for converting dossiers and neutralization reports into a stan-dardized format. The data were then passed up to the Phung Hoang Directorate andfrom there to the National Police Command Data Management Center, where they wererecorded using punch-cards before being entered onto magnetic tape with an IBM360mainframe.

The United States preserved an electronic copy of the ﬁnal targeting database calledthe National Police Infrastructure Analysis Subsystem II (NPIASS-II). It containsrecords from July 1970 to December 1972. This is the later period of the war, followingthe Tet Offensive, the GVN and the U.S. counter attack, and the period of U.S. with-drawal. It contains data on 73,717 individuals (rows) that I refer to as suspects and whoserve as the unit of observation. It contains 45 attributes (columns) that provide infor-mation on each suspect about their biography, job, details of operations targeted againstthem, and their ﬁnal disposition. The attributes are of mixed types including numeri-cal, nominal, dates, and nested lookup codes such as locations (e.g. region->province->district->village) and rebel jobs (e.g. National Liberation Front->Liberation Woman’sAssociation->Personnel). The structure of the database is outlined in Table 1.Record Type Status OutcomeSuspects(73,712) NeutralizationRecord (48,074) Neutralized(49,756) Killed (15,438)Captured (22,215)BiographicalRecord (25,638) Defector (12,103)At Large (23,943) At Large (23,943)Table 1: Structure of the National Police Infrastructure Analysis Subsystem II(NPIASS-II) database. The unit of observation (rows) are individual suspects. Thepotential attributes (columns) are available in blocks depending on the kind of recordand outcome. “Org and Mission,” April 1969, Phung Hoang Directorate, Records of the Ofﬁce of Civil Operationsfor Rural Development Support (CORDS), General Records, 1967-1971; Record Group 472.3.10. NationalArchives at College Park, College Park, MD. ARC Identiﬁer: 4495500. Reference Copy of Technical Documentation for Accessioned Electronic Records, National Police In-frastructure Analysis Subsystem (NPIASS) I & II Master Files, Record Group 472 Records of the U.S.Forces in Southeast Asia. Electronic Records Division, U.S. National Archives and Records Administration,College Park, Md. United States Military Assistance Command Vietnam/Civil Operations Rural Development Support(MACORDS) National Police Infrastructure Analysis Subsystem II (NPIASS II), 1971-1973. File Num-ber 3-349-79-992-D. Created by the Military Assistance Command/Civil Operations and Rural DevelopmentSupport-Research and Analysis (MACORDS-RA). U.S. Military Assistance Command/Civil Operations Ru-ral Development Support. The total number of attributes is higher if multipart attributes are disaggregated or if low to no varianceattributes are included. Second, I apply a machine learning approach to the structureof the database overall, not just the tabular values of individual variables. Patterns ofmissing values, the meaning of different variables, and heterogeneity between differentkinds of observations are all targets of inquiry.One revelation from the documentation of the predecessor system is that the databasecombines two different kinds of records: those entered while a suspect was still at large(a biographical record) and those entered after a suspect had already been killed, cap-tured, or defected (a neutralization record). The two kinds of records resulted fromdifferent worksheets, with separate text examples, and coding guidelines. Neutraliza-tion records appear to have recorded the Phoenix Program as it actually happened onthe ground. Two thirds of suspects, 48,074 (65%), were entered in as neutralizationrecords. Biographical records were a kind of growing wish list where analysts both-ered to digitize information from the much larger pool of suspects with dossiers, onblacklists, or reported in the Political Order of Battle. A third of suspects, 25,638(35%), were entered this way.The second important division in the database is along the fate of each suspect, asstill at large, killed, captured, or defector (rallier). All bibliographic records start offas at large suspects. In rare cases (6.6%), some were updated to show the suspect hadlater been neutralized, though there are sufﬁcient irregularities to suspect that some ofthese are actually coding errors.The Phoenix Program was not, at least predominately, an assassination program.The dominant form of neutralization was arrests/captures (45%). A smaller share(24%) of neutralizations were suspects that turned themselves in, defecting. Killingsmade up only a third of neutralizations documented by the database (31%).

To illustrate the kinds of information available (and the kinds of facts that are omitted),I provide the following comparison of a suspect’s record in the database with their de-tailed interrogation report. The record was deanonymized by manually comparing dataﬁelds against reported details in declassiﬁed interrogation reports from the CombinedMilitary Interrogation Center (CMIC). Below is a brief narrative of a suspect’s life, “VCI Neutralization and Identiﬁcation Information System (VCINIIS) Reporting and Coordination Pro-cedures.” Folder: "1603-03A Operational Aids, 1969," ARC Identiﬁer: 5958372, Administrative and Op-erational Records, compiled 05/1967 - 1970, documenting the period 1966 - 1970, HMS Entry Number(s):A1 724, Record Group 472: Records of the U.S. Forces in Southeast Asia, 1950 - 1976. National Archivesat College Park, College Park, MD. “VCI of the Soc Trang Province Party Committee,” March 1971, Folder 11, Box 09, Douglas PikeCollection: Unit 05 - National Liberation Front, The Vietnam Center and Archive, Texas Tech University. The full list of 45 attributes are provided below in Table 2. Attributes are arrangedinto groups, determined using a combination of descriptions from the codebooks andan unsupervised clustering method described fully below. I have further aggregatedthe groups into four broad concepts; attributes that are always available, typically onlyavailable for biographical records, typically only available for neutralization records,and only available for neutralization records of suspects who defected or were captured. ‡ Killed/Capt./Defect. Detention FacilityJob Birth Place Action ForceBio Process. Date ID Source Arrest LevelEchelon Bio Info Date Neut. Process. Date Arrest SerialSex Dossier Location Neut. Action Date Arrest YearBlack List Neut. LocationParty Membership Photo Sentence Process. DateArea of Operation Prints Speciﬁc Target Sentence DatePriority A/B ∗ Arrest Order Operation Level Sentence CodeRecord Updates Address IOCC Involvement Sentence LocationConﬁrmationAge Release Process. DateRelease Action DateRelease CodeRelease LocationArrest ForwardingForw. Process. DateForw. Action DateForw. Location ∗ Imputed from ofﬁcial position Greenbook. ‡ Mutually exclusive and so merged with Killed/Capt./Defector

Table 2: Attributes for each suspect in the NPIASS-II database.

The codebooks provide descriptions of each attribute but they omit important detailssuch as how missing values are handled. Every observation has at least some missingattributes and nearly 50% of cells are empty. Some of the patterns are self-explanatory,e.g. there will be no information about sentencing if the suspect is still at large. In othercases, missingness is more subtle, e.g. information on the suspect’s age is sometimesmissing for suspects who were killed in the ﬁeld without questioning. In all cases thereappears to be a combination of missing at random and undocumented structure. Thisambiguity and apparent latent structure suggests applying a machine learning approachto learning how attributes are related to one another.I frame this as a blockclustering problem where the task is to simultaneously ﬁnd r groups of attributes and k groups of observations that are similar in terms of missing-ness. Let an I × Q binary matrix, X NA , represent the missingness for each individual i and attribute q , shown on the left in Figure 1. The task is to decompose this matrix intoa version sorted by row and column into homogenous blocks ˆ X NA , shown on the rightside of Figure 1, and a smaller r by k binary matrix of row and column clusters.I employ an unsupervised bi-clustering algorithm, the Bernoulli Latent Block Model(Govaert & Nadif, 2003). The model is ﬁt with a wide range of possible row and col- Implemented in the R package Blockcluster (Bhatia et al. , 2014). et al. , 2000).Figure 1: Block clustering of missing values in NPIASS-II into 11 groups of attributesand 9 groups of observations. True values (white) indicate a missing value.The structure of missingness in the NPIASS-II database is best explained by 11clusters of attributes and 9 clusters of observations (ICL = − . Each suspect in the database has potentially over forty known facts about them. If youwere forced to describe an observation to someone else, which fact would you startwith? What is the single most important fact about a suspect? The second mostimportant fact? The third? And so on. Typically these decisions are made on an adhoc basis given the researcher’s theoretical interests. Here, the focus is in part learningthe structure of the database and so we need a principled deﬁnition of what makes anattribute important, and a method for ranking attributes on that dimension.I frame this as an unsupervised learning problem, where the task is to learn rulesand relationships between attributes that could be used to distinguish a real observationfrom a synthetic randomly shufﬂed version. The only way to tell a real observationfrom a randomly generated one is to learn patterns of regularity and structure betweenattributes. In this conception, an attribute is important if it conveys a great deal ofinformation about what other values a suspect’s attributes will take. The most importantfact is the one that provides the most information for inferring other facts. The leastimportant fact is the one that provides completely unique but orthogonal or potentiallyrandom information. The classiﬁer I use for this task is an unsupervised random forest. Random forestsare an ensemble method that combines the predictions of many individual base learn-ers. The individual learners in this case are fully grown binary decision trees, each ﬁtto a different random subset of attributes and random subset of observations (Breiman,2001). In the supervised case, cut points for covariates are selected to separate ob-servations into increasingly homogeneous groups on some outcome variable. In theunsupervised case, the random forest learns to identify a genuine observation from asynthetic scrambled version (Shi & Horvath, 2006). This method works for both cat- Put another way, suspects are situated in some high dimensional space where there is more underlyingstructure than we could ever hope to completely document. What structure should we prioritize as the mostdominant or interesting in the data? Note that this is a reversal of the typical variable selection process, where the goal is to better explainsome outcome by removing redundant information to produce a smaller number of uncorrelated explanatoryvariables. In this multivariate setting, there is no single outcome and the redundancies are the details ofinterest. Implemented in the R package randomForestSRC (Ishwaran & Kogalur, 2014). I employ 1000 trees, trying seven variables at each split, minimum of one unique case at each split, andfully grown trees with no stopping criteria. Splits with missing values are ﬁrst determined using non-missingin-bag observations, and then observations with that attribute missing are randomly assigned to a child node. et al. , 2010). The interaction of two variables is captured by the depth oftheir second-order maximal subtree (the distance from the root node of one variable’smaximal subtree within the maximal subtree of the other), as trees tend to split on onevariable and then soon split on a related variable. I deﬁne a symmetric distance betweentwo variables as the sum of their second-order maximal subtree depths. This distanceis small when both variables tend to split close to the root, soon after one another, andlarge if either splits late in the tree or far from the other. This approach provides tworemarkable pieces of summary information shown in Figure 2. Summing the second-order maximal subtree depths is a stronger test of interaction and is a novel inno-vation so far as the author is aware. ossier Fingerprints (43)Dossier Photograph (34)Dossier Arrest Order (26)Dossier Address (25)Dossier Confirmation (24)Record Type (21)Bio Process Date (29)Birth Location (30)Dossier Location (27)Bio Information Date (28)Release Code (44)Arrest Forward Code (42)Forward Location (41)Arrest Level (31)Specific Target (23)Sex (2)A or B Priority (16)IOC Involvement (22)Kill/Capt./Rally/At Large (1)Detention Facility (8)Operation Level (19)Neutralization Process. Date (20)Neutralization Action Date (18)Neutralization Location (17)Row Cluster (4)Action Force (7)Job (13)Black List (9)ID Source (6)Area of Operation (11)Party Membership (3)Echelon (5)VCI Serial Number (10)Record Update Count (14)Age (15)Arrest Year (33)Sentence Process. Date (37)Sentence Action Date (35)Sentence Location (36)Sentence Code (12)Arrest Serial Number (32)Release Processing Date (39)Release Action Date (38)Release Location (40) Merge Height

Attribute Interaction Strength (Hierarchical Clustering) Overall Attribute Importance (Rank)

Figure 2: Clustering of attributes by strength of interaction with merges selected tominimize Ward’s distance (dendrogram). Rank order importance of each attribute interms of average maximal subtree depth in an unsupervised learning task (unsupervsiedrandom forest). Smaller rank means an attribute was selected sooner in the randomforest construction and is thus more informative overall.12he ﬁrst piece of summary information is a ranking of variables in terms of howmuch information they convey about the entire dataset. The answer of the question“Which fact should we start with?” is deﬁnitively the fate of the suspect: still at large,killed, captured, or defector. No other single attribute implies as much about the re-maining details as that one. The next most important is the suspect’s gender, followedby whether they were a party member, the kind of record as estimated by the blockclus-tering above, the suspect’s echelon, the source of information used to ID the suspect,and so on.The second is a pairwise distance between attributes in terms of how much in-formation their interaction conveys about the entire dataset. Hierarchically clusteringattributes on that distance reveals what appears to be two mostly unconnected datagenerating processes: one related to the creation of biographical records and dossiers,and another related to the neutralization of suspects. The method has, without prompt-ing, correctly recovered the undocumented difference between neutralization recordattributes and biographical record attributes. Justifying earlier concerns, the dossierattribute “conﬁrmed” is ﬂagged as being closely related to other administrative detailsof dossiers and not the core demographic attributes of suspects or the empirical processof targeting. The clustering also pinpoints the place to start the analysis: a core group of 21highly related and informative attributes relating to the neutralization and demograph-ics of suspects. They are ﬂanked by tangential groups of attributes relating to the sen-tencing of a suspect, the release of a suspect, and the details of dossiers for biographicalrecords. There may be interesting structure within these other groups of attributes, butthey are mostly orthogonal to the core outcomes of interest and so can be safely setaside for future work.Having selected a core group of attributes, the next question is whether they canbe further summarized by a simpler topology. The next two sections unpack theseattributes and tackle dimensionality reduction with respect to two themes; the kindsof victims, and the kinds of government operations. For that analysis, I weed the listfurther to just 11 nominal demographic and neutralization attributes. I set aside datesand locations. I exclude 5 attributes about the administrative aspects of the dataset.And I single out two attributes, with a large number of categories, for detailed analysis:the job of the suspect and the government actor responsible for the neutralization.

Who were the Phoenix Program’s victims? The program was charged with disman-tling nonmilitary rebel organizations in South Vietnam. A full breakdown of suspectcounts by organization, demographic attributes, and ﬁnal status is shown Table 3. This is all the more amazing because the variable is incorrectly imputed with values for the majority ofrows in the dataset. The method has correctly identiﬁed the subset of rows for which the variable takes onmeaningful values and has grouped it with related variables accordingly. With captured documents and defector reports, GVN and U.S. intelligence analysts mapped those orga-nizations in great detail (Conley, 1967, 165). ,

699 32 21 30 16Organization PRP 54 ,

977 31 19 34 15NLF 7 ,

147 33 13 28 26Communist Orgs 11 ,

315 39 33 13 15Sex Male 56 ,

257 36 25 22 16Female 16 ,

910 20 6 57 17Party Full Member 22 ,

443 48 29 13 10Membership Unknown 29 ,

470 37 19 31 12Probationary Member 7 ,

042 17 26 37 20Non-Member 14 ,

743 6 10 51 33Echelon Hamlet 9 ,

858 25 24 24 26Village 44 ,

513 31 20 33 15District 12 ,

917 42 23 22 13City/Prov/Reg/COSVN 6 ,

400 33 16 34 17List Most Wanted List 24 ,

285 65 21 9 4Target List 15 ,

687 40 18 30 12Most Active List 8 ,

136 23 15 51 11Unknown 25 ,

588 0 24 43 33AorB A 43 ,

645 42 25 18 15B 29 ,

790 18 15 47 19Age [0,25] 15 ,

500 19 14 47 21(25,35] 14 ,

858 37 22 25 16(35,55] 27 ,

225 35 14 31 20(55,100] 3 ,

742 22 5 56 17Table 3: Cross tabulation of demographic properties against outcome.Broadly, the political opposition to the Republic of Vietnam (GVN) was organizedinto three groups. Political authority, command, and resources ﬂowed from North Viet-nam into South Vietnam through the communist political apparatus, the People’s Revo-lutionary Party (PRP). Indigenous popular support and participation was organized intothe subordinate National Liberation Front (NLF), also called the Viet Cong. Togetherthey constructed alternative administrative institutions referred to as Communist Au-thority Organizations, such as the People’s Revolutionary Government (PRG), as wellas a number of political organizations designed to involve civilians outside of the com-munist party.Together, and with overlapping and changing roles and capabilities, these threeorganizations embodied foreign authority, popular participation, and political institu-tions. Four-ﬁfths of neutralizations were against PRP positions with fewer directedtoward more indigenous NLF and Communist Organization positions. This is consis-tent with both the priorities of the program and the timing in the war; the Post-Tet phasewas more externally driven by North Vietnam.14embers or supporters performing active roles of these organizations were collec-tively known as the Viet Cong Infrastructure (VCI). VCI were grouped into ClassA VCI that were full or probationary PRP members or leadership and command roleswhile class B VCI were trained but voluntary members. Where appropriate, roles werereplicated at multiple levels of governance called echelons including the hamlet, vil-lage, district, province, city, capital, region, and national level.The broad pattern is one of a program that targeted large numbers of low levelsuspects, a portfolio of targets that was bottom heavy. Half of neutralizations were Blevel voluntary/support positions, and a little more than half were previously unknownto security forces. About a ﬁfth of neutralizations were full party members, and abouta ﬁfth of neutralizations were at the district or higher echelon. It is unclear, however,whether this is disproportionate to the number of actual rebels holding those positions.If the program selected targets uniformly from the rebel population, these rates areprobably proportional to the share of those positions of all rebel members during thisperiod of the war.There is a strong relationship between the demographics of suspects and methodsof targeting. In brief, more important suspects were fewer in number but more directlytargeted, either by having a ﬁle while at large, or by being killed in an operation. Lowlevel suspects were less likely to have a ﬁle at large, and were much more likely to beswept up as arrests or to walk in off the street as a defection.The cross-tabulation in Table 3 shows this in terms of a single comparison betweendemographics and the suspect’s ﬁnal status. Targeting was gendered, with female sus-pects much less likely to be killed or to be targeted as at large. Known party memberswere more likely to be targeted at large, or killed, while suspects known to not be partymembers were much more likely to simply be arrested or to defect. The lower thesuspect’s echelon, the more likely they were simply arrested or defected and the fewertargeted while at large.The same is true for the level of prior suspicion against the suspect. More prior sus-picion is associated with more severe outcomes. Previously unknown suspects tendedto be arrested in the ﬁeld or defectors, not killed. Moving up the ladder of suspicionto the most active list, the target list, and the most wanted list increases the chances thatthe suspect is killed by an operation or targeted at large. The same pattern holds truefor the A or B priority of the suspect’s position.Visualizing the underlying pattern in just the bivariate case is already somewhatoverwhelming. Extending the analysis to the multivariate case and developing a sim-plifying taxonomy is the task of dimensionality reduction that I turn to next. Military personnel serving in organizational roles, e.g. on Military Affairs Committee, could qualify asVCI. By deﬁnition, a previously unknown suspect (not on a black list) did not have a biographical record (wasnot targeted at large). Note that the data speak to the probability of being under suspicion given already being targeted. Esti-mating changes in the risk of being targeted as a function of suspicion would require additional informationabout the population of rebels overall. .1 A Taxonomy of Suspects Is there a simpler topology for understanding differences between victims? I framethis as a dimensionality reduction problem where nominal values for each of the cate-gorical variables are mapped to common latent dimensions. The method I use for thisestimation is Multiple Correspondence Analysis (MCA). MCA is a multivariate tech-nique analogous to Principal Components Analysis but for unordered categorical data(Lê et al. , 2008).Let an I × Q matrix represent the values for each individual i and attribute q with K q possible values for each attribute and K total possible values. This matrix is thenconverted to an I × K disjunctive table (dummy variables for each level of each vari-able). Rarely used categories disproportionately inﬂuence the construction of thesedimensions, so I suppress both rare and missing values. I use a variant of the algorithm called Speciﬁc Multiple Correspondence Analy-sis that correctly calculates partial distance between points given levels were dropped(Roux & Rouanet, 2009). It decomposes the disjunctive table into principle axis rep-resenting latent dimensions, points for individuals in that reduced space, and points foreach attribute value in the same space. The result is a geometric interpretation of theoriginally categorical data where suspects and attributes are all now projected into asmaller number of continuous dimensions.The core variation of the dataset is well summarized by a few latent dimensions,with the ﬁrst two principal axis accounting for 74% of total variation (inertia). Theyare summarized in Table 4. The ﬁrst dimension (56%) reﬂects a clear demographicconcept of the suspect’s importance to the targeting program. At one extreme are unim-portant, previously unknown, low level volunteers, often caught in large raids. At theextreme are high level, full party members, that are on the most wanted list, but usuallyremain at large.The second dimension (18%) reﬂects the method of neutralization used. At one ex-treme are killings, sometimes targeting speciﬁc individuals, in operations directed bythe local intelligence ofﬁce. At the other extreme are defections, that required no pre-vious effort by an intelligence ofﬁce, where the identity was conﬁrmed by the suspectsown confession. In between lie arrests which share aspects of both kinds of targeting. This is another motivation for carefully studying missingness in the database. The main source of vari-ation in the database is technical, the difference between different kinds of records. I manually suppressmissing values and purely administrative variables so that the estimated components reﬂect only the substan-tive empirical variation between attributes. Implemented in the R package soc.ca. In total 18 dimensions account for 100% of variation. chelon6kDistrictEchelon6kHamletEchelon6kProvinceEchelon6kVillageSex6kFemale Sex6kMaleList6kMostk7ctivekList List6kMostkWantedkListList6kTargetkListList6kUnknown Party6kFullkMemberParty6kMembershipkUnknownParty6kNon − MemberParty6kProbationarykMember Priority6k7Priority6kB 7ge6k(f.3K.]7ge6k(K.3..]7ge6k(..3C[[]7ge6k[[3f.] Status6k7tkLargeStatus6kCaptured Status6kDefectorStatus6kKilledTarget6kNon − specific Target6kSpecificOp1Level6kOtherOp1Level6kSector0ProvinceOp1Level6kSubsector0DistrictIOCC6kDirectedkbykDIOCC0PIOCCIOCC6kNokDIOCC0PIOCCkInvolvementIOCC6kResultkofkDIOCC0PIOCCkInformationID1Source6k7gent0InformerID1Source6kCapturedkDocumentID1Source6kConfessionID1Source6kOtherkSourceID1Source6kPhungkHoangkPoliticalkOrderkofkBattle − C − [1.[[1.CC1. − C1[ − [1. [1[ [1. C1[ C1kDimensionkBPrioritykofkSuspectBk(./p2 f D i m en s i on k B S e v e r i t y k o fk O u t c o m e B k ( CR p Shape7geEchelonID1SourceIOCCListOp1LevelPartyPrioritySexStatusTargetSize C[[[[f[[[[K[[[[O[[[[.[[[[

Figure 3: Attribute values for all suspects projected into two dimensions with multiplecorrespondence analysis.For studies that use counts of rebel or civilian deaths as a dependent variable, thisshows that those raw counts could be driven by changes in at least three different under-lying dynamics: (1) the intensity of the war, growing or shrinking the size of the targetlist or the number of operations looking for suspects not on a list; (2) changes in effec-tiveness, neutralizing more or fewer known targets already on the list; or (3) changesin tactics, using more killings than arrests or more defections preempting killings, etc.Shifts along any of these dimensions could produce changes in total body counts or theportfolio of observed violence (e.g. ratios of civilian to military deaths). There is cur-rently little in the way of theoretical expectations for how interventions should interactwith each of these dimensions, much less how those interactions should aggregate intochanges in total observed levels of violence.18 .2 The Jobs Held by Suspects

As an external check of validity, I compare the estimated position of each suspect alongthe dimensions of targeting to a description of the job they held. If the dimensions arecorrect, and useful, then jobs with similar functions ought to be more similar to eachother in terms of targeting. I ﬁnd that suspects with similar jobs, as described bythird party sources, do in fact have similar demographic attributes and similar targetingbehavior by the government. If the underlying data were faked or entered with error,they were at least doing it in a consistent and creative way.Each suspect is tagged with one of 485 speciﬁc jobs, coded according to a stan-dardized ofﬁcial schema called the Greenbook. Each jobs is nested within increasinglylarge departments called elements, subsections, sections, and the three main branches.I focus on the section level of aggregation. The location of each section along thedimensions of targeting is estimated by including the section attribute as a noncon-tributing covariate in the Multiple Correspondence Analysis introduced earlier. A mapof sections along the two dimensions of targeting is shown Figure 4.19 ctionFArrowFTeamAdministrationF hPartyFOfficeMAreaFAdministrativeFOfficialsCadreFAffairsFCentralFWoundedFandFDeadFSoldiersFAssYnCivicFActionCivilianFProselytingFCommo − LiaisonF Finance − EconomyF FrontFGuerillaFUnitInvestigationFLandFDistributionFLiberationFFarmersYFAssYnF LiberationFLaborFAssYnFLiberationFWomenYsFAssYnFLiberationFWorkersYFAssYnFLiberationFYouthFAssYnF MilitaryFAffairsMilitaryFProselyting MissingNflsvFCentralFCommitteeNflsvFSecretariatOrganizationFSectionPatrioticFBuddhistFAssYnFPatrioticFTeachersYFAssYnFPeopleYsFCouncilFStandingFCommitteePeoplesFRevolutionaryFCommitteePoliticalFOfficersPoliticalFStruggleFPropagandaBCultureFAndFIndoctrination ProvisionalFRevolutionaryFGovernmentRearFServiceF SecurityFSocialFWelfareFReliefFAssYnFSpecialFAction Specialized − − − . 0 . .RFDimensionFYPriorityFofFSuspectYFh56bM RF D i m en s i on F Y S e v e r i t y F o fF O u t c o m e Y F h .6 b M AttributeFMap:FRebelFOrganizationalFSections

Figure 4: Map of rebel sections projected into the two dimensions of suspect impor-tance and severity of outcome.Most of the variation across sections is on the ﬁrst dimension of priority (horizontalaxis). At one extreme on the far left are low level logistics related sections like theCommo-Liaison and Rear Service sections, where suspects were rarely targeted at largeand mostly swept up as arrests. At the other extreme, on the right, are high levelleadership positions like the NLF Central Committee and the Provisional RevolutionaryGovernment where almost every suspect was most wanted but still at large. Along thesecond dimension of severity of outcome (vertical axis), some sections were likelyto be speciﬁcally targeted or killed, e.g. Guerilla Units, Military Affairs Section, orArea Administrative Ofﬁcials. At the other extreme some groups were much morelikely to defect, e.g. the Medical Section, the Frontline Supply Council, or the WesternHighlands Autonomous People’s Movement.Next I cluster the sections according to their distance along the targeting dimen-20ions. The 30 sections with 100 or more suspects are described in Table 5. Theyare arranged hierarchically using Ward’s method by their proximity in biographical di-mensions estimated with MCA above (Ward, 1963). Each section is provided with abrief description based on their functions as outlined by U.S. intelligence (CombinedIntelligence Center, Vietnam, 1969).Section N Description Org.Liberation Farmers’ Ass’n 2,230 Mass Org. Local NLFCadre Affairs 2,007 Intel, Proselytizing PRPGuerilla Unit 4,593 Armed local forces OrgMilitary Affairs 1,671 Coordinate Guerrillas PRPPeople’s Council 447 Administration OrgArea Adm. Ofﬁcials 1,406 Administration OrgPeoples Rev. Comm. 2,202 Administration PRPOrganization Section 188 Administration PRPSpecialized 126 NLFNﬂsv Secretariat 216 NLF Leadership NLFNﬂsv Central Committee 594 NLF Leadership NLFPolitical Ofﬁcers 337 PRP Leadership PRPLiberation Workers’ Ass’n 137 Mass Org, Urban NLFLand Distribution 663 PRPSecurity 6,990 Intel., Police, Justice PRPParty Ofﬁce 163 Administration PRPCulture-Indoctrination 2,210 Propaganda PRPFinance-Economy 7,845 Logistics, Taxes, Food PRPLiberation Youth Ass’n 1,050 Mass Org, Youth NLFAction Arrow Team 2,665 Mobile Security OrgPolitical Struggle 443Liberation Women’s Ass’n 2,654 Mass Org. Women. NLFMilitary Proselyting 5,848 Turn GVN soldiers PRPMedical 3,387 Public/Civil health PRPCivilian Proselyting 1,538 Party Recruiting PRPFrontline Supply Council 681 Logistics. PRPProduction 1,375 Rear Production PRPSpecial Action 1,425 Sappers PRPCommo-Liaison 9,425 Logistics, Routes PRPRear Service 3,037 Logistics, Military PRPTable 5: Organizational sections with over 100 suspects. Hierarchical clustering withWard’s distance shown in dendrogram on the left.The clustering has recovered groups of sections with similar functions, arrangedinto roughly four themes. The ﬁghting themed cluster contains four large sections in-cluding the Guerrilla Unit, Military Affairs, Cadre Affairs, and the Liberation Farmers I calculate euclidean distance on the ﬁrst three dimensions which account for over 80% of the variation.

Killings and arrests required the government to launch an operation, and defectionsrequired a receiving government actor or ofﬁce. Which government actors conductedthose operations and what methods did they use? When a suspect is neutralized, thedetails of the circumstance or operation leading to their neutralization were recorded.The details of each neutralization are cross-tabulated against outcomes in Table 6.22utcomeKilled Captured DefectorAll % % %All 49 ,

774 31 45 24Source Agent/Informer 16 ,

296 37 56 7Captured Document 4 ,

428 44 45 11Confession 11 ,

633 7 42 52Order of Battle 9 ,

942 50 41 9Other Source 7 ,

249 22 31 47Level DTA/Other/Region 3 ,

835 30 43 27Sector/Province 6 ,

880 27 58 15Subsector/District 33 ,

296 37 50 13IOCC No Involvement 9 ,

637 29 46 25Result of Info 24 ,

127 38 52 9Directed by 8 ,

785 38 59 3Speciﬁc Non-speciﬁc 32 ,

467 31 49 20Speciﬁc 12 ,

630 43 49 8Table 6: Properties of operations across neutralization outcomes.The tabulations show a program with a large base of incidental arrests and killingsin the course of regular operations topped with a sizable number of direct plannedstrikes against speciﬁc targets. A full third of killings and captures were suspects targetby an operation, often an ambush along a route or a raid. About a third were fromoperations directed by an IOCC. As noted before, more than half of killed or capturedsuspects were already previously listed on a blacklist.The identity of a suspect had to be conﬁrmed at the time of neutralization. Forpreviously unidentiﬁed suspects, the source of ID at the time of neutralization may havebeen the source that led them to be a suspect in the ﬁrst place. For suspects already ona blacklist or listed in the Political Order of Battle, there was some unobserved processby which evidence was collected, leading to the initial suspicion. The majority wereeither identiﬁed by another civilian (an agent or informer) or they were said to haveconfessed. Others were conﬁrmed by material evidence like documents captured ontheir person or identifying them by name. Some were conﬁrmed against descriptionsin the Political Order of Battle (OB). About 12% were identiﬁed as “other” explainedin a written comment on the back of the worksheet and not recorded here.

Neutralizations can also be well summarized by a simpler typology. I ﬁt the samespeciﬁc multiple correspondence model as before to just the subset of observationsresulting in arrest or killing. The ﬁrst principal axis accounts for 59% of the variationand, as before, reﬂects the level of priority of the suspect. At one extreme are low levelincidental captures and at the other killings against priority targets. To a lesser degree italso captures the level of premeditation, as high priority targets were more likely to bethe speciﬁc target of an operation and the target of an operation planned and directed23y an IOCC.Dimension 1 (59%) Ctr. Coord. Dimension 2 (15%) Ctr. Coord.(+) Most Wanted List 13.3 1.37 No IOCC Involvement 19.0 1.34Full Party Member 10.9 1.11 Other Source for ID 14.1 1.59A Priority 7.8 0.66 Sector/Province Level Op. 8.0 0.96Killed 7.8 0.73 Other Level Op. 7.7 1.57Order of Battle for ID 6.9 0.9 Province Echelon 3.7 1.19Speciﬁc Target 5.5 0.71 Unknown (No List) 3.2 0.36B Priority 7.9 -0.67 Result of IOCC Information 9.6 -0.55(-) Female 5.8 -0.76 Captured Document for ID 5.3 -0.95Captured 5.4 -0.51 Subsector/District Level Op. 4.7 -0.33Age [0,25] 3.3 -0.61 Most Active List 3.0 -0.62Confession for ID 3.0 -0.76 Target List 2.9 -0.51Table 7: The ﬁrst two dimensions representing demographic and operations relatedattributes estimated with multiple correspondence analysis for only observations re-sulting in killing or capture. Contribution and coordinates of speciﬁc values shown forattributes with above average contribution to each dimension.The second principal axis accounts for 15% of the variation and reﬂects the domainof the operation. On one end are operations that found VCI and reported them retroac-tively to the intelligence infrastructure for documentation. These operations typicallyhad no IOCC involvement, were carried out by more conventional forces, and at theprovince or sector level. At the other extreme are operations carried out by Phoenixrelated forces against targets known about beforehand. These operations were at thesubsector or district level, against village echelon level targets, who were on the mostactive or target black lists, and beneﬁted from information provided by the IOCC. Themap of each attribute value along these two dimensions is shown in Figure 5.24 chelon9kDistrict Echelon9kHamletEchelon9kProvinceEchelon9kVillageSex9kFemale Sex9kMaleList9kMostkActivekList List9kMostkWantedkListList9kTargetkListList9kUnknown Party9kFullkMemberParty9kMembershipkUnknownParty9kNon − Member Party9kProbationarykMember Priority9kAPriority9kB Age9k(K.3f.]Age9k(f.3..]Age9k(..3C[[]Age9k[[3K.]Status9kCaptured Status9kKilledTarget9kNon − specific Target9kSpecificOp1Level9kOtherOp1Level9kSector0ProvinceOp1Level9kSubsector0DistrictIOCC9kDirectedkbykDIOCC0PIOCCIOCC9kNokDIOCC0PIOCCkInvolvementIOCC9kResultkofkDIOCC0PIOCCkInformationID1Source9kAgent0InformerID1Source9kCapturedkDocumentID1Source9kConfession ID1Source9kOtherkSourceID1Source9kPhungkHoangkPoliticalkOrderkofkBattle − C − [1.[[1.CC1. − C [ C

C1kDimensionkBPrioritykofkSuspect0PremeditationBk(.'p2 K D i m en s i on k B D o m a i n B k ( C. p Size C[[[[K[[[[ShapeAgeEchelonID1SourceIOCCListOp1LevelPartyPrioritySexStatusTarget

AttributekMap9kKilledkorkCapturedk(NzfI3/.f2

Figure 5: Attribute values for just killing and captures projected into two dimensionswith multiple correspondence analysis.

There were 16 different organizations reported as the government actor in the neutral-ization of suspects. As an additional external check of validity, Figure 6 shows theposition of the actors along the two dimensions of priority and domain. If the under-lying data are accurate and well summarized by these dimensions, then actors withsimilar functions ought to be similar to each other in terms of victims and tactics.25 rmedfPropagandafTeamfyAPT0 ARVNfMainfForcesChieufHoifCadre CivilianfIrregularfDefensefGroupfyCIDG0FWMAFfOtherfthanfU9S9MilitaryfSecurityfServicefyMSS0NationalfPolicefyNP0 NationalfPolicefFieldfForcefyNPFF0PeoplesfSelffDefensefForcefyPSDF0PopularfForcesfyPF0ProvincialfReconnaissancefUnitfyPRU0RegionalfForcesfyRF0RuralfDevelopmentfCadrefyRD0SpecialfPolicefySP0 U9S9fForces − −

295 292 295 b9fDimensionfSPriorityfoffSuspect%PremeditationSfy59U0 : D i m en s i on f S D o m a i n S f y b5 U AttributefMap:fGovernmentfActors

Figure 6: Government actors responsible for killings and captures projected into thetwo dimensions of importance and domain.A clustering of actors along the dimensions of operations are shown in Table 8.On paper, the ofﬁcial Phoenix Forces were the National Police, the Provincial Recon-naissance Units, Rural Development Cadre, Civilian Irregular Defense Group, and theArmed Propaganda Team (APT). In practice, the tent poles of the Phoenix Programwere the Popular Forces, Regional Forces and Special Police who made up two-thirdsof all killings and captures. 26ctor N DescriptionSpecial Police (SP) 5,375 Urban PoliceNational Police (NP) 3,604 Urban/Suburban PoliceMilitary Security Service (MSS) 561 Counter-intelligenceProvincial Reconnaissance Unit (PRU) 3,190 Mobile Special ForcesNational Police Field Force (NPFF) 1,065 Mobile Rural PolicingArmy of the Republic of Viet Nam (ARVN) 1,919 Regular GVN MilitaryArmed Propaganda Team (APT) 310 Mobile Cultural TeamRegional Forces (RF) 14,356 District/Province ParamilitaryPopular Forces (PF) 5,195 Hamlet/Village ParamilitaryCivilian Irregular Defense Group (CIDG) 146 Irregular MilitiaPeoples Self Defense Force (PSDF) 111 Irregular MilitiaFree World Military Assistance Force 308 Allied, South KoreaU.S. Forces 899 Regular U.S. MilitaryOther 539Table 8: Government actors responsible for neutralizations. Grouped according tosimilarity on operation properties estimated with multiple correspondence analysis.Dendrogram shows hierarchical clustering using Ward’s method.It reveals four small clusters. The ﬁrst cluster includes urban and suburban policeorganizations that were much more likely to arrest than kill. This is likely becausethey were both in areas of greater government control where contestation was lessviolent overall and because they were more likely to document their arrests than lessinstitutionalized forces.The second cluster includes mobile forces who operated in areas of weaker gov-ernment control but still had close ties to the Phoenix Program in terms of intelligencesharing and reporting. Provincial Reconnaissance Units (PRU), for example, werespecially designed units for the Phoenix Program who had the highest rate of neutral-izations per ﬁghter (Thayer, 1985, 210).The third cluster contains paramilitary forces, Regional Forces were responsible forroutes and intersections while Popular Forces were responsible for village and hamletdefense. Their neutralizations were the most violent and numerous. This is becauseof where they operated (in more contested areas), their high numbers (having moremanpower in more places than police forces), and their tactics (equipped and trainedfor defense and attack rather than regular policing).A ﬁnal cluster represents conventional forces: the U.S. and Free World MilitaryAssistance forces (primarily South Korean). These forces reported few suspects to thenational database, likely because of parallel reporting mechanisms and also becausethey were engaged in larger more conventional ﬁghting.These clusters provide a way to think about targeting as a function of different kindsof government forces: regular police, expeditionary forces, paramilitary forces, andconventional forces. Each employs different tactics, in a different environment, witha different portfolio of victims, and different incentives and capabilities to report backstatistics. This is particularly relevant to the study of aggregate levels of violence. Over27he course of a conﬂict, the size and level of activity of these four groups will change.Forces are raised, units move around the country, and police and paramilitary forcesare extended to newly secured communities. Each of these will impact the amount ofviolence committed and the number of casualties reported in a given area over a givenperiod of time.

The Phoenix Program provides an unusually clear view of a large wartime governmenttargeting effort. In the aggregate, it provides an example of a typically mixed target-ing program. Most processed neutralizations were of low priority targets, while occa-sionally the program had the intelligence, or good luck, to launch operations againsthigh-proﬁle targets. This pattern is a result of the fundamental feature of civil war, aninability to easily separate rebels from neutral civilians.That said, based on the portion of targeting recorded in the national database, thegovernment did track and target a large number of suspects with veriﬁable links. In thesame way that policing is primarily about deterring illegal behavior through small risksof punishment, the Phoenix Program offered a credible risk to rebels who might haveotherwise operated openly or civilians who were on the fence about joining.There is a strong connection between a suspect’s demographics, their position inrebel organizations, the kind of government actor that would target them, the methodsthat would be used, and their ultimate fate. A combination of dimensionality reductionand clustering suggests a few simplifying ways to describe that connection. Suspectsvary in priority to the government in terms of who it wishes it could target and who ittargets in practice. The outcomes for suspects vary in severity ranging from voluntarydefection to death in the ﬁeld. Operations vary in the priority of the suspect, arrestinglarge numbers of unimportant suspects in sweeps and launching premeditated oper-ations to kill more important suspects. Operations also vary across domains. Someoperations are carried out far away from intelligence and police infrastructure and areunderreported in ofﬁcial statistics. Other operations are carried out in clear view, of-ten using previous intelligence and regularly reporting back for inclusion in ofﬁcialstatistics.The data also reveal clear organizational differences between different govern-ment actors and different rebel sections. Rebel sections have functions that fall intogroups related to ﬁghting, high level leadership, low level administration, and logisti-cal support. Government actors fall into groups of regular police, expeditionary forces,paramilitary forces, and conventional forces. Each kind of organizational subdivisionhas a distinct signature in terms of types of civilians involved and the types of targetingmethods employed.One motivation for moving toward bigger (wider) data in conﬂict studies is thatthey reveal these underlying dimensions, organizational types, and processes whichtypically get relegated to an error term. These results should be a cautionary tale foranalysis based on raw event counts, often drawn from newspapers or similarly shallowreporting. Targeting in the Vietnam War was very high dimensional. An analyst couldreach dramatically different conclusions about outcomes by truncating the sample to28ust killings, by omitting information about the suspect’s position or the governmentactor committing the violence, or by missing important details about how the institutioncreated records and aggregated them into a ﬁnal dataset.With data this detailed, the analysis provided here is just the tip of the iceberg. Ihave shown ways to identify and explore the main sources of variation in a database, butthere are many other places in the data to look for interesting structure. There are threethat come immediately to mind. The ﬁrst is exploring the spatial and temporal structureof data. The Vietnam War varied from province to province and often from villageto village, which should have clear implications for how civilians were treated. Thesecond is how neutralizations were related to one another. I have treated neutralizationsas isolated events, but in reality they were often part of larger operations. There isenough detail on locations and timing to aggregate individual observations into a largerevent level analysis. Finally, the structure of rebel organizations is an entire ﬁeld ofstudy on its own, and the detailed data on rebel jobs, locations, and demographics canprovide a remarkable map of Viet Cong and North Vietnamese organization acrossSouth Vietnam.

References

Aguirre, Carlos, Doyle, Kate, Hernández-Salazar, Daniel, Guatemala, Policía Na-cional, Archivo Histórico, University of Oregon, & Libraries. 2013.

From silence tomemory: revelations of the AHPN . Eugene, OR: University of Oregon Libraries.Ball, Patrick, Tabeau, Ewa, & Verwimp, Philip. 2007 (June).

The Bosnian Book ofDead: Assessment of the Database (Full Report) . HiCN Research Design Notes 5.Households in Conﬂict Network.Berman, Eli, Shapiro, Jacob N., & Felter, Joseph H. 2011. Can Hearts and MindsBe Bought? The Economics of Counterinsurgency in Iraq.

Journal of PoliticalEconomy , (4), 766–819.Bhatia, Parmeet, Iovleff, Serge, & Govaert, Gérard. 2014. blockcluster: An R Packagefor Model Based Co-Clustering .Biddle, Stephen, Friedman, Jeffrey A., & Shapiro, Jacob N. 2012. Testing the Surge:Why Did Violence Decline in Iraq in 2007? International Security , (1), 7–40.Biernacki, C., Celeux, G., & Govaert, G. 2000. Assessing a mixture model for cluster-ing with the integrated completed likelihood. IEEE Transactions on Pattern Analysisand Machine Intelligence , (7), 719–725.Breiman, Leo. 2001. Random Forests. Machine Learning , (1), 5–32.Colby, William Egan, & McCargar, James. 1989. Lost Victory: A Firsthand Accountof America’s Sixteen-Year Involvement in Vietnam . Contemporary books Chicago.Comber, Leon. 2008.

Malaya’s secret police 1945-60: the role of the Special Branchin the Malayan Emergency . Institute of Southeast Asian Studies.29ombined Intelligence Center, Vietnam. 1969 (Feb.).

VCI Functional Elment Descrip-tion .Conley, Michael Charles. 1967.

Communist insurgent infrastructure in South Vietnam .Washington: Center for Research in Social Systems, American University.Gelman, Andrew, Fagan, Jeffrey, & Kiss, Alex. 2007. An Analysis of the New YorkCity Police Department’s “Stop-and-Frisk” Policy in the Context of Claims of RacialBias.

Journal of the American Statistical Association , (Sept.), 813–823.Govaert, Gérard, & Nadif, Mohamed. 2003. Clustering with block mixture models. Pattern Recognition , (2), 463–473.Ishwaran, H., & Kogalur, U. B. 2014. Random Forests for Survival, Regressionand Classiﬁcation (RF-SRC), R package version 1.6. URL http://CRAN. R-project.org/package= randomForestSRC .Ishwaran, Hemant, Kogalur, Udaya B., Gorodeski, Eiran Z., Minn, Andy J., & Lauer,Michael S. 2010. High-Dimensional Variable Selection for Survival Data.

Journalof the American Statistical Association , (489), 205–217.Kalyvas, Stathis N., & Kocher, Matthew Adam. 2007. How ’Free’ is Free Ridingin Civil Wars?: Violence, Insurgency, and the Collective Action Problem. WorldPolitics , (02), 177–216.Lê, Sébastien, Josse, Julie, & Husson, François. 2008. FactoMineR: An R Package forMultivariate Analysis. Journal of Statistical Software , (i01).Lyall, Jason. 2014 (Aug.). Bombing to Lose? Airpower and the Dynamics of Violencein Counterinsurgency Wars . SSRN Scholarly Paper ID 2422170. Social ScienceResearch Network, Rochester, NY.Moyar, Mark, & Summers, Harry. 1997.

Phoenix and the Birds of Prey: The CIA’sSecret Campaign to Destroy the Viet Cong . Annapolis, Md: Naval Institute Press.Natapoff, Alexandra. 2009.

Snitching: criminal informants and the erosion of Ameri-can justice . NYU Press.Price, Megan, & Ball, Patrick. 2014. Big Data, Selection Bias, and the StatisticalPatterns of Mortality in Conﬂict.

SAIS Review of International Affairs , (1), 9–20.Price, Megan, Guberek, Tamy, Guzmán, Daniel, Zador, Paul, & Shapiro, Gary. 2009.A statistical analysis of the Guatemalan National Police archive: searching for doc-umentation of human rights abuses. JSM Proceedings, Section on Survey ResearchMethods. Alexandria, VA: American Statistical Association .Roux, Brigitte Le, & Rouanet, Henry. 2009.

Multiple Correspondence Analysis . Thou-sand Oaks, Calif: SAGE Publications, Inc.30eybolt, Taylor B., Aronson, Jay D., & Fischhoff, Baruch (eds). 2013.

Counting Civil-ian Casualties: An Introduction to Recording and Estimating Nonmilitary Deaths inConﬂict . 1 edition edn. Oxford: Oxford University Press.Shi, Tao, & Horvath, Steve. 2006. Unsupervised Learning With Random Forest Pre-dictors.

Journal of Computational and Graphical Statistics , (1), 118–138.Silva, Romesh, Marwaha, Jasmine, & Klingner, J. 2009. Violent Deaths and En-forced Disappearances During the Counterinsurgency in Punjab, India: A Prelimi-nary Quantitative Analysis. .Thayer, Thomas C. 1985. War Without Fronts: The American Experience in Vietnam .Westview Press.Ward, Joe H. 1963. Hierarchical Grouping to Optimize an Objective Function.

Journalof the American Statistical Association ,58