+ Lab Experiments for Measurement in Program Evaluation Michael J. Gilligan, New York University

+ The Task Government/NGO/CBO programs wish to change participants attitudes and beliefs in particular ways Typically these program coach participants in the right set of attitudes and beliefs. Examples Pro-social behaviors: contributions to public goods, trust, tolerance, non-violence and so on Attitude and behaviors toward marginalized groups: women minorities, particular ethnic groups These programs would like to be able to measure whether their efforts have been successful

+ The Problem Randomized control trials are essential to be able to make causal statements about the effects of the program But randomized control trials are not a solution to the measurement problemindeed they are a hindrance to it. RCT programmers only operate with treated populations so only treated populations receive coaching on the right responses RCTs, the very thing that is insuring unbiasedness with respect to subject pools (balance) is introducing bias in measurement

+ Social Capital & Pro-social Attitudes

+ Definition [S]ocial networks and the norms of reciprocity and trustworthiness that arise from them. [S]ocial capital is closely related to... civic virtue.'' The difference is civic virtue is more powerful when embedded in a dense network of reciprocal social relations. A society of many virtuous but isolated individuals is not necessarily rich in social capital (Putnam 2000).

+ We are interested in measuring: Altruism Trust Trustworthiness Willingness to contribute to public goods The social networks that (purportedly) support these behaviors

+ Implications for Development Trust: crucial for cost-effective self enforcement of contracts Compliance with social norms: non- violence, compromise, fairness Contributions to public goods: essential for economic efficiency Respect for legitimate sources of authority

+ A Few Findings (among many) Putnam (1993) shows that local governments in Italy are more efficient where there is greater civic engagement. Knack and Keefer (1997) demonstrate that increases in country-level trust lead to large increases in the countrys economic growth. La Porta et. al. (1997) establish a strong positive link between trust and judicial efficiency and a strong negative link between trust and corruption.

+ Implications The World Bank and other international actors have many programs to foster social capital and pro-sociality Community-based DDR Community-driven development programs A focus on local capacity in development efforts Local ownership of development programs to foster sustainability

+ Measuring Social Capital and Social Norms These are very difficult concepts to measure In many cases they are not observed directly Indicators differ greatly across different cultures People are often unwilling to reveal behavior that is not pro-social

+ Traditional survey measures Generally speaking, would you say that most people can be trusted or that you cant be too careful in dealing with people? (World Values Survey) Would you be willing to contribute a day of free time to ? How difficult do you think it would be for your community to reach agreement on ? In the last three months have you contributed time or money to a community-based organization? Did you vote in the last election?

+ Bias concerns with surveys Programmers coach respondents in the right answers to these types of questions They do not operate in control communities at all so respondents many not even know the right answers

+ Observational Measures Number of people who voted in the last election Number of people who show up to clean up a public park Contributions to a community fund

+ The measures have great external (real world) validity but Are we measuring social attitudes or leadership strength? or intimidation? or corruption? Example: Voter turnout in the Soviet Union was routinely above 98 percent Good outcomes may be caused by the exact opposite of good institutions and pro-social attitudes

+ Structured Observational Measures Structured Community Activities (Casey Glennerster and Miguel) Funds collected in matching-grant scheme Decision making over allocating salt or batteries Allocation of tarpaulin Tuungame Project, Congo (Humpreys, Sanchez de la Sierra and van der Windt 2013) Participation in matching funds for a public good Allocation of a $100 windfall Participation in a community meeting

+ Structured Observational Measures Structured and therefore more comparable to each other Have great external validity but we still cannot disentangle individual factors (attitudes) from community-wide factors (leadership, institutions)

+ Lab-in-the Field Activities Observing behavior in a controlled laboratory setting All social pressures, political institutional effects etc., are removed by design of the experiment We observe only peoples responses to the incentives that we (the experimenters) offer them We are able to disentangle attitudes from community-wide factors

+ Loss in External Validity Community-wide factors (leadership, institutional efficiency) are excluded from the lab so we cannot obtain measures of them Thus lab activities are best combined with the other measurement methods

+ Behavioral games Three important games are: Altruism game Trust game Public goods game Our main interest is in the altruism, trust and public goods games, but we also need to conduct the other games to control for risk attitudes, patience and altruism

+ Game Instruction

+ Altruism Activity Subjects were given a sum of money Nepal; 40 NPR in 5 NPR notes Sudan: 3 pounds in half-pound coins Cambodia: 16,000 KHR in 4,000 KHR notes Subjects decide how much they want to contribute to a local needy family The identity of the family is not revealed

+ Trust/Trustworthiness Activity Subjects are randomly assigned to one of two roles: sender or receiver (we use neutral names in the field) Both types are given initial endowment of money Senders decide how much of their endowment to send to the receiver We triple that amount and give it to the receiver The receiver decides how much of this total to return to the sender All players and types are anonymous Nash: send zero, return zero Social optimum: send full endowment, return whatever is necessary to support trusting behavior

+ Public Goods Game All subjects play simultaneously Each player is given two cards, one with an X and one blank For each X card turned in in the first round all players receive an amount of money, say 4NPR Turning in an X card in the second round earns the player that turned it in a larger amount, say 20 NPR

+ Attitudes Toward Marginalized Groups

+ Examples Many programs are interested improving the status of marginalized groups, especially women Governments/NGOs/CBOs are often interested in easing (often violent) ethnic rivalries, especially in post-conflict settings

+ Same Problem RCT programmers only operate with treated populations so only treated populations receive coaching on the right responses RCTs, the very thing that is insuring unbiasedness with respect to subject pools (balance) is introducing bias in measurement

+ A Variety of Options Standard games (altruism, trust, public goods etc.) can be used to measure attitudes toward out groups groups Bracic 2013 attitudes toward Roma in the former Yugoslavia Observing behavior of deliberation, cooperation and teamwork among mixed groups Karpowitz and Mandelberg 2014 deliberation in mixed groups of men and women

+ Observing group behavior Bales Interaction Process Analysis Participants are given a task that requires a group decision or cooperation Record interactions according to a specific set of criteria to code whatever the researcher is interested in measuring (respect, hostility, etc.) The trick Not cuing participants that this is a study of in- group out-group interaction Incentivizing participants to act according to beliefs about the out-group

+ Example: Attitudes toward Gender and Ethnicity in the Liberian National Police (LNP) The government of Liberia adopted an explicit 30% quota for women in the LNP We did NOT conduct an RCT but we were interested in: testing some of the assumptions of the gender program

+ Program proponents claimed that more women would produce a variety of benefits More consensual decision making Greater sensitivity to gendered crimes Decades of social psychology findings that women would not participate fully in group deliberations.

+ The program had been underway for several years so officers new the attitudes toward female officer that they were supposed to have Thus a survey would not have been a convincing measurement strategy We had groups of size officers complete team tasks and randomized the number of female officers in each group We observed team members to see if men reacted differently in groups with more women Groups with more women deliberated more consensually and were more likely to see crime as gendered

+ Findings Female officers were not, in general, more likely to see a gendered crime but more competent women were Groups with more women members were not more likely to see a gendered crime Groups with more women were not more consensual Backlash effect: Men in majority female groups were significantly more aggressive.

+ Conclusion Programming by its very nature coaches beneficiaries in giving the types of survey responses answers the program would like to hear Randomization exacerbates this problem Behavioral measures are appealing but: Measures with high external validity can make it hard to disentangle mechanisms at the individual and community level Fine tuning individual incentives correctly get at attitudes even when subjects are cued to the right answer: monetary reward will induce people will act on actually held beliefs rather than the socially correct ones Lab-in-the field activities address both of these issues and provide an important tool for measuring the social effects of programs, at some loss of external validty

Documents

+ Lab Experiments for Measurement in Program Evaluation Michael J. Gilligan, New York University