60
Pontificia Universidad Cat´ olica de Chile Instituto de Econom´ ıa Mag´ ıster en Econom´ ıa TESIS DE GRADO MAG ´ ISTER EN ECONOM ´ IA ıaz Titelman Viviana Julio, 2021

TESIS DE GRADO MAG´ISTER EN ECONOM´IA

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

Pontificia Universidad Catolica de ChileInstituto de EconomıaMagıster en Economıa

TESIS DE GRADOMAGISTER EN ECONOMIA

Dıaz Titelman Viviana

Julio, 2021

Page 2: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

Pontificia Universidad Catolica de ChileInstituto de EconomıaMagıster en Economıa

“KNOW THYSELF”: INFORMATION DESIGN WITH ELUSIVERECEIVERS

Viviana Dıaz Titelman

Comision:

Nicolas Figueroa

Juan Pablo Montero

SantiagoJulio, 2021

Page 3: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

“KNOW THYSELF”: INFORMATION DESIGN WITHELUSIVE RECEIVERS

Viviana Dıaz Titelman∗

Abstract

In some contexts, agents might be uncertain on how their chosen actions payoff.On the other side of the market, firms can use this uncertainty to induce theirpreferred actions. We develop a dynamic principal-agent model in which the agentstrategically chooses whether or not (and to what extent) to inform herself on anunknown parameter of her utility function. Information on this parameter is valuedtomorrow but is costly today. We find that for certain prior beliefs of said param-eter, information acquisition is optimal. The principal has an objective functionthat depends on the agent’s decision and knows the agent’s optimal strategy. So,he intervenes in her chosen actions by designing distributions of posterior beliefs.In particular, we explore three instruments for the principal that determine (i) theprecision of the information the agent acquires and (ii) the chances that the agentchooses to get informed.

∗Thesis written as a Master student at the Department of Economics, Pontificia Universidad Catolicade Chile. I want to sincerely thank my thesis advisors Nicolas Figueroa and Juan Pablo Montero fortheir guidance, incredible patience and very helpful comments and discussions. I am also very gratefulto all of my friends, especially Javiera Castillo, Marıa Constanza Munoz and Diego Cussen, for theirsupport and their willingness to help and contribute throughout this process. Last but not least, myalways unconditionally loving family.

Any errors or omissions are my own responsibility. Comments to: [email protected]

Page 4: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

Contents

I Introduction 1

II Literature review 3

IIIBasic model for the demand 4

IV Student’s optimal decisions 7

IV.1 Solution in t=2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7IV.2 Solution in t=1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11IV.3 Basic ideas about the role of p . . . . . . . . . . . . . . . . . . . . . . . . 14

V The principal’s problem 16

V.1 University’s instrument: precision p . . . . . . . . . . . . . . . . . . . . . 17V.1.1 Benchmark: Kamenica and Gentzkow solution . . . . . . . . . . . 18V.1.2 Basic solution to our problem . . . . . . . . . . . . . . . . . . . . 20V.1.3 Heterogeneity in students . . . . . . . . . . . . . . . . . . . . . . 25

V.2 University’s instrument: coursework x . . . . . . . . . . . . . . . . . . . 28V.2.1 Basic solution to our problem . . . . . . . . . . . . . . . . . . . . 28

V.3 University’s instrument: menu (p, x) . . . . . . . . . . . . . . . . . . . . 30V.3.1 Full information solution . . . . . . . . . . . . . . . . . . . . . . . 31V.3.2 Under asymmetric information . . . . . . . . . . . . . . . . . . . . 33

VI Conclusion 38

Page 5: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

I Introduction

We figure some services or products, while satisfying the same need, can have verydifferent attributes. For example, different wines can be described in their balance,intensity, clarity, etc, and still all be wine. Digital pianos can have diverse specificationslike their maximum polyphony, number of speakers and pedals, and many others thatbroaden the limit of what can be played in them. Agents face this great variety of versionswhen they choose what to buy. Say you want to learn how to play the piano, or justjoined a wine club with your friends. What quality should you buy? At the time, youdo not know how much advantage you will be able to get from the piano or the wine -you still have to learn how to appreciate or use them, and you also have to invest time tolearn. Then, with some sort of belief on your performance in wine club or piano classes,you buy a product. So here is the problem: there is uncertainty on how the purchasedattributes payoff in your utility function.

This thesis extrapolates this general idea to a more specific setting: university stu-dents that are deciding how to face their education. Students must choose between agreat variety of courses and decide how much attention to exert on each one of them.Alternative courses can be different in aspects such as difficulty, dimensions in whichcontents are covered, workload - i.e, attributes of a course. We propose that courseswith better attributes can be harder to succeed in - so, quality has a price. A student’sattention is of finite capacity and so has to be distributed among all interests the studentexploits - this means there are opportunity costs to only concentrating on schoolworksuch as recreation, hobbies, paid jobs, etc. So, because attention can be allocated, astudent can choose how much of the course they have enrolled in they will take in andhow much they will overlook. The principle we stand on is that the learning experienceis better or more enjoyable in itself (1) the higher quality/more demanding a course isand (2) the more attention a student gives to said course. Therefore, in this frameworka student compares the enjoyment they could eventually get from studying versus thecosts said study plan involves.

But we add a third determinant to the learning experience: (3) the talent or affinitya student has with what they study. This talent refers to the student’s endogenousability to perform successfully - for example, we can establish that a math student hasa certain level of general talent in maths. On one hand, talented or interested studentsperceive more net enjoyment in studying. On the other hand, talent is not known, it isdiscovered: innate ability and interest in what they study is revealed as they study. We

1

Page 6: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

model these ideas by suggesting that, while talent is an important factor in a myopicutility function, only past decisions in courses and attention can give new informationto the student about her talent through Bayesian actualization. More specifically, themore a student works, the greater the chances she gets new information. This way, astudent’s coursework and attention decisions throughout their education not only affectthe learning experience statically through immediate enjoyment and costs, but can alsoaffect it dynamically by determining the chances the belief they hold on their abilityto perform successfully and enjoy what they study is updated. We combine all of theelements listed before and model a student’s optimal decision in course’s attributes andattention exerted in a dynamic setting with experimentation, where beliefs on talent areactualized period after period.

But the supply also observes what they offer and do so accordingly to certain ob-jectives. For example, a piano company can be interested in selling professional pianosbecause it is more profitable or because it is their brand, or a winery can offer a greatrange of wines for all kinds of consumers. In the same line, universities play a role inthe game as well: they can pursue objectives that have to do with how their studentsdraw their academic paths. For example, schools can have attention or course qualitygoals, they can value that students participate in sports or artistic endeavors, they canprioritize students’ health, etc. In other words, universities that are interested in maxi-mizing their own payoff can base that payoff in the students decisions: i.e, attention andcourse quality. To address this possibility, we develop a series of principal-agent modelswhere, under the premise that universities are interested in inducing attention from theirenrolled students, they intervene in their Bayesian learning process in order to maximizetheir attention. In particular, we model how universities can design posterior beliefsand information acquiring processes through different tools: first, the precision that theBayesian actualization process has - i.e, the precision of the information students receive.We model this through grading policies. Second, the coursework that the university of-fers - therefore, intervening indirectly in the frequency that Bayesian actualization takesplace, and third through a combination of menus of these two separate instruments in anasymmetric information setting.

This thesis is organized as follows. Section II explores different areas of economicliterature that can motivate and aid in the development of our setting and argument.Section III presents the basic elements and assumptions of the model we work on. SectionIV finds the optimal academic path student’s choose in the proposed model. Section V

2

Page 7: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

takes optimal student decisions as a given, and explores how the university - the principal- can induce certain actions from the student - the agent - through different instrumentsof information design and Bayesian learning process manipulation. Finally, section VIconcludes.

II Literature review

The thesis has two main parts: one concerns the student’s academic path problem,and the other develops different information design models in which the university caninfluence students decisions to reach its objectives.

Regarding the first part, there are two topics that can be important antecedents forthis thesis. First, experimentation. In our model, talent plays an important role in de-termining the student’s decisions through her value function. Essentially, she faces theBergemann and Valimaki (2006) trade off between exploitation and experimentation inBayesian learning games. This general approach has been used in several topics withineconomic problems. Somehow close to our framework, Hestermann and Le Yaouanq(2018) use experimentation to understand the theoretical costs in utility of incorrectprior beliefs of ability. Second, and perhaps more indirectly related to our model, thereis a growing literature dedicated to rational inattention. Rational inattention could bean important theoretical antecedent to our proposed setting because it could help ratio-nalize the decision for or against attention in attributes. Two important examples in theliterature are Gabaix (2017) and Mackowiak et al (2018). However, to the extent of ourknowledge, no other publication combines all of the elements we revise here.

In the second part of this thesis, we approach the university-student relationshipthrough a principal-agent problem, and we develop different models for information de-sign under complete compromise. In this context, one important benchmark we usein our analysis is Kamenica and Gentzkow (2011), which is a fundamental precedentin Bayesian persuasion that we take advantage of. In the problem of optimally de-signing grading, many interesting publications align grading systems with certain edu-cational objectives. Some of them are Feltovich, Harbaugh and To (2002) and Dubeyand Geanakoplos (2010), which model student-signaling and university-signaling devicesrespectively, and Boleslavsky and Cotton (2015), who focus on competition within univer-sities. Regarding asymmetric information in principal-agent models, there is naturallya vast amount of literature. However, to the extent of our knowledge, no asymmetricinformation paper combines all of the elements that this thesis does.

3

Page 8: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

Finally, this investigation can also contribute in the contextual o empirical literaturethat addresses education related problems like the ones we model here. One of them isdrop out rates, which can easily be linked to our model through attention - no attention inschoolwork can be a proxy for dropping out. Our model could help develop mechanismsthat control drop out rates. Closely to what we work with, Denning et al (2020) analysethe increase in college completion rates and link this with grade inflation. Babcock (2010)relates grade inflation with effort, proxied by study time. Among other things, Butcher,McEwan and Weerapana (2014) show how restricting grade inflation in Wellesley Collegeaffected the total number of applicants in treated majors. Another interesting topicconcerns what happens to students after university. Bar, Kadilayi and Zussman (2009)use a natural experiment in Cornell University to understand the link between gradeinflation and the value of the information grades confer to students, grad schools andemployers.

III Basic model for the demand

Students enroll to a certain university and have to decide how they will shape theiracademic path. Of course, a great part of this decision is what to specialize in. Universitiesusually offer a great deal of majors and degrees and students get to choose what to study.Our model focuses not on what students choose study, but on how they decide to carrytheir studies. We translate this last decision into two variables students must determineevery period of their academic path: xt ∈ [0, 1] - the difficulty or quality of the coursethey will take1; and at ∈ [0, 1] - the level of attention they will unequivocally exert on thecourse. Every student has a certain unchangeable talent σ for what they chose to study,such that σ ∈ {0, 1}, where P (σH = 1) = Q and P (σL = 0) = 1−Q.

This talent is unknown by the students. However, at any time t they have prior beliefsof their talent denoted by σt = P (σ | It), where It is the information set available at timet.

Any student’s static utility function can be described as:

ut = Et,σ[σatxt + (1− at)A− c(xt)

](1)

The first term in brackets on the RHS indicates the net enjoyment of studying and1Naturally, the set of xt depends on what the university offers. At the time, we suppose students

have access to courses that can be described over the whole set [0, 1].

4

Page 9: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

considers the product of the quality of the course, the attention exerted and the talent ofthe student. What we are suggesting with this term is that there is an intrinsic value instudying - studying can be enjoyable by itself. Notice that whenever any of the variablesincluded in this term is 0 there is no enjoyment, and that ∂(Et[σ]atxt)

∂Et[σ] > 0, ∂(Et[σ]atxt)∂at

> 0 and∂(Et[σ]atxt)

∂xt> 0. The second term considers the utility perceived from other activities that

the student can enjoy because she is not paying attention to the course and can focus onthem. Therefore, A is a positive parameter that describes the payoff of these alternativeactivities. This means that a student’s attention is finite and has to be allocated amongall the interests or projects she participates in, including schoolwork. All of the attentiona student fails to provide in her coursework is destined at these alternatives. The lastterm in brackets indicates the cost that a certain course demands because of its difficultylevel. We suppose that higher quality courses are also more demanding and so, c(xt) isa function such that dc(xt)

dxt> 0 and c(0) = 0. We also assume a convex cost function

d2c(xt)dx2t

> 0, to portray that the marginal costs of difficulty are increasing. Finally, thisutility function is an expected value of σ because students that are making decisions onattention and schoolwork do not know their talent σ, but have a belief on it. Every othercomponent of (1) is known by students.

There are two elements that add a dynamic dimension to the model. First, studentshave to choose attention and coursework in two consecutive periods and so, when choosingtheir decision variables at t = 1, they take into consideration the expected payoff of thenext period. Second, a student can learn about her talent through Bayesian actualization.This means that a student’s perceived utility in t = 1 can change the information set I2

and result in the formation of posterior beliefs, which implies that σ2 = E2[σ|u1]. As wehave suggested, posteriors in t = 2 occur following Bayes’ rule, and respond to signalsthat behave as follows:

P (σt = σt+1) = (1− atxt)

P (σt 6= σt+1) = atxt

In other words, with probability 1−atxt she does not learn anything new about her talent- her prior is not updated -, and with a probability atxt new information is acquired, andso new posteriors are formed. Notice that whenever either xt or at are 0, then thereis no information acquisition. This intends to capture that students only learn abouttheir talent when that talent is somehow tested - results with no effort in attention orno challenge in difficulty do not reveal anything about ability. Moreover, as at and xt

5

Page 10: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

are decision variables, the student somehow chooses their chances of getting a signal - inother words, they choose the probability with which they receive information.

But, this new information is not necessarily revealing of the true type of the student -signals can be imprecise or subject to error. So, in the case there is a prior update, thenit can occur in two different ways:

P (signal = i | σ = i) = p

P (signal = j | σ = i) = 1− p

where from the perspective of the student, p ∈ [12 , 1] is a set exogenous parameter. That

is, there is a probability p that the signal was truthful, and a probability 1−p that is wasnot. From now on, we will think of this parameter p as the precision of the signal. Whatthe model tries to convey here is that the learning experience or performance can alsobe determined by variables other than talent or academic path choices. In general, wethink other dimensions of a student’s life could affect their perceived utility in studying,and so σ2 = E2[σ|u1] is not unequivocal about true talent. If it were not this way, thenat the end of period t when utility is perceived, the student could simply solve (1) for σand would have perfect knowledge about her talent. Because signals can be imprecise,this is not the case.

When we take into account the whole information model we have displayed so far, wecan characterize all of the possible prior and posterior beliefs using Bayes’ rule:

σ1 = Q

σ2,H = Qp

Qp+ (1−Q)(1− p)

σ2,L = Q(1− p)Q(1− p) + (1−Q)p

(2)

Notice these priors only depend on parameters. In other words, the value of the beliefsis completely exogenous. It is clear that:

σ2,L ≤ σ1 ≤ σ2,H (3)

6

Page 11: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

IV Student’s optimal decisions

IV.1 Solution in t=2

To find the optimal academic path for any given student, we begin by solving theproblem at t = 2. We combine all of the elements listed before and state that anystudent solves:

maxa2,x2

V2(a2, x2; σ) = E2[σ]a2x2 + (1− a2)A− cx22

2 (4)

To make calculations easier, and without compromising the solution, we rewrite theproblem for t = 2 as follows:

maxa2

[maxx2

V2(a2, x2; σ) = E2[σ]a2x2 + (1− a2)A− cx22

2

](5)

Proposition 1. The solution at t = 2 is characterized by:

either a∗2 = 1 ∧ x∗2 = min{σ2

c, 1}

or a∗2 = 0 ∧ x∗2 = 0

where the first pair of values represent the working solution and the second pair the nowork solution.

All proofs are in the appendix. Before analyzing whether and when each solutionwill arise, we can extract a lot of information from the result we display. First, there iscomplete specialization in t = 2 in terms of attention and schoolwork: that is, studentseither abandon all interest in studying and take a2 = 0 and x2 = 0, or they focus all oftheir attention on schoolwork a2 = 1 and choose course quality in proportion with theirbelief. However, notice that x∗2 is bounded at the top by 1 because of how we modelledvariable xt, and so the optimal quality is restricted at the top. This potential situationleads to a second interesting conclusion: the working solution in t = 2 can be bothrestricted or unrestricted in x∗2. That is, for a given set of parameters A, c,Q, p we havethree possible solutions in t = 2: a no work solution a2, x2 = 0, an unrestricted workingsolution a2 = 1, x2 = σ2

cand a restricted working solution a2, x2 = 1. If we consider this,

we can write an incomplete form of the t = 2 value function V2 as follows:

V2(σ2, x∗2) = max

{A, σ2x

∗2 −

cx∗22}

(6)

7

Page 12: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

Where A is the value a student can obtain in the second period if she exerts attentiona2 = 0, and σ2x

∗2−

cx∗22 is the value she can obtain with attention a2 = 1 and with x∗2 yet to

determine - that is, the possible values of the working solution. This function comes fromreplacing all possible pairs of a∗2, x∗2 we listed before in the utility function (5). Naturally,this value function does not yet fully characterize what is attainable for the student int = 2. Because x∗2 = min{σ2

c, 1}, then the second term of the maximization in the value

function in (6) is:

V2(σ2, x∗2) =

(σ2)2

2c ifx∗2 = σ2c

σ2 − c2 ifx∗2 = 1

(7)

So finally, we can understand V2 completely through the analysis of three differentcases. First, consider the case where x∗2 is always unbounded - i.e, σ2

cis always smaller

that 1. This means that the relevant part of the correspondence in (7) is the first one,which in turn means that (6) has only two arguments. In this context, we can calculatethe expected value of σ that generates indifference between exerting maximum attentiona2 = 1 and no attention at all a2 = 0. That is, the belief in t = 2 that leaves the studentindifferent between working and not working in t = 2. This occurs when A = (σ2)2

2c ,and so Eind2 =

√2Ac. Second, consider the cases when x∗2 is always bounded. In this

situation, the only relevant part of (7) is the second one, and so indifference occurs whenA = σ2 − c

2 . This leads to Eind2 = A + c2 . Finally, consider a case in which there are

certain values of σ2 for which the restriction in the working solution for x∗2 is binding,and other values for which it is not. In other words, c is such that for certain values ofσ2, σ2 < c and for others σ2 > c. This means that both functions in (6) are argumentsof the V2(σ2) function in (6), and in particular, that σ2 − c

2 is relevant only when σ2 > c,and (σ2)2

2c when σ2 < c. At last, the next proposition fully characterizes all of the casesthat can occur in t = 2 for every permitted parametrization:

Proposition 2. There are three possible value functions V2(σ2):

i If c <√

2Ac, thenV2(σ2) = max

{A, σ2 −

c

2}

where A is the value of the no working solution and σ2 − c2 is the value of the

always bounded working solution. The threshold that separates the work and nowork solution is

Eind2 = A+ c

2

8

Page 13: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

ii If c >√

2Ac

(a) and c > 1, then

V2(σ2) = max{A,σ2

22c}

where A is the value of the no working solution and σ22

2c is the value of the neverbounded working solution. The threshold that separates the work and no worksolution is

Eind2 =√

2Ac

(b) and c < 1, then

V2(σ2) =

A ifσ2 <

√2Ac

(σ2)2

2c ifσ2 ∈ [√

2Ac, c]

σ2 − c2 ifσ2 > c

where A is the value of the no working solution and the value for the workingsolution can either be σ2

22c when unbound or σ2− c

2 when bounded. The thresholdthat separates the work and no work solution is

Eind2 =√

2Ac

Two main conclusions can be drawn from proposition (2). The first one is that eventhough thresholds between working and no working solutions can be different for differentparametrizations, their role in terms of how they affect the solutions for a2, x2 is the same.Notice that ∂(

√2Ac)∂c

> 0, ∂(√

2Ac)∂A

> 0, ∂(A+ c2 )

∂c> 0 and ∂(A+ c

2 )∂A

> 0. That is, the criticalEind2 [σ] is located closer to 1 the bigger A and c are, lessening the chances that the prior in

t = 2 is above that number and so attention a2 = 1 will be optimal. But also∂

((σ2)2

2c

)∂σ2

> 0and ∂(σ2− c2 )

∂σ2> 0: perceived utility in studying increases on beliefs on talent. The intuition

here is that students holding lower beliefs have worse expectations of the utility studyingmay bring them than students with higher beliefs of their talent. If a student has a certainbelief σ2 on her talent, then ceteris paribus, the chances she will allocate her attention toalternative activities through A grow as the belief σ2 tends to 0. Equivalently, the higherthe prior, the greater the probability the student works at t = 2.

The second conclusion concerns the informational consequences of the different V2(σ2)formulations. First, consider that as we suggest in the proposition and in its proof inthe appendix, which value function best describes the highest attainable utility a student

9

Page 14: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

can perceive in t = 2 depends only on the problems parameters. Specifically, on A and c.Certainly, different value functions in t = 2 will yield different specific outcomes in thedecision variables. However, regardless of the nature of the specific solution for a2, x2,all possible value functions behave the same along σ2. This is one of the most importantresults of the student’s problem:

Lemma 1. For every given parametrization, V2(σ2) is convex and continuous.

And so, by combining the information from proposition (2) and lemma (1) we candraw these functions as follows:

Figure 1: Case (i) inproposition (2)

Figure 2: Case (ii.a) inproposition (2)

Figure 3: Case (ii.b) inproposition (2)

Convexity of the value function yields a very important conclusion for our model:there is always a dynamic value of information. Whenever a student holds σ1 ∈ (0, 1)and she gets new information that updates her beliefs, she spreads her posterior beliefsσ2 in t = 2 along the graph of the functions presented above, locating them closer tothe extreme values σ2 = {0, 1}. The convex combination of the two posteriors yields aweakly higher expected utility compared to the expected utility of the initial σ1, and soinformation is good for the student. In other words, our value function reveals a taste orpreference for information. Formally:

Lemma 2. A mean preserving spread of prior σ1 is:

i Always weakly beneficial for the student in t = 2

ii Strictly beneficial for the student in t = 2 if σ2 distribution function has a positivemass above the working threshold Eind2 .

Finally, notice that which value function V2(σ2) best represents the student’s attain-able utility is also completely determined by the problem’s parameters, and that everyspecific form of the value function at t = 2 is parametric. Moreover, as we established in

10

Page 15: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

(2), the prior and the posteriors do not depend on the decision variables. This means thatdecision variables do not affect the value of the posteriors, so the informational effects ofthe academic path choices are not there.

IV.2 Solution in t=1

In t = 1, the student solves:

maxa1,x1

V1(a1, x1; σ) = E1[σ]a1x1 + (1− a1)A− cx21

2 + E1[u2(σ; a1x1)] (8)

Notice that the last term in the RHS depends on a1x1 while taking into account theexpected value at t = 1 of the utility in the next period. As we will discuss later, this lastterm represents the dynamic value of information on the model. To visualize this better,we rewrite (8) using the distribution we have given σ and the proposed signaling devicesto expand the last expected value:

maxa1

[maxx1

σ1a1x1 + (1− a1)A− cx21

2 + (1− a1x1)V2(σ1)+

a1x1 [(Qp+ (1−Q)(1− p))V2(σ2,H) + ((1−Q)p+Q(1− p))V2(σ2,L)]]

(9)

This formulation considers that: (1) with probability 1−a1x1 the student does not updateher prior, and therefore she solves for t = 2 with a belief on talent characterized by σ1.In other words, her value function at t = 2 is based on σ1 as a belief on talent. (2) Withprobability a1x1 the student does acquire new information. The received signal can begood - high talent - or bad - low talent - with complete probabilities of (Qp+(1−Q)(1−p))and ((1 − Q)p + Q(1 − p)) respectively. That is, conditional on her actually receivingthe signal with probability a1x1, she gets a positive signal on her talent with probability(Qp+ (1−Q)(1− p)), and therefore solves for t = 2 with a prior of σ2,H . Alternatively,with a probability of ((1 − Q)p + Q(1 − p)) she gets a negative signal on her talent,and faces t = 2 with a low prior of σ2,L. We should stress the point that V2 does notdepend on a1x1. There are no decision variables in any of the possible value functions fort = 2 we develop in proposition (2), nor in the beliefs displayed in (2). This means thatthe informational consequences of a1, x1 decisions only affect the probability of getting asignal.

We will move further with the solution to the student’s problem in t = 1. From now

11

Page 16: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

on, to make notation easier call:

G ≡ Qp+ (1−Q)(1− p)

J ≡ (1−Q)p+ (1− p)Q

Proposition 3. The solution at t = 1 is described by:

either a∗1 = 1 ∧ x∗1 = min{

(Q− V2(Q) +GV2(σ2,H) + JV2(σ2,L))c

, 1}

or a∗1 = 0 ∧ x∗1 = 0

Where the first pair of values represents the working solution and the second pair repre-sents the no work solution.

Notice that similarly to t = 2, the solution also shows specialization. But this solutionis not fully characterized: we can not yet determine when the student works and whenshe does not. In the working solution for t = 1, the term for x∗1 is a function of V2. Weestablished that in every possible case, this value function always had a threshold thatdetermined whether the student worked or did not work at t = 2. To consider all of thepossible solutions for (9), we must expand on the possible payoffs the student foreseesfor the next period - that is, on what V2(σ2), V2(σ2L) and V2(σ2H) can be. Because x∗1can depend on V2, the value of initial beliefs and posteriors will play a critical role in thesolution for t = 1.

To advance further in our analysis, we will examine different cases around the twopossible locations that the initial belief σ1 can have with respect to the threshold Eind2 ,considering the ordering we have defined for priors in (3).

Lemma 3. Case 1.1: If for a certain parametrization Q ≤ Eind2 and σ2,H ≤ Eind2

Then the only particular solution is

a∗1 = 0 ∧ x∗1 = 0

Therefore, the student never works in t = 1.

In this case, every possible belief at t = 2 lies below the indifference threshold wecharacterized before. This means that in every scenario the student foresees she doesnot hold a belief on herself high enough to cross over to the attention area. So there isabsolutely no gain from working in t = 1: measured as in (1), statically or myopically

12

Page 17: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

it is not optimal to work at t = 1, and dynamically, the informational value she couldacquire by working for a signal is not big enough to make her work at t = 1.

Lemma 4. Case 1.2: If for a certain parametrization Q ≤ Eind2 and σ2,H ≥ Eind2

Then the possible solutions are

either a∗1 = 1 ∧ x∗1 = min{

(Q− V2(Q) +GV2(σ2,H) + JV2(σ2,L))c

, 1}

or a∗1 = 0 ∧ x∗1 = 0

If x∗1 is unbounded, the student works in t = 1 iff

(Q− V2(Q) +GV2(σ2,H) + JV2(σ2,L)

)2 12c > A (10a)

where V2(Q) = A and V2(σ2,L) = A.If x∗1 is bounded, the student works in t = 1 iff

(Q− c

2 +GV2(σ2H) + JV2(σ2L))> 2A (10b)

where V2(Q) = A and V2(σ2,L) = A.

In this case, there is an ambiguity in the solution: there are scenarios in which thestudent works at t = 1 and scenarios in which she does not. The student does not holda positive prior of her talent in t = 1. If we calculate the solution for a completely staticproblem using (1), we will find it optimal for the student to do no work in t = 1, and soshe chooses a1 = x1 = 0. However, the dynamic value of working can eventually surpassthe static costs of working and shift her decision into a1 = 1, x1 = x∗1. This is, the studentworks in order to acquire new information about herself. In other words, the dynamicvalue of working today can be “worth the hassle” tomorrow because there is a chance thesignal will point towards a high talent. Whether this is the case or not depends on theparameters p,Q,A and c.

Lemma 5. If for a certain parametrization Q ≥ Eind2 , then the student always works att = 1.

Q ≥ Eind2 , means that the initial belief of the student is located above the threshold.As a consequence, the static problem in (1) always results in working at t = 1. Becauseinformation is always weakly beneficial for the student (lemma 2), then there is no counter

13

Page 18: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

force that could diminish the greater value of working at t = 1. Therefore, if the studentholds a high prior belief, she always works.

Consequently, the solution for the student with sufficiently positive beliefs on herselfat the beginning of t = 1 is easier to determine: they always work and exert their fullattention. The only thing left to determine is how much they work - that is, the value ofx∗1.

Corollary 1. If for a certain parametrization Q ≥ Eind2 and

1. Case 2.1: σ2,L ≤ Eind2

Then the particular solution is determined by

a∗1 = 1 ∧ x∗1 = min{

(Q− V2(Q) +GV2(σ2,H) + JV2(σ2,L))c

, 1}

where V2,L(σ2) = A.

2. Case 2.2: σ2,L ≥ Eind2

Then the particular solution is determined by

a∗1 = 1 ∧ x∗1 = min{

(Q− V2(Q) +GV2(σ2,H) + JV2(σ2,L))c

, 1}

IV.3 Basic ideas about the role of p

So far, we have analysed a student’s optimal trajectory for at, xt, fixing all of theproblem’s parameters A, c, p,Q. Because beliefs σt and second period value functionstructures V2(σ2) are always completely parametrical, analyzing changes in parameterscan be useful to understand better a student’s behavior. In particular, the variation ofone specific parameter - which in the next section will play a very important role for theuniversity - can completely shift certain kinds of students’ decisions: precision p.

In our analysis of period t = 1, we found that there were two cases in terms ofpossible allocations of attention when considering only the student’s miopic problem:either students’ static utility of working was high enough to make it optimal to work,or it was not. Whether the case was the former or the latter depended on how Q waslocated around Eind2 . In a context where parameters A, c are exogenous and given, we willcall all students whose initial belief Q lies below the threshold Eind2 unconfident students,and so naturally, confident students begin period t = 1 with a belief such that Q > Eind2 .

14

Page 19: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

In terms of what we described above and on the previous section, unconfident stu-dents are characterized because statically it is never optimal for them to choose workingsolutions in t = 1. As their initial belief is poor, a1, x1 6= 0 only diminishes utility inits purely static perspective. However, in lemma (4) we determined parametrical condi-tions for which unconfident students that hold a positive high posterior could eventuallywork in t = 1. This condition is based on the facts that only by working (understood asa1 = 1, x1 = x∗1) students can access to information about themselves and that informa-tion is valued. In other words, what guarantees the existence of cases where conditions(10a) and (10b) are satisfied is information. But not any kind of information will do forunconfident students:

Lemma 6. If Q ≤ Q0 ⇒ ∀p LHS(10) < RHS(10),If Q ≥ Q0 ⇒ ∃p(Q) : LHS(10) > RHS(10) when p ≥ p(Q)

There are two important dimensions to lemma (6). First, prior beliefs on talent Qhave to be high enough so that the chances of getting a positive signal are big enough tojustify working for a signal. The idea here is that if students are very pessimistic on thechances they will get a signal σ2H because they hold a poor prior, then the event whenthey work in t = 1 and get good news in t = 2 seems highly improbable. Therefore,“playing for good news” is most probably more costly than beneficial. Naturally, thefarther Q is located from the threshold Eind2 , then the more pessimistic the student is.Second, signals have to be precise enough. The positive belief σ2H that eventually justifiesworking in t = 1 has to be above the threshold Eind2 , and so a certain amount of precisionp is necessary to accomplish that. Moreover, optimally working in t = 1 means that thedynamic utility of information surpasses the static loss of working. Lemma (6) statesthat for a given Q, this begins happening at p. Finally, consider lemma (2). Unconfidentstudents always weakly prefer more than less information. This means that the dynamicportion of their utility function rises as posteriors spread along V2(σ2H). This happensas precision p increases.

Lemma 7. p(Q) is decreasing on Q

If p is the smallest p that can get unconfident students such that Q > Q0 working, thenit is natural that lower beliefs need more precision to result in optimal a1 = 1, x1 = x∗1.The concept is that lower Q find it costlier to work, and so more information has to bedelivered in order to make the dynamic value of working overpower the static costs ofworking.

15

Page 20: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

On the other hand, confident students always resolve a1 = 1, x1 = x∗1 because it isstatically optimal for them to work. This means that confident students always changetheir information set from t = 1 to t = 2: because the probability of getting a signalis a1x1 and they always choose the working solution, then the probability a confidentstudent will get a signal is always positive. However, this signal can point towards goodnews or bad news. If the student gets σ2H , then she keeps working in t = 2. But, if shegets σ2L < Eind2 , then it is no longer optimal to continue at work on the second period.This means that for any confident prior belief Q, there is some precision that makesthe negative belief σ2L smaller than the threshold Eind2 , in which case getting that signalmeans the student stops working. This idea is summarized in the following lemma:

Lemma 8. ∀p > 12 , there is a probability x∗1(p,Q)J(p,Q) ∈ [0, 1] that confident students

will choose a2 = 0.

Summarizing, moving precision p can potentially change the optimal decision variablesstudents choose, given parameters A, c,Q. In the previous section we fully categorizedcases in t = 1 - i.e., we called them cases 1.1, 1.2, 2.1 and 2.2 - and determined for eachone of these cases the unequivocal optimal solution. What the manipulation of parameterp can do is to unify cases 1.1 and 1.2, and cases 2.1 and 2.2. Lemma (6) states that,provided that Q is sufficiently large, then any case type 1.1 - where the student neverworked in t = 1 - can be turned into a working case 1.2 - where the student works forinformational purposes. On the other hand, lemma (8) shows that any given case type2.2 can be turned into a 2.1 case, if precision p is large enough to locate the posterior σ2L

below the working threshold. Therefore, precision p can turn unconfident students intoworking students in t = 1, and confident students into not working students in t = 2.

V The principal’s problem

Universities can have objectives that are somehow related to how their students outlinetheir academic paths2. For example, they can be interested in educating “elite” studentsand therefore offer high quality courses and aspire to full attention on them, or they can beinterested in students’ overall well being and therefore select coursework that maximizestheir utility. Naturally, every university can have various and even mixed goals and designtheir educational curriculum accordingly. In our framework, it is interesting to explore

2We address a university’s goals after selection - that is, after the university has determined whichstudents that applied can enroll in the institution.

16

Page 21: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

what happens when universities take students’ decisions such as attention a, courseworkx, or utility u into their own objective function.

As we saw in previous sections, a student’s optimal decision in attention and course-work depends on the models parameters: A, c,Q, p. We can say that three of theseparameters - A, c,Q - “describe” a certain student: A refers to a student’s payoff con-cerning alternative activities, c is a coefficient linked to the cost of studying for a courseof a specific quality, and Q has to do with the probability the student is talented, andtherefore describes her prior belief. All of these parameters could be manipulated bythe university through selection. But when we refer to post-selection objectives, then wethink of two possible instruments that affect students decisions: first, what courses x areoffered - obviously, this can restrict the set of options available in the moment of thedecision - and second, the level of noise of the signals students get - the parameter p. Inthe next sections, we will discuss what the university can accomplish by manipulatingthese two variables.

We will focus on models where universities’ objective functions are in terms of at - thismeans, they care only about the attention students exert on their courses. In particular,the university’s value function is increasing in at. Additionally, we focus most of ourcalculations on cases where the working threshold in t = 2 is such that Eind2 =

√2Ac.

Even though this arbitrary decision does change some of the specific results we obtain,general conclusions and intuitions about the effects of different university policies onstudents outcomes remain invariant for other thresholds in V2(σ2H).

V.1 University’s instrument: precision p

Because posterior beliefs on talent are so determinant to students’ academic pathdecisions, designing these posteriors can be convenient for a university. We model thisthrough grading: the idea is that students learn about their talent through the gradesthey achieve. However, grades are not necessarily a perfect signal for their talent. Inparticular, signals can be intentionally noisy if revealing talent imperfectly is beneficialfor the university. That is, with a grading policy p, the university tries to persuadestudents into at = 1 as much as it can. Notice that this modellation requires completeknowledge from the university on σ. Concretely, this would mean that a teacher receivesand corrects a student’s test, gains perfect knowledge of the student’s talent from thattest, and consequently with the optimal strategy, grades it.

But in order to successfully persuade students, there has to be some room for per-

17

Page 22: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

suasion: the agent that gets the signal has to be influentiable by it. As we concentrateon the maximization of at, we will begin the analysis focusing on the kinds of studentsthat can be induced into work through p. We have already discussed how confident andunconfident students’ decision variables can respond to changes in p. With lemma (6), weestablished relatively optimistic unconfident students can be induced into work if signalsof talent are precise enough. We also determined that confident students can be inducedout of work through p in lemma (8). Therefore, the only case in which persuading intowork is possible in our model is with unconfident students, where the determination towork in t = 1 is ambiguous.

In the next sections, we investigate the effects of grading system designs characterizedby different grading tools and in diverse environments. At first, we exclusively face theuniversity’s problem with unconfident students. Progressively, we incorporate a varietyof students into de problem.

V.1.1 Benchmark: Kamenica and Gentzkow solution

In the Kamenica and Gentzkow environment, universities would hold two instrumentswith which they can induce actions in students: good grades and bad grades. Thismeans that universities can design different grading systems for each one of these signals.However, for persuasion to take place, the signaling devices must be consistent with thestudent’s prior of her talent. This is a constraint universities must take into account intheir design: Bayesian persuasion can occur only if E2[σ] = E1[σ] = Q.3 As a consequence,universities have to layout grading strategies in which they send positive and negativesignals with certain probabilities. Formally, the university chooses:

P (good grade|σ = 1) = m

P (bad grade|σ = 1) = 1−m

P (good grade|σ = 0) = n

P (bad grade|σ = 0) = 1− n

(11)

As we have established, the only sub case where the student’s working decision isambiguous in t = 1 - and so, susceptible to persuasion - is when σ2,H >

√2Ac and

Q, σ2,L <√

2Ac, and therefore it is the only scenario in which there is any room for“confusing” the student into working. We will determine what m and n should be in

3Notice this is a given in the informational model we have elaborated.

18

Page 23: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

this environment and specific case, considering universities’ only objective is to maximizestudents’ attention at.

The first thing to establish is that, similarly to Kamenica and Gentzkow, there iscomplete commitment and so the university does not have to worry about how the studentinterprets their signal. As a consequence, it is in the university’s best interest to use goodgrades to induce high attention and bad grades to dissuade from high attention. As wehave said, the university wants to maximize the chances a certain student exerts attentionin classwork. Because the student’s solution is completely extreme in at, the university’svalue function can be drawn as follows: Universities get their maximum value when

Vuni

σ√2AC 1

Vmax

students apply themselves through at, and because at can only have two values, then theuniversity has two possible payoffs. Notice the discontinuity of the value function occursin√

2Ac, because this is the threshold between working and not working we had foundstudents considered in their decision. That is, when their belief lays below

√2Ac, the

student is in the no work zone, and when it is positioned above it, then the student works.The university wants to maximize the chances that even when a student’s prior belief

is below the threshold, the student works for the informational value of work. In thiscontext, this means that the university will grade low talented students to take themfrom the no work zone - where universities get no payoff - into the work zone - where thepayoff is maximum. In the case we analyse, because the initial belief of talent is belowthe threshold, the student would never work if she does not receive a positive signal. Inother words, the student only works if

P (σ = H|grade) >√

2Ac

19

Page 24: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

Proposition 4. In the Kamenica and Gentzkow environment, the optimal signaling de-vice for the university is

P (good grade|σ = 1) = 1

P (good grade|σ = 0) =Q(1−

√2Ac√

2Ac )1−Q

(12)

where the payoff of the university is

Vuniv = 0 if the student does not work

Vuniv = VMAX = Q√2Ac

if the student works

This signal is extremely asymmetrical for different levels of talent. This is becausein every case where the university knows the student is of high talent, it is a dominantstrategy to signal them into working. Sending the alternative message could confusestudents into the no work states, which goes against the university’s interest. However,when the university sees low talent, grades will be used to persuade students into workingby suggesting they posses a high talent. The probability n that maximizes the chances ofconfusion while respecting the student’s prior beliefs to guarantee credibility is

Q( 1−√

2Ac√2Ac

)1−Q .

Just like in Kamenica and Gentzkow’s paper, this solution leaves the student exactlyindifferent between working and not working and is optimal for the university.

As interesting as this solution is, it is important to emphasize how difficult it is toapply it to our problem. The main incompatibility is that in the Kamenica and Gentzkowsolution students always receive a signal, whereas in our model there is a distribution thatcharacterizes when the signal materializes. As we will see, this is an important additionalforce that can shift the optimal grading policy for the university.

V.1.2 Basic solution to our problem

We established that the Kamenica and Gentzkow solution is not easily suitable for theframework we are working with, mainly because in the Kamenica and Gentzkow solutionthere is no uncertainty on whether the signal is received or not. In our model, thereis a probability distribution over the chances the signal will be received. Moreover, thisdistribution depends on the student’s academic path decisions - another way to say this, isthat students choose to receive a signal with probability a1x1. Another difference is thatin our model universities only count with one instrument to induce effort: the precision

20

Page 25: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

of the signal p. While parameters Q, c and A are also important in the final decisiona student makes, these parameters are out of reach for the university after selection hasbeen made - they are student-specific. On the other hand, the precision of the signalcan be manipulated by the university when grading. Consequently, the university has todecide:

P (signal i|σ = i) = p

P (signal j|σ = i) = 1− p

In this context, a perfect precision p = 1 would unequivocally signal the studentabout her talent. Graphically, posterior beliefs σ2 are located at the extremes of thevalue functions V2 we sketched in the previous section. A completely imperfect signalp = 1

2 does not alter the student’s prior belief - that is, the posteriors match the priorexactly. This can be mathematically checked by substituting both extreme values of pin (2). So, when the university decides to design the students’ posteriors through p, itmust take into consideration that p alters: (1) the probability that the student gets asignal, and (2) the precision of the signal. This means that while universities can designa grading system that maximizes the times it confuses students into working like theKamenica $ Gentzkow solution does, they must not overlook that the signal must get tothe student to actually work as a signaling device.

The university’s value function is structured just like in the Kamenica and Gentzkowenvironment. Considering that the objective function is the same - that is, depends onlyon at -, and that we are still analyzing the ambiguous case where σ2,H >

√2Ac and

Q, σ2,L <√

2Ac, then the university in t = 2 solves:

maxp

a1 + Pr(a2 = 1)

But in the unconfident case we are observing, the student only works in t = 2 if theyworked in t = 1 - i.e, a1 = 1. Therefore, we can rewrite this equation:

maxp

1 + P (a2 = 1)

However, we know that under this positioning of prior and posterior beliefs the studentworks at t = 1 when, depending on the particular case, (10a) or (10b) are satisfied. Thisconditions depend on all of the problem’s parameters, but it only restricts the university’ssignal design through p. Therefore, considering lemma 6, we have to rewrite the problem

21

Page 26: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

as follows:maxp 1 + P (a2 = 1)s.t. p ≥ p

For our last formulation of the maximization problem, we explore further what P (a2 = 1)is. We have proved that the student will only work at t = 2 if the signal she receives ispositive. This last event occurs with probability [Qp+ (1−Q)(1− p)]. Additionally, thesignal is a signal if it is received by the student, which happens with probability a1x1.But it is also true that the student chooses a1x1 optimally, and so we finally get to:

maxp 1 + x∗1(p)[Qp+ (1−Q)(1− p)]s.t. p ≥ p

(13)

where x∗1(p) is the working solution of x1 displayed in lemma 4. This is the problem theuniversity must solve to determine the optimal p. From now on, call x∗1(p) the first partof the university’s problem and [Qp + (1 − Q)(1 − p)] the second part. Consider thatwhenever Q < 1

2 , the second part of the problem is a straight line with a negative slope,and that the value of the slope moves further away from 0 as Q gets smaller. On the otherhand, the function x∗1(p) is not as easy to understand and has a more complex shape inp. However, we can infer certain aspects of the solution to this problem by characterizingthese parts further:

Proposition 5. The two main functions of the maximization problem in (13) behave asfollows:

i ∂(Qp+(1−Q)(1−p))∂p

< 0 when Q < 12

ii ∂x∗1(p)∂p

> 0

Notice that the new bound that was imposed for Q in proposition 5 has to be compat-ible with the existence of p, considering that p comes from the condition that guaranteeswork in t = 1 (condition (10)). The first numeral of proposition 5 suggests that univer-sities want to lower p as much as they can: specifically, p∗ = p. The intuition behindthis is that lowering p increases the chances to confuse students out of a1 = 0 and intoa1 = 1. Increasing p forces the university to send a bad signal to low talent students withmore probability, and so more students choose the no work path. By choosing p∗ = p,the university minimizes the chances that these unconfident students get bad signals,and students that get good signals are completely indifferent between working and notworking. This result is inherited directly from Kamenica and Gentzkow’s modellation:

22

Page 27: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

the signal sender takes all of the receiver’s excedent to maximize its own payoff. However,there is an opposing force displayed in numeral (ii) of proposition 5, which is not foundin Kamenica and Gentzkow: decreasing the precision p lowers the times the studentsactually get the signal. So, you can have an amazingly imprecise signal that is morelikely to carry the student into the working zone, but then you lower the chances thatthe student actually gets that signal. This may not be optimal, because signaling hasto occur for confusion to take place. In other words, this function pushes p upwards top∗ = 1, where students that end up working prefer this option strictly, and bad signalsget sent much more often.

The analytic solution takes the FOC of the problem in (13), which can be succinctlyexpressed as:

x∗′(p)G(p) + x∗(p)G′(p) (14)

This last equation is very interesting. We know that x∗′(p) and G(p) are strictly positive,which means that the first term of the FOC is always positive. Regarding the secondterm, we know that x∗(p) is positive, but G′(p) is always negative. This means that, underthe comparison of these two terms, (14) can be positive or negative - that is, (13) can bean increasing or a decreasing monotone function. Finally, this can tell us a lot about thesolution to the problem: if the function the university maximizes is increasing in p andconvex, then it is always true that the solution is p∗ = 1. If the function is decreasingand convex, then it is always better to choose p∗ = p. If the function is concave, thenwe could find inner solutions. So, the question we are left with is under what parametricconditions equation (14) is greater or smaller than 0 in the extreme points for p.

The solution to this problem is very complex and not easily found. Therefore, toget results, we analysed the function in (13) and computed its solutions for differentparametrizations using numerical methods in MATLAB. Consider that while x∗1 is in theuniversity’s objective function, x∗1 is in itself determined by value function V2(σ2), as weshow in lemma (4). To correctly run the numerical estimations, then we must take intoaccount that x∗1 depends on V2(σ2), and therefore instruct the program to consider all ofthe possible structures V2(σ2) can have4. The results can be summarized as follows:

Proposition 6. Numerical solutions to the problem in (13) show that:

1. If the parametrization is such that x∗2 cannot be bounded, there are two possiblesolutions:

4Remember we only considered parametrizations for which Eind2 =

√2Ac

23

Page 28: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

i A la Kamenica & Gentzkow: the optimal decision is p∗ = p. This occurs whenQ is notoriously small.

ii Anti Kamenica & Gentzkow: the optimal decision is p∗ = 1. This occurs forhigher Q.

• In both of these cases, x∗1 induced by the university through p is unbounded

2. If the parametrization is such that x∗2 can be bounded, the solutions are:

i A la Kamenica & Gentzkow: the optimal decision is p∗ = p. This occurs whenQ,Q0 ARE relatively small and close to the working/no working threshold.

ii Anti Kamenica & Gentzkow: the optimal decision is p∗ = 1. This occurs forrelatively high values of Q that are not close to Q0

iii Interior solutions: the optimal p∗ ∈ [p, 1]. These cases occur when:

i. Q has an intermediate value

ii. the induced x∗1 is bounded

These results summarize all of the cases that were observed. First, most of the times,universities’ optimal decision for p was to choose the maximum available, p∗ = 1. Thisis an interesting result because it completely contradicts the solutions we derived in theKamenica and Gentzkow environment. This means that the most important force atplay in this context is the event of actually signaling, and not the event of negativelyor positively signaling. Optimallity points to full disclosure policies, and so universitiesshould be more interested in actually getting information through to their students, andnot into confusing them into their preferred actions.

There were some parametrizations under which the optimal policy is p∗ = p. That is,Kamenica and Gentzkow results still hold for some cases. These cases have one importantgrouping characteristic: Q is extremely close to Q0 and Q0 is close to 0. This can beexplained because the second part of the university’s objective function reaches its mostnegative slope when Q ≈ 0, and so in those cases, the second part of the function canoverthrow the strong increasing effect of x∗1. Intuitively, students with extremely lowprior beliefs may find it impossibly costly to enroll in a high difficulty course, and so if x∗1is too high, then they prefer not to work. Additionally, because prior belief Q is so low,they foresee that the probability of getting a positive signal of talent is nearly 0, and soit is harder to convince them to work for information through x∗1. In this case, becauseincreasing x∗1 pushes these students out of the work zone, the university can not ask for

24

Page 29: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

extreme difficulty and therefore decides to maximize the amount of students it confusesrather than the probability that signals get sent.

Finally, for most of the parametrizations, the optimal grading policy for the universitywas p∗ = 1 but the constraint for x∗1 ≤ 1 was violated. In those restricted cases, theoptimal decision is to take the p∗ where x∗1(p) = 1 occurs: this strategy maximizes thefirst part of the objective function - i.e., x∗1 -, and because the second part of the functionis decreasing in p∗, minimizes the cutback effect this has on the value. On these cases,we found mostly interior solutions p < p∗ < 1 and sometimes extreme solutions p∗ = p.

V.1.3 Heterogeneity in students

In previous sections, we considered a single student that held a prior Q about hertalent - or, put another way, we solved the university’s optimal grading policy for a certainmass of students that homogeneously held the prior belief Q. However, it is natural tothink that universities enroll students with very different beliefs of their talents. Whatis the optimal grading policy p when the university faces a heterogeneity in students?In this section, we will consider parametrizations such that when students are of typeQ0 < Q <

√2Ac, the optimal grading policy for the university is full disclosure p∗ = 1.

That is, we are not going to observe cases where x∗1 surpasses its permitted maximumvalue and so yields interior solutions for p∗, or where Q is so close to Q0 that optimalgrading policy is p∗ = p5.

Suppose there is a variety of students such that Q ∼ F [Q0, 1] where F is some continu-ous probability distribution function and P (σ = 1|Q) = Q. Call students with Q <

√2Ac

unconfident (U) students, and students with Q >√

2Ac confident (C) students. In thiscontext, we propose the following:

Lemma 9. There are two straightforward optimal grading policies with heterogeneous Q6:

1. If every student holds a prior belief Q such that Q <√

2Ac, then p∗ = 1.

2. If every student holds a prior belief Q such that Q >√

2Ac, then p∗ = 12 .

This lemma is actually a corollary of lemmas (6) and (8). In the anti Kamenica &Gentzkow solutions we are observing, it is always optimal to grade with full disclosure.

5This decision was made mainly because we want to focus on “anti Kamenica and Gentzkow” solutionsfor these kinds of students.

6As we explain in the proof at the appendix, numeral 1 of this lemma is sustained on the particularanti Kamenica & Gentzkow cases we observe. However, the principle with other type of solutions stillstands, because of what we proved in lemma (6): anytime students hold initial beliefs below the threshold,some precision p > 1

2 is needed to induce attention. Considering other solutions to the heterogeneous Qenvironment in this context means that p∗ = 1 may no longer be, but yet p∗ > 1

2 .

25

Page 30: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

Even though the particular full disclosure solution responds mainly to student-specificparameters, the idea that only a certain precision - in particular, a high enough precision- can get unconfident students working in t = 1 still holds. However, with any p > 1

2 thereis a portion of confident students that will get a discouraging negative signal in t = 1 andtherefore will deviate to attention a2 = 0. Therefore, we have two very different optimalgrading policies for students with high and low beliefs. The following proposition is acorollary for lemma (8) that applies the same principle to the problem with a mass ofconfident students:

Corollary 2. ∀p > 12 , there is a portion λ(Q, p) ∈ [0, 1] of confident students such that

a2 = 0.

The university faces an impossible trade off: whenever it raises p∗ away from 12 , then it

looses a portion of confident students from a2 = 1 to a2 = 0, because they get a negativesignal and their negative signal is below the working threshold

√2Ac. But whenever they

move p∗ away from 1, a mass of unconfident students cannot be persuaded into working,and so they shift from a2 = 1 to a2 = 0.

But who are the students that are susceptible to this shift? In the unconfident sectionof the distribution, there are two kinds of students: the ones that for a certain p arerepresented by cases 1.1 and 1.2. As we have stated before, 1.1 students are impossibleto persuade towards work. The students that can respond to a grading policy are theones with Q > Q0. So, set p∗ = 1 and begin lowering its value towards p∗ = 1

2 . Thestudents that are going to shift from a2 = 1 to a2 = 0 are the ones that have a prior beliefQ such that the new p∗

′ is not enough precision to get them working - i.e., Q such thatp∗ < p(Q). Because lower Q need more precision p to work, then the mass of studentsthat shifts from a2 = 1 to a2 = 0 as p moves downward starts from Q0 and moves upwardsin the distribution. Call Qucr the critical Q below which the precision p∗ is not enoughto induce work. Because we are only observing unbounded cases of x∗1, this critical valuecan be computed from condition (10a):

(Qucr − A+ (Qucrp

∗)2

2cG(Qucr, p∗)+ J(Qucr, p

∗)A)2 1

2c < A (15)

In the confident section, we have 2.1 and 2.2 case students. In this context, the lattercan be defined as the students that for a certain p still have all of their prior an posteriorbeliefs above the threshold

√2Ac, and therefore will work in every period, no matter the

precision of the signal p. The group that is susceptible to change is characterized by case

26

Page 31: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

2.1, and will be the students that hold a prior belief Q low enough that if they get asignal with that same precision p, σ2L(Q, p) <

√2Ac. This means that, just like with

unconfident students, the confident students that are going to stop working when p∗ israised from 1

2 , are the ones with belief Q =√

2Ac. That is, the most unconfident withinthe confident. To get the critical Q from which confident students work at t = 2, we solveσ2L(Q, p) =

√2Ac and get the following:

Qccr = p∗√

2Ac1− p∗ −

√2Ac(1− 2p∗)

(16)

Notice Qccr is increasing in p. Moreover, if p = 1 then Qccr = 1, and if p = 12 then

Qccr =√

2Ac.When the university decides its optimal grading policy, it must consider all of the

movements we just described. Just like in both cases we analyzed before, the universityis interested in maximizing the attention it gets from all of its students. The problemthe university solves is:

maxp

ac,t=1 + ac,t=2 + au,t=1 + au,t=2 (17)

whereac,t=1 = 1(1− F (

√2Ac))

ac,t=2 = 1[(1− F (

√2Ac))−

∫ Qccr(p)√

2Acx∗1(p,Q)J(p,Q) dF (Q)

]

au,t=1 = 1[F (√

2Ac)−∫ Qucr(p)

Q0f(s) ds

]

au,t=2 = 1[ ∫ √2Ac

Qucr(p)x∗1(p,Q)G(p,Q) dF (Q)

](18)

The solution for this problem is not evident or straightforward. Naturally, it will de-pend on things such as the particular distribution function of the beliefs, or more specificobjectives in attention that a university may have - e.g, a university could prioritize confi-dent student’s attention over unconfident student’s attention, or vice versa. However, wealready have some information on the variables that are considered in this maximizationproblem, and can therefore characterize a little more profoundly the forces that are partof it. First, we know that ac,t=1 = 1. Second, the integral in ac,t=2 has as an upper limitQccr(p), which is increasing in p. Therefore, as p∗ increases, ac,t=2 decreases. The firstfunction inside the integral corresponds to the optimal x∗1 of case 2.1, which is increasingin p, and therefore also lowers ac,t=2 as p increases. J is an increasing function in p when-

27

Page 32: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

ever Q < 12 . Therefore, the effect here is more ambiguous as it depends on Q. Regarding

au,t=1, the upper limit of the integral is decreasing in p, and so the whole term au,t=1 isincreasing in p. Finally, in au,t=2 the lower limit of the integral is decreasing in p, whichenlarges the area to integrate. The function x∗1 corresponds to the optimal solution for1.2 case students, and as we are observing only p∗ = 1 solutions in this section, thenx∗1G(Q, p∗) is an increasing function in p.

V.2 University’s instrument: coursework x

As we said before, in our model universities have at hand certain instruments that canshape the optimal decisions of their students. In this next section, we imagine that whileuniversities still only care about the level of commitment at that students have with theircoursework, they can only maximize this attention designing the spectrum of difficultyof the courses they offer in every period t, [x, 1].

The idea here is that precision p is an exogenous parameter both for students andschools. This means that it is no longer in the university’s ability to transform or manip-ulate the value of posterior beliefs. Picking [x, 1] has two effects on students decisions:first, it determines statically for every period the value they could attain if they decideto work. Second, it dynamically can alter the probability that students get signals. Thatis, determining [x, 1] will have consequences both on the static utility students perceivein any period, and on the dynamic utility students perceive in t = 1. Therefore, eventhough belief values are not at hand for the university, the probability that new beliefswill be formed somehow is.

Just like in previous sections, we will concentrate on one case of type 1.2: that is,one unconfident student with belief Q that, while does not wish to work statically, caneventually be persuaded through information into working at t = 1.

V.2.1 Basic solution to our problem

Suppose Q0 < Q <√

2Ac, and fix p ∈ [p, 1]. The university wants to design an offer incoursework that will make this student exert at = 1. Because this student is unconfident,she statically does not work in t = 1. And, if this is true, then she conserves her prior forthe next period: Q = σ1 = σ2. Therefore, the university must choose x in order to inducework in t = 1 so that a signal will be received. The probability or receiving a signal ismaximized when x∗1 = 1, and so the school might be tempted into offering only the mostdifficult courses. However, there is a chance that with x = 1 this student does not work

28

Page 33: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

in t = 1: a∗1 = 1 depends on the fact that the working utility in t = 1 was greater thanthe no working utility in t = 1. That is, that the function in (9) must be maximizedgiven that a1 = 1. Therefore, the problem the university solves is the following:

maxx 1 + x(p,Q,A, c)G(p,Q)s.t. x ∈ argmaxy≥x

[−c2 x

21 + x1(Q−GA+GV2(σ2H)) + A

] (19)

Alternatively, we can say that the university wants to choose the maximum x suchthat a∗1 = 1. In our analysis of the solution for the maximization problem in (9), wedetermined that the objective function was concave in x1. Naturally, the optimal x∗1 is alocal maximum of the objective function in x1. However, to get the student working int = 1 it is not necessary that x1 = x∗1: x∗1 is merely the favorite x1 for the student. Foran unconfident student, any pair a1, x1 such that a∗1 = 1 is optimal, can induce work int = 1, provided that is suffices:

Qx1 −c

2x21 + (1− x1)A+ x1(GV2(σ2) + JA) ≥ 2A (20)

Which is precisely what the restriction in (19) observes and is just a reformulation ofconditions (10). But now, x1 is no longer a decision variable for the student. Thisproblem simply states that the utility (understood as in (9)) that the student gets byexerting attention at = 1 and accepting the imposed x1 must be bigger than the utilityshe gets abandoning work entirely. Now, because the university wants to maximize thechances a signal will get through to the student, optimally it must choose the greater xthat complies with (20). In other words, it can just solve for x1:

Qx1 −c

2x21 + (1− x1)A+ x1(GV2(σ2) + JA) = 2A (21)

By solving (20), she leaves the student perfectly indifferent between working and notworking. Notice this equation is quadratic. Naturally, this will most probably yield twosolutions for x1. Because the university prefers greater than lower x1, it is clear that itis optimal to choose the higher computed root (watching for bounds). From now on, wewill call this solution x.

In general, the idea is that universities can always ask students for “a little more effort”in coursework without compromising their decision to work - that is, they can costlesslytake all of the excedent that students would have kept for themselves if they would havebeen allowed to choose their optimal coursework. Naturally, this has a breaking point

29

Page 34: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

from which the university just “asks for too much” and students decide to remove theirattention from schoolwork. However, computing the highest possible inforced xt such thatstudents are still working can be very useful for universities with attention objectives.

V.3 University’s instrument: menu (p, x)

Suppose the university can now influence the students’ decisions through x and p.That is, it can determine both the range of courses it offers and its grading policy inorder to increase student’s attention in coursework at. To reduce the amount of casesthat must be analysed, we will focus on the particular parametrical setting where V2(σ2)is such that x∗2 is never bounded.

We saw from our previous analysis with p as an instrument that, when faced withsome sort of heterogeneity in students, designing one grading system for all students canbe costly for the university in terms of attention: because optimal grading policies areso different when students hold both low priors and high priors, for every policy p theuniversity looses attention from either confident or unconfident students (or even both).In this context, the school can benefit from designing different attention-maximizingmenus or contracts (pi, xi) that target specifically high or low confidence students, andtherefore have them self-select into their designed menu.

Suppose that for a given parametrization in A, c there is a mass of students that isdistributed over only two points of the prior belief line: Q0 < Qu <

√2Ac < Qc. where

the proportion of Qu students is µ. The offered contracts will be (pu, xu), (pc, xc). Becausethe university chooses the x that will be offered, x is no longer a decision variable for thestudents, and therefore in t = 1 unconfident students solve:

uu = maxa1

Qua1x+ (1− a1)A− cx2

2 + a1x[Qup+ (1−Qu)(1− p)]V2(σ2H,u)

+ (1− a1x[Qup+ (1−Qu)(1− p)])A

and confident students solve:

uc = maxa1

Qca1x+ (1− a1)A− cx2

2+ a1x ((Qcp+ (1−Qc)(1− p))V2(σ2H,c) + ((1−Qc)p+Qc(1− p))A)

Just like in previous sections, we will consider the case when the working solution fort = 1 (a1 = 1) is a given for both kinds of students: that is, we will focus on the problem

30

Page 35: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

the university solves to maximize attention in t = 2, provided that there was attention int = 1. In the confident student’s case this is not problematic, because we know that thiskind of student always works at t = 1. However, with the unconfident student, solvingfor t = 2 under a1,u = 1 requires considering constraints at t = 2 that guarantee a1,u = 1retroactively. So, the university solves the following problem at t = 2:

max(pu,xu),(pc,xc) xu[Qupu + (1−Qu)(1− pu)] + (1− xc[Qc(1− pc) + (1−Qc)pc])s.t. uu(pu, xu) ≥ uu(pc, xc)

uc(pc, xc) ≥ uc(pu, xu)uu(pu, xu) ≥ 2A

uc(pc, xc) ≥ A+ Q2c

2c(22)

The first two restrictions are incentive compatibility restrictions for unconfident andconfident students respectively, and the last two are participation constraints (that is,they guarantee a1 = 1 for any student). To solve this problem, we will first consider thefull information case where incentive compatibility does not have to be verified becausethe university knows students’ types.

V.3.1 Full information solution

With full information, universities know perfectly which kind of student they arefacing. Therefore, students cannot mimic each other in order to trick the university intooffering them the contract that was not designed for them. For the unconfident student,the university solves

max(pu,xu) xu[Qupu + (1−Qu)(1− pu)]s.t. uu(pu, xu) ≥ 2A

σ2H,u(Qu, pu) ≥√

2Ac

The second condition is guaranteed by the context of the problem we are observing,where Q0 < Qu <

√2Ac < Qc and so unconfident students are persuadable into work. As

we discussed in previous sections, the first condition is true every time p∗u ≥ p. Therefore,we can rewrite this problem as follows:

max(pu,xu) xu(pu)[Qupu + (1−Qu)(1− pu)]s.t. pu ≥ p(Qu)

31

Page 36: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

Somehow, this problem is a combination of situations we seen before. First, from ouranalysis of p as an instrument, we concluded that provided that Q > Q0, differentparametrizations lead to different optimal elections of p∗ ∈ [p, 1]. However, we addedthat in every parametrization it was always optimal to induce the highest x∗1 possiblethat could make students work. Higher or lower p∗ depended on how much x∗1 could beinduced while still guaranteeing work. We know from when we analysed the university’soptimal policy when xt was the exclusive instrument, that for a given p > p the univer-sity can still make students work in t = 1 by pushing x1 further form x∗1. That is whatwe denominated x: the maximum x1 that can still optimally make the student work.Therefore, the university’s problem can be understood as:

max(pu,xu) xu(pu)[Qupu + (1−Qu)(1− pu)]s.t. pu ≥ p(Qu)

And so the optimal first best contract for these students is (pu, x(pu)).For confident students, the problem is:

max(pc,xc) 1− xc(pc)[Qc(1− pc) + (1−Qc)pc]s.t. uc(pc, xc) ≥ A+ Q2

c

2c

σ2L,c(Qc, pc) ≤√

2Ac

Just like with the unconfident student, the second restriction is guaranteed by how wehave described the setting of the problem. However, the first restriction in this case is agiven: we know that students with prior belief above the threshold always work in t = 1,which means their a1 = 1 utility is always greater that their a1 = 0, which is exactly whatthe first constraint measures. This means we have a perfectly unconstrained problem thatonce again, we have somehow already analysed in previous sections: the objective functionis decreasing on the probability that the negative signal appears because negative signalscan shift students from working to not working in t = 2. However, if the bad signal doesnot update the good enough prior belief Qc this student has, then no matter what signalshe gets (or even if she gets a signal) she still works at t = 2. Therefore, the optimalgrading policy for these kinds of students is pc = 1

2 , with xc any x ∈ [0, 1] such thatit is optimal to work. In particular, and taking into consideration that the next stepof the analysis is without full information, the university can offer the favorite kind ofcoursework for confident students to minimize the chances these students will cross overto unconfident student’s contracts. That is, x∗1(p = 1

2) = min{Qc, 1}

.

32

Page 37: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

V.3.2 Under asymmetric information

Without full information, universities cannot identify which kind of student is which,and so the first best menus we found in the previous subsection have to be analysed underthe conditions of incentive compatibility. In this order, the problem the university mustsolve is once again

max(pu,xu),(pc,xc) xu[Qupu + (1−Qu)(1− pu)] + (1− xc[Qc(1− pc) + (1−Qc)pc])s.t. uu(pu, xu) ≥ uu(pc, xc)

uc(pc, xc) ≥ uc(pu, xu)uu(pu, xu) ≥ 2A

uc(pc, xc) ≥ A+ Q2c

2c(23)

First, we know that unconfident students will never deviate and solicit confidentstudent’s menus, because confident students receive no precision p at all in the university’sfirst best. As unconfident students work in t = 1 only for informational purposes, thenuninformative menus that additionally include x 6= 0 are not attractive at all. That is,we can disregard the first incentive compatibility restriction of (23) for now.

Regarding confident students, the problem is not so simple. A priori, it is not obviousthat these students will always choose not to deviate. Unconfident students’ menu hasan important precision and some workload such that xu 6= 0. Even though confidentstudents always work in t = 1, they still value information, and therefore the precisionthat unconfident students get in their contract can be tempting. Resolving the incentivecompatibility restriction for confident students yields:

Q2c

2c ≥ Qc −c

2 +[(1−Qu)pc +Qc(1− pu)

]A+ Q2

c

2c

[p2u

Qcpu + (1−Qc)(1− pu)

](24)

A priori, determining analytically how the forces at play in the previous equation willresolve for given parametrizations is not easy, because the problem is multivariate andvariables interact with each other in many different ways. Consequently, in order to de-termine under what parametrical conditions equation (24) is satisfied, we ran a MATLABprogram where we tested (24) for different combinations of parameters7. The results aredisplayed in the following proposition:

7Remember that in this section we are only analysing parametrizations for which x∗2 is never bounded.

These cases occur when c >√

2Ac ∧ c > 1, which are very restricting conditions. Naturally, this limitsstrongly the generality of our more specific conclusions. However, as we will argument further on, theystill comment on how the variables that are at play here interact, no matter the shape of V2(σ2).

33

Page 38: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

Proposition 7. Under the specific parametrizations considered, incentive compatibilityin (24) was never sufficed.

Naturally, these results can be a consequence of the different constraints that wereimposed due to the specific cases for which we defined the menu problem. In particular,consider that in the cases we observe here c < 1 and c >

√2Ac, where

√2Ac ∈ [0, 1].

As a consequence, our program only considered low values for c and A. Because forevery computed parametrization incentive compatibility was not sufficed, it is difficult toconduct a finer analysis into under what conditions incentive compatibility of (24) canand cannot occur. However, we can give an intuition on what happened based on a veryfrequently repeated result: in almost every case, the first best for the university was todefine pu = 1, and if it was not, then pu ≈ 1. That is, the unconfident menu was of fulldisclosure, whereas the confident menu had absolutely no informational value. This mayhave been too strong an effect to keep confident students away from unconfident menus.Finally, these results imply that the university cannot implement the first best as it isdepicted in the full information section, and so these full information contracts will haveto be manipulated.

A first completely costless alternative for the university is to select pc ∈ [12 ], p, where p

is such that σ2L,c(p) =√

2Ac. We say this manipulation is costless because providing moreprecision in grades does not require an inversion from the university: giving precision p

is “free”. Moreover, if we focus on pc ∈ [12 , p], we are never producing a negative posterior

that can induce confident students out of work in t = 2: at a maximum, p leaves themexactly indifferent between working and not working. Therefore, even though this profileof menus is not the first best we found with full information, it is a costless alternativefor the university and could eventually eliminate problems in incentive compatibility.Considering that now the menu targeted to confident students is (pc, xc), the incentivecompatibility constrain of the confident student Uc(xc, pc, Qc) ≥ Uc(xu, pu, Qc) can bewritten as:

34

Page 39: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

Qcxc(pc)−c

2(xc(pc))2 + (1− xc(pc))Q2c

2c +

xc(pc)(Qcpc + (1−Qc)(1− pc))

(Qcpc)2

((Qcpc + (1−Qc))212c+

((1−Qc)pc + (1− pc)Qc)(Qc(1− pc))2

(((1−Qc)pc + (1− pc)Qc))2 ) 12c

≥ Qcxu(pu)−

c

2(xu(pu))2+

[(1−Qc)pu + (1− puQc)]A+ Q2c

2c

[p2u

Qcpu + (1−Qc)(1− pu)

](25)

However, there are two possible downsides to this mechanism. First, it can alwaysbe the case that the maximum p is not enough to comply with incentive compatibilityfor the confident person. In that case, the university will have to costly distort eitherthe menu of confident or unconfident students. A second downside is that as we providemore precision p in the confident student’s menu, we make it more tempting for theunconfident student: that is, as we move further away from pc = 1

2 , we progressivelyincrease the chances that the unconfident student deviates and mimics the confidentone. Even though the confident student’s menu will always induce work in t = 1, -i.e, xc 6= 0 -, the chosen xc(pc) can be lower than xu(pu). Therefore, the violation ofincentive compatibility of unconfident students can decrease the probability that there isa signal in t = 1. The incentive compatibility constraint Uu(xu, pu, Qu) ≥ Uu(xc, pc, Qu)for unconfident students is:

Quxu(pu)−c

2(xu(pu))2+xu(pu)[Qupu+(1−Qu)(1−pu)][ (Qupu)2

(Qupu + (1−Qu)(1− pu))2

] 12c

+ xu(pu)[(1−Qu)pu + (1− pu)Qu]A ≥ Quxc(pc)−c

2(xc(pc))2

+ xc(pc)[(Qupc + (1−Qu)(1− pu))

( (Qupc)2

((Qupc + (1−Qu)(1− pu)))2

) 12c

+ [(1−Qu)pc + (1− pc)Qu]A]

+ (1− xc(pc))A (26)

To solve this second costless problem, we conducted a MATLAB program that yieldedthe following results:

Proposition 8. The strategy in which the university manipulates the confident student’s

35

Page 40: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

first best contract by choosing pc ∈ [12 , p] has three possible outcomes:

i There is a pc ∈ [0.5, plimitc ] such that ICc and ICu are satisfied.

• Occurs for high Qc

• This possibility is maximized when Qu and Qc are far apart from each other.

ii From a certain pc ∈ [0.5, plimitc ] both ICc and ICu are not satisfied.

• This occurs when Qu is high - i.e, close to the threshold.

iii There is no pc ∈ [0.5, plimitc ] such that ICc is satisfied.

• These cases are maximized for lower Qc

Proposition (8) depicts the diversity of results that the costless strategy we designedhas in terms of incentive compatibility. Numeral (i) of proposition (8) shows the successfulcases for the university. That is, they can design a pair of menus (pu, xu), (pc, xc), haveboth confident and unconfident students self-select into these menus and they can achievethis cost free. Numeral (ii) shows the first kind of failure: before confident student’sincentive compatibility constraint can be fulfilled, unconfident students want to deviate.Even though this is not costly in terms of attention in t = 1 it can be in t = 2, because xcis not the optimal xu of the first best, and so the probability that unconfident students getsignals decreases. Lastly, numeral (iii) is the second kind of failure. No possible costlesspc was enough to have confident students self select into their contract. This means thateither confident or unconfident contracts will have to be distorted and this will reducethe amount of attention the university can induce.

The fulfillment of conditions (25) and (26) comes from the comparison of the utilitiesstudents get when they accept the different contracts available. Therefore, by measuringhow these utilities change with variations of the parameters they are defined in, we canunderstand better the results in proposition (8). A first important result is that undercertain parametrizations, (25) can eventually be satisfied. In the data we have createdthrough MATLAB, we see that when we fix Qu, Qc, c, a key to incentive compatibility is ahigher A. And, when we fix Qu, Qc, A, increasing the value of c also eventually generatesincentive compatibility. We can understand this better by, ceteris paribus, taking thederivative of the confident student’s utility in (9) with respect to A and c separately. Todo this, we use the envelope theorem. One first result is that

∂U1,c

∂c= −x

2

2

36

Page 41: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

This means that the way in which xu, xc change within variations of c is the key tounderstanding sudden incentive compatibility compliance. MATLAB results show that,in the vicinity of the change in c that triggered the satisfacion of incentive compatibility,both of these variables decrease as c increases, and so the difference that makes incentivecompatiblity possible is one of degree. Regarding A, the derivative of the utility functionof a confident student using the envelope theorem yields

∂U1,c

∂A= x((1−Q)p+ (1− p)Q)

Results from MATLAB show that while pu = 1 for every A, as A increases, pc and xc

decrease. Consider that ((1−Q)p+(1−p)Q) is decreasing with p when Qc >12 - which is a

comfortable assumption with confident students. This would mean that with increasingvariations in A, ((1 − Q)p + (1 − p)Q) would increase. While this effect should pointtowards less utility in changes in A, it may be the case that the decreasing force of xccompensates this loss of utility enough so that incentive compatibility can be assured.

When we make this same kind of analysis through variations in Qu, Qc, we can statethe following: (1) ceteris paribus, raising Qc always leads to more completion of incentivecompatibility. This occurs because higher p are attainable and can eventually make theconfident student choose her contract. The derivative with respect to Q in (9) is muchharder to read even when using envelope, but simulations in MATLAB consistently showthis is the case. (2) Variations in Qu, ceteris paribus, also show a pattern. Whenever Qu

decreases, xu decreases, while xc remains the same. Eventually, in the downward move-ment of Qu, incentive compatibility is satisfied. Because the derivative ∂U

∂Qis not easily

readable, we cannot understand this phenomenon through changes in utility. However,the results point in the direction of a reduction in xu such that the utility said courseworkprovides is no longer greater than the designed contract for confident students, which inthe vicinity of the change in Qu that satisfies incentive compatibility, is always such thatxc > xu. In other words, xu can be too low for a confident student.

The next step would be to find the optimal distortion of menus (pc, xc), (pu, xu) suchthat incentive compatibility constraints can be fulfilled and total exerted attention ismaximized. There is a lot of literature that solves these kinds of problems in contextsof asymmetrical information. In general, an important condition that permits a smoothresolution of these problems is the satisfaction of the single crossing property. Thisproperty guarantees that the utility functions of the agents are such that the agents’TMSS are always increasing or always decreasing in their types. Laffont and Martimort

37

Page 42: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

(2001) describe this mathematically as

∂θ

(UqUt

)> 0 (or < 0) ∀t, q, θ (27)

Where θ is the agent’s type, q is the good the agent produces, t is the transfer theprincipal gives the agent for the good and Uq, Ut are the marginal utilities of q or trespectivelly. However, this property is not evidently or even presumably satisfied in ourmodel. First, because transferences and goods are mixed variables in our model. Thevariables that are at play here do not have unidimensional roles. Second, and howeverdeeply linked with the last argument, the structure of our utility functions are suchthat stating that crossed second derivatives always have the same sign is not possiblea priori. Because exchange variables play different roles in different components of theutility function, then marginal utilities are very difficult to understand. However, thisproblem could eventually be solved - single crossing properties allow for easier solutions toasymmetrical information problems, but are not a necessary condition for the existenceof these solutions. For example, Araujo and Moreira (1999) show how even withoutsingle crossing properties it is possible to characterize the implementable contracts inadverse selection and moral hazard games. The authors explain that in order to do so,local incentive compatibility constraints are not sufficient and global constraints must betaken into account, changing absolutely the shapes of offered contracts. Regrettably, thesolution to this problem exceeds the scope of this particular investigation.

VI Conclusion

With the intention to understand better what can happen when agents make con-suming decisions under the uncertainty of the payoff they will get from their choices, wedevelop a model that transfers this problem to a very specific context: university studentsthat are simultaneously exploring what they could eventually value and making costlydecisions today without knowing what they value.

In our model, students decide what variety of courses to take and how much they wantto exploit them through attention. The uncertainty element is that students do not knowtheir natural ability to actually take advantage of - and therefore, enjoy - the courses theytake. We add a Bayesian learning dimension by suggesting that their beliefs on naturalability can be updated with a certain precision as they study. Specifically, the morethey study, the higher the probability they will update their beliefs on themselves. Our

38

Page 43: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

students face a strenuous trade off: investing in costly work today and getting informationon their talent, or maximizing static utility and keep their initial vague beliefs. In sectionIV of the paper, we investigate possible outcomes this problem may result in. Our resultscan be summarized in three main conclusions: first, that students’ optimal academic pathdecisions are sensible to the initial belief they carry about themselves. Second, that theaccuracy of the signal they receive is substantial in their optimal decision. Third thateven though information is always valuable to students, in some cases they can optimallychoose not to receive any information at all because acquiring information is too costly.

As sensible as students can be to their initial beliefs statically, effectively deliveringprecise information can eventually shift their prefered action, without changing this initialbelief. This last idea opens the door to the university’s participation in the student’sdecision making process. We assume universities are interested in inducing the highestattention possible. Section V inquires on how universities can effectively intervene in theacademic path choices their students make by inducing them into accepting information.In this section, we explore different kinds of tools through which schools can alter theBayesian learning process of students in order to maximize the attention they decide toexert on schoolwork. We find that through grading policies that affect the precision ofthe signals students get, and the range of courses that the university decides to offer,schools can successfully “force” students into academic decisions that they would havenot chosen without this manipulation. One important question that was left unansweredwas the characterization of optimal designed menus in precision and coursework thatcould induce attention in students with positive and negative prior beliefs of their talent.This is an interesting path to follow in next investigations.

But let us go back to the initial inquiry: what can happen when agents face a varietyof options and have to choose with no knowledge on how they will enjoy their choices?What we have shown here is that when information actualization can happen, then agentscan eventually learn what they like: all of our students that decided to work in order toget information did so because it was optimal to work, regardless of the nature of the newsthey would receive - i.e., if they are talented or not. Moreover, information acquisitioncan be enforced: in a principal-agent environment, by offering contracts that incrementedthe probability of signal occurrence or the precision of the signals, universities managedto induce students into learning what they like. Sometimes, we need a little push to getus to “know (our)thyselves”.

39

Page 44: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

References

Araujo, A., Moreira, H. (1999): “Adverse selection problems without Spence Mirrlees

condition”. mimeo, Institute of Pure and Applied Mathematics (IMPA)

Babcock, P. (2010). “Real costs of nominal grade inflation? New evidence from student

course evaluations”. Economic Inquiry, 48(4): pp. 983-996

Bar, T., Kadiyali, V., Zussman, A. (2009). “Grade information and grade inflation: the

Cornell experiment”. The Journal of Economic Perspectives, 23(3): pp. 93-108

Bergemann, D., Valimaki, J. (2006). “Bandit problems”. Cowles Foundation Discus-

sion Paper No. 1551. Online at: https://cpb-us-w2.wpmucdn.com/campuspress.

yale.edu/dist/3/352/files/2012/01/bandit.pdf

Boleslavsky, R., Cotton, C. (2015). “Grading standards and education quality”. American

Economic Journal: Microeconomics, 7(2): pp. 248-279

Butcher, K., McEwan, P., Weerapana, A. (2014). “The effects of anti-grade-inflation

policy at Wellesley College”. The Journal of Economic Perspectives, 28(3): pp. 189-

204

Denning, J., Eide, E., Mumford, K., Patterson, R., Warnick, M. (2020). “Why Have

College Completion Rates Increased? An Analysis of Rising Grades”. NBER Working

Paper Series, No. 28710

Dubey, P., Geanakoplos, J. (2010). “Grading exams: 100, 99, 98... or A, B, C?”. Games

and Economic Behavior, No. 69: pp. 72-94

Feltovich, N., Harbaugh, R., To, T. (2002). “Too cool for school? Signalling and coun-

tersignalling”. The RAND Journal of Economics, 33(4): pp. 630-649

Gabaix, X. (2017). “Behavioral inattention”. Handbook of Behavioral Economics. Online

at: https://scholar.harvard.edu/files/xgabaix/files/behavioral_inattention.

pdf

Hestermann, N., Le Yaouanq, Y. (2018). “It’s not my fault! Self-confidence and exper-

imentation”. CESifo Working Papers, No. 7501. Online at https://papers.ssrn.

com/sol3/papers.cfm?abstract_id=3338858

Kamenica, E., Gentzkow, M. (2011). “Bayesian persuasion”. American Economic Review,

101, pp. 2590-2615

Laffont, J., Martimort, D. (2001). “The Theory of Incentives : The Principal-Agent

40

Page 45: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

Model” Princeton University Press, doi:10.2307/j.ctv7h0rwr

Mackowiak, B., Matejka, F., Wiederholt, M. (2018). “Rational inattention, a disciplined

behavioral model”. CEPR Discussion Paper No. DP13243. Available at: ssrn.com/

abstract=3266436

Mas-Colell, A., Whinston, M., Green, J. (1995). “Microeconomic Theory,” OUP Cata-

logue, Oxford University Press, number 9780195102680

41

Page 46: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

Appendix

Proof. Proposition 1Notice that (5) is linear in a1 and concave in x2. This means that FOC and SOC forx2 are satisfied, and therefore there can be interior solutions for x2. Maximizing for x2,FOC = 0 yields:

x∗2 = E2[σ]a2

c

We solve (5) for a2 with x∗2:

maxa2

(E2[σ])2(a2)2

2c + (1− a2)A (28)

this problem is a convex quadratic function in a2. Therefore, the are two maximumcandidates in the domain: the extreme values a2 = 0 y a2 = 1. If we replace the possiblesolutions of a2 in the term for x∗2 specified above, we get the following values:

if a2 = 0 then x2 = 0if a2 = 1 then x2 = E2[σ]

c

And because we suppose x2 ∈ [0, 1], then

if a2 = 0 then x2 = 0if a2 = 1 then x2 = min

{E2[σ]c, 1}

Proof. Proposition 2

i If c <√

2Ac, then x∗2 in the working solution becomes bounded at σ2 = c <√

2Ac,and therefore the working solution for x2 is bounded by 1 for every σ2. This meansthat the student’s optimal decisions are characterized by

either a∗2 = 1 ∧ x∗2 = 1or a∗2 = 0 ∧ x∗2 = 0

If we evaluate the utility function at (5) with these possible solutions, we get

V2(σ2) = max{A, σ2 −

c

2}

42

Page 47: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

The breaking point of function V2 here is the unique point where the lines intersect:

A = σ2 −c

2Eind2 [σ] = A+ c

2

Notice A > 0, c > 0 and σ2 ∈ [0, 1]. This means that there are parametrizationsthat make both A and σ2 − c

2 possible maximums.

ii If c >√

2Ac and c > 1, then x∗2 in the working solution becomes bounded atσ2 = c > 1. Because σ2 support is [0, 1], then x∗2 will not be bounded for any σ2.This means that the student’s optimal decisions are characterized by

either a∗2 = 1 ∧ x∗2 = σ2c

or a∗2 = 0 ∧ x∗2 = 0

We evaluate the value function (5) with each of these two candidates, and we get

V2(σ2) = max{A,

(σ2)2

2c}

In the situation we have described, this breaking point occurs when the value ofworking is exactly the same as the value of not working:

A = (σ2)2

2cEind2 [σ] =

√2Ac

Because A > 0, c > 0 and σ2 ∈ [0, 1], there are parametrizations that make both Aand (σ2)2

2c possible maximums.

iii (a) Consider the two possible value functions for the working solution, V2(σ2) =(σ2)2

2c and V2(σ2) = σ2 − c2 . In the domain σ2 ∈ [0, 1], these functions cross at

only one point:

(σ2)2

2c = σ2 −c

2σ2 = c

when c ∈ [0, 1). Now consider ε ∈ R. Notice that when σ2 = c,

∀ε such that (σ2 − ε) ∈ [0, 1], (σ2 − ε)2

2c > (σ2 − ε)−c

2

43

Page 48: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

This means that the function V2(σ2) = (σ2)2

2c is strictly greater than V2(σ2) =σ2 − c

2 for all σ2 6= c, and so when σ2 < c, the value function of the student isbest described by V2(σ2) = (σ2)2

2c . However, beyond σ2 = c, x∗2 = 1 ∀σ2 andso the relevant function is V2(σ2) = σ2 − c

2 .

(b) If c >√

2Ac and c < 1, then x∗2 in the working solution becomes boundedat σ2 = c < 1, which is in σ2 support. Because c >

√2Ac, then for every

σ2 ∈ [√

2Ac, c], x∗2(σ2) is an unbounded working solution.

(c) This means that the student’s optimal decisions are characterized by

either a∗2 = 1 ∧ x∗2 = 1or a∗2 = 1 ∧ x∗2 = σ2

c

or a∗2 = 0 ∧ x∗2 = 0

In the situation we have described, the separation between working and noworking solutions occurs when x∗2 shifts from 0 to σ2

c, and so the threshold is

obtained from:A = (σ2)2

2cEind2 [σ] =

√2Ac

Proof. Lemma 1

1. The value function V2(σ2) = max{A, σ2 − c2}

(a) Is continuous:Call f(σ2) = A and g(σ2) = σ2 − c

2 . Notice both f(σ2) and g(σ2) are lines,which are continuous functions. If they intersect, this intersection occurs atonly one point:

A = σ2 −c

2σ2 = A+ c

2

And for possible parametrizations, 0 < A+ c2 ≤ 1. Therefore, the intersection

can occur within the domain of σ2.We have established the threshold between the work and no work solution is

44

Page 49: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

σ2 = A+ c2 , which is equivalent to saying that

When σ2 ∈ [0, A+ c

2], arg maxσ2

V2(σ2) = f(σ2)

When σ2 ∈ [A+ c

2 , 1], arg maxσ2

V2(σ2) = g(σ2)

So, proving continuity:

• limσ2→ (A+ c2 )− V2(σ2) = limσ2→ (A+ c

2 )− f(σ2) = A

• limσ2→ (A+ c2 )+ V2(σ2) = limσ2→ (A+ c

2 )+ g(σ2) = A+ c2 −

c2 = A

• Because σ2 = A + c2 is the point of indifference, then V2(A + c

2) can beevaluated on either function, and always yields A.

Therefore, V2(σ2) is continuous.

(b) Is convex:Notice both f(σ2) and g(σ2) are lines, which are convex functions. Therefore,the epigraph of each of these functions is a convex set. We know that theintersection of two convex sets is also a convex set (MGW, definition M.G.1).Notice the function whose epigraph is the intersection between the epigraphsof f(σ2) and g(σ2) is V2(σ2), because

When σ2 ∈ [0, A+ c

2], arg maxσ2

V2(σ2) = f(σ2) =⇒ epiV2(σ2) = epi f(σ2)

When σ2 ∈ [A+ c

2 , 1], arg maxσ2

V2(σ2) = g(σ2) =⇒ epiV2(σ2) = epi g(σ2)

Therefore, V2(σ2) is convex.

2. For value function V2(σ2) = max{A, σ22

2c }

(a) Is continuous:Call f(σ2) = A and g(σ2) = (σ2)2

2c . Notice f(σ2) is a line, which is a continuousfunction, and g(σ2) is a quadratic function, which is also continuous. Weestablished in the proof of proposition (2) that these functions cross at onepoint in the domain:

A = (σ2)2

2cσ2 =

√2Ac

Because A, c > 0, the intersection can occur within the domain of σ2.We have established this is the threshold between the work and no work solu-

45

Page 50: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

tion is, which is equivalent to saying that

When σ2 ∈ [0,√

2Ac], arg maxσ2

V2(σ2) = f(σ2)

When σ2 ∈ [√

2Ac, 1], arg maxσ2

V2(σ2) = g(σ2)

So, proving continuity:

• limσ2→ (√

2Ac)− V2(σ2) = limσ2→ (√

2Ac)− f(σ2) = A

• limσ2→ (√

2Ac)+ V2(σ2) = limσ2→ (√

2Ac)+ g(σ2) = (√

2Ac)2

2c = A

• Because σ2 =√

2Ac is the point of indifference, then V2(√

2Ac) can beevaluated on either function, and always yields A.

Therefore, V2(σ2) is continuous.

(b) Is convex:Notice that g(σ2) = (σ2)2

2c is an upward opening parabola with vertex at σ2 = 0,and so g(σ2) is an increasing and convex function in the domain of σ2. f(σ2) =A is a constant function, and is therefore convex. Therefore, the epigraph ofeach of these functions is a convex set. We know that the intersection of twoconvex sets is also a convex set (MGW, definition M.G.1). Notice the functionwhose epigraph is the intersection between the epigraphs of f(σ2) and g(σ2) isV2(σ2), because

When σ2 ∈ [0,√

2Ac], arg maxσ2

V2(σ2) = f(σ2) =⇒ epiV2(σ2) = epi f(σ2)

When σ2 ∈ [√

2Ac, 1], arg maxσ2

V2(σ2) = g(σ2) =⇒ epiV2(σ2) = epi g(σ2)

Therefore, V2(σ2) is convex.

3. For value function

V2(σ2) =

A ifσ2 <

√2Ac

(σ2)2

2c ifσ2 ∈ [√

2Ac, c]

σ2 − c2 ifσ2 > c

(a) Is continuous:Call f(σ2) = A, g(σ2) = (σ2)2

2c and h(σ2) = σ2− c2 . We proved in proposition (2)

that the correspondence described above is the value function (i.e, that it is amaximum). We have already proven that the composition max

{f(σ2), g(σ2)

}is continuous. Consider g(σ2) and h(σ2). We established in the proof of propo-

46

Page 51: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

sition (2) that these functions cross at only one point in the domain:

(σ2)2

2c = σ2 −c

2σ2 = c

when c ∈ [0, 1). Because c > 0, the intersection can occur within the domainof σ2.We have established this is the value of σ2 from which the working solutionbecomes bounded, which is equivalent to saying

When σ2 ∈ [√

2Ac, c], arg maxσ2

V2(σ2) = g(σ2)

When σ2 ∈ (c, 1], arg maxσ2

V2(σ2) = h(σ2)

So, proving continuity:

• limσ2→ (c)− V2(σ2) = limσ2→ (c)− g(σ2) = c2

• limσ2→ (c)+ V2(σ2) = limσ2→ (c)+ h(σ2) = c2

• g(c) = c2

Therefore, V2(σ2) is continuous.

(b) Is convex:We now have to prove that V2(σ2) = max

{f(σ2, V2,work(σ2))

}, where V2,work(σ2)

is

V2,work(σ2) =

g(σ2) ifσ2 ∈ [

√2Ac, c]

h(σ2) ifσ2 > c

is convex. Considering g(σ2) and h(σ2) are both individually convex, overallconvexity will happen whenever dg(σ2)

dσ2 σ2→c+ ≤ dh(σ2)dσ2 σ2→c− :

• dg(σ2)dσ2 σ2→c+ = σ2

c σ2→c+ = 1• dh(σ2)

dσ2 σ2→c+ = 1σ2→c+ = 1

We have established max{f(σ2), g(σ2)} is a convex function. Therefore, V2(σ2)is convex.

Proof. Lemma 2 Suppose two probability distributions F and G have the same mean.Then, G is riskier than F if every risk lover prefers G over F . That is, if for every nondecreasing and convex utility function:

E[u(G)] ≥ E[u(F )]

47

Page 52: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

But, if F and G have the same mean and G is riskier than F , then F second orderstochastically dominates G (definition 6.D.2, MWG (1995)). If F second order stochas-tically dominates G and they have the same mean, then G is a mean preserving spreadof F .Now, consider that G is the distribution for talent priors characterized by σ2,H and σ2,L,F is the distribution such that σ2 = σ1 and V2 is a convex and non decreasing functionwith arguments F and G.

i Strictly better for the student:If G locates a mass beyond

√2Ac, then at least one posterior in t = 2 is located on

the convex and non decreasing portion of the value function V2, and so E[u(G)] >E[u(F )].

ii Weakly better for studentWhen G does not have a mass above

√2Ac, then both distributions are located on

the constant part of the value function and so E[u(G)] = E[u(F )] which respectsE[u(G)] ≥ E[u(F )].

Therefore, students have a taste for mean preserving spreads.

Proof. Proposition 3First, problem (9) is concave for x1. We take FOC = 0 and consider σ1 = Q:

x∗1 = Qa1 − a1V2(Q) + a1(Qp+ (1−Q)(1− p))V2(σ2,H) + a1((1−Q)p+Q(1− p))V2(σ2,L)c

Reorganizing:

x∗1 = a1 (Q− V2(Q) + (Qp+ (1−Q)(1− p))V2(σ2,H) + ((1−Q)p+Q(1− p))V2(σ2,L))c

(29)

Now, (9) is linear in a1. This means that the maximum will either be at one of the cornersa1 = 1 or a1 = 0, or that the function has the same value for any a1 ∈ [0, 1]. Using theenvelope theorem, we can evaluate (9) on both extreme values and we get the followingsolution for the problem in t = 1:

a∗1 = 0↔ x∗1 = 0 (30)

a∗1 = 1↔ x∗1 = (Q− V2(Q) + (Qp+ (1−Q)(1− p))V2(σ2,H) + ((1−Q)p+Q(1− p))V2(σ2,L))c

(31)

48

Page 53: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

Proof. Lemma 3For any parametrization, and therefore for any structure of V2(σ2), (9) yields:

maxa1

maxx1

σ1a1x1 + (1− a1)A− cx21

2 + (1− a1x1)A+

a1x1 ((Qp+ (1−Q)(1− p))A+ ((1−Q)p+Q(1− p))A) (32)

Which can be rewritten as:

maxa1

maxx1

σ1a1x1 + (1− a1)A− cx21

2 + A

(33)

In this equation, the dynamic portion of the maximization problem in t = 1 has beenreduced completely, because there is complete certainty that the value of next period isA.Replacing the solutions in proposition 3 with the V2 values shown in (32), we get thefollowing possible solutions in t = 1:

either a∗1 = 0 ∧ x∗1 = 0

or a∗1 = 1 ∧ x∗1 = min{Q

c, 1}

• If x∗1 = Qc, then the threshold between the work and no work solutions in t = 2

is necessarily Eind2 =√

2Ac. This means that the case we are analyzing is suchthat Q <

√2Ac. If we plug-in the solutions a1 = 1, x1 = Q

cinto (32), we get the

following value function:V1 = max

{2A, Q

2

2c + A}

Where the terms in brackets stand for the value of not working and working att = 1 respectively.But, in this case it is true that Q <

√2Ac. Therefore, the value of not working is

always greater than the value of working.

• If x∗1 = 1, then there are two possible thresholds for the problem in t = 2.

– If the threshold is Eind2 = A + c2 , then the case is such that Q < A + c

2 . Ifwe plug-in the solutions a1 = 1, x1 = 1 into (32), we get the following valuefunction:

V1 = max{

2A,Q− c

2 + A}

49

Page 54: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

But, in this case it is true that Q < A+ c2 . Therefore, the value of not working

is always greater than the value of working.

– If the threshold is Eind2 =√

2Ac, then the case we are analyzing is such thatQ <

√2Ac. If we plug-in the solutions a1 = 1, x1 = 1 into (32), we get the

following value function:

V1 = max{

2A,Q− c

2 + A}

But, in this case it is true that Q <√

2Ac. In the proof for proposition (2)we demonstrate that ∀σ2 ∈ [0, 1], σ2

22c ≥ σ2 − c

2 . Combining this informationwith what we proved when x∗1 = Q

c, it is clear that in this case the value of not

working is always greater that the value of working.

Proof. Lemma 4In this case, (9) yields:

maxa1

maxx1

σ1a1x1 + (1− a1)A− cx21

2 + (1− a1x1)A+

a1x1 ((Qp+ (1−Q)(1− p))V2(σ2,H) + ((1−Q)p+Q(1− p))A) (34)

Replacing the solutions in proposition 3 with this case’s V2 values, we get the solutionsfor a1, x1 that are displayed in the lemma.To compute the V1 value functions, we have to plug-in the solutions we found in themaximization problem into (34). Inputting the unbounded working solution and the nowork solution yields:

V1 = max{

2A, ((Q− V2(Q) +GV2(σ2,H) + JV2(σ2,L)))2 12c + A

}And comparing these arguments leads to condition (10a).Inputting the bounded working solution and the no work solution yields:

V1 = max{

2A, (Q− c

2 +GV2(σ2H) + JA)}

And comparing these arguments leads to condition (10b).

Proof. Lemma 6We want to prove that p exists whenever Q is sufficiently high.

50

Page 55: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

p is the minimum p such that conditions (10) are satisfied. Conditions (10) come fromthe comparison of the arguments of the following equation:

V1 = max{

2A,[

maxx1

σ1x1 − cx2

12 + (1− x1)A+ x1 (G(p)V2(σ2,H) + J(p)A)

]∣∣∣∣∣a1=1

}

Call the second argument of the V1 maximization L(p). For p to exist, it is sufficient that:

i.∂L(p)∂p

> 0

ii. L(1

2)< 2A

iii. L(1) > 2A

i Function L(p) is the maximization of the student’s problem under case 1.2. As weshowed in the proof for lemma 2, students have a taste for mean preserving spreadsof σ1. As we show in the proof of proposition 5, this means that students have ataste for higher p. Therefore, the maximization of the student’s problem under case1.2 is increasing in p.

ii Consider that the case we analyse is such that σ2,H > Eind2 and Q, σ2,L < Eind2 .When p = 1

2 , then posterior beliefs do not expand from the prior. This means thatσ2,H = Q and so, we are in case of type 1.1.Therefore, x∗1|a1=1 = min{Q

c, 1} and as we proved for lemma (3), means that L

(12

)<

2A for every possible V2(σ2).

iii Because L(p) is in terms of V2(σ2) and we are considering maximum precision, wemust separate this analysis among the different cases of V2(σ2H).

(a) V2(σ2) = max{A,

σ22

2c

}When p = 1, then unbounded x∗1|a1=1 yields:

x(p = 1)∗1 =(Q− A+ Q2

2cQ + (1−Q)A)

1c

And σ2,H = 1. Then, L(p) yields:

L(p = 1)∗|a1=1 = 12c

(Q− A+ Q

2c + (1−Q)A)2

+ A

Which is in terms of Q, c, A, all student-specific parameters. Consider that:

∂L(1)∗∂Q

=(Q

c

)(1 + 1

2c − A)2

51

Page 56: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

Which is an always positive derivative. Additionally, consider L(p,Q), andnotice that

∂L(p,Q)∂Q

= x∗1 + x∗1(2p− 1)(V2(σ2H)− V2(σ2L)) > 0 ∀p, x∗1

L(p,Q) is an increasing function in Q, independently of V2(σ2) and boundedor unbounded x∗1. Therefore, there are two different cases under a specificparametrization:

i. L(1)∗ < 2A: but this only happens for very low values of Q. That is,Q < Q0

ii. L(1)∗ > 2A: as a consequence of (a), this is true for Q > Q0 such thatQ <

√2Ac

(b) V2(σ2) = max{A, σ2 − c

2

}When p = 1, then unbounded x∗1|a1=1 yields:

x(p = 1)∗1 =(

2Q− Qc

2 −QA) 1c

And σ2,H = 1. Then, L(p) yields:

L(p = 1)∗|a1=1 = 12c

(2Q− Qc

2 −QA)2

+ A

Which is in terms of Q, c, A, all student-specific parameters. Consider that:

∂L(1)∗∂Q

=(Q

c

)(2− c

2 − A)2

Which is an always positive derivative. Just like in the previous numeral,consider L(p,Q), and notice that

∂L(p,Q)∂Q

= x∗1 + x∗1(2p− 1)(V2(σ2H)− V2(σ2L)) > 0 ∀p, x∗1

L(p,Q) is an increasing function in Q, independently of V2(σ2) and boundedor unbounded x∗1. Therefore, there are two different cases under a specificparametrization:

i. L(1)∗ < 2A: but this only happens for very low values of Q. That is,Q < Q0

ii. L(1)∗ > 2A: as a consequence of (a), this is true for Q > Q0 such thatQ < A+ c

2

52

Page 57: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

(c) V2(σ2) is the correspondence.Consider L(p,Q). To prove L(1) > 2A, we can prove that:

i. L(1, 0) < 2AThe maximization problem yields:

L(1, 0) = arg maxx1

−cx21

2 + (1− x1)A+ x1A a1=1

Which is a concave problem. The CPO of this problem is x∗1 = 0, and soL(1, 0) = A < 2A

ii. L(1,√

2Ac) > 2AFirst, consider that L(1

2 ,√

2Ac) = 2A. This is straightforward from thecondition of indifference we have established for this particular case ofV2(σ2). We proved that L(p) is an increasing function in p, and thereforeL(1,√

2Ac) > L(12 ,√

2Ac) = 2A

iii. L(1, Q) increasing in Q.The maximization problem for L(1, Q) yields:

L(1, Q) = arg maxx1

Qx1−cx2

12 +(1−x1)A+x1(Q(1− c2)+(1−Q)A) a1=1

Using the envelope theorem, we get:

∂L(1, Q)∂Q

= x1(1 + 1−

(c

2 + A) )

Consider that in this case, A+ c2 < 1 because by the shape of V2(σ2), for

the highest possible belief in t = 1, σ2 = 1, it must be true that σ2− c2 > A.

iv. L(p,Q) is increasing in Q.Notice that

∂L(p,Q)∂Q

= x∗1 + x∗1(2p− 1)(V2(σ2H)− V2(σ2L)) > 0 ∀p, x∗1

L(p,Q) is an increasing function inQ, independently of V2(σ2) and boundedor unbounded x∗1.

Therefore, there are two different cases under a specific parametrization:

i. L(1)∗ < 2A: but this only happens for very low values of Q. That is,Q < Q0

ii. L(1)∗ > 2A: as a consequence of (a), this is true for Q > Q0 such thatQ <

√2Ac

53

Page 58: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

Proof. Lemma 7Consider the function L(p,Q) described in the proof for lemma (6). L(p,Q) is the functionthat describes the maximization problem of a student in t = 1, provided that a1 = 1. Inany existing optimal decision, it is true that

∂L(p(Q), Q)∂Q

= 0

Solving the derivative with the chain rule, we get

∂L(p(Q), Q)∂p

∂p(Q)∂Q

+∂L(p(Q), Q)

∂Q= 0

We showed in the proof of lemma (6) that ∂L(p(Q),Q)∂p

> 0 and ∂L(p(Q),Q)∂Q

> 0. Therefore,

because optimal decisions exist, it is necessary that ∂p(Q)∂Q

< 0.

Proof. Lemma 8

i Whenever p > 12 , posterior beliefs are a mean preserving spread of prior Q.

ii Given that a signal was delivered, negative signals occur with probability J(p,Q).

iii Confident students always choose a1 = 1, and so the probability of getting a signalis x∗1.

iv Whenever σ2L < Eind2 and the student gets a negative signal, then by porposition(2) the optimal response is not to work in t = 2.

v So, for every confident student, there is a certain p > 12 from which σ2L < Eind2 and

so with probability a∗1x∗1J(p,Q), a2 = 0.

Proof. Proposition 4Define the following joint probability function: f(H,G), f(H,B), f(L,G), f(L,B) whereH stands for high talent (σ = 1), L for low talent (σ = 0), G for good grade and B

for bad grade. Universities designing posterior beliefs through grades have to define theprobability function f . Consider that any student’s prior belief is she has high talentwith probability Q and low talent with probability 1−Q. Finally, suppose good gradesare destined to induce high attention at = 1, and bad grades induce no attention at = 0.

54

Page 59: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

• If the state is H, then as Kamenica and Gentzkow (2011) show, the optimal gradingstrategy is:

f(H,G) = Q therefore P (G|H) = 1

f(H,B) = 0 therefore P (B|H) = 0

• If the state is L, then the university want to maximize the chances of a1 = 1. Weestablished the student pays attention if: P (H|grade) >

√2Ac. With Bayesian

actualization, this means that the student works if:

f(H,G)f(H,G) + f(L,G) ≥

√2Ac

From which we get:(1−

√2Ac)f(H,G)√

2Ac≥ f(L,G)

But we have established that f(H,G) = Q. Now, the maximization problem forthe university is:

maxf(L,G) Q+ f(L,G)s.t. (1−

√2Ac)Q√2Ac ≥ f(L,G)

Naturally, it is convenient for the university to leave no slack in the restriction, tomake f(L,G) as high as it is possible. Therefore:

f(L,G) = (1−√

2Ac)Q√2Ac

And finally:

f(L,G) = (1−√

2Ac)Q√2Ac

therefore P (G|L) =Q(1−

√2Ac√

2Ac )1−Q

f(L,B) = 1− (1−√

2Ac)Q√2Ac

therefore P (B|L) = 1−Q(1−

√2Ac√

2Ac )1−Q

Proof. Proposition 5

i Solving (Qp+ (1−Q)(1− p)) yields:

2Qp+ 1− p−Q

55

Page 60: TESIS DE GRADO MAG´ISTER EN ECONOM´IA

If we take the derivative for p, we get:

∂(2Qp+ 1− p−Q)∂p

= 2Q− 1 (35)

which is smaller than 0 when Q < 12

ii Consider two probability distribution functions F and G, where G is riskier than Fand both G and F are mean preserving spreads of H. Then for every non decreasingconvex utility function u,

E[u(G)] ≥ E[u(F )] ≥ E[u(H)]

But in our model, G riskier than F ⇐⇒ pG ≥ pF . Therefore, the student prefersthe distribution with higher precision.Given that a1x1 determines the probability of getting F or G, the student will prefera1,Gx1,G ≥ a1,Fx1,F . Now, we have established that a1 = 1. Therefore, x1,G ≥ x1,F

and so ∂x∗1(p)∂p

> 0.

Proof. Lemma 9

i This comes from the numerical solutions that are explained in the previous sectionand the fact that we are observing anti Kamenica & Gentzkow cases.

ii By lemma (8) for any p > 12 there is a probability x∗1J(p,Q) such that a2 = 0

because σ2L <√

2Ac. When p = 12 , then Q = σ2L = σ2H >

√2Ac. Therefore, it is

always preferable to choose p = 12

56