14
This article was downloaded by: [128.255.255.102] On: 31 January 2018, At: 07:42 Publisher: Institute for Operations Research and the Management Sciences (INFORMS) INFORMS is located in Maryland, USA Decision Analysis Publication details, including instructions for authors and subscription information: http://pubsonline.informs.org Online and Off the Field: Predicting School Choice in College Football Recruiting from Social Media Data Kristina Gavin Bigsby, Jeffrey W. Ohlmann, Kang Zhao To cite this article: Kristina Gavin Bigsby, Jeffrey W. Ohlmann, Kang Zhao (2017) Online and Off the Field: Predicting School Choice in College Football Recruiting from Social Media Data. Decision Analysis 14(4):261-273. https://doi.org/10.1287/deca.2017.0353 Full terms and conditions of use: http://pubsonline.informs.org/page/terms-and-conditions This article may be used only for the purposes of research, teaching, and/or private study. Commercial use or systematic downloading (by robots or other automatic processes) is prohibited without explicit Publisher approval, unless otherwise noted. For more information, contact [email protected]. The Publisher does not warrant or guarantee the article’s accuracy, completeness, merchantability, fitness for a particular purpose, or non-infringement. Descriptions of, or references to, products or publications, or inclusion of an advertisement in this article, neither constitutes nor implies a guarantee, endorsement, or support of claims made of that product, publication, or service. Copyright © 2017, INFORMS Please scroll down for article—it is on subsequent pages INFORMS is the largest professional society in the world for professionals in the fields of operations research, management science, and analytics. For more information on INFORMS, its publications, membership, or meetings visit http://www.informs.org

Online and Off the Field: Predicting School Choice in College ...pdfs.semanticscholar.org/f993/805535f831044e9f33bdfd1384...College Football Recruiting from Social Media Data Kristina

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Online and Off the Field: Predicting School Choice in College ...pdfs.semanticscholar.org/f993/805535f831044e9f33bdfd1384...College Football Recruiting from Social Media Data Kristina

This article was downloaded by: [128.255.255.102] On: 31 January 2018, At: 07:42Publisher: Institute for Operations Research and the Management Sciences (INFORMS)INFORMS is located in Maryland, USA

Decision Analysis

Publication details, including instructions for authors and subscription information:http://pubsonline.informs.org

Online and Off the Field: Predicting School Choice inCollege Football Recruiting from Social Media DataKristina Gavin Bigsby, Jeffrey W. Ohlmann, Kang Zhao

To cite this article:Kristina Gavin Bigsby, Jeffrey W. Ohlmann, Kang Zhao (2017) Online and Off the Field: Predicting School Choice in CollegeFootball Recruiting from Social Media Data. Decision Analysis 14(4):261-273. https://doi.org/10.1287/deca.2017.0353

Full terms and conditions of use: http://pubsonline.informs.org/page/terms-and-conditions

This article may be used only for the purposes of research, teaching, and/or private study. Commercial useor systematic downloading (by robots or other automatic processes) is prohibited without explicit Publisherapproval, unless otherwise noted. For more information, contact [email protected].

The Publisher does not warrant or guarantee the article’s accuracy, completeness, merchantability, fitnessfor a particular purpose, or non-infringement. Descriptions of, or references to, products or publications, orinclusion of an advertisement in this article, neither constitutes nor implies a guarantee, endorsement, orsupport of claims made of that product, publication, or service.

Copyright © 2017, INFORMS

Please scroll down for article—it is on subsequent pages

INFORMS is the largest professional society in the world for professionals in the fields of operations research, managementscience, and analytics.For more information on INFORMS, its publications, membership, or meetings visit http://www.informs.org

Page 2: Online and Off the Field: Predicting School Choice in College ...pdfs.semanticscholar.org/f993/805535f831044e9f33bdfd1384...College Football Recruiting from Social Media Data Kristina

DECISION ANALYSISVol. 14, No. 4, December 2017, pp. 261–273

http://pubsonline.informs.org/journal/deca/ ISSN 1545-8490 (print), ISSN 1545-8504 (online)

Online and Off the Field: Predicting School Choice in CollegeFootball Recruiting from Social Media DataKristina Gavin Bigsby,a Jeffrey W. Ohlmann,b Kang Zhaob

a Interdisciplinary Graduate Program in Informatics, University of Iowa, Iowa City, Iowa 52242; bDepartment of Management Sciences,University of Iowa, Iowa City, Iowa 52242Contact: [email protected], http://orcid.org/0000-0003-2967-1337 (KGB); [email protected],

http://orcid.org/0000-0002-1737-4717 (JWO); [email protected], http://orcid.org/0000-0002-8321-2804 (KZ)

Received: January 12, 2017Revised: April 29, 2017Accepted: May 14, 2017Published Online in Articles in Advance:October 31, 2017

https://doi.org/10.1287/deca.2017.0353

Copyright: © 2017 INFORMS

Abstract. This study explores predictors of school choice decisions in American collegefootball recruitment. We combine data about individual athletes’ recruiting activities withsocial media data to predict which school the athlete will choose among those that haveoffered him a scholarship.While previous works have approached school choice as a ratio-nal decision process, our results indicate that a bounded rationality model incorporatingsocial factors and heuristics may be more appropriate. We explore how the actions takenby athletes during recruitment can be interpreted as early signals of athletes’ preferencesand find that models incorporating social media features consistently outperform thebaseline model with only off-line recruiting features. In addition to better understandingthe school choice decision, this work can help coaches to effectively allocate recruitingresources and inform social media strategies during recruitment.

Keywords: social media • school choice • recruiting • bounded rationality • social networks

1. IntroductionSocial media provides detailed data about individu-als’ behaviors, preferences, and online social networks,presenting new opportunities to study decision mak-ing. In this study, we leverage the social media data ofAmerican college football recruits to analyze and pre-dict their school choices. Athletic recruitment presentsan interesting context for study; college football recruit-ing captures a high level of public interest, relevantdata are available, and it is a high-stakes activity. Ath-letic departments of universities in the SoutheasternConference spent an average of 27 million dollars eachon football during the 2012–2013 season (Smith 2013),and the recruiting budgets of top programs exceed1 million dollars (Sherman 2012).This study has two main objectives: (1) to better

understand the school choice process of college foot-ball recruits and (2) to build predictive models thatcan assist coaches in identifying athletes who are mostlikely to commit to their school, which can informtheir recruiting strategies. We pay particular attentionto how athletes’ connections and behaviors on socialmedia can provide timely information about their col-lege preferences.

Previous work predicting school choice primarilyrelied on rational decision-making models, assumingthat an athlete will select a school that maximizes theexpected utility of attendance (Dumond et al. 2008).However, we argue that recruiting decisions occurunder significant constraints in terms of time, infor-mation, and cognitive resources, and that a boundedrationality model incorporating social network infor-mation and heuristics may be more appropriate.

The decisions of individuals are often related to theirsocial networks, whether the decision is related to busi-ness (Trusov et al. 2009), health (Zhao et al. 2016),or politics (Bond et al. 2012). Specific to high schoolathletes’ school choice decisions, both anecdotal andempirical evidence supports the critical role of parents(Croft 2008), high school coaches (Prunty 2014), andother players (Myerberg 2015). Social media offers anunprecedented opportunity to gather information onthe social networks of individuals; for this study, wecollect public data from athletes’ Twitter profiles. Toaccount for a range of rational, social, and heuristic fac-tors influencing school choice, we combine data aboutthe athlete’s choice set (the schools that have offered

261

Dow

nloa

ded

from

info

rms.

org

by [

128.

255.

255.

102]

on

31 J

anua

ry 2

018,

at 0

7:42

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Page 3: Online and Off the Field: Predicting School Choice in College ...pdfs.semanticscholar.org/f993/805535f831044e9f33bdfd1384...College Football Recruiting from Social Media Data Kristina

Bigsby, Ohlmann, and Zhao: Predicting School Choice in College Football Recruiting262 Decision Analysis, 2017, vol. 14, no. 4, pp. 261–273, ©2017 INFORMS

him a scholarship), the athlete’s recruiting activities,and the athlete’s social media data.Our study is the first work on school choice to incor-

porate social media data and represents a novel addi-tion to the athletic recruiting literature. Outside of thesports world, we believe that our findings may gen-eralize to other recruiting domains, such as humanresources (HR), military, or academic settings. Thisstudy makes a unique contribution to the decisionanalysis literature by considering both the factors influ-encing athletes’ school choices and how the social net-works and online behaviors of athletes can be lever-aged to improve predictions.

We outline background information on college foot-ball recruiting and related research in Section 2. Wedescribe the data and features selected for this studyin Sections 3 and 4, respectively. Section 5 containsresults, including fitted model coefficients and evalua-tions of predictive performance. Finally, Section 6 con-tains discussion of how our predictive model can beutilized to inform coach decisions during recruitment,potential limitations of our work, conclusions, and rec-ommendations for future research.

2. Background and Related ResearchIn general, college football recruiting is a two-stage,sequential decision-makingprocess. First, schools iden-tify and evaluate recruits and decidewhether to extenda scholarship offer. Second, athletes select a school fromamong their scholarship offers andannounce a commit-ment. We focus on this latter stage and seek to analyzeand predict school choice decisions.

School choice may be viewed as a multiobjectivedecision-making process. That is, decisions are likelyto be based on several potentially conflicting objectives.Indeed, surveys of college athletes identify economicbenefits(Doyle and Gaeth 1990), geographic proximity(Barden et al. 2013, Lujan 2010), probability of achiev-ing a professional career (Croft 2008), and educationalquality (Popp et al. 2011) as significant factors. Asthese fundamental objectives may be abstract or diffi-cult to quantify, athletesmay considermeans objectives(Keeney 1992). For instance, an athlete may base hiscommitment decision in part on his intention to pursuea professional football career. Because this outcome isuncertain, he may estimate the benefits of attending agiven school by looking at the team or coach’s record

of placing players in the National Football League.Different than previous work on multiobjective deci-sion analysis in personnel management (Dees et al.2013), we focus on inferring objectives and weightsvia regression models, rather than eliciting preferencesfrom decision makers.

We contend that a rational decision-making modelmay not capture the complexity of the school choicedecision. The underlying assumptions of the ratio-nal model are (1) that athletes possess sufficient timeto make rational choices, (2) that athletes possesssufficient information to make rational choices, and(3) that athletes possess the cognitive ability and desireto make rational choices. We demonstrate that theseassumptions do not hold in a real-world recruiting con-text and investigate the role of heuristics and socialfactors.

Because of time constraints surrounding commit-ment decisions, the first assumption of rational deci-sionmakingmay be violated. Schools can award amax-imum of 25 scholarships to incoming freshmen (NCAA2015), and athletesmay feel pressure to commit quicklyin order to secure financial aid. Indeed, 15% of col-lege athletes report being given less than one weekto accept a scholarship offer (Sander 2008). The highcosts of recruitment can also encourage quick commit-ments. The father of a quarterback estimated spend-ing $40,000 on travel expenses for camps and unofficialvisits during recruitment (Elliott 2015). Finally, whileathletes can announce a verbal commitment at anytime, National Signing Day acts as a de facto deadline.National Signing Day occurs on the first Wednesdayin February of the athlete’s senior year and is the firstdate that athletes can sign financial aid contracts calledLetters of Intent.

Challenging the second assumption of the rationalmodel, athletes oftenmake school choicedecisionswithlimited information. Recruits evaluate prospective col-leges by taking visits but are restricted to five officialvisits paid for by the recruiting school (NCAA 2015).Athletes can go on unlimited unofficial visits that theypay for themselves, but these may present an economichardship, making it impossible for an athlete to visitevery school that is recruiting him. Gathering infor-mation about college options via communication withcoaches is also fraught with difficulties. There are strict

Dow

nloa

ded

from

info

rms.

org

by [

128.

255.

255.

102]

on

31 J

anua

ry 2

018,

at 0

7:42

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Page 4: Online and Off the Field: Predicting School Choice in College ...pdfs.semanticscholar.org/f993/805535f831044e9f33bdfd1384...College Football Recruiting from Social Media Data Kristina

Bigsby, Ohlmann, and Zhao: Predicting School Choice in College Football RecruitingDecision Analysis, 2017, vol. 14, no. 4, pp. 261–273, ©2017 INFORMS 263

regulations on when and how coaches may commu-nicate with recruits (NCAA 2015). Additionally, sur-veys of high school athletes indicate that recruits are illinformed of college options: 65% of athletes reportingan intention to play in college had spent little or no timeresearching colleges and only 18.4% had actually spo-kenwith a college coach (Lujan 2010).The process of evaluating and selecting a college is

also complicated by the difficulty of measuring sub-jective characteristics of a school and athlete–school“fit.” Athletes can consult information about a school’sobservable traits such as academic rating, majorsoffered, and team playing record, which the job mar-ket signaling model proposed by Spence (1973) refersto as “indices.” Because indices do not capture allof the information relevant to the school choice deci-sion, Spence’s model predicts that schools will takean action, or “signal,” in order to communicate infor-mation to the athlete. We hypothesize that coacheswill engage in signaling in order to convey their levelof “interest,” which can be interpreted as the prior-ity placed on a specific athlete relative to others. Theinterest level of a school impacts the athlete’s expectedutility of attendance via intervening outcomes such asthe availability of financial aid. Schools risk turning offrecruits by signaling interest in other athletes. A toprecruit described the worst recruiting pitch he receivedfrom a college team as “when it told him it offered threeother QBs on the same day” (Davenport 2015, para. 3).We expect that athletes will consider signals of interestfrom the recruiting schools communicated via socialmedia when making commitment decisions.

Applying rational decision-makingmodels to schoolchoice also assumes that athletes possess the cognitiveability and desire to make rational choices. Researchon age-specific differences in psychology and decisionmaking suggests that adolescents are likely to deviatefrom rational processes, instead relying on emotionaland social factors (Steinberg and Cauffman 1996). TheHR literature has paid a significant amount of atten-tion to social networks in job seeking and job choice(Granovetter 1973, Chapman et al. 2005), and surveysof college athletes support the importance of socialnetworks in the school choice process (Croft 2008).However, only one previous predictive work has con-sidered athletes’ social networks. Mirabile and Witte(2015) identify family connections between athletes

and schools, finding that having a family member whoplayed or coached at a school increases the likelihoodof commitment between 96% and 253%. Although wealso expect that athletes with social ties to a givenschool will be more likely to select that school, thisstudy examines social networks beyond family ties,tracking connections between athletes and coaches,current college players, and other recruits. It is also thefirst to include social media data in a predictive modelof school choices.

Given these constraints, we also consider the role ofheuristics, or mental shortcuts utilized when makingjudgments and decisions under conditions of boundedrationality. We are informed by the work of Hogarthand Karelaia (2006) on heuristics in predicting decisionmaking. While the authors apply the elimination-by-aspects and take-the-bestheuristics,weexplorehowtheavailability heuristic might impact school choice. Theavailability heuristic holds that decision makers willselect the most memorable option (Tversky and Kahne-mann 1974). Availability may be influenced by severalfactors, including sequence, frequency, and vividness.We focus on the relationship between school choice andsequence, tracking the first and last recruiting events ofeach type (e.g., offers, visits). Extant research on schoolchoice and job choice has primarily used rational mod-els and overlooked heuristics (Highhouse et al. 2014),making our work a unique addition to this domain.

3. DataWe scraped data on 2,644 high school football athletesin the 2016 recruiting class from the recruiting databaseof 247Sports.com.1 For each individual athlete, we col-lected timelines of recruiting events, such as scholar-ship offers, visits, commitments, and decommitments.We also obtained basic information about the recruit-ing schools, including location, academic ranking, andfootball team ranking.

Many 247Sports profiles contained embedded Twit-ter timelines; 1,629 Twitter IDs for recruits, 466 IDs forDivision I coaches, and 2,225 IDs for current collegeathletes were retrieved from the site. We conducted amanual search of the remaining recruits in the classof 2016, locating 700 additional Twitter IDs. In full,2,329 recruits in the data set (88%) possess public Twit-ter accounts. Social media data for these individualswere collected using the Twitter REST API (Twitter2015). Profile information, friend and follower lists,

Dow

nloa

ded

from

info

rms.

org

by [

128.

255.

255.

102]

on

31 J

anua

ry 2

018,

at 0

7:42

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Page 5: Online and Off the Field: Predicting School Choice in College ...pdfs.semanticscholar.org/f993/805535f831044e9f33bdfd1384...College Football Recruiting from Social Media Data Kristina

Bigsby, Ohlmann, and Zhao: Predicting School Choice in College Football Recruiting264 Decision Analysis, 2017, vol. 14, no. 4, pp. 261–273, ©2017 INFORMS

and tweets were gathered monthly between Septem-ber 2015 and March 2016, the most active period ofrecruitment for the class of 2016. Social media datawere collected over time in order to observe changes inthe online social network before commitment.We eliminate athletes without a commitment from

our data. For the purposes of this study we considerboth verbal commitments and Letter of Intent sign-ings. Although an athlete can verbally commit at anytime during recruitment, we consider only commit-ments occurring between October 1, 2015, and Febru-ary 29, 2016, approximately 43% of commitments forthe class of 2016. This time range is selected so that atleast one month of retrospective social media data areavailable for each player. There were 25 commitmentsafterMarch 1, 2016 (1% of all commitments), that we donot include in this study. Late commitments are fairlyuncommon and are most likely to occur in instanceswhere academic eligibility or oversigning (when a teamsigns more than 25 Letters of Intent and have to revokescholarship offers) are an issue.

Finally, we only consider athletes with two or morescholarship offers; 93% of athletes who received onlyone offer committed to that school. These steps yield adata set with 573 athletes who selected a school fromamong 8 scholarship offers on average. For each athletein the data, we create an observation for every offeringschool, resulting in 4,408 athlete–school pairs. In eachathlete–school pair, we measure features relative to the“prediction school” and model the likelihood of theathletes selecting that school.

4. Feature EngineeringThe performance of a predictive model depends on thefeatures it considers. To determine the value added byconsidering athletes’ social media data, we compare abaseline group of features constructed from recruitingdata with four groups of social media features corre-sponding to different aspects of athletes’ social mediaprofiles: in-links, out-links, interactions with others(mentions, replies, retweets, quotes), and tweet content.

4.1. Off-Line FeaturesWe construct a set of “off-line” features from the247Sports recruiting data and school data. We drawupon the work of Dumond et al. (2008), who identifyeconomic capital, athletic capital, and human capital

objectives in school choice decisions. We expect thatfeatures that increase the benefits associated withattendance at a given school will also increase likeli-hood of commitment. We expand on previous researchby considering comparisons to alternative options inthe athlete’s choice set. For example, an athlete’s like-lihood of selecting the prediction school may be influ-enced by that school’s geographic proximity as wellas the number of other schools recruiting him thatare closer. We include data about recruiting activitiesthat demonstrate affinity between the college and ath-lete, including offers and visits. In light of constraintson time, information, and cognitive resources faced byathletes, we also construct features related to the avail-ability heuristic.

Table 1 lists the 26 off-line features that we consider.We use time-consistent data, meaning that we excludeevents that occurred after the commitment decision.For example, to predict which school an athlete willcommit to in January, we count only official visits thatoccurred before January 1. We assume that the monthof commitment is known, as our focus is predictingwhere an athlete will commit given what has beenobserved, rather than when.

4.2. Followers of AthletesSocial networks often influence individual decisionmaking, and the next group of features focuses on in-links from other Twitter users (i.e., “followers”). Weinterpret following as a signal of interest from theschool to the athlete and expect that the likelihood ofcommitment to a given school will increase as the num-ber of followers from that school increases. For an ath-lete, we determine the number of new followers in themonth before commitment by comparing the set of anathlete’s followers at the beginning of the commitmentmonth to the set of an athlete’s followers at the begin-ning of the previous month. For example, if predictingwhere an athlete will commit in January, the sets of fol-lowers from January 1 and December 1 will be com-pared to determine the number of new followers in theprior month. We also track the type (recruits, currentcollege athletes, and coaches) and school affiliation offollowers. The “In-links” category in Table 2 lists thesix resulting features based upon athletes’ social mediafollowers.

Dow

nloa

ded

from

info

rms.

org

by [

128.

255.

255.

102]

on

31 J

anua

ry 2

018,

at 0

7:42

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Page 6: Online and Off the Field: Predicting School Choice in College ...pdfs.semanticscholar.org/f993/805535f831044e9f33bdfd1384...College Football Recruiting from Social Media Data Kristina

Bigsby, Ohlmann, and Zhao: Predicting School Choice in College Football RecruitingDecision Analysis, 2017, vol. 14, no. 4, pp. 261–273, ©2017 INFORMS 265

Table 1. Off-Line Features

Category Feature Description

Objectives prediction_distance Numeric; distance between recruit hometown and school being predictedprediction_inState Binary; recruit hometown in same state as prediction schoolprediction_usNews Binary; prediction school is included in 2015 U.S. News and World Report academic

rankings (U.S. News and World Report 2014)prediction_AP Binary; prediction school ranked in top 25 of AP poll (SB Nation College News 2014)prediction_division Categorical; athletic division of prediction schoolprediction_conference Categorical; athletic conference of prediction schoolprediction_sanction Binary; football program under active NCAA sanction or probation (NCAA 2016)prediction_coachChange Binary; head coach change at prediction school during 2015–2016 season

Comparisons other_offer Numeric; number of offers from other schoolsother_closer Numeric; number of other schools closer to recruit hometownother_higherAP Numeric; number of other schools with higher AP rankingother_higherUsNews Numeric; number of other schools with higher 2015 U.S. News and World Report

academic ranking (U.S. News and World Report 2014)Affinity prediction_unofficial Binary; unofficial visit to prediction school

prediction_coachVisit Binary; coach visit from prediction schoolprediction_official Binary; official visit to prediction schoolother_unofficial Numeric; number of unofficial visits to other schoolsother_coachVisit Numeric; number of coach visits from other schoolsother_official Numeric; number of official visits to other schools

Availability prediction_firstOffer Binary; first offer from prediction schoolprediction_lastOffer Binary; last offer from prediction schoolprediction_firstUnofficial Binary; first unofficial visit to prediction schoolprediction_lastUnofficial Binary; last unofficial visit to prediction schoolprediction_firstCoach Binary; first coach visit from prediction schoolprediction_lastCoach Binary; last coach visit from prediction schoolprediction_firstOfficial Binary; first official visit to prediction schoolprediction_lastOfficial Binary; last official visit to prediction school

Table 2. Social Media Features

Category Feature Description

In-links coach_followers_prediction Numeric; increase in coaches from prediction school following user2016_followers_prediction Numeric; increase in 2016 recruits committed to prediction school following usercurrent_followers_prediction Numeric; increase in current athletes at prediction school following usercoach_followers_other Numeric; increase in coaches from other schools following user2016_followers_other Numeric; increase in 2016 recruits committed to other schools following usercurrent_followers_other Numeric; increase in current athletes at other schools following user

Out-links coach_friends_prediction Numeric; increase in coaches from prediction school followed by user2016_friends_prediction Numeric; increase in 2016 recruits committed to prediction school followed by usercurrent_friends_prediction Numeric; increase in current athletes at prediction school followed by usercoach_friends_other Numeric; increase in coaches from other schools followed by user2016_friends_other Numeric; increase in 2016 recruits committed to other schools followed by usercurrent_friends_other Numeric; increase in current athletes at other schools followed by user

Social mediainteractions

interactions_prediction Binary; athlete has posted a retweet, reply, quote, or mention of users associated withprediction school in the previous month

interactions_other Binary; athlete has posted a retweet, reply, quote, or mention of users associated withother schools in the previous month

Content hashtags_prediction Binary; athlete has posted a hashtag associated with prediction school in the previousmonth

hashtags_other Binary; athlete has posted a hashtag associated with other schools in the previous month

Dow

nloa

ded

from

info

rms.

org

by [

128.

255.

255.

102]

on

31 J

anua

ry 2

018,

at 0

7:42

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Page 7: Online and Off the Field: Predicting School Choice in College ...pdfs.semanticscholar.org/f993/805535f831044e9f33bdfd1384...College Football Recruiting from Social Media Data Kristina

Bigsby, Ohlmann, and Zhao: Predicting School Choice in College Football Recruiting266 Decision Analysis, 2017, vol. 14, no. 4, pp. 261–273, ©2017 INFORMS

4.3. Users Followed by AthletesWe also treat athletes’ online connections as early indi-cations of their school preferences. This second setof features tracks users that an athlete follows onsocial media, i.e., an athlete’s “friends” or out-links.We expect that athletes intending to commit to a cer-tain school will add friends from that school. The the-ory of realignment, which states that overlap in mem-bers’ respective social networks will increase with theintensity of a dyadic relationship (Jowett and Timson-Katchis 2005), supports our hypothesis. Furthermore,62% of Division I athletes report building friend-ships with their future teammates during recruitment(Sander 2008). Similar to the previous set of features,we track the number and affiliation of new friendsin the month before commitment. The “Out-links”category in Table 2 contains the six features related toTwitter friends.

4.4. Social Media InteractionsThe next group of features examines the actions of ath-letes on social media. Twitter allows users to interact inseveral ways: replying to posts, copying posts, forward-ing posts, and mentioning other users. We hypothesizethat interacting with recruits, coaches, and current ath-letes may reflect that an athlete’s school preferenceswill be related to his likelihood of commitment. AsNCAA policies prohibited college coaches and athleticdepartment staff from mentioning, quoting, retweet-ing, or replying to high school athletes during theperiod of data collection (Elliott and Kirshner 2016),we only track the social media interactions initiated byathletes. We account for social media interactions withtwo binary measures; one feature indicates whetherthe athlete has interacted with the prediction schoolin the previous month and the other feature indicateswhether the athlete has interacted with other schoolsin the previousmonth. The “SocialMedia Interactions”category in Table 2 lists these two interaction features.

4.5. Tweet ContentSocial media also offers a rich source of text datafrom users’ posts. We analyze the hashtags posted byathletes in the month before commitment and expectthat posting hashtags relevant to the prediction schoolwill be associated with increased likelihood of com-mitment. As free text data, the topic of a hashtag isnot always evident. We use a two-step information

retrieval process to determine the likely topic of eachhashtag:

(1) For each of the 682 schools in our data we gen-erate a set of positive query terms, P. These terms aresubstrings based on the school name, team name, nick-name, abbreviation, coach name, and/or location ofeach school. For instance, the query terms for the Uni-versity of Utah are P � {“utah,” “utes,” “utenati”}. Wetreat each athlete’s hashtags as a set of documents, D.We query on D using the Boolean OR operator withelements of P and include a hashtag in the subset S1 ifit contains at least one positive term.

(2) We also construct a list of negative terms N foreach school, or substrings that should be disallowedin relevant hashtags. For the University of Utah, N �

{“utahst”}, thereby excluding references to Utah StateUniversity. We query S1 using the NOT operator withelements of N so that the resulting subset S2 containsonly hashtags that contain positive terms and no neg-ative terms. Based on S2, we then create two featurestracking whether the athlete has posted hashtags rele-vant to the prediction school or other schools that haveoffered scholarships. Table 2 lists these two binary fea-tures related to hashtag content.

5. Models and EvaluationWe use logistic regression for this study because ofits interpretability and performance with nonnormallydistributed response variables. We divide the data setinto 3,072 training observations (409 commitments)and a hold-out set of 1,336 test observations (179 com-mitments). As our data contain multiple observationscorresponding to each individual athlete (each obser-vation corresponds to an athlete–school pair), we keepsuch observations together to avoid training and test-ing on the same athlete.

To evaluate the contributions of each group of fea-tures, we implement six models. Model 0 (the baselinemodel) uses off-line recruiting features only. Becausethe features we derive may be highly correlated to eachother, we perform feature selection using a least abso-lute shrinkage and selection operator (Lasso) regres-sion with L1 penalty (C � 0.1) (Tibshirani 1996). Weremove predictors whose weight reduces to zero aswell asmanually eliminating nonsignificant predictors.For consistent comparison to the other models, we refit

Dow

nloa

ded

from

info

rms.

org

by [

128.

255.

255.

102]

on

31 J

anua

ry 2

018,

at 0

7:42

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Page 8: Online and Off the Field: Predicting School Choice in College ...pdfs.semanticscholar.org/f993/805535f831044e9f33bdfd1384...College Football Recruiting from Social Media Data Kristina

Bigsby, Ohlmann, and Zhao: Predicting School Choice in College Football RecruitingDecision Analysis, 2017, vol. 14, no. 4, pp. 261–273, ©2017 INFORMS 267

the baseline using logistic regression without regular-ized maximum likelihood or penalty. Model 1 addsto the baseline the features related to an athlete’s in-links, i.e., the Twitter users that followed the athlete inthe month before commitment. Model 2 focuses on the“friends” in an athlete’s online social network, addingfeatures measuring the number and affiliation of out-links to the baseline. Model 3 combines the featurestracking social media interactions with the baselinemodel, and Model 4 adds the features derived fromhashtag content to the baseline. Model 5 incorporatesall features from Models 0–4. The issue of collinearityarises again when combining all social media features,and we apply Lasso regression to construct Model 5.We then refit Model 5 using logistic regression withoutregularized maximum likelihood or penalty.We evaluate the predictive performance of each

model on the hold-out set. We consult standard clas-sifier metrics: precision (ratio of true positives topredicted positive observations), recall (ratio of truepositives to actual positive observations), balanced Fscore (harmonic mean of precision and recall), andarea under the receiver operating characteristic curve(AUC), which measures the probability of ranking arandomly chosen positive observation higher than arandomly chosen negative observation (Manning et al.2009). Because only 17% of observations in our datacorrespond to commitments, we do not use overallaccuracy (proportion of correctly predicted observa-tions), which is not robust to class imbalance.Logistic regression yields a predicted probability of

commitment for each athlete–school pair, classifyingeach observation for which the probability is greaterthan 50% as a commitment. Based on this classificationscheme, our models may predict more or less than onecommitment per athlete. To account for this, we alsoassess the school choice prediction as a ranking. Weorder each athlete’s college options according to thepredicted probability and use normalized discountedcumulative gain (NDCG), which measures the qual-ity of a ranking based on relevance and position inthe results list (Manning et al. 2009). Simply, the pre-dictive model receives the highest rating (1.0) if thecommitment school is ranked first, and a discountedrating if the commitment school is ranked lower. Col-lege football recruiting occurs in a competitive context,and producing a ranking of athletes’ school choices

by predicted probability of commitment may be moreuseful than a binary classification. Coaches may usethis information on the position of their school relativeto others to make decisions about recruiting resourceallocation and potentially to take actions to improvetheir chances with a given athlete. Because the num-ber of predictions for each athlete varies based on thenumber of scholarship offers received, but all athletesin the data have at least two schools in their choice set,we calculate NDCG using the top two predictions perathlete. We refer to this measure as NDCG@2.

5.1. Factors Related to School ChoiceWefirst build explanatorymodels using logistic regres-sion to revealwhich off-line factors are related to schoolchoice. Logistic regression is commonly used in situa-tions where the response variable Y is binary. In ourcase, the outcome we wish to predict is whether anathlete will commit to a specific school, and our mod-els calculate the predicted probability that Y � 1 foreach athlete–school pair. Model coefficients in logisticregression measure the rate of change in the log odds.Thus, by applying the exponential function to the coef-ficients, we can estimate the relationship between eachvariable and the odds of commitment. While statisticalsignificance may vary across the six models discussedin the previous section, we see that the qualitative effectof each variable remains relatively the same (Table 3).

The features in the baseline model (Model 0) relateto cost/benefit factors influencing school choice, com-parisons to alternatives, athlete–school affinity demon-strated by recruiting activities, and decision-makingheuristics. Applying Lasso regression reduces the sizeof the baseline model from 26 to 11 features. Per theModel 0 column in Table 3, we find that, if a schoolis in the athlete’s home state, the odds that the ath-lete commits to that school are approximately 111%higher (e0.7478 � 2.11). Attendance at an in-state schoolis linked not only to decreased travel costs but alsoto an increased sense of satisfaction and fit (Bardenet al. 2013). Considering alternative options in the ath-lete’s choice set, the odds of commitment to the predic-tion school decrease 11% for each offer from anotherschool (e−0.1173 � 0.89), 6% for each offering schoolthat is geographically closer (e−0.0607 � 0.94), and 28%for each school that has a higher academic rank-ing (e−0.3229 � 0.72). We also find that off-line recruit-ing activities are strong predictors of school choice.

Dow

nloa

ded

from

info

rms.

org

by [

128.

255.

255.

102]

on

31 J

anua

ry 2

018,

at 0

7:42

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Page 9: Online and Off the Field: Predicting School Choice in College ...pdfs.semanticscholar.org/f993/805535f831044e9f33bdfd1384...College Football Recruiting from Social Media Data Kristina

Bigsby, Ohlmann, and Zhao: Predicting School Choice in College Football Recruiting268 Decision Analysis, 2017, vol. 14, no. 4, pp. 261–273, ©2017 INFORMS

Table 3. Logistic Regression Results

Model 0 Model 1 Model 2 Model 3 Model 4 Model 5

constant −0.9904∗∗∗ −1.0086∗∗∗ −1.0408∗∗∗ −0.8405∗∗∗ −0.9475∗∗∗ 0.9665∗∗∗

prediction_inState 0.7478∗∗∗ 0.7680∗∗∗ 0.7758∗∗∗ 0.7126∗∗∗ 0.7053∗∗∗ 0.7466∗∗∗

other_closer −0.0607∗ −0.0455• −0.0443 −0.0534∗ −0.0408 −0.0254other_higherUsNews −0.3229∗ −0.2805• −0.2362 −0.3316∗ −0.2432 −0.1471other_offer −0.1173∗∗∗ −0.1248∗∗∗ −0.1257∗∗∗ −0.1190∗∗∗ −0.1100∗∗∗ −0.1212∗∗∗

prediction_official 2.6427∗∗∗ 2.3059∗∗∗ 2.3423∗∗∗ 2.5211∗∗∗ 2.2794∗∗∗ 2.0916∗∗∗

other_unofficial −0.0407• −0.0447• −0.0442• 0.0400• −0.0326 −0.0405other_official −0.3198∗∗∗ −0.3386∗∗∗ −0.3566∗∗∗ −0.3470∗∗∗ −0.3251∗∗∗ −0.3257∗∗∗

prediction_firstOffer 0.6084∗∗∗ 0.5631∗∗∗ 0.5607∗∗∗ 0.6250∗∗∗ 0.5975∗∗∗ 0.5578∗∗

prediction_lastOffer −0.7238∗∗∗ −0.7458∗∗∗ −0.7455∗∗∗ −0.7337∗∗∗ −0.8779∗∗∗ −0.8868∗∗∗

prediction_lastCoach 0.4997• 0.4602 0.3409 0.4305 0.3664 0.2985prediction_lastOfficial 1.3497∗∗∗ 1.4265∗∗∗ 1.4181∗∗∗ 1.3205∗∗∗ 1.5529∗∗∗ 1.5433∗∗

coach_followers_prediction 0.3365•

2016_followers_prediction 0.4088∗∗∗

current_followers_prediction 0.3782• 0.2411coach_followers_other −0.0509 −0.03552016_followers_other −0.0131current_followers_other −0.2210∗ −0.1872•

coach_friends_prediction 0.3868∗∗∗ 0.3258•

2016_friends_prediction 0.4855• 0.3987∗∗∗

current_friends_prediction 0.1682coach_friends_other −0.00702016_friends_other −0.0270∗ −0.0193current_friends_other −0.0652interaction_prediction 0.6156∗∗∗

interaction_other −0.2173hashtag_prediction 1.3986∗∗∗ 1.2342∗∗∗

hashtag_other 1.0366∗∗∗ −1.0106∗∗∗

•p < 0.1; ∗ p < 0.05; ∗∗ p < 0.01; ∗∗∗ p < 0.001.

Completing an official visit to the prediction schoolincreases the odds of commitment by over 1,300%(e2.6427 � 14.05). The odds of selecting the predictionschool decrease 27% and 4% for each official (e−0.3198 �

0.73) and unofficial (e−0.0407 � 0.96) visit to anotherschool, respectively.TheModel 0 results also support the use of the avail-

ability heuristic in athletes’ commitment decision mak-ing. The odds of an athlete committing to a school are286% higher when the prediction school is their mostrecent official visit (e1.3497 � 3.86), and odds of com-mitment increase by 65% when the last coach to visitthe athlete is from the prediction school (e0.4997 � 1.65).While our models cannot prove causality, one possibleexplanation for these effects is that recency impacts anathlete’s evaluation of schools. Alternatively, it is possi-ble that the athlete’s last official visit or last coach’s visitto the athlete coincides with the school that the athleteprefers and thus he ends his recruitment activities andcommits. The odds of an athlete committing to the pre-diction school are 84% higher when it is the first to offer

him a scholarship (e0.6084 � 1.84) and 52% lower if itis the last (e−0.7238 � 0.48). Anecdotal evidence suggeststhat athletes may attribute more emotional weight tothe first offer, consistent with a vividness effect.

Model 1 adds features corresponding to the sig-nals of interest communicated by colleges to recruits,specifically the number of Twitter users associatedwith the prediction school and other schools follow-ing the athlete in the month before commitment. Ashypothesized, an increase in social media followersis associated with greater likelihood of commitment.An athlete’s odds of choosing the prediction schoolincrease by 40%, 51%, and 46%, respectively, for eachcoach (e0.3365 � 1.40), recruit (e0.4088 � 1.51), and currentathlete (e0.3782 � 1.46) from the prediction school follow-ing the athlete in the prior month. We also find thateach current college athlete attending another schoolfollowing the recruit in the previous month decreasesthe odds of selecting the prediction school by 20%(e−0.2210 � 0.80).

Dow

nloa

ded

from

info

rms.

org

by [

128.

255.

255.

102]

on

31 J

anua

ry 2

018,

at 0

7:42

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Page 10: Online and Off the Field: Predicting School Choice in College ...pdfs.semanticscholar.org/f993/805535f831044e9f33bdfd1384...College Football Recruiting from Social Media Data Kristina

Bigsby, Ohlmann, and Zhao: Predicting School Choice in College Football RecruitingDecision Analysis, 2017, vol. 14, no. 4, pp. 261–273, ©2017 INFORMS 269

InModel 2,we consider howanathlete’s online socialnetwork out-links can provide insight into his collegepreferences. We find that following accounts associ-ated with the prediction school in the previous monthincreases the odds of commitment by 47% for eachcoach (e0.3868 � 1.47) and 62% for each committed recruit(e0.4855 � 1.62). Conversely, an athlete’s odds of commit-ting to the prediction school decrease by 3% for eachnew friend committed to another school (e−0.0270 � 0.97).In Model 3, we use Twitter “interactions” (mentions,

replies, quotes, and retweets posted by the recruit) aspredictors of school choice. Interacting with the pre-diction school is associated with an 85% increase in theodds of selecting that school (e0.6156 � 1.85). This resultsuggests that athletes may be more likely to investeffort into building relationships with their preferredschools.Model 4 investigates the use of text data in pre-

dicting school choice, specifically the hashtags usedby an athlete in the month before commitment. Mak-ing a reference to the prediction school via hashtag isassociated with a 305% increase in the odds of com-mitment (e1.3986 � 4.05), and making a reference to acompeting school is associated with a 65% decrease inthe odds of commitment (e−1.0366 � 0.35). These find-ings indicate that the content posted by athletes onsocial media may be interpreted as communicatingtheir school preferences.Model 5 adds all of the social media features tested

in Models 1–4 to the off-line features of the base-line model. We apply Lasso regression again to cor-rect for potential collinearity, resulting in a final modelwith 19 features. The odds of commitment decreaseby 17% for each player from another school who isfollowing the athlete (e−0.1872 � 0.83). While not causal,these results suggest that recruiting schools’ onlinebehaviors may be correlated to athletes’ commitmentdecisions. We see that athletes’ out-links are strongpredictors. For each coach and recruit from the pre-diction school that the athlete follows, the odds ofcommitment increase by 39% (e0.3258 � 1.39) and 49%(e0.3987 � 1.49), respectively. Hashtag content can also beinterpreted as communicating athletes’ school prefer-ences. Posting a hashtag associated with the predictionschool increases likelihood of commitment by 244%(e1.2342 � 3.44), and posting a hashtag associated with

another offering school decreases likelihood of com-mitment by 64% (e−1.0106 � 2.75). Overall, these findingsdemonstrate the utility of considering actions and con-nections in both online and off-line environmentswhenpredicting school choice decisions.

5.2. Predictive PerformanceTo evaluate the contribution of social media data toschool choice predictions, we apply the models con-structed with training data to hold-out test data. Wecompare the performance of the baseline model con-taining only off-line recruiting data (Model 0) with thefive models incorporating social media features.

Evaluating each model based upon standard metricsfor classifier performance (AUC, precision, recall, andF score), we see that incorporating social media fea-tures consistently adds value over the baseline model.As Figure 1 shows, Models 1–5 achieve 0%–8% im-provement in AUC, 0%–3% improvement in precision,0%–30% improvement in recall, and 0%–19% improve-ment in F score over the baseline model. These met-rics suggest that the features related to network cen-trality (in-links and out-links) and content show thelargest gains in performance over the baseline. Fea-tures tracking social media mentions, replies, retweets,and quotes display the smallest gains over the base-line. Furthermore, Model 5, which we construct fromall possible factors, eliminates both of the social mediainteraction features via Lasso regression; this aggre-gate model outperforms all other models tested andachieves an AUC of 0.720.

We obtain similar results when evaluating the schoolchoice predictions as a ranking problem. Figure 2displays the NDCG@2 scores for each model, aver-aged over all players in the hold-out set. As withthe previous test, models incorporating informationfrom the athlete’s social media consistently outper-form the baseline, with Models 1–5 showing gainsof 0%–13% over the baseline NDCG score. Model 5achieves the highest average NDCG score (0.755), indi-cating that combining features related the estimatedcost/benefit of attendance, comparisons to alterna-tive options, athlete–school affinity, decision-makingheuristics, and early indications of athletes’ prefer-ences on social media can yield accurate school choicepredictions.

Dow

nloa

ded

from

info

rms.

org

by [

128.

255.

255.

102]

on

31 J

anua

ry 2

018,

at 0

7:42

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Page 11: Online and Off the Field: Predicting School Choice in College ...pdfs.semanticscholar.org/f993/805535f831044e9f33bdfd1384...College Football Recruiting from Social Media Data Kristina

Bigsby, Ohlmann, and Zhao: Predicting School Choice in College Football Recruiting270 Decision Analysis, 2017, vol. 14, no. 4, pp. 261–273, ©2017 INFORMS

Figure 1. Comparing the Predictive Performance of Different Models (AUC, Precision, Recall, F Score)

0.668 0.673 0.687 0.667 0.6780.72

0.0

0.2

0.4

0.6

0.8

1.0

Model 0 Model 1 Model 2 Model 3 Model 4 Model 5

AU

C

0.663 0.657 0.6850.644 0.657 0.685

0.0

0.2

0.4

0.6

0.8

1.0

Model 0 Model 1 Model 2 Model 3 Model 4 Model 5

Pre

cisi

on

0.366 0.377 0.4040.366 0.388

0.457

0.0

0.2

0.4

0.6

0.8

1.0

Model 0 Model 1 Model 2 Model 3 Model 4 Model 5

Rec

all

0.472 0.4790.509

0.467 0.4880.561

0.0

0.2

0.4

0.6

0.8

1.0

Model 0 Model 1 Model 2 Model 3 Model 4 Model 5

F s

core

Figure 2. Comparing the Ranking Performance of DifferentModels (NDCG@2)

0.6950.727 0.722 0.709 0.727

0.755

0.0

0.2

0.4

0.6

0.8

1.0

Model 0 Model 1 Model 2 Model 3 Model 4 Model 5

ND

CG

@2

6. Discussion and Conclusions6.1. Applications to Recruiting

Resource AllocationIn addition to analyzing the decision making of collegefootball recruits, we seek to provide practical insights

for recruiters to use when crafting recruiting strate-gies. Therefore, we produce a sample report that acoaching staff might consider when making decisionson how to allocate recruiting resources. In Table 4,we list the predicted probability of 10 high schoolathletes who received offers from the University ofIowa but remained uncommitted as of January 2016.Although this example describes the late-stage recruit-ing prospects of only one team, it shows our model’spotential to inform the recruiting strategies of collegefootball programs.

Applying Model 5 (which combines both recruitingdata and features from social media), we estimate thatAlaric Jackson had a 90% chance of selecting Iowa. Hehad made an official visit and received signals of inter-est from Iowa (two coaches and five committed recruitsfollowed him on Twitter). Additionally, he showed hispreference for Iowa by following two coaches and sixcommitted recruits and by posting six Iowa hashtags.According to the estimated probability of commitment,Jackson’s second most likely choice was Eastern Michi-gan. Although located in his home state, we estimate

Dow

nloa

ded

from

info

rms.

org

by [

128.

255.

255.

102]

on

31 J

anua

ry 2

018,

at 0

7:42

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Page 12: Online and Off the Field: Predicting School Choice in College ...pdfs.semanticscholar.org/f993/805535f831044e9f33bdfd1384...College Football Recruiting from Social Media Data Kristina

Bigsby, Ohlmann, and Zhao: Predicting School Choice in College Football RecruitingDecision Analysis, 2017, vol. 14, no. 4, pp. 261–273, ©2017 INFORMS 271

Table 4. Top 10 Iowa Prospects (January 2016)

Name Star Position Hometown Iowa (%) Top prediction

Alaric Jackson 3 Offensive tackle Detroit, MI 90 IowaMatt Farniok 4 Offensive tackle Sioux Falls, SD 59 IowaObi Obialo 3 Wide receiver Coppell, TX 29 IowaTyler Johnson 3 Quarterback Minneapolis, MN 22 Minnesota (48%)Kene Nwangwu 3 Running back Frisco, TX 14 Iowa State (39%)Tyquan Statham 3 Athlete Oakwood, GA 8 Cincinnati (63%)K. J. Gray 3 Athlete Jersey City, NJ 6 Rutgers (95%)Izon Pulley 3 Defensive end Olney, MD 5 Illinois (36%)Terrance Landers, Jr. 3 Wide receiver Dayton, OH 4 Purdue (50%)Jerrion Nelson 3 Defensive end Columbia, MO 3 Syracuse (70%)

that Jackson had only a 12% chance of selecting East-ern Michigan, partially because of the fact that he hadnot made an official visit and had no Twitter connec-tions to the school. Other than Nebraska (9%) andMichigan State (6%), our model predicted that Jack-son had <5% likelihood of choosing the other schoolsthat had offered a scholarship. A recruiter, informed byJackson’s high predicted probability of selecting Iowaand the disparity between Iowa and his other options,could safely consider him a strong prospect.

According to our predictions, Iowa’s next most likelycommit wasMatt Farniok. He hadmade an official visitand was followed by two coaches and two committedrecruits from Iowa. However, he did not follow anyIowa accounts and did not post Iowa hashtags. UnlikeJackson, Farniok’s predicted probability of committingto his other options was not insignificant. For instance,our model predicts that he had a 36% chance of com-mitting to Michigan State and a 31% chance of com-mitting to Nebraska. Farniok had made official visitsand had followed coaches and committed recruits fromboth schools. Based on these predictions, a recruiterwould likely consider Jackson a better bet than Farniok.As both recruits are offensive tackles, the coaching staffmight logically decide to prioritize Jackson during thefinal weeks of recruitment.

Ourmodel can also help coaches to determine whichprospects are most likely to select a competitor. Forexample, we predict K. J. Gray to have a 6% chanceof choosing Iowa and a 95% chance of committing toRutgers. While Gray was followed by two Iowa recruitson Twitter (whom he followed back), he was also fol-lowed by three Rutgers recruits and two current ath-letes (whom he followed back). Furthermore, Rutgersis located in Gray’s home state of New Jersey. Given

these results, the Iowa football program could realisti-cally assume that expending additional time and effortrecruiting Gray would be unlikely to pay off in a com-mitment.

The personnel needs of a college football programcan change quickly in the final weeks of recruitment,especially in cases where a previously committed ath-lete decommits, compelling coaches to revisit theiroptions. Our model may prove useful for coachesattempting to identify and recruit athletes without astrong attachment to any school. While Model 5 esti-mates Kene Nwangwu’s probability of selecting Iowaat 14%, his most likely commitment school, Iowa State,is only a 39% chance. If Iowa’s football team werein need of a running back late in their recruitmentprocess, Nwangwu might present a good opportunityto sway an uncommitted athlete toward their school.

6.2. LimitationsWe note some potential limitations of this work. First,we limit our analysis to athletes with public Twit-ter profiles, and it is possible that predicting schoolchoice for athletes without a presence on social mediamay yield different results. However, we contend thatexcluding athletes without public social media profilesdoes not introduce bias because there is no evidence ofsignificant differences between athletes with and with-out social media. Chi-squared tests fail to reject the nullhypothesis of independence between possession of apublic Twitter account and star rating (p � 0.9912) aswell as the number of offers received (p � 0.3940). Sec-ond, we use only one year of recruiting data and sixmonths of social media data. Therefore, these resultsmay not be generalizable to other recruiting classes.Finally, as this is not an experimental study, we can-not infer causality from our results. Our tests indicate

Dow

nloa

ded

from

info

rms.

org

by [

128.

255.

255.

102]

on

31 J

anua

ry 2

018,

at 0

7:42

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Page 13: Online and Off the Field: Predicting School Choice in College ...pdfs.semanticscholar.org/f993/805535f831044e9f33bdfd1384...College Football Recruiting from Social Media Data Kristina

Bigsby, Ohlmann, and Zhao: Predicting School Choice in College Football Recruiting272 Decision Analysis, 2017, vol. 14, no. 4, pp. 261–273, ©2017 INFORMS

that features derived from the social media profilesof American college football recruits may be usefulfor predicting to which school an athlete will commit.However, while our models are consistent with exist-ing theories, we cannot state that social media featurescause commitments.

6.3. ConclusionThis work represents a novel contribution to the lit-erature examining the intersection of social networksand individual decision making. In this research, weanalyze the school choice decisions of college footballrecruits. In addition to identifying factors influencingcommitment decisions, we consider how the behaviorsand actions of athletes on social media can be inter-preted as early signs of school preference and informthe recruiting strategies of college coaches. This predic-tive approach presents a unique addition to the deci-sion analysis literature.While athletic commitments receive a great deal of

attention in themassmedia, this is the first study incor-porating social media data into an empirical predic-tive model. Our results demonstrate that social mediafeatures consistently add value to predictive models.We compare features related to different aspects ofathletes’ social networks and conclude that coacheslooking for early signs of athletes’ school choice prefer-ences should focus on connections and content postedon social media. While our results are not causal, ourtests also suggest that schools’ behaviors on socialmedia may be related to an athlete’s probability ofcommitment.

Over all tests performed, the combined model withboth recruiting features and social media features(Model 5) is the highest performer, with an AUC of0.720 (7% improvement over the baseline) and an Fscore of 0.561 (18% improvement). This result sug-gests that incorporating information about behaviorsin both off-line and online environments can benefitschool choice models. Among the four types of socialmedia features we examine, social media interactionscontribute the least to predictive performance.We expand on prior research predicting school

choice (Dumond et al. 2008, Mirabile and Witte 2015),but our study is the first of its kind utilizing socialmedia data. Both previous studies use predictiveaccuracy to evaluate their models, with Dumond et al.(2008) achieving 71% accuracy in predicting the school

choices of the top 100 recruits in the class of 2005and Mirabile and Witte (2015) achieving 65% accuracyover 19,815 recruits in 10 recruiting classes. Althoughwe use different metrics and cannot directly com-pare results, 84% of athletes in our test data commit-ted to the first- or second-ranked school by predictedprobability.

Furthermore, our work makes a unique contribu-tion to the recruiting literature by considering decision-making heuristics. Previous work on both job choiceand school choice has focused on applying rationaldecision models, but our results demonstrate that abounded rationality approach may be more appropri-ate. We find that features tracking the sequence ofrecruiting events produce predictions consistent withthe availability heuristic. The addition of these fea-tures based on event sequence to the baseline yields a4% increase in AUC over the same model constructedwithout sequence features. We believe that this findingmay be generalized to other recruiting contexts whereoptimal decision making is impacted by constraints ontime, information, and cognitive resources.

There are several interesting directions for futureresearch. While we focus specifically on predictingwhere an athlete will commit rather than when, fore-casting the timing of commitments can be furtherexplored. Additionally, decision processes almost cer-tainly differ from recruit to recruit. Mirabile and Witte(2015) look at the impact of athletic ability on schoolchoice, and Popp et al. (2011) compare the schoolchoices of international and domestic student-athletes.Future research could analyze differences in schoolchoice by timing (early commitments versus late com-mitments) or player position. Our study also providespreliminary evidence for the value of text data in pre-dicting school choice. Deeper analysis of social mediaposts, such as topic models and sentiment analysis,present possible extensions of this work. Overall, thisresearch represents both a promising step in analyzingand predicting school choice in college football and itsimplications on the use of social media predictors inother recruitment contexts.

Endnote1See “2016 football recruits search.” Accessed June 1, 2016, https://247sports.com/Season/2016-Football/Recruits.

Dow

nloa

ded

from

info

rms.

org

by [

128.

255.

255.

102]

on

31 J

anua

ry 2

018,

at 0

7:42

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Page 14: Online and Off the Field: Predicting School Choice in College ...pdfs.semanticscholar.org/f993/805535f831044e9f33bdfd1384...College Football Recruiting from Social Media Data Kristina

Bigsby, Ohlmann, and Zhao: Predicting School Choice in College Football RecruitingDecision Analysis, 2017, vol. 14, no. 4, pp. 261–273, ©2017 INFORMS 273

ReferencesBarden JQ, Bluhm DJ, Mitchell TR, Lee TW (2013) Hometown prox-

imity, coaching change, and the success of college basketballrecruits. J. Sport Management 27(3):230–246.

Bond RM, Fariss CJ, Jones JJ, Kramer ADI, Marlow C, Settle JE,Fowler JH (2012) A 61-million-person experiment in social influ-ence and political mobilization. Nature 489:295–298.

Chapman DS, Uggerslev KL, Carroll SA, Piasentin KA, Jones DA(2005) Applicant attraction to organizations and job choice:A meta-analytic review of the correlates of recruiting outcomes.J. Appl. Psych. 90(5):928–944.

Croft C (2008) Factors influencing Big 12 Conference college bas-ketball male student-athletes’ selection of a university. Unpub-lished doctoral dissertation, University of Texas, El Paso.

DavenportR (2015)The recruitingguy:UnderArmourAll-Americanssurvey. Arkansas Democrat-Gazette (December 29), http://www.arkansasonline.com/news/2015/dec/29/under-armour-all-american-survey/.

Dees RA, Nestler ST, Kewley R (2013) WholeSoldier performanceappraisal to support mentoring and personnel decisions. Deci-sion Anal. 10(1):82–97.

Doyle CA, Gaeth GJ (1990) Assessing the institutional choice processof student-athletes. Res. Quart. Exercise Sport 61(1):85–92.

Dumond JM, Lynch AK, Platania J (2008) An economic model of thecollege football recruiting process. J. Sport Econom. 9(1):67–87.

Elliott B (2015) 8 pieces of advice from parents of star college footballrecruits. SB Nation (August 4), http://www.sbnation.com/college-football-recruiting/2015/8/4/9060917/ncaa-football-recruit-parents.

Elliott B, Kirshner A (2016) Players like it, but the NCAA’s new socialmedia rule could be chaos for coaches. SB Nation (August 1),http://www.sbnation.com/college-football-recruiting/2016/4/14/11429908/coaches-recruits-twitter-facebook-social-media-rules.

Granovetter M (1973) The strength of weak ties. Amer. J. Sociol. 78(6):1360–1380.

Highhouse S, Dalal RS, Salas E, eds. (2014) Judgment and DecisionMaking at Work (Routledge, New York).

Hogarth RM, Karelaia N (2006) Regions of rationality: Maps forbounded agents. Decision Anal. 3(3):124–144.

Jowett S, Timson-Katchis M (2005) Social networks in sport: Parentalinfluence on the coach-athlete relationship. Sport Psych. 19:267–287.

Keeney RL (1992) Value-Focused Thinking: A Path to Creative Decision-making (Harvard University Press, Cambridge, MA).

Lujan C (2010) How well informed are high school student-athletesabout post-secondary options? Master’s thesis, University ofArizona, Tuscon.

Manning CD, Raghavan P, Schutze H (2009) An Introduction to Infor-mation Retrieval (Cambridge University Press, Cambridge, UK).

Mirabile MP, Witte MD (2015) A discrete-choice model of collegefootball recruit’s program selection decision. J. Sport Econom.18(3):211–238.

Myerberg P (2015) The best quarterback recruits become the bestrecruiters. USA Today (July 10), http://www.usatoday.com/story/sports/ncaaf/2015/07/09/college-football-recruiting-quarterbacks-dwayne-haskins-coaches/29920671/.

National Collegiate Athletic Association (NCAA) (2015) 2015–2016NCAA Division I Manual (NCAA, Indianapolis).

National Collegiate Athletic Association (NCAA) (2016) Legisla-tive services database. Accessed June 1, 2016, https://web1.ncaa.org/LSDBi/exec/miSearch.

Popp N, Pierce D, Hums MA (2011) Comparison of the collegeselection process for international and domestic student-athletesat NCAA Division I universities. Sport Management Rev. 14(2):176–187.

Prunty B (2014) Colleges often entice top prospects by recruitingtheir mentors, too. New York Times (December 28), http://www.nytimes.com/2014/12/29/sports/ncaabasketball/colleges-often-entice-top-prospects-by-recruiting-their-mentors-too.html.

Sander L (2008) For college athletes, recruiting is a fair (but flawed)game. Chron. Higher Ed. 55(17):A1.

SB Nation College News (2014) The regular season’s last footballrankings: Playoff, polls, and computers. SB Nation (Decem-ber 7), http://www.sbnation.com/college-football/2014/12/7/7347525/college-football-rankings-top-25-playoff-ap-coaches.

Sherman M (2012) Balancing the recruiting budget. ESPN (June 12),http://www.espn.com/college-sports/recruiting/football/story/_/id/8041461/the-cost-recruiting.

Smith C (2013) College football’s most valuable teams 2013: TexasLonghorns can’t be stopped. Forbes (December 18), http://www.forbes.com/sites/chrissmith/2013/12/18/college-footballs-most-valuable-teams-2013-texas-longhorns-cant-be-stopped/.

Spence AM (1973) Job market signaling. Quart. J. Econom. 87(3):355–374.

Steinberg L, Cauffman E (1996)Maturity of judgment in adolescence:Psychosocial factors in adolescent decision making. Law HumanBehav. 20(3):249–272.

Tibshirani R (1996) Regression shrinkage and selection via the Lasso.J. Royal Statist. Soc. Ser. B (Methodological) 58(1):267–288.

Trusov M, Bucklin RE, Pauwels K (2009) Effects of word-of-mouthversus traditional marketing: Findings from an Internet socialnetworking site. J. Marketing 73(5):90–102.

Tversky A, Kahnemann D (1974) Judgment under uncertainty:Heuristics and biases. Science 185(4157):1124–1131.

Twitter (2015) REST APIs. Accessed June 1, 2016, https://dev.twitter.com/rest/public.

U.S. News and World Report (2014) Best Colleges (U.S. News andWorld Report, Washington, DC).

Zhao K, Wang X, Cha S, Cohn AM, Papandonatos GD, Amato MS,Pearson JL, Graham AL (2016) A multirelational social networkanalysis of an online health community for smoking cessation.J. Med. Internet Res. 18(8):e233.

Kristina Gavin Bigsby is a Ph.D. candidate in informationscience at the University of Iowa. Her research focuses onthe intersections of social network analysis and individualdecision making, especially in organizational contexts.

JeffreyW. Ohlmann is associate professor of managementsciences and Huneke Research Fellow in the Tippie Collegeof Business at the University of Iowa. He earned his Ph.D.in industrial and operations engineering from the Universityof Michigan. Professor Ohlmann’s research interests on themodeling and solution of decision-making problems spanapplications in sports analytics, transportation and logistics,and agriculture.

Kang Zhao is assistant professor at the Tippie College ofBusiness at the University of Iowa. He obtained his Ph.D.from Penn State University. His current research focuseson data science and social computing, especially the anal-ysis, modeling, mining, and simulation of social/businessnetworks and social media. His research has been featuredin public media of more than 20 countries. He served asthe chair for the INFORMS Artificial Intelligence Section2014–2016.

Dow

nloa

ded

from

info

rms.

org

by [

128.

255.

255.

102]

on

31 J

anua

ry 2

018,

at 0

7:42

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.