29
STUDYING CHURNING FACTORS IN INDIAN TELECOMMUNICATION SECTOR USING SOCIAL MEDIA ANALYTICS Nitish Varshey , S. K. Gupta Indian Institute of Technology Delhi, India

DaWaK24aug

Embed Size (px)

DESCRIPTION

Studying Churning Factors in Indian Telecommunication Sector using Social Media Analytics

Citation preview

Mining intrinsic knowledge in telecommunication sector using social media analytics

Studying Churning Factors in Indian Telecommunication Sector using Social Media Analytics

Nitish Varshey , S. K. GuptaIndian Institute of Technology Delhi, IndiaOutlineTelecommunication Sector of India

Goal

Related Work

Methodology

Results and Analysis

Conclusion and Future WorkTelecommunication Sector of IndiaGrown MarketSecond largest in subscriber base [1]899.86 million subscribers [2]13 mobile carriers : most of them having pan India presence [1]

Depicting situation : Telecom Service Providers are stable and market saturates

Main Business Drivers [Hung [3] ]Retention of customer subscribers baseIncrease in average revenue per userTRAI received 109 million MNP requests within a span of 2 years [2].Currently India'sIndian telecommunication network is the second largest in the world in terms of total number of users (both xed and mobile phone)On 30th September, 2013, country's telecom subscriber's base was as huge as 899.86 millionAbove situation depicts a condition where markets have matured to a point where they have saturated and telecom service providers are stable.MNP stands for Mobile Number Portability3GoalTo identify factors pertaining to churn which may help decision makers improve operations in terms of their marketing strategySetting a relation between churning in telecommunication sector of India and sentiments present in social media feeds

Field : Telecommunication Sector Churn is particularly high in Telecom [ Accenture 2011 Global Consumer Survey [4] ]Data availability [ TRAI Telecom Subscription Data Monthly Report ]

Accenture 2011 Global Consumer Survey occurred on 10+ service based industries in 20+ countries.4Predict customers getting ready to switch, understand why and connect with them to provide offers to mitigate change [5,6,7,8]Data Corpus : Actual customer transactions and billing dataChurn Pertaining Attributes Identified : geographic location, account length, call length etcWork Done :Patterns followed by churning customers:If I am calling to customer care more than Y times then I will churnIf I am calling more than X minutes and if someone else provides me a better plan I will churnIf my in net call duration is low I will churnPotential value of a customer

Related Work : Data Mining TechniquesData Corpus is basically inhouse data having details of all and messages a customer have done. Fields available for analysis are : voice plan, call lengths and usage pattern. On identification of potential churners, service based offers are proposed to such customers with the assumption such offers prompts user not to churn.Implements techniques like support vector machines (SVM), Regresson and Decision Trees.5Based on consumer survey data [9,10,11,12]

Avoids use of proprietary customer data

Major Goal :Finding customer loyalty and intention to recommend a service provider to other customersSocio-economic factors can also be examined

Related Work : Survey Based TechniquesSocio-economic factors related can also be mined using such approach.However, The survey data may not fully represent the customers actual future continued patronage decision. The analysis done using such a methodology is mostly uneconomical and time taking.6 Methodology : Social Media AnalysisPeople share opinion on different aspects of life everyday, including telecommunication services they are using

27% customers complains on social media ( InformationWeeks survey of 392 business tech pros at companies using one or more internal social networking systems, October 2011 )

Opinions available on social media can be a valuable online source for mining reasons for customers dissatisfaction, which most likely lead to customers churnI/PData Source :Twitter MethodologyTOPSY Otter API used for fetching tweets. 8I/PSelectionCorpus Creation : Telecom Specific Tweets : Queried for service provider within time rangeData collected over a span of 9 months (1 Aug, 2012 to 31 Apr, 2013)MethodologyQueried for 3 service provider : BSNL, Airtel, Tata Indicom and Docomo9Data CleansingNon Relevant Tweets RemovalSpell CorrectorGoogles spell corrector Net lingos lv, gr8EmoticonsI/PSelection

MethodologyNon relevant tweets eg : BSNL works in broadband area. All such tweets need to be filtered out. 10LanguageDictionaryIs Emoticon?NetlingoDictionaryWordNorvigs Toy Spell CorrectorYYYNNNRemoval of any character that repeats more than twice ProcessedWordSpell Correctore.g. coooool11Data CleansingI/PSelectionPreprocessingStemming of each token

MethodologyStemming is process of converting word to its stem form12Data CleansingI/PSelectionPreprocessingTransformationRelational DB tuples86K tuples retrieved for 3 telecom playersManually Annotated TextMethodology13Data CleansingI/PSelectionPreprocessingTransformationClassificationEg. Tweet : A B C D EMethodologyyou may have context-dependent bigrams whose opinion may differ from the modifying opinion word. e.g. is "dry diaper" (+ve) vs "dry textbook" (-ve). Existence of a word does not mean positive / negative : depands on subjectivity. Thts why we are doing aspect based text classification.14N-gram Based Text Categorization [5,6]Feature-based sentiment analysis [8]Feature-based sentiment analysis:Aspects act as object featuresObject Features : Price, Service, Satisfaction, MiscellaneousFeature Indicators :Price : Tariff / costing / rateService : network, internet, recharge, interrupt, messaging, billingSatisfaction : (change|switch)*(to|from)*service_providerMiscellaneous : waive off, bailout, discount, implement

Eg. Tweet : A B C D EMethodologyN-gram Based Text Categorization cant classify text as positive / negative solely, classification depands on subjectivity. Dry diaper vs dry textbook15Data CleansingI/PSelectionPreprocessingTransformationClassificationData Source :Twitter Telecom Specific Tweets : BSNL, Aircel, TataData collected over a span of 9 months (1 Aug, 2012 to 31 Apr, 2013)Non Relevant Tweets RemovalSpell CorrectorStemming of each tokenStop Words Removal Relational DatabaseAround 86K tweets retrievedEg. Tweet : A B C D EARM for most pertaining churning factorMethodology16Association Rule MiningFinds correlations among different attributes (features) in a datasetCorrelation between aspects and sentimentsStrongness of correlation among different features is measured in terms of interesting of ruleSupport-Confidence used for generating interesting rulesConstraint Based Association Rules are minedX ->Y , X : Sentiment, Y : AspectEg. Service -> NegativeMost frequent rules are mined :Top-K association rules are mined, with K=3

Results and AnalysisStrongly positive / negative term list provided by Bing Liu [13] is domain independentNeeds to be modified for Telecommunication SectorDomain Dependent strongly positive / negative wordhighspeed, lightning, slashesWithdrawn, flop, shittyDomain independent words may not be strongly positive / negative for telecommunication sectorpretty, right, toughcheap, interruptions, problem

Multiple service provider issue (inclusion of multiple service providers in a tweet presents semantic issues) (79.5%)Examples :planning to move from BSNL to aircelidea and bsnl both are shit i am pretty sure about this need to shift to virginbsnl services are much better than idea dont get an ideaResults and AnalysisNeg neg is generally positive, however in tweets they represent negative sentiment (Our experiment show 86%)Examples:bsnl internet is dead today no internet aircel i hate u no wifi since morning bsnl suck no line never work uncouth staff@bsnlportal you r not responding to ur mistake by deducting my talktime 4 bsnl-tune how long itll take from 27th dec my bsnl data card 9408914869 is not working however the balance is 3g 8gb and main balance is 400/- is gong to expire after 03/01/2013Results and AnalysisExperiment 1 for Aircel Service ProviderPerformance MeasureSupport - ConfidenceFrom reports published by TRAIHow to readRule, Support, ConfidenceTotal CustomersChange in no. of CustomersAug, 2012('miscellaneous',) ==> ('positive',) , 0.311 , 0.91565952244793717('service',) ==> ('positive',) , 0.211 , 0.633Sep, 2012('miscellaneous',) ==> ('positive',) , 0.284 , 0.91466607361655117('miscellaneous', 'service') ==> ('positive',) , 0.100 , 0.856('service',) ==> ('positive',) , 0.207 , 0.635Oct, 2012('miscellaneous',) ==> ('positive',) , 0.261 , 0.917 66786295178934('service',) ==> ('positive',) , 0.206 , 0.625 ('service',) ==> ('negative',) , 0.124 , 0.375Nov, 2012('miscellaneous',) ==> ('positive',) , 0.185 , 0.92165323317-1462978('service',) ==> ('negative',) , 0.129 , 0.445Dec, 2012('miscellaneous',) ==> ('positive',) , 0.302 , 0.94263347284-1976033('service',) ==> ('positive',) , 0.249 , 0.609Jan, 2013('miscellaneous',) ==> ('positive',) , 0.380 , 0.94861571291-1775993('service',) ==> ('positive',) , 0.306 , 0.713('negative',) ==> ('service',) , 0.123 , 0.476Feb, 2013('miscellaneous',) ==> ('positive',) , 0.352 , 0.93160872785-698506('negative',) ==> ('service',) , 0.210 , 0.580('satisfaction',) ==> ('negative',) , 0.155 , 0.740Mar, 2013('miscellaneous',) ==> ('positive',) , 0.267 , 0.94960071967-800818('negative',) ==> ('service',) , 0.109 , 0.362('satisfaction',) ==> ('positive',) , 0.104 , 0.730Apr, 2013('miscellaneous',) ==> ('positive',) , 0.448 , 0.961600802168249('positive',) ==> ('service',) , 0.225 , 0.34221Experiment 1 for BSNL service providerPerformance MeasureSupport- ConfidenceFrom reports published by TRAIHow to readRule, Support, ConfidenceTotal CustomersChange in no. of CustomersAug,12('miscellaneous', 'service') ==> ('positive',) , 0.109 , 0.83099240191491362('miscellaneous',) ==> ('positive',) , 0.240 , 0.827('service',) ==> ('positive',) , 0.259 , 0.582Sep,12('miscellaneous', 'service') ==> ('positive',) , 0.114 , 0.84699633207393016('miscellaneous',) ==> ('positive',) , 0.210 , 0.842('service',) ==> ('positive',) , 0.247 , 0.595Oct,12('miscellaneous',) ==> ('positive',) , 0.227 , 0.804 99990355357148('service',) ==> ('negative',) , 0.188 , 0.502('service',) ==> ('positive',) , 0.187 , 0.498Nov,12('miscellaneous',) ==> ('positive',) , 0.166 , 0.73399909334-81021('satisfaction',) ==> ('negative',) , 0.160 , 0.568('service',) ==> ('positive',) , 0.232 , 0.555Dec,12('miscellaneous',) ==> ('negative',) , 0.306 , 0.7639992234713013('service',) ==> ('positive',) , 0.207 , 0.589('service',) ==> ('negative',) , 0.145 , 0.411Jan,13('miscellaneous',) ==> ('positive',) , 0.141 , 0.698100240893318546('satisfaction',) ==> ('positive',) , 0.125 , 0.591('service',) ==> ('positive',) , 0.247 , 0.541Feb,13('miscellaneous',) ==> ('positive',) , 0.111 , 0.694100670567429674('service',) ==> ('negative',) , 0.327 , 0.627('satisfaction',) ==> ('negative',) , 0.103 , 0.610Mar,13('miscellaneous',) ==> ('positive',) , 0.109 , 0.623101206625536058('satisfaction',) ==> ('positive',) , 0.105 , 0.618April,13('miscellaneous', 'service') ==> ('positive',) , 0.180 , 0.92898970584-2236041('miscellaneous',) ==> ('positive',) , 0.235 , 0.904Whenever negative sentiment is present in top rules, it is highly probable that total customers for the service provider would decrease. Service provider needs to work upon the aspects indicated in negative sentiment containing rule so that customer churn can be controlledConclusionFuture WorkTweaking parameters used for Association Rule MiningDisregard categorically circle specific issues as feeds geographic location not availableBased on GPS location : tagging location using smart phonesInferring relation between tweets : People who are living in Britain are more likely to post about Royal Wedding in BritainDirect Inference : Moving to SingaporeWork is based on assumption All tweets are genuine : Identification and Removal of all such feedsReferencesIndia needs umbrella body on telecom standards. Article published in Economic times on 16 August 2012.TRAI, telecom subscription data monthly report taken on 30th september, press Release No. 78/2013. Shin-Yuan Hung , David C. Yen , Hsiu-Yu Wang, Applying data mining to telecom churn management, Expert Systems with Applications 31 (2006) 515524.From CRM to Social, Accenture 2011 Global Consumer Survey, Feb 2012, information Week.ReferencesMozer, M. C., Wolniewicz, R., Grimes, D. B., Johnson, E., & Kaushansky, H. (2000). Predicting subscriber dissatisfaction and improving retention in the wireless telecommunications industry. IEEE Transactions on Neural Networks, 11(3), 690696.NG, K., & Liu, H. (2000). Customer retention via data mining. Artificial Intelligence Review,14(6), 569590.Wei, C. P., & Chiu, I. T. (2002). Turning telecommunications call details to churn prediction: A data mining approach. Expert Systems with Applications , 23 (2), 103112.Drew, J. H., Mani, D. R., Betz, A. L., & Datta, P. (2001). Targeting customers with statistical and data-mining techniques. Journal of Service Research , 3 (3), 205219.ReferencesWeerahandi, S., & Moitra, S. (1995). Using survey data to predict adoption and switching for services. Journal of Marketing Research, 32 (1), 8596.Bolton, R. N., Kannan, P. K., & Bramlett, M. D. (2000). Implications of loyalty program membership and service experiences for customer retention and value. Journal of the Academy of Marketing Science , 28 (1), 95108.Gerpott, T., Rams, W., & Schindler, A. (2001). Customer retention, loyalty and satisfaction in the German mobile cellular telecommunications market. Telecommunications Policy , 25 (4), 249269.Kim, H. S., & Yoon, C. H. (2004). Determinants of subscriber churn and customer loyalty in the Korean mobile telephony market. Telecommunications Policy , 28 (9/10), 751765.ReferencesLiu, Bing. "Sentiment analysis and subjectivity." Handbook of natural language processing 2 (2010): 627-666.Awadallah, Rawia, Maya Ramanath, and Gerhard Weikum. "Language-model-based pro/con classification of political text." Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval. ACM, 2010.Cavnar, William B., and John M. Trenkle. "N-gram-based text categorization." Ann Arbor MI 48113.2 (1994): 161-175.

Questions?