20
CSE 534 Final Project Internet Outage Analysis Name: Guanyu Zhu, Wei-Ting Lin, Zhaowei Sun Professor: Phillipa Gill

CSE 534 Final Project Internet Outage Analysis Name: Guanyu Zhu, Wei-Ting Lin, Zhaowei Sun Professor: Phillipa Gill

Embed Size (px)

Citation preview

CSE 534 Final Project Internet Outage Analysis

CSE 534 Final Project Internet Outage AnalysisName: Guanyu Zhu, Wei-Ting Lin, Zhaowei SunProfessor: Phillipa GillMotivation/ GoalMotivation: (1) Network outages can lead societal and economic impact. (2) Knowing the reasons of network outages are always desirable

Goal: (1) Find out what type of outages occur commonly(2) Predict the on-going outage typeData Set

First EmailSep 29, 2006Last EmailMar 24, 2015Num of Posts6963Num of Threads2102Num of Replies4725Num of Posters1256Summary of Outage mailing list dataset

What - Outage Mailing listWhy - Public (Free) / rich informationPreliminary Data Analysis:Content Providers (Yahoo, google, facebooketc) ISPs (AT&T, Verizon, Sprintetc) Protocols (BGP, DNS, IPv6etc) Security (DDoS, Hijack, Virusetc)Preliminary Data Analysis

Data PreprocessingSteps: Integrate threads Remove words unrelated to network outage Stemming and Lemmatization Remove words with less TF-IDF value Generate Term Frequency in the dataset

ClassificationLabelingLabeling StandardLabeling Standard

ClassificationLabelingLabeling Standard Why labeling How to label(Fleiss kappa)ClassificationTrain the classifierMultiple Classification -> Multiple Binary Classification ---- one vs allWhy using this method?

Test the classifiers effectHalve labeled data--training data and test data separatelyEvaluation the Classifier Accuracy of the classification, Confusion Matrix

Classifier accuracy

ClassificationTrain the classifierMultiple Classification -> Multiple Binary Classification ---- one vs allWhy using this method?

Test the classifiers effectHalve labeled data--training data and test data separatelyEvaluation the Classifier Accuracy of the classification, Confusion Matrix

Classify the unlabeled dataBased on the substantial well accuracy of the classification, classify the remaining unlabeled data.ResultOutage Types Distribution of each yearOutage Types Distribution of Each Year

ResultEach year outage types distribution

2006-2015 every outage type percentageOutage Types Percentage 06 - 15

ResultEach year outage types distribution

2006-2015 every outage type percentage

Extension: Real-time outage type prediction Real-time outage type prediction How to doIntegrate data preprocessing, classification method, real-time predict new mails outage type and show on website immediately.

What to showIf the mail text include traceroute information, then extract it and show on the website.Combine the 2015s all mail text and analyze the tendency of the outage type. Real-time outage type prediction ConclusionFeature of Outage CausesMobile network issues are increasingCommon outage types are easily observed by usersReal-time Predict the on-going Outage Type

Future WorkAnalyzing keywords with associated outage type in advanceIntegrate data based on subjects VS threads