53
TEXT MINING Rueshyna 2013/04/25 Taipei.py

Text Mining

Embed Size (px)

DESCRIPTION

https://www.youtube.com/watch?v=svGf5Vxyx60&feature=c4-feed-u

Citation preview

  • 1. TEXT MININGRueshyna2013/04/25Taipei.py

2. WHO AM I? (Tsai, Chia-Chi) Rueshyna (Rues) http://rueshyna.org Machine Learning, Text Mining 3. Text Analytics 4. Text AnalyticsPython 5. Text AnalyticsEnglishPython 6. Text AnalyticsEnglish ChinesePython 7. Text AnalyticsEnglish ChinesePython 8. Chinese 9. ChineseML/DM Monday 10. ChineseML/DM MondayDIY Chinese Segmentation 11. NLTK NUMPY MATPLOTLibrary 12. DataTrain-sampe.csv - Titlehttp://www.kaggle.com/c/predict-closed-questions-on-stack-overow 13. HOW TO ANALYSIS? 14. TermFrequency 15. Zipfs lawFrequencyWord 16. Zipfs lawFrequencyWordW1,W2,W3,W4,W5...... 17. High Freq 18. Word Freq Word Freqto 43736 of 14161? 41421 is 13232in 34713 with 13112a 29796 on 10954how 24858 i 10719. 19919 , 9708the 19409 c 9521for 15283 from 9185- 15145 8742and 14328 using 8219Top 20 19. Mid Freq 20. Word Freq Word Freqgoogle 1683 like 15801 1668 nd 1572all 1659 good 1545problem 1644 time 1544query 1639 set 1537form 1625 xml 1536help 1622 between 1535add 1613 method 1533vs 1593 rails 1526work 1592 project 152250~70th 21. Low Freq 22. Word Freq Word Freqally 1 alpha1 1alm 1 alphabetize 1aloaha 1 alphalite 1alocation 1 alphanumerical 1alogcat 1 alphanumerics 1alogrithm 1 alsways 1aloha 1 altassian 1alongside 1 alterantive 1alot 1 alterate 1alowing 1 alterbox 1Freq = 1 23. Word Freq Word Freqally 1 alpha1 1alm 1 alphabetize 1aloaha 1 alphalite 1alocation 1 alphanumerical 1alogcat 1 alphanumerics 1alogrithm 1 alsways 1aloha 1 altassian 1alongside 1 alterantive 1alot 1 alterate 1alowing 1 alterbox 1Freq = 1 24. TermFrequency+Part of Speech 25. Noun 26. High FreqHigh Freq Mid FreqMid Freq Low FreqLow Freqle web help html accordeon accustomcode page rails view accordian acdisdata way line script according aceessapplication function number name accordtion acessibilitys java windows asp accound acessort server button mysql accoutn achorerror database issue framework accros achsesphp image development software accses acitityc class work service acction acknowledgeapp value custom control accumlation acknowledgements 27. High FreqHigh Freq Mid FreqMid Freq Low FreqLow Freqle web help html accordeon accustomcode page rails view accordian acdisdata way line script according aceessapplication function number name accordtion acessibilitys java windows asp accound acessort server button mysql accoutn achorerror database issue framework accros achsesphp image development software accses acitityc class work service acction acknowledgeapp value custom control accumlation acknowledgements 28. High FreqHigh Freq Mid FreqMid Freq Low FreqLow Freqle web help html accordeon accustomcode page rails view accordian acdisdata way line script according aceessapplication function number name accordtion acessibilitys java windows asp accound acessort server button mysql accoutn achorerror database issue framework accros achsesphp image development software accses acitityc class work service acction acknowledgeapp value custom control accumlation acknowledgements 29. High FreqHigh Freq Mid FreqMid Freq Low FreqLow Freqle web help html accordeon accustomcode page rails view accordian acdisdata way line script according aceessapplication function number name accordtion acessibilitys java windows asp accound acessort server button mysql accoutn achorerror database issue framework accros achsesphp image development software accses acitityc class work service acction acknowledgeapp value custom control accumlation acknowledgements 30. Verb 31. High FreqHigh Freq Mid FreqMid Freq Low FreqLow Freqis nd see parse abandon accuratedo need develop " abbreviate acegiuse add put select abc achivedoes i open insert abcnull ackget has take inside absurd acknowledgeare work generate detect abybody acomplishbe set download delete acceptance actioncreate write stop choose accesslog actionscriptmake learn print replace accomodate activeshave change return include accumulate activestate 32. Adj 33. High FreqHigh Freq Mid FreqMid Freq Low FreqLow Freqbest table long slow abelian accusablenet open visual high above acessandroid many top binary abreast achievenew dynamic common small absent acknowledgedpossible specic real such abysmal aclocalgood website double private acadamic acousticdierent local public objective accelerometer actionbeanvariable key multiple faster accepted activemerchantsame more null generic accessary acts_as_commentableother single second global accomplish actsastaggable 34. Collocation 35. I could murder a curry. 36. I could murder a curry . 37. I could murder a curry .I could 38. I could murder a curry .I couldI murder 39. I could murder a curry .I couldI murderI a 40. I could murder a curry .I couldI murderI aI curry 41. I could murder a curry .I couldI murderI aI curryI . 42. I could murder a curry .I couldI murderI aI curryI .could murder 43. I could murder a curry .I couldI murderI aI curryI .could murder 44. murder curryWHY? 45. murder curryWHY?It cant tell you why.Just use it!!! 46. Python ... 47. Python ...PHPWeb RubyDjangocodeJavaframeworkWindowsfunctionmodule 48. Django ... 49. adminmodel appPythonRailsgetqueryPHPerrorcustomDjango ... 50. N-Gram 51. DEMO 52. Ref Christopher D. Manning and Hinrich Schtze(1999). Foundations of Statistical NaturalLanguage Processing. The MIT Press. Kaggle 53. THANKS