chp%3A10.1007%2F978-3-642-22365-5_7

Embed Size (px)

Citation preview

  • 8/9/2019 chp%3A10.1007%2F978-3-642-22365-5_7

    1/6

    C. Lee et al. (Eds.): STA 2011 Workshops, CCIS 187, pp. 5055, 2011.

    Springer-Verlag Berlin Heidelberg 2011

    Opinion Mining in MapReduce Framework

    Kyung Soo Cho, Ji Yeon Lim, Jae Yeol Yoon, Young Hee Kim,

    Seung Kwan Kim, and Ung Mo Kim

    School of Information and Communication Engineering SungKyunKwan University,

    2nd Engineering Building 27039 CheonCheon-Dong, JangAn-Gu,

    Suwon 440-746, Republic of Korea

    [email protected], {01039374479,vntlffl}@naver.com,

    [email protected], [email protected], [email protected]

    Abstract.Presently, many researching fields are crossed and mashed up to each

    fields, however, some of computer science fields cannot be solved by technique

    only. Opinion mining sometimes needs a solution from other fields, too. For

    example, we use a method from psychology to gain information from text about

    users. Likewise, we suggested a new method of opinion mining which is using

    MapReduce before, and this method also uses a WordMap which is dictionary-

    like. WordMap just has information of category and value of word. If we use a

    novel method of Opinion mining, it could be mining opinion from web more

    powerful than before. Therefore, for stronger opinion mining, we suggest a

    framework of Opinion mining in MapReduce.

    Keywords: Framework, Opinion mining, WordMap, POS tagging, MapReduce.

    1 Introduction

    Opinion mining and Semantic web techniques are fascinating domain of searching

    engine. Between them, Opinion Mining is one of the mining techniques, extracts

    estimation from the internet, analyzes it, and puts out the results. These results areusable and useful in many areas like marketing or product reviews. Nonetheless,

    current methods are inefficient and use time too much for huge data because they run

    on a single node to process. To settle this problematic, cloud computing, which is the

    center of attention for next computing environment, is appropriate. MapReduce,

    which is one of cloud computing methods, already be used in Google file system.

    Therefore, this paper suggests Opinion Mining in MapReduce framework to this

    novel trial for designing under a cloud computing environment, and we look forward

    to the framework showing performance moderately. This framework is able to be

    utilized when a developer who has wanted for some object and expectations about

    performance makes Opinion mining tools in MapReduce.This paper is composed as follows: in the section 2, we explain a technique of

    opinion mining and existing representative research which has relation with the

    framework. In the section 3, we present the framework of opinion mining in

    MapReduce function. In the section 4, we finish the paper with a conclusion and our

    future work.

  • 8/9/2019 chp%3A10.1007%2F978-3-642-22365-5_7

    2/6

    Opinion Mining in MapReduce Framework 51

    2 Related Work

    2.1 Opinion Mining Methods

    Opinion mining study has been gradually growing since the late 90s. Known assentiment classification, Opinion Mining focuses not on the topic, but a users mental

    attitude that topic. In late years, opinion mining has been applied to product reviews,

    or other commercial things. [1] WY. Kim and others suggest a method for opinion

    mining of product reviews using association rules. [2] Opinion mining field also

    includes featured-based opinion mining, summarization, comparative sentence,

    relation mining, opinion searching, opinion spamming, and the linguistic resource

    defining & constructing. [3] [4]

    In a case of sentiment classification, reading text and analyzing make a result like

    , and is similarly to MapReduces [5] data structure. Sosentiment classification has a lot possibility of well-matching within MapReduce. In

    addition, some rules for analysis, which is like the POS tagging technique [6] [7], or

    dictionary information are usable, too.

    Fig. 1.Example of sentiment classification

    Figure 1 shows sentiment classification. Sentiment classification is simple concept.

    It selects the sentiment of a portion of a document set Positive or Negative. If a blog

    user write Hyundai is good, it will calculate and make a result positive on Hyundai.

    A topic associated word is realized as important in a technique of Topic-based

    classification; however, it is as insignificant in sentiment classification. The late

    research on sentiment classification is mainly performed in a document level, which

    can find a detailed attribute. Sentence-level studies are also being done. B Pang, and L

    Lee introduce the ways of discovering sentiment in mining. [8] SM Kim and E Hovy

    suggest the way of recognizing opinion and sentiment of each opinion in a given topicusing sentiment classification of sentence level, [9] It describes that opinions are

    categorized by the technique of POS tagging(Part-of-speech tagging). Some

    academics have focused on comments with emoticons. For instance, Potthast and S.

    Becker give a method of opinion summarization of web comments [10], and J. Read

    introduces a relationship between emoticons and sentiment classification, and

  • 8/9/2019 chp%3A10.1007%2F978-3-642-22365-5_7

    3/6

    52 K.S. Cho et al.

    recognizes an emoticon-trained classifiers. [11] Opinion holder means a person or a

    group who makes an opinion in analyzed resource. A considering of opinion holder is

    central in opinion mining. Thus, SM Kim, and E Hovy submit a paper of a technique

    of mining opinions generated by an opinion holder on topics in online news media

    texts also. [12] Along with sentiment classification research, methods of weight forsentiment information also have studied. We suggest reader weight and method of

    using LIWC[13], before.

    2.2 MapReduce

    Google suggest MapReduce for analyzing large data, and they use it in their BigTable

    [8]. MapReduce is very simple and strong for huge size data like terabytes or

    petabytes, and it is able to customize to each systems efficiently. For these reason,

    MapReduce is paid attention from many researchers.

    Fig. 2.MapReduce

    Figure 2 shows how MapReduce implements. A master node controls all of

    Worker nodes which are called by Map or Reduce nodes, and Map nodes make

    intermediate data which structure is , and Reduce nodes collect

    intermediate data and transform to . It is just general

    fact, and it will change for each systems.

    Some researchers in field of mining have attention to this function. They consider

    that it will make methods of mining stronger. Kelvin Cardona, Jimmy Secretan,

    Michael Georgiopoulos and Georgios Anagnostopoulos suggest a grid based system

    for data mining using MapReduce. [15] In addition, Bayir, M.A, Toroslu, I.H, Cosar,

    A, and Fidan, G suggest a smart miner: a new framework for mining large scale web

    usage data. [16] They suggest novel methods using data mining and MapReduce. Xia,

    T suggest SMS mining with MapReduce which is Large-scale sms messages miningbased on map-reduce in a Computational Intelligence and Design. [17] In their

    thesis, performance of mining methods improves because of MapReduce. This mean

    is MapReduce is suitable for Mining technique, and it is also adoptable to opinion

    mining. We suggest a method of Using WordMap and Score-based weight in opinion

    mining with Mapreduce before. [18]

  • 8/9/2019 chp%3A10.1007%2F978-3-642-22365-5_7

    4/6

    Opinion Mining in MapReduce Framework 53

    3 Opinion Mining in MapReduce Framework

    3.1 WordMap

    A paper of Using WordMap and Score-based weight in opinion mining withMapreduce gives a structure of WordMap. It is multidimensional indexing, dictionary

    data, and usable to any systems flexibly. It is possible that a developer uses this

    concept on his system, and changes its element like mean or value of words. Also, it

    can use additional weight policy. For example, LIWC or leader weight can use in

    connected WordMap. WordMap is able to choose two ways, and the first is that the

    WordMap includes supplementary weight information, and the second is linking

    weight information externally. Including weight information is faster than an external

    way, however, adding weight information to completed WordMap will require a lot of

    time and data space.

    3.2 RuleBox

    RuleBox is a part to classify sentiment and analyze in the opinion mining in

    MapReduce framework. It can connected additional component like as POS tagging

    technique. It defines a sentence or document with using WordMap part. A system

    developer is always able to choose and customize reasonable rules in RuleBox for

    appropriate his systems; however, RuleBox must have one or more rules like a natural

    language processing method. The POS tagging, we mentioned, is a representative

    natural language processing method.

    3.3 Framework

    Fig. 3. Opinion mining in MapReduce framework

    Figure 3 shows Opinion mining in MapReduce framework. This framework has

    three parts: MapReduce, WordMap, and RuleBox. The MapReduce part is membraneof the framework, the RuleBox part is brain, and the WordMap part is resource for the

    framework.

    The WordMap and RuleBox influence to accuracy of opinion mining, and

    MapReduce improves time performance of the framework. [16][18] The Opinion

    mining in MapReduce framework can use for searching engines and it will make

  • 8/9/2019 chp%3A10.1007%2F978-3-642-22365-5_7

    5/6

    54 K.S. Cho et al.

    searching results wealthier. Also it is able to use strong marketing analyzing tools in

    companies for collecting their product reviews, and government is able to utilize this

    framework for their information gathering and analysis. For example, In case of

    America, Google and CIA make co-financing investment company which called

    recorded futures. This company uses mining methods with a technique of huge dataprocessing. This fact is issued in several newspapers.

    4 Conclusion and Future Work

    We suggest an opinion mining in MapReduce framwork. It is novel method of

    opinion mining technique and using MapReduce. This framework is useful to

    someone who wants to develop opinion mining in MapReduce, however, it is

    unsuitable for small size data because the construction of WordMap part spends a lot

    of time cost, and it is inefficient that several nodes analyze small size data.

    Nonetheless, it is powerful for large scale data, and has a strong point of flexibility.

    Today, many companies want to know opinion of their products in the internet,

    therefore, opinion mining which analyze huge resource is interesting research topic.

    Next task is to improve performance and accuracy of opinion mining technique in

    MapReduce.

    Acknowledgments.This work was supported by the Korea Science and Engineering

    Foundation (KOSEF) grant funded by the Korea government (MEST) (No. 2009-

    0075771).

    References

    1. Conrad, J.G., Schilder, F.: Opinion mining in legal blogs. In: Proceedings of the 11th

    International Conference on Artificial Intelligence and Law, pp. 231236. ACM, New

    York (2007)

    2.

    Kim, W.Y., Ryu, J.S., Kim, K.I., Kim, U.M.: A Method for Opinion Mining of Product

    Reviews using Association Rules. In: Proceedings of the 2nd International Conference on

    Interaction Sciences: Information Technology, Culture and Human (ICIS 2009), Seoul,Korea, November 24-26, pp. 270274 (2009)

    3.

    Esuli, A., Sebastiani, F.: SENTIWORDNET: A Publicly Available Lexical Resource for

    Opinion Mining. In: Proceedings of the 5th Conference on Language Resources and

    Evaluation (LREC 2006), Citeseer (2006)

    4. Esuli, A., Sebastiani, F.: PageRanking WordNet synsets: An application to opinion

    mining. In: Proceedings of the 45th Annual Meeting of the Association for Computational

    Linguistics (ACL 2007), Citeseer (2007)

    5. Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters.

    Communications of the ACM 51(1), 107113 (2008)

    6.

    Stanford Tagger Version 1.6 (2008), http://www.nlp.staford.edu/software/tagger.shtml

    7.

    Stanford Parser Version 1.6 (2008), http://nlp.stanford.edu/software/lex-parser.shtml

    8.

    Pang, B., Lee, L.: Opinion mining and sentiment analysis. Foundations and Trends in

    Information Retrieval 2(1-2), 1135 (2008)

    9. Kim, S.M., Hovy, E.: Determining the sentiment of opinions. In: Proceedings of the 20th

    International Conference on Computational Linguistics (2004)

  • 8/9/2019 chp%3A10.1007%2F978-3-642-22365-5_7

    6/6

    Opinion Mining in MapReduce Framework 55

    10.

    Potthast, M., Becker, S.: Opinion Summarization of Web Comments. Advances in

    Information Retrieval, 668669 (2010)

    11.

    Read, J.: Using emoticons to reduce dependency in machine learning techniques for

    sentiment classification. In: Proceedings of the ACL Student Research Workshop, pp. 43

    48. Association for Computational Linguistics (2005)12.

    Kim, S.M., Hovy, E.: Extracting opinions, opinion holders, and topics expressed in online

    news media text. In: Proceedings of ACL/COLING Workshop on Sentiment and

    Subjectivity in Text, Sydney, Australia (2006)

    13. Cho, K.S., Ryu, J.S., Jeong, J.H., Kim, Y.H., Kim, U.M.: Credibility Evaluation and

    Results with Leader Weight in Opinion Mining. In: The 2nd International Conference on

    Cyber-Enabled Distributed Computing and Knowledge Discovery, Huangshan, China,

    October 10-12 (2010)

    14. Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra,

    T., Fikes, A., Gruber, R.E.: Bigtable: A distributed storage system for structured data.

    ACM Transactions on Computer Systems (TOCS) 26(2), 4 (2008)15. Cardona, K., Secretan, J., Georgiopoulos, M., Anagnostopoulos, G.: A grid based system

    for data mining using MapReduce. Technical Report TR-2007-02, AMALTHEA (2007)

    16. Bayir, M.A., Toroslu, I.H., Cosar, A., Fidan, G.: Smart miner: a new framework for

    mining large scale web usage data. In: Proceedings of the 18th International Conference

    on World Wide Web, pp. 161170. ACM, New York (2009)

    17.

    Xia, T.: Large-scale sms messages mining based on map-reduce. In: International

    Symposium on Computational Intelligence and Design, ISCID2008, pp. 712. IEEE, Los

    Alamitos (2008)

    18.

    Cho, K.S., Jung, N.R., Kim, U.M.: Using WordMap and Score-based Weight in Opinion

    mining with MapReduce. In: IEEE International Conference on Service-Oriented

    Computing and Applications (2010)