14
http://www.iaeme.com/IJMET/index.asp 180 [email protected] International Journal of Mechanical Engineering and Technology (IJMET) Volume 9, Issue 2, February 2018, pp. 180193, Article ID: IJMET_09_02_018 Available online at http://www.iaeme.com/IJMET/issues.asp?JType=IJMET&VType=9&IType=2 ISSN Print: 0976-6340 and ISSN Online: 0976-6359 © IAEME Publication Scopus Indexed REAL TIME SENTIMENT ANALYSIS OF E- COMMERCE WEBSITES USING MACHINE LEARNING ALGORITHMS Prof. K. Sudheer Assistant Professor, SITE, VIT University, Vellore, Tamilnadu, India Dr. B Valarmathi Associate Professor, SITE, VIT University, Vellore, Tamilnadu, India ABSTRACT With the advent of social media, sentiment analysis an evolving area finds application in all domains. Significant applications of sentiment analysis include retail, movie rating, politics, healthcare, and automobile industry. Sentiment analysis can be implemented in various methods like machine learning methods, syntactic patterns, lexicon based method. Finding the sentiment of e-commerce websites is very important for online customers to buy quality products and timely delivery of their goods and services. This paper compares the e-commerce websites in terms of quality and timely delivery. Sentiment is calculated based on the live tweets generated by twitter on these websites using streaming API. Key words: Sentiment Analysis, twitter, Over fitting, Accuracy, Feature selection, E- commerce, Real time. Cite this Article: Prof. K. Sudheer and Dr. B Valarmathi, Real Time Sentiment Analysis of E-Commerce Websites Using Machine Learning Algorithms, International Journal of Mechanical Engineering and Technology 9(2), 2018, pp. 180193. http://www.iaeme.com/IJMET/issues.asp?JType=IJMET&VType=9&IType=2 1. INTRODUCTION Sentiment analysis is generally appraisal, attitude or subjective. To do sentiment analysis we have to collect opinions from different users, because their opinion will not be the same. Opinion collection collects opinions of different people from social media like Facebook, twitter, forum, blogs. Each of it has unique features. Opinion in twitter will be simple because each tweet is of maximum of 140 characters similarly reviews are also simple, forums are complex because interactive messages will be exchanged between the users. Sentiment analysis is basically a Quadruple (E,S,O,T) where E is the entity on which the opinion is told is the sentiment, either positive or negative is the opinion holder is the time at which the

REAL TIME SENTIMENT ANALYSIS OF E- COMMERCE WEBSITES … · 2018. 3. 1. · Significant applications of sentiment analysis include retail, movie rating, politics, healthcare,

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: REAL TIME SENTIMENT ANALYSIS OF E- COMMERCE WEBSITES … · 2018. 3. 1. · Significant applications of sentiment analysis include retail, movie rating, politics, healthcare,

http://www.iaeme.com/IJMET/index.asp 180 [email protected]

International Journal of Mechanical Engineering and Technology (IJMET)

Volume 9, Issue 2, February 2018, pp. 180–193, Article ID: IJMET_09_02_018

Available online at http://www.iaeme.com/IJMET/issues.asp?JType=IJMET&VType=9&IType=2

ISSN Print: 0976-6340 and ISSN Online: 0976-6359

© IAEME Publication Scopus Indexed

REAL TIME SENTIMENT ANALYSIS OF E-

COMMERCE WEBSITES USING MACHINE

LEARNING ALGORITHMS

Prof. K. Sudheer

Assistant Professor, SITE,

VIT University, Vellore, Tamilnadu, India

Dr. B Valarmathi

Associate Professor, SITE,

VIT University, Vellore, Tamilnadu, India

ABSTRACT

With the advent of social media, sentiment analysis an evolving area finds

application in all domains. Significant applications of sentiment analysis include

retail, movie rating, politics, healthcare, and automobile industry. Sentiment analysis

can be implemented in various methods like machine learning methods, syntactic

patterns, lexicon based method. Finding the sentiment of e-commerce websites is very

important for online customers to buy quality products and timely delivery of their

goods and services. This paper compares the e-commerce websites in terms of quality

and timely delivery. Sentiment is calculated based on the live tweets generated by

twitter on these websites using streaming API.

Key words: Sentiment Analysis, twitter, Over fitting, Accuracy, Feature selection, E-

commerce, Real time.

Cite this Article: Prof. K. Sudheer and Dr. B Valarmathi, Real Time Sentiment

Analysis of E-Commerce Websites Using Machine Learning Algorithms,

International Journal of Mechanical Engineering and Technology 9(2), 2018, pp.

180–193.

http://www.iaeme.com/IJMET/issues.asp?JType=IJMET&VType=9&IType=2

1. INTRODUCTION

Sentiment analysis is generally appraisal, attitude or subjective. To do sentiment analysis we

have to collect opinions from different users, because their opinion will not be the same.

Opinion collection collects opinions of different people from social media like Facebook,

twitter, forum, blogs. Each of it has unique features. Opinion in twitter will be simple because

each tweet is of maximum of 140 characters similarly reviews are also simple, forums are

complex because interactive messages will be exchanged between the users. Sentiment

analysis is basically a Quadruple (E,S,O,T) where E is the entity on which the opinion is told

is the sentiment, either positive or negative is the opinion holder is the time at which the

Page 2: REAL TIME SENTIMENT ANALYSIS OF E- COMMERCE WEBSITES … · 2018. 3. 1. · Significant applications of sentiment analysis include retail, movie rating, politics, healthcare,

Real Time Sentiment Analysis of E-Commerce Websites Using Machine Learning Algorithms

http://www.iaeme.com/IJMET/index.asp 181 [email protected]

opinion is given. Ex: User A likes the picture quality of the mobile phone, but its battery life

is not good, here Opinion holder is User A, Sentiment is positive on picture quality but

negative on battery life, Entity is mobile phone. Entity can be any product, person, service or

event. Time component is also important .Ex Sentiment given at this moment is more

important than the sentiment given before two years. An Opinion is of two types regular and

comparative. Regular opinion is this book is great. Comparative opinion involves comparison

between two entities.Ex Mobile X is better than Mobile Y.

In general if users want to buy any product like mobile phone, TV or a dress in e-

commerce websites they will see the service ratings in that website. The website maintains

only positive ratings so rating will not be helpful to the user. This problem can be solved in

twitter by looking into the live tweets generated by the users on e-commerce websites. Twitter

is used by millions of people in the world and the no of tweets generated per day is millions

.Ex tweet on amazon is the service given by Amazon is awesome in buying apple mobile

phone and can compare the services given by the different e-commerce websites.so that users

can choose which is the best website for buying the products .our problem is real time

sentiment analysis of e-commerce websites using machine learning algorithms. Real time

sentiment analysis is a challenging task since the labelled data is difficult to get.

The problem can be solved using different methods

1.Sentiment lexicons.

2.Syntatcic analysis.

3.Machine learning model.

1.1. Machine Learning Model

Machine learning model [1] includes feature set extraction by removing the irrelevant features

then apply machine learning algorithms to train the data and learn ,then the model is used to

predict the sentiment of the text

1.2. Sentiment Lexicon

Sentiment lexicon[3] uses domain dictionary consists of sentiment words and its synonyms.

For any sentence stop words has to be removed and polarity is calculated by first comparing

each token presence in the dictionary. If found then its strength of polarity can be found in

sentiwordnet. Finally add all the polarity strengths of tokens and based on threshold sentence

can be classified as positive or negative.

Generating sentiment lexicons can be done in 3 methods.

1.Manual method.

2.Thesarus based.

3.Corpus based.

Manual method is costly in finding all the opinion words and its polarity J kamps

proposed thesaurus based approach to obtain the sentiment lexicon[3] by expanding the

adjectives or opinion words with its synonyms‟ and antonym‟s .For example excellent ,good,

superb, beautiful all will have positive semantic orientation. Similarly bad, worst, dull will

have negative semantic orientation .Corpus[5] based method used to extract the opinion words

based on the conjunctions if two sentences are connected by the word “AND” the opinion of

Page 3: REAL TIME SENTIMENT ANALYSIS OF E- COMMERCE WEBSITES … · 2018. 3. 1. · Significant applications of sentiment analysis include retail, movie rating, politics, healthcare,

Prof. K. Sudheer and Dr. B Valarmathi

http://www.iaeme.com/IJMET/index.asp 182 [email protected]

the both words will be positive, otherwise different opinion words if connected by the word

“BUT”. E.G. Though the picture quality of camera is good but size is too long.Here the

Picture quality has positive semantic orientation, size is having negative semantic orientation.

Figure 1 Machine learning Model

1.3. Syntactic Patterns

Syntactic patterns [2] are used to identify phrases like NN followed by adjective.

Table 1 Syntactic patterns

S.No First word Second word Third word

1 JJ NN or NNS Any thing

2 RB,RBR or RBS JJ NNS

3 JJ JJ Not NN nor NNS

4 NN or NNS JJ Not NN nor NNS

5 RB,RBR, or RBS VB,VBD,VBN or VBG Anything

The sentimental analysis approach consists of following steps.

1.Apply parts of speech tagging to the given data set2. Extract the patterns using the words

given in the above table.

3. Apply Associate rule mining. Frequent item set is words or phrases

4. Extract the Frequent Item sets

Eg.Digital camera.

5. Find nearby adjectives to know the opinion words.

6. Find the polarity of the extracted phrases.

This method also fails in case of sentiment shifters like „NOT‟, and ‟but‟.

2. SENTIMENT ANALYSIS APPLICATIONS.-RELATED WORK

This section compares all the existing work done on applications of sentiment analysis and

explores the domain of online shopping. Sentiment Analysis for social images is a latest

research much focus is on text sentiment analysis. Jyoti Islam,Yanqinq Zhanq[28] proposed a

model for sentiment analysis on social images using transfer learning approach with much

improved results of the current state of art.Naan Yang and Shufan[14] Zhang proposed a data

analysis system for video comments. Comments were extracted from internet and analyses the

information using machine learning technique and provides sentiment about the video.

Page 4: REAL TIME SENTIMENT ANALYSIS OF E- COMMERCE WEBSITES … · 2018. 3. 1. · Significant applications of sentiment analysis include retail, movie rating, politics, healthcare,

Real Time Sentiment Analysis of E-Commerce Websites Using Machine Learning Algorithms

http://www.iaeme.com/IJMET/index.asp 183 [email protected]

Chih-Hua ,Tai works on the detecting the intention and intensity of feelings on Social

networks. The main idea is building Feeling Distinguisher framework in view of

supervised,Latent Dirichlet Allocation (sLDA), Latent Dirichlet Allocation, and

SentiWordNet systems for recognizing a person‟s feelings and power of emotions through the

investigation of the online posts. The performance of FeD is about 1.08–1.18 folds that of

SVM and sLDA. Apoorv Agarwal, Vivek Sharma[15] performs opinion mining in news

headlines using sentiwordnet. The algorithm works as follows headlines are tokenised into

words and apply parts of speech tagger, lemmatise the word to know in which reference it is

used, then apply stemming. Finally the output of the resulting word is given to the

sentiwordnet which calculates the positive, negative scores and finds the total polarity of the

headline.

Huan Chen, Xin-Nan Li[17],worked on sentiment analysis for big data scenario. At

present text classification methods such as naïve bayes, SVM,Maximum entropy were used.In

this paper text mining analysis based on the big data perspective was proposed. A new text

processing service called Cloud based core text processing service has been proposed and

finally personalised news recommendation was proposed.

Meghana Ashok, Swathi Rajanna[18] proposed a personalised recommender system using

sentiment analysis of social media.Retreiving information from social media is a enormous

task.In this paper author proposes a method to retrieve users reviews and comments of

restaurants to personalise and rank suggestions based on user preferences.

Yohei seki studys[19] about the Use of Twitter for Analysis of Public Sentiment for

Improvement of Local Government Service .Public sentiment are analysed from their twitter

accounts and proved that system is effective in administering the local service. Tweets were

analysed on the festival day in Tsukuba city from the public in the year 2014 ,2015 then

government improved the traffic policies based on the sentiment from the public.

Security Attack Prediction Based on User Sentiment Analysis of Twitter Data by Ado

Hernandez, Victor Sanchez [26] used twitter to find the people who uses twitter to express

their opinion about security attacks, sentiment analysis is done on the views to predict the

security attack . Sentiment analysis of student feedback, A study towards optimal tools by

Mohammad aman Ullah[20] used facebook as a platform to collect the feedback from the

students and sentiment analysis is done using machine learning algorithms.

In Sentiment and Emotion Analysis for Social Multimedia: Methodologies and

Applications

by Quanzeng[21] performed sentiment analysis on the visual images of social multimedia.

An image is worth of thousand words. Sarah E. Shukri, Rawan I. Yaghi, Ibrahim Aljarah,

Hamad Alsawalqah [22] studied about sentiment analysis of automobile industry. Their

results show that BMW has more positive polarity than Mercedes.

Latest research on sentiment analysis is using Domain ontology. Domain ontology is used

to extract the features of any product and does Aspect level sentiment analysis. Ontology-

based Aspect Extraction for an Improved Sentiment Analysis in Summarization of Product

Reviews by Ali Marstawi, Nurfadhlina Mohd Sharef, The[23] and Aida Mustapha is based on

ontology based sentiment summarisation framework(OBPSS) which outperformed other

existing systems in aspect extraction. Ontology Driven Sentiment Analysis on Social Web for

Government Intelligence by Akshi Kumar and Arunima Joshi[24] used twitter to collect the

opinions of citizens on government rules and policies. Ontology was used to determine the

subjects and topics discussed in the tweets. An Ontology is created for ministers of

Page 5: REAL TIME SENTIMENT ANALYSIS OF E- COMMERCE WEBSITES … · 2018. 3. 1. · Significant applications of sentiment analysis include retail, movie rating, politics, healthcare,

Prof. K. Sudheer and Dr. B Valarmathi

http://www.iaeme.com/IJMET/index.asp 184 [email protected]

government of India. Sentiment Learning on Product Reviews via Sentiment Ontology

Tree[25] is labelling the product attributes and their associates sentiments in product reviews

by hierarchical learning process with a Sentiment ontology tree.

Another major part of research on sentiment analysis is using evolutionary algorithms. A

statistical and Evolutionary approach to sentiment analysis by Jonathan Carvalho, Adriana

Prado, Alexander Plastino[26] improved the existing statistical method in which polarity of

the tweets is identified by the association of product features with set of paradigm words. In

this paper Paradigm words are selected from the initial population using genetic algorithms.

Results show that there is significant improvement in the accuracy. Adaptive lexicon learning

using genetic algorithm for sentiment analysis of micro blogs[27] creates adaptive lexicon

from the combining corpus based and lexicon based method and generates new lexicon from

the text. The goal is find an optimal sentiment lexicon by using genetic algorithm. A Genetic

Algorithm feature selection based approach for Arabic sentiment classification uses genetic

algorithm to minimise the no of features in the feature set extraction. Each chromosome is a

bag of unigrams, the algorithm randomly generates initial population and fitness function is

classification accuracy achieved with the selected words in the chromosome then crossover

and mutation is performed to get the selected features for the given data set. Results show that

accuracy, precision and recall were improved with the resultant reduced feature set.

Another major implementation model in sentiment analysis is neural networks. Last and

the important research in sentiment analysis is identifying sarcastic and comparative

sentences. Sarcastic sentence tells about positivity of the sentence but its real intention is

negative

2.1. Review

A Minimal work has been done on research of Application of sentiment analysis on e-

commerce websites. This paper implements the application of sentiment analysis using nltk

twitter API‟s and machine learning algorithms

3. IMPLEMENTATION

3.1. Data Collection

Tweets can be downloaded from the twitter using two methods API search key and twitter

Streaming API. Twitter streaming API is used to download the live tweets .The basic process

of Twitter streaming is

first create an application in twitter

Then authenticate the twitter account

Use stream listener to listen to the live tweets.

Then use Filter to stream the tweets that user wants.

Store the tweets in CSV file or in database.

Table 1 Tweets collected for different websites:

E-commerce website Keyword No of tweets collected

Amazon #amazon 50000

E-bay #e-bay 25000

Alibaba #Alibaba 25000

Page 6: REAL TIME SENTIMENT ANALYSIS OF E- COMMERCE WEBSITES … · 2018. 3. 1. · Significant applications of sentiment analysis include retail, movie rating, politics, healthcare,

Real Time Sentiment Analysis of E-Commerce Websites Using Machine Learning Algorithms

http://www.iaeme.com/IJMET/index.asp 185 [email protected]

Tweets are collected daily from 01/06/17 to 05/06/17 between 4 p.m. to 5 p.m., maximum

of 1000 per day so a total of 50,000 tweets in case of amazon, 25,000 in case of E-bay and

Alibaba were collected.

Figure 2:Bar Diagram showing the no of tweets collected.

Twitter application settings are as follows.

Figure 3 Application settings

0

10000

20000

30000

40000

50000

60000

Twe

ets

co

llect

ed

amazon

ebay

Alibaba

Page 7: REAL TIME SENTIMENT ANALYSIS OF E- COMMERCE WEBSITES … · 2018. 3. 1. · Significant applications of sentiment analysis include retail, movie rating, politics, healthcare,

Prof. K. Sudheer and Dr. B Valarmathi

http://www.iaeme.com/IJMET/index.asp 186 [email protected]

Figure 4 Downloaded tweets

API search key is used to download the tweets that user searches in the twitter. Use screen

name to know the search content. In this paper tweets are downloaded using streaming API at

different times of day by using the search terms „Amazon‟,‟Ebay‟ and „Alibaba‟.

3.2. Pre-Processing

Tweets downloaded has to be pre-processed .It contains irrelevant information and irrelevant

symbols which can be removed manually or using regular expressions in python. In twitter

users normally post the urls which has to be removed. Finally stop words like is, was, also to

be removed.

Figure 5 Preprocessing.Steps

Table 3 Tweets before Pre-processing.

Sl.no DESCRIPTION PURPOSE

1. Remove any url in tweets URL removal

2. Remove RT @username in front of tweets Retweet tag removal

3. Remove #tag at the end of tweets Hashtag removal

4. Remove all prefix(@#) in the middle of tweets Removing special characters

Table 4:Tweets after Pre-processing.

Page 8: REAL TIME SENTIMENT ANALYSIS OF E- COMMERCE WEBSITES … · 2018. 3. 1. · Significant applications of sentiment analysis include retail, movie rating, politics, healthcare,

Real Time Sentiment Analysis of E-Commerce Websites Using Machine Learning Algorithms

http://www.iaeme.com/IJMET/index.asp 187 [email protected]

Sl.no Tweet Steps

1. b"RT @CNN: If your child made an in-app purchase on

Amazon's app store without your permission, you may get

your money back https://t.co/TdQR"

1 to 6

After

Clean up

If your child made an in-app purchase on Amazon's app

store without your permission, you may get your money

back

2 b'Online retailer Amazon refunding $70M for charges by

children without parents knowledge\n\nSTORY;

https://t.co/eUKJYeXMbv'

1 to 6

After

Clean up

Online retailer Amazon refunding $70M for charges by

children without parents knowledge\n\nSTORY;

Figure 6 Sentiment Analysis process

3.3. Natural Language Tool kit

The implementation of sentiment analysis of Collected data set is done on NLTK.NLTK

provides inbuilt libraries for stemming, tokenisation, and machine learning algorithms,

parsing and semantic reasoning.it provides access to more than 50 corpora and lexical

resources.NLTK is free and open source supported by all the operating systems.

3.4. Feature Selection

Feature selection mainly removes the irrelevant features from the original set .Feature

selection model is mainly useful to overcome the problem of over fitting and improving the

performance of sentiment analysis and it is cost effective. Different Feature selection methods

can be implemented.

Page 9: REAL TIME SENTIMENT ANALYSIS OF E- COMMERCE WEBSITES … · 2018. 3. 1. · Significant applications of sentiment analysis include retail, movie rating, politics, healthcare,

Prof. K. Sudheer and Dr. B Valarmathi

http://www.iaeme.com/IJMET/index.asp 188 [email protected]

Document Frequency

Document frequency measures the no of documents in which feature has appeared.it removes

the features whose document frequency is less than the predefined threshold value. Research

survey suggests that this method is simple scalable and effective for document classification.

Information Gain

Feature selection method used in this paper is Document Frequency(DF) finding the frequent

words in the dataset and algorithm learns based on frequent words present in the document.

Frequent words are based on the predefined threshold range. Other Feature selection methods

are Information gain, Gain ratio, CHI squared statistic. Cross –fold validation is used to

divide the data set into train set and test set. The algorithm learns in the training phase and

predicts the document either negative or positive in test set. Different machine learning

algorithms were used to fit a model and compared, like naïve Bayesian classification, Max-

entropy classification, decision rule based classification. Maximum accuracy is achieved in

Naïve Bayesian classification.

Naïve Bayesian algorithm

Naive Bayesian is based on the Bayes theorem

( ( ) ( | ))

( | )( )

P x P y xP x y

P y

.

If F1, F2 …Fn are Features of the Data set then the

algorithm predicts the probability P (

) for different clauses C1, C2, C3 .The Class

which has the maximum probability will be the correct prediction. Naive Bayesian is the most

powerful and accurate method in machine learning algorithms. It gives Maximum accuracy.

Maximum Entropy Method

ME sometimes outperform Naive Bayesian in standard text classification. Conditional

probability is estimated by the exponential form.

,,

1( | ) ( , )

( ) i xEM i x

i

P x y EXP F y xZ y

Where Z(y) is a normalisation function Fi,x is a feature or class function for feature fi

This is defined as ,

1, ( ) 0 & '( , )

0 elsei x

in y x xF y x

Where are feature weight parameters. The parameter values are set so that the MaxEnt

classifiers increase the entropy of the incited distribution while keeping up the limitations

upheld by the training data. The requirement is that the anticipated estimations of the

highlight/class capacities as for the model are equivalent to their anticipated values with

respect to the training data. Maximum entropy assumes that features are dependent whereas

Naive Bayesian assumes independent.

Decision Tree

Decision trees are made by pruning the feature vector space. For each node entropy is

calculated, based on the least value of information gain and accordingly split is made.

Decision trees is easy to predict ,New predictions are made by traversing from the root node

to the leaf node.

Information gain = Entropy (parent) – Weighted sum of Entropy (children)

Page 10: REAL TIME SENTIMENT ANALYSIS OF E- COMMERCE WEBSITES … · 2018. 3. 1. · Significant applications of sentiment analysis include retail, movie rating, politics, healthcare,

Real Time Sentiment Analysis of E-Commerce Websites Using Machine Learning Algorithms

http://www.iaeme.com/IJMET/index.asp 189 [email protected]

Sentiment of the document is identified using the accuracy. If the accuracy is greater than

0.5 it means that it is positive else negative.In this work, we have used overall accuracy (OA)

as performance evaluation metrics. The confusion matrix shown in Table 1 is used for

evaluating the performance of classifiers.

Table 5 Confusion Matrix.

Predicted Positives Predicted Negatives

Actual Positive Examples TP FN

Actual Negative Examples FP TN

The performance of sentiment classification is evaluated by the Overall Accuracy, which

is given by:

Overall Accuracy =

Recall: It is the fraction of relevant instances that have been retrieved over total relevant

instances in the image .Recall =

.Precision (also called positive predictive value) is the

fraction of relevant instances among the retrieved instances.

Precision =

F-measure: A measure that combines precision and recall is the harmonic mean of precision

and recall, the traditional F-measure or balanced F-score:

F-measure =

Table 6 Overall Document Polarity of different websites

E-commerce website Polarity

Amazon Positive

E-bay Positive

Ali baba Positive

3.5. Feature extraction using parts of speech tagging.

Normally adjectives and adverbs represent polarity words in any domain. So extracting

polarity words using parts of speech tagging, then then machine learns based on the adjectives

present in the document, then classifies the new document. Results for the above method are

mentioned in the table 8. Similarly we can use bigrams and trigrams also feature extraction

and results will be compared.

4. RESULTS

The results show that Amazon has got good accuracy compared with E-bay and Ali-baba.

Much of the time NaiveBayesan classification has outperformed the other classification

algorithms. Results are compared with Sentiment knowledge discovery in twitter streaming

data[29].We got better results in terms of accuracy.

E-commerce website No of positive tweets No of negative tweets

Amazon 46428 3570

E-bay 21428 3571

Ali baba 22,011 2989

Page 11: REAL TIME SENTIMENT ANALYSIS OF E- COMMERCE WEBSITES … · 2018. 3. 1. · Significant applications of sentiment analysis include retail, movie rating, politics, healthcare,

Prof. K. Sudheer and Dr. B Valarmathi

http://www.iaeme.com/IJMET/index.asp 190 [email protected]

Table 7 Accuracy Using Document frequency as Feature selection

E-commerce

Website

N –no of

features

Naïve Bayes Accuracy Decision tree Accuracy Max Entrophy

Accuracy

Amazon 200 0.8461 0.8461 0.923

Amazon 300 0.923 0.923 0.923

Amazon 1000 0.923 0.923 0.923

E-bay 200 0.8571 0.8334 0.75

E-bay 300 0.833 0.9166 0.8334

E-bay 1000 0.9166 0.9166 0.9166

Alibaba 200 0.8571 0.89 0.89

Alibaba 300 0.84 0.85 0.86

Alibaba 1000 0.91 0.91 0.91

Table 8 Accuracy using Parts of speech tag as feature selection method

E-commerce

Website

N –no

of

features

Naïve Bayes Accuracy Decision tree Accuracy Max entrophy

Accuracy

Amazon 200 0.7857 0.7857 0.7857

Amazon 300 0.7857 0.7857 0.7857

Amazon 1000 0.7857 0.7857 0..7857

E-bay 200 0.7142 0.7142 0.7142

E-bay 300 0.7142 0.7142 0.7142

E-bay 1000 0.7142 0.7142 0.7142

Ali baba 200 0.712 0.71 0.71

Ali baba 300 0.75 0.75 0.75

Ali baba 1000 0.75 0.75 0.75

Table 9 Precision using Document frequency as feature selection .

E-commerce

Website

N –no

of

features

Naïve Bayes Precision Decision tree Precision Max entrophy

Accuracy

Amazon 200 0.73 0.72 0.76

Amazon 300 0.73 0.72 0.76

Amazon 1000 0.73 0.72 0.76

E-bay 200 0.768 0.768 0.768

E-bay 300 0.769 0.768 0.768

E-bay 1000 0.769 0.769 0.769

Ali baba 200 0.76 0.76 0.76

Ali baba 300 0.73 0.75 0.76

Ali baba 1000 0.74 0.75 0.75

Page 12: REAL TIME SENTIMENT ANALYSIS OF E- COMMERCE WEBSITES … · 2018. 3. 1. · Significant applications of sentiment analysis include retail, movie rating, politics, healthcare,

Real Time Sentiment Analysis of E-Commerce Websites Using Machine Learning Algorithms

http://www.iaeme.com/IJMET/index.asp 191 [email protected]

Figure 7 Graphs: Accuracy for Different Ecommerce companies

5. CONCLUSION AND FUTURE WORK

In this paper the sentiment analysis for different e-commerce websites is compared using

machine learning algorithms. Amazon is the preferred entity with good accuracy. There is

significant improvement in results using twitter for data set collection instead of Data from

the websites which is mostly positive. Currently we are doing sentiment analysis on amazon

as an entity in future we can do aspect level sentiment analysis in different e-commerce

products and choose the product whose ratings is more in e-commerce website. Using Deep

learning we can still improve the accuracy of the machine learning algorithms.

REFERENCES

[1] Pang, B., Lee, L. and Vaithyanathan, “S.Thumbs up? sentiment classification using

machine learning techniques,” Proceedings of the ACL-02 conference on Empirical

methods in natural language processing-Volume 10 (pp. 79-86). Association for

Computational Linguistics, July 2002

[2] Hu, M. and Liu, B,” Mining opinion features in customer reviews,” In AAAI Vol. 4, No.

4, pp. 755-760,July 2004

[3] Kamp‟s, J., Marx, M.J, Mokken, R.J. and Rijke, M.,”Using word net to measure semantic

orientations of adjectives,”LREC 2004 May 26 (Vol. 4, pp. 1115-1118). 2004

[4] Kim, S.M. and Hovy, E,” Determining the sentiment of opinions,” In Proceedings of the

20th international conference on Computational Linguistics (p. 1367). Association for

Computational Linguistics, August 2004

Page 13: REAL TIME SENTIMENT ANALYSIS OF E- COMMERCE WEBSITES … · 2018. 3. 1. · Significant applications of sentiment analysis include retail, movie rating, politics, healthcare,

Prof. K. Sudheer and Dr. B Valarmathi

http://www.iaeme.com/IJMET/index.asp 192 [email protected]

[5] Hatzivassiloglou, V.and McKeown, K.R.,” Predicting the semantic orientation of

adjectives,” In Proceedings of the eighth conference on European chapter of the

Association for Computational Linguistics (pp. 174-181). Association for Computational

Linguistics, July 1997.

[6] Ganapathibhotla, M. and Liu, B., “Mining opinions in comparative

sentences,” Proceedings of the 22nd International Conference on Computational

Linguistics-Volume 1 (pp. 241-248). Association for Computational Linguistics, August

2008.

[7] Jindal, N. and Liu, B,” Mining comparative sentences and relations,” In AAAI (Vol. 22,

pp..1331-1336),July 2006.

[8] Davidov, D., Tsur, O. and Rappoport, A.” Semi-supervised recognition of sarcastic

sentences in twitter and amazon,” Proceedings of the fourteenth conference on

computational natural language learning (pp. 107-116).Association for Computational

Linguistics, July 2010

[9] Turney, P.D., “Thumbs up or thumbs down?: semantic orientation applied to unsupervised

classification of reviews,” Proceedings of the 40th annual meeting on association for

computational linguistics (pp. 417-424). Association for Computational Linguistics, July

2002

[10] Kwon, N. and Hovy, E, ”Integrating semantic frames from multiple sources,”

International Conference on Intelligent Text Processing and Computational

Linguistics (pp. 1-12). Springer Berlin Heidelberg,Feb 2006

[11] Pang, B. and Lee, L, “Seeing stars: Exploiting class relationships for sentiment

categorization with respect to rating scales,” Proceedings of the 43rd annual meeting on

association for computational linguistics (pp. 115-124), Association for Computational

Linguistics, June 2005.

[12] Anastasia, Sonia, and Indra Budi, "Twitter sentiment analysis of online transportation

service providers," Advanced Computer Science and Information Systems (ICACSIS),

2016 International Conference on. IEEE, 2016.

[13] Ramteke, Jyoti, et al. "Election result prediction using Twitter sentiment analysis,"

Inventive Computation Technologies (ICICT), International Conference on. Vol. 1. IEEE,

2016.

[14] Yang, Nan, Sanxing Cao, and Shufan Zhang,"Data analysis system for online short video

comments," Computer and Information Science (ICIS), 2016 IEEE/ACIS 15th

International Conference on. IEEE, 2016.

[15] Agarwal, Apoorv, et al,"Opinion mining of news headlines using SentiWordNet,"

Colossal Data Analysis and Networking (CDAN), Symposium on. IEEE, 2016.

[16] Tai, Chih-Hua, Zheng-Han Tan, and Yue-Shan Chang," Systematical approach for

detecting the intention and intensity of feelings on social network," IEEE journal of

biomedical and health informatics 20.4 (2016): 987-995.

[17] Chen, Huan, et al,"Cloud-Based Core Text Processing Services for Sentiment

Analysis," Big Data (BigData Congress), 2016 IEEE International Congress on. IEEE,

2016.

[18] Ashok, Meghana, et al, "A personalized recommender system using Machine Learning

based Sentiment Analysis over social data," Electrical, Electronics and Computer Science

(SCEECS), 2016 IEEE Students' Conference on. IEEE, 2016.

[19] Seki, Yohei, "Use of Twitter for Analysis of Public Sentiment for Improvement of Local

Government Service," Smart Computing (SMARTCOMP),2016 IEEE International

Conference on. IEEE, 2016.

Page 14: REAL TIME SENTIMENT ANALYSIS OF E- COMMERCE WEBSITES … · 2018. 3. 1. · Significant applications of sentiment analysis include retail, movie rating, politics, healthcare,

Real Time Sentiment Analysis of E-Commerce Websites Using Machine Learning Algorithms

http://www.iaeme.com/IJMET/index.asp 193 [email protected]

[20] Ullah, Mohammad Aman,"Sentiment analysis of students feedback: A study towards

optimal tools," Computational Intelligence (IWCI), International Workshop on. IEEE,

2016.

[21] You,Quanzeng,"Sentiment and Emotion Analysis for Social Multimedia: Methodologies

and Applications," Proceedings of the 2016 ACM on Multimedia Conference. ACM, 2016.

[22] Shukri, Sarah E., et al, "Twitter sentiment analysis: A case study in the automotive

industry," Applied Electrical Engineering and Computing Technologies (AEECT), 2015

IEEE Jordan Conference on. IEEE, 2015.

[23] Marstawi, Ali, et al,” Ontology-based Aspect Extraction for an Improved Sentiment

Analysis in Summarization of Product Reviews,” In Proceedings of the 8th International

Conference on Computer Modelling and Simulation. ACM, 2017..

[24] Kumar, Akshi, and Arunima Joshi,” Ontology Driven Sentiment Analysis on Social Web

for Government Intelligence,” In Proceedings of the Special Collection on eGovernment

Innovations in India,ACM, 2017.

[25] Wei, Wei, and Jon Atle Gulla, ”Sentiment learning on product reviews via sentiment

ontology tree,” In Proceedings of the 48th Annual Meeting of the Association for

Computational Linguistics, Association for Computational Linguistics, 2010.

[26] Carvalho, Jonathon, Adriana Prado, and Alexandra Plastino, "A statistical and

evolutionary approach to sentiment analysis," Proceedings of the 2014 IEEE/WIC/ACM

International Joint Conferences on Web Intelligence (WI) and Intelligent Agent

Technologies (IAT)-Volume 02. IEEE Computer Society, 2014.

[27] Keshavarz, Hamidreza, and Mohammad Saniee Abadeh, "ALGA: Adaptive lexicon

learning using genetic algorithm for sentiment analysis of micro blogs," Knowledge-Based

Systems 122 (2017): 1-16.

[28] Islam, Jyoti, and Yanqing Zhang, "Visual Sentiment Analysis for Social Images Using

Transfer Learning Approach," Big Data and Cloud Computing (BDCloud), Social

Computing and Networking (SocialCom), Sustainable Computing and Communications

(SustainCom) (BDCloud-SocialCom-SustainCom),International Conferences on. IEEE,

2016.

[29] Bifet, Albert, and Eibe Frank.,"Sentiment knowledge discovery in twitter streaming

data," International conference on discovery science, Springer, Berlin, Heidelberg, 2010.

[30] N. Sumathi and Dr. T. Sheela, An Efficient Sentiment Analysis by using Hybrid Naive

Bayes and SVM Approach in Banking Institutions. International Journal of Civil

Engineering and Technology, 8(12), 2017, pp. 373 - 3 91.

[31] Dr.Gagandeep K Nagra and Dr.R.Gopal, “The Effect of Digital Marketing

Communication on Consumer Buying” International Journal of Management (IJM),

Volume 5, Issue 3, 2014, pp. 53 - 57

[32] Mitisha Vaidya and Priyank Thakkar, “Aspect Based Sentiment Analysis of Movie

Reviews” International Journal of Advanced Research in Engieering & Technology

(IJARET), Volume 5, Issue 6, 2014, pp. 53 - 61

[33] A. Suganthy, G.S .Sumithra, J.Hindusha, A.Gayathri and S.Girija, ―Semantic Web

Services and its Challenges. International journal of Computer Engineering & Technology

(IJCET), Volume 1, Issue 2, 2010, pp. 26 - 37