Visualizing+music+artists’+main+topics+and+overall+sentiment+by+analyzing+lyrics

Visualizing music artists’ main topics and overall sentiment by analyzing lyrics

Abstract

This paper discusses the process of visualizing music artists’ main topics and overall sentiment by analyzing lyrics. While artists themselves translate their lyrics into sound, we are chal-lenged by the problem of visualizing their songs and automatically generating a moodboard based on written songs. This forms the main problem of this article, because visualized lyrics can be valuable for deaf people, and analyzing them may be of commercial use for music software companies. The approach to this problem in-volves scraping the lyrics from the web, pre-processing the scraped texts and analysis of top-ics and overall sentiment. Furthermore, based on the main topics, corresponding images are scraped from the web and placed into the mood-board. Based on the sentiment score, a matching level of saturation is given to these images. Fi-nally, some results are given and discussed fol-lowed by a link to the final application.

1 Introduction

Every music artist is unique, just like every song is unique. Songs can express positive or negative emotions through their sounds and lyrics. While humans can comprehend the emotions of a song by simply listening to it, a different approach is required for a Natural Language Processing sys-tem to perform this task, since only the textual component of lyrics can be utilized. Without sound, it is more difficult to extract the sentiment of a song, since all that is left are words. Another problem is that lyrics do not follow the same syntactic rules as informative texts. The language structure that is adhered to in lyrics is more simi-lar to the structure of poetry. Lyrics can be am-biguous, since they often contain metaphors, idi-

oms and polysemous words; the interpretation is left to the listener of the song, and can be inter-preted differently by different people. To enable an automated system to interpret lyrics is there-fore a demanding task. However, lyrics might be able to provide an automated system enough in-formation about a song to detect its main topics and overall sentiment. Therefore, in this research is analyzed whether it is possible to use Natural Language Processing in order to extract the sen-timent of songs based on their lyrics only. It is also analyzed how to extract the main topics of an artist’s songs, that are embedded in the lyrics. The challenge and main problem statement of this paper is to express the extracted sentiment and topics of the songs of an artist in a different way than sound: visually, by generating a mood-board. A moodboard as visualization method is chosen because it has the convenience of carry-ing the same ambiguity as a song, which makes it a perfect fit for the domain.

The resulting moodboard of an artist’s main top-ics and overall sentiment can be useful for sever-al purposes. It would offer an opportunity for deaf people to comprehend music by taking ad-vantage of their visual senses. This can bring them closer to a domain that they are often dis-tanced from. Besides that, these moodboards could also be useful for music software compa-nies such as Spotify. Spotify is well aware of the fact that music and mood are intertwined, and anticipate on that by offering playlists based on moods. An addition to these playlists would be embedding the proposed moodboard of an art-ist’s topics and sentiment, for an additional visu-al sensation.

Kayleigh Beard Vrije Universiteit

Amsterdam, Nederland [email protected]

Anita Tran Vrije Universiteit


Nathalie Post Vrije Universiteit


Laila van Ments Vrije Universiteit


2 Data Used

2.1 Lyrics

The used data consists of lyrics from the website ‘http://www.songteksten.nl’. However, no data is stored within the application itself. A user of the application can query an artist, and only after this query the data is retrieved by scraping all the artist’s lyrics from ‘http://www.songteksten.nl’, using ‘Scrapy’, a Python module for scraping websites.

2.2 Images

Based on the queried artist, and the results of the topic analysis, images are scraped from photo-sharing website Flickr, ‘http://www.flickr.com’.

3 Methodology

This section describes the four separate steps that take place within the application. After the lyrics are scraped, the lyrics are preprocessed, as de-scribed in Section 3.1. Subsequently, the main topics of the lyrics are extracted (Section 3.2) and sentiment analysis is conducted (Section 3.3). Finally, the results from the topic analysis and sentiment analysis are used in order to gen-erate a moodboard (Section 3.4).

3.1 Preprocessing the text

The scraped lyrics are preprocessed in order to normalize the text. The first step of prepro-cessing consists of sentence segmentation and tokenization. After this, Part-Of-Speech tagging is applied, in which each word of the text is marked with a part of speech, based on its defini-tion and context. Finally, redundant characters and punctuation marks (colons, apostrophes, hy-phens, strokes, parentheses and square brackets) are removed from the text (Fokkens, 2015).

3.2 Topic analysis

The most meaningful words of a sentence are keywords. In order to extract the main topics from lyrics, the goal was to filter out the key-words from the lyrics. Keywords are most often contained in nouns, and therefore only nouns were extracted from the text (Common Noun, Proper Noun, Proper Noun Singular Form and Proper Noun Plural Form), and stored in a list. Not all the extracted nouns in this list are actual keywords, and some of the extracted nouns should not be part of the topic analysis. The NLTK ‘Stopwords’ corpus is used in order to

filter out these words. However, the ‘Stopwords’ corpus does not filter out all redundant words, especially the domain dependent words (such as ‘chorus’, and ‘verse’). During a thorough test procedure of the application, an additional stop-words list was generated in order to filter out those redundant words as well.

After a list of proper keywords is established, a frequency distribution of the most common words in this list is generated. From this frequen-cy distribution the ten most common words are assembled, which represent the ten main topics of the queried artist (Bird, S., Klein, E. and Lop-er, E., 2009).

3.3 Sentiment Analysis

Several approaches can be used for sentiment analysis (Maks, 2015). In this application, a po-larity lexicon is used to assess the overall senti-ment, by determining how many positive and how many negative words appear in the lyrics (Breen, 2011). The used algorithm is based on the approach of F. Alba (Alba, 2012), and com-prises of a few steps. First, every word in the lyrics is compared to the words in the opinion lexicon, which contains a set of ‘positive polarity words’ and a set of ‘neg-ative polarity words’. When a word contained in the lyrics matches a word in the lexicon, it is an-notated with a tag according to its polarity: posi-tive or negative. However, this is not sufficient, since this lexicon does not account for incre-menters (‘very’, ‘super’) and decrementers (‘barely’, ‘little’), which enhance or decrease the strength of the sentiment. Besides that, inverters (‘not’, ‘no’), invert the entire polarity of the word, changing it from positive to negative or vice versa. Therefore, additional dictionaries for incrementers, decrementers and inverters are uti-lized. After every word is annotated with either a sen-timent tag or none, the application keeps track of two separate sentiment scores. One score keeps track of all the positively classified words, the other one of all the negatively classified words. However, before a sentiment score is assigned to a word, the previous tokens are checked for in-crementers (in which case, the sentiment score for that word is doubled), decrementers (in which case, the sentiment score for that word is halved), and inverters (in which case, the sentiment score for that word is inverted).

The results of the sentiment analysis consist of two scores: one positive score, and one negative score. These resulting scores are not interpretable without scaling. Some artists have many songs and therefore many available lyrics, which re-sults in higher scores than artists with fewer lyr-ics. Therefore, the resulting sentiment scores are scaled according to the amount of sentiment tagged words, resulting in a percentage of posi-tive sentiment carrying words and a percentage of negative sentiment carrying words.

3.4 Moodboard Generation

The results of the topic- and sentiment analysis are visualized in a moodboard. This moodboard generation consists of two steps. First, for each of the ten main topics determined in the topic analysis, five corresponding images are scraped from Flickr, as well as five images of the artist. Second, the resulting sentiment score from the sentiment analysis is translated into the amount of saturation in the moodboard. The higher the sentiment score (thus, the more positive words in the songs), the higher the amount of saturation in the pictures. The lower the sentiment score (thus, the more negative words in the songs), the lower the amount of saturation in the pictures.

4 Results

The resulting application from this research is able to generate a moodboard, based on topic analysis and sentiment analysis of the queried artist’s lyrics. To assess the performance of a Natural Language Processing system, often a quantitative analysis is used. However, since there was no annotated dataset of lyrics, this was impossible. Therefore, in order to determine the performance of the system, a manual approach was used, in which a range of different artists was queried. A handful of results from these que-ries are attached in Appendix I. It is difficult to establish a valid performance measure of these results. However, the fact that it is not possible to provide an exact performance measure is not ex-tremely relevant for the purpose of the applica-tion. The resulting topic- and sentiment analysis of the application are visualized in a moodboard, and moodboards are not an exact science. The purpose of a moodboard is to project an overall mood in a visual way, so even when the accuracy of the topic analysis and sentiment analysis are below optimal, this isn’t immediately observable

in the moodboard. Whether the moodboard por-trays an artist’s sentiment and topics accurately is almost as ambiguous as the lyrics themselves, and is left to the interpretation of the user.

5 Discussion

There are many parts of the developed applica-tion that can be improved. A few proposed future improvements are described in this section. As stated before, lyrics do not follow the same syntactic structure such as informative texts, which makes it a challenging task for a Natural Language Processing system to correctly deter-mine the Part-Of-Speech of each word. Words were not always assigned the correct Part-Of-Speech, which resulted in words being incorrect-ly identified as nouns. This sometimes led to non-keywords being identified as keywords, and incorrectly projected on the moodboard. Even though this misclassification often isn’t visible due to the ambiguity of the moodboard, it is a flaw in the application. A domain specific Part-Of-Speech tagger could be useful in order to re-solve this problem. Also, even though the lexicons used for the sen-timent analysis were extensive, they were not domain specific, which can lead to inaccurate results. Besides that, the positive polarity lexicon consisted of 2002 words, while the negative po-larity lexicon consisted of 4767 words. It has not been researched for the purposes of this applica-tion whether this ratio is a correct representation of sentiment carrying words in English language. If not, it is possible that the used lexicons result in a bias toward the negative polarity lexicon. Therefore, research about the correct ratio of positive and negative polarity carrying words is necessary to improve the accuracy. Furthermore, only a rule-based approach was used for this application. It could be useful to explore whether machine learning or hybrid ap-proaches would yield better results. Finally, this application is a ‘stand-alone’ appli-cation right now, but if it would be integrated in a music software system such as Spotify, it would be a great improvement if the application could be personalized. Different people interpret songs in a different way, so an addition to the system would be an opportunity for the user to

provide feedback about the resulting visualiza-tions. This way, the parameters of the application could be tuned according to the user, which could lead to a better user experience.

6 Link to Application

The application is not published online, however, the code is made available to download at https://www.dropbox.com/s/241gr8r9cglf8eg/Visual_Songs.zip.

7 Group Work Summary

All group members brainstormed together about the idea and worked their way through the lab sessions. The actual application was mostly built by Kayleigh and Nathalie, because they have more experience with programming in Python than Laila and Anita. Nathalie was responsible for the scraper, the web-application using Python CGI, and the sentiment analysis. Kayleigh was responsible for the topic analysis and the mood-board with images scraped from Flickr. The final report was written by Laila, Anita, Kayleigh and Nathalie together.

8 References

Alba, F. “Basic Sentiment Analysis in Python”. 1 Nov. 2012. Web. 23 Mar. 2015. <http://fjavieralba.com/basic-sentiment- analysis-with-python.html>.

Breen, J. “Twitter Sentiment Analysis Tutorial 201107: Opinion Lexicon English”. Git hub. 12 Jul. 2011. Web. 23 Mar. 2015. <https://github.com/jeffreybreen/twitter-sentiment-analysis-tutorial-201107/tree /master/data/opinion-lexicon-English>.

Bird, S., Klein, E. and Loper, E. Natural Langu age Processing with Python, 79-128. First Edition (2009). California: O’Reilly Media Inc. Web.

Fokkens, A. “Introduction to NLP”. Blackboard Learn VU. Web, Lecture 10 Feb. 2015.

Maks, I. “Text Mining 2015: Sentiment Analysis & Opinion Mining”. Blackboard Learn VU. Web, Lecture 24 Feb. 2015.

Spice Girls The results of the sentiment analysis of Spice Girls are: 68 percent of classified words was positive,

and 31 percent of the words was negative. The main topics of Spice Girls are: time, love, come, something, night, fun, baby, lover, deeper.

The resulting moodboard is displayed in image 1.

Image 1: Resulting moodboard for the Spice Girls.

Ellie Goulding The results of the sentiment analysis of Ellie Goulding are: 43 percent of classified words was posi-

tive, and 56 percent of the words was negative.

The main topics of Ellie Goulding are: burn, love, time, baby, heart, fire, lights, life, anything. The resulting moodboard is displayed in image 2.

Image 2: Resulting moodboard for Ellie Goulding.

Appendix I

Slipknot The results of the sentiment analysis of Slipknot are: 23 percent of classified words was positive,

and 76 percent of the words was negative.

The main topics of Slipknot are: inside, build, fuck, life, everything, end, goodbye, man, eyes. The resulting moodboard is displayed in image 3.

Image 3: Resulting moodboard for Slipknot.

ABBA The results of the sentiment analysis of ABBA are: 53 percent of classified words was positive,

and 46 percent of the words was negative.

The main topics of ABBA are: man, mother, honey, waterloo, midnight, take, elaine, nothing. The resulting moodboard is displayed in image 4.

Image 4: Resulting moodboard for ABBA.

Documents

Visualizing+music+artists’+main+topics+and+overall+sentiment+by+analyzing+lyrics