Python Lyrical Analysis

Task

To be able to analyze the sentiment of the artist/song that you want to analyze, the implementation can be broken into a couple of different tasks: obtaining the lyrics, analyzing the lyrics, and displaying the finalized analysis.

Implementation

Obtaining Lyrics

In order to obtain the lyrics that we wish to analyze, we make use of the LyricsGenius python-based genius.com API wrapper. This library provides all of the necessary usability available from the genius.com proper API.

Making use of the LyricsGenius library requires that the user provides a genius api key, either via the runtime arguments, or as a separate input once the program starts. There are addition runtime arguments that are used to prompt how analysis should be done. For instance, the user can choose to analyze a single song, an entire artist based on each song, or an artist based on each of their albums. Note that regardless of which option is chosen for an artist, there is the option to later select specific albums/songs so that the entire discography does not need to be analyzed.

Once the lyrics are properly obtained, a handful of sanitization steps are implemented in order to ensure that the proper lyrics are being analyzed. First, the string of lyrics is vectorized in order to give us a list of words for us to analyze the sentiment for. On the song-wide scope, there are a few custom-made features that can be used to remove any remixed songs or unfinished songs, since these types of songs are usually repeats, and would only skew the sentiment ratings. These two user-specified flags, remove_remix and remove_unfinished, are also able to be specified at runtime, and are only available for analysis on artists as a whole. Additionally, on the lyric-wide scope of things, the common list of english stopwords is checked in order to remove all non-sentiment contributing words from our list of lyric words to analyze, and we also ensure that all common punctuation is also removed.

Analyzing the Lyrics

After obtaining the lyrics, analysis is much more straightforward. Lyrical analysis is conducted using the sentiment polarity analyzer that is made available through the nltk toolkit. Once the songs/songs are passed through the analyzer, the positive/negative/ neutral sentiment scores are recorded and properly weighted in order to judge the sentiment of a song. Note that while nltk does provide a composite score that will give us a sentiment label for a given list of lyric words, we are interested in sentiment breakdowns across the positive/negative/neutral categories, not just the overall label.

Displaying the Analysis

Sentiment analysis display is done through the use of wrapper functionality built on top of graphing capabilities provided by the matplotlib python plotting library. Two different methods are available for analysis display: a pie chart that displays proportion across sentiment categories for individual songs/albums, and a line chart to show the difference in sentiment categories across different albums and songs. Again, a runtime flag is available at runtime for the user to specify how they would like to view the sentiment breakdown.

Challenges

While a handful of different problems were presented during the development of this project, the largest problem was presented up front with the lyric-wrangling from the genius API. While there was initial issues with ensuring functionality of the genius API, the easiest solution to this problem seemed to be using the LyricsGenius library in order to access the API without having to wrestle with the API otherwise. Since the LyricsGenius library provides all of the same functionality as the base genius API does, no issue has arised from this workaround. For future feature addition with something like analysis of the artists/albums/songs from a user's Spotify library, a similar workaround is present through the use of the python-based Spotify API library spotipy.

Another issue that presented itself involved removing remixed songs from the songs to analyze when analyzing an artist by album. While the LyricsGenius library includes functionality to remove songs based on certain keywords, this functinality doesn't entirely solve the issue. Even though a song may include the word 'remix' in it, it is not necessary a remix in and of itself. In order to resolve this issue, special sanitization functions were created and implemented in order to make sure that the 'remix' tag at the end of the song would ensure that the song is removed, while not removing songs that don't appear to be remixes (in order to see the functionality behind this, the source code is available under lib/utils in the github repository). Similar functionality was implemented to remove unfinished songs that may include a tag such as (leak), (demo), or other similar songs. Additional functionality should be included in order to give the user the ability to choose certain keywords that should be excluded, either in addition to or instead of the keywords that are currently in use.

Thoughts

First and foremost, it is great to see a functional tool created based off of the initial idea to analyze my favorite artist. While the original functionality has been implemented, there is certainly room for even further improvement.

While the posted sentiment is good as a general indication of how songs or albums may be, the sentiment analysis method used doesn't seem to capture complex sentiment that may be constructed over the course of a line, verse, or even an entire song. In order to better analyze sentiment in this regard, a better sentiment analysis technique likely involved some sort of memory-based analysis, such as a Long-Short Term Memory (LSTM) or Recurrent (RNN) Neural Network. Additionally, there is likely also a better way to analyze sentiment over the course of an entire album. Instead of saying that the sentiment of an album is simply the average of the sentiment of the songs in that album, a similar memory-based technique should be implemented in order to analyze album sentiment over the course of the first word in an album to the last word in an album.

In terms of additional work for this project, two main points come to mind. First up is the ability to analyze sentiment over a single album, as opposed to only analyzing a song or artist. While this may seem easy at face value, the lack of an album element in the genius API means that additional wrangling will be required in order to implement this. In order to implement this functionality, an initial proposal is to search for the tracklist of the user's requested album, and then making individual calls to the Genius API for each song in the tracklist. An additional feature to add is the ability for a user to enter their spotify username and then analyze the sentiment of select artists/ albums/songs from their library.