Lyric Analysis with NLP and Machine Learning using R: Part One – Text Mining

June 22
By Debbie Liske

This is Part One of a three part tutorial series originally published on the DataCamp online learning platform in which you will use R to perform a variety of analytic tasks on a case study of musical lyrics by the legendary artist, Prince. The three tutorials cover the following:


Musical lyrics may represent an artist’s perspective, but popular songs reveal what society wants to hear. Lyric analysis is no easy task. Because it is often structured so differently than prose, it requires caution with assumptions and a uniquely discriminant choice of analytic techniques. Musical lyrics permeate our lives and influence our thoughts with subtle ubiquity. The concept of Predictive Lyrics is beginning to buzz and is more prevalent as a subject of research papers and graduate theses. This case study will just touch on a few pieces of this emerging subject.



Prince: The Artist

To celebrate the inspiring and diverse body of work left behind by Prince, you will explore the sometimes obvious, but often hidden, messages in his lyrics. However, you don’t have to like Prince’s music to appreciate the influence he had on the development of many genres globally. Rolling Stone magazine listed Prince as the 18th best songwriter of all time, just behind the likes of Bob Dylan, John Lennon, Paul Simon, Joni Mitchell and Stevie Wonder. Lyric analysis is slowly finding its way into data science communities as the possibility of predicting “Hit Songs” approaches reality.

Prince was a man bursting with music – a wildly prolific songwriter, a virtuoso on guitars, keyboards and drums and a master architect of funk, rock, R&B and pop, even as his music defied genres. – Jon Pareles (NY Times)
In this tutorial, Part One of the series, you’ll utilize text mining techniques on a set of lyrics using the tidy text framework. Tidy datasets have a specific structure in which each variable is a column, each observation is a row, and each type of observational unit is a table. After cleaning and conditioning the dataset, you will create descriptive statistics and exploratory visualizations while looking at different aspects of Prince’s lyrics.

Check out the article here!




(reprint by permission of DataCamp online learning platform)

New DataCamp Course: Working with Web Data in R

Hi there! We just launched Working with Web Data in R by Oliver Keyes and Charlotte Wickham, our latest R course!

Most of the useful data in the world, from economic data to news content to geographic information, lives somewhere on the internet – and this course will teach you how to access it. You’ll explore how to work with APIs (computer-readable interfaces to websites), access data from Wikipedia and other sources, and build your own simple API client. For those occasions where APIs are not available, you’ll find out how to use R to scrape information out of web pages. In the process, you’ll learn how to get data out of even the most stubborn website, and how to turn it into a format ready for further analysis. The packages you’ll use and learn your way around are rvest, httr, xml2 and jsonlite, along with particular API client packages like WikipediR and pageviews.

Take me to chapter 1!

Working with Web Data in R features interactive exercises that combine high-quality video, in-browser coding, and gamification for an engaging learning experience that will make you an expert in getting information from the Internet!



What you’ll learn

1. Downloading Files and Using API Clients
Sometimes getting data off the internet is very, very simple – it’s stored in a format that R can handle and just lives on a server somewhere, or it’s in a more complex format and perhaps part of an API but there’s an R package designed to make using it a piece of cake. This chapter will explore how to download and read in static files, and how to use APIs when pre-existing clients are available.

2. Using httr to interact with APIs directly
If an API client doesn’t exist, it’s up to you to communicate directly with the API. But don’t worry, the package httr makes this really straightforward. In this chapter, you’ll learn how to make web requests from R, how to examine the responses you get back and some best practices for doing this in a responsible way.

3. Handling JSON and XML
Sometimes data is a TSV or nice plaintext output. Sometimes it’s XML and/or JSON. This chapter walks you through what JSON and XML are, how to convert them into R-like objects, and how to extract data from them. You’ll practice by examining the revision history for a Wikipedia article retrieved from the Wikipedia API using httr, xml2 and jsonlite.

4. Web scraping with XPATHs
Now that we’ve covered the low-hanging fruit (“it has an API, and a client”, “it has an API”) it’s time to talk about what to do when a website doesn’t have any access mechanisms at all – when you have to rely on web scraping. This chapter will introduce you to the rvest web-scraping package, and build on your previous knowledge of XML manipulation and XPATHs.

5. ECSS Web Scraping and Final Case Study
CSS path-based web scraping is a far-more-pleasant alternative to using XPATHs. You’ll start this chapter by learning about CSS, and how to leverage it for web scraping. Then, you’ll work through a final case study that combines everything you’ve learnt so far to write a function that queries an API, parses the response and returns data in a nice form.

Master web data in R with our course Working with Web Data in R!