The following document details how to create a treemap in R using the treemap package.
What are they & when do we use them
In the most basic terms a treemap is generally used when we want to visualize proportions. It can be thought of a pie map where the slices are replaced by rectangles.
Using pie charts to visualize proportion is an excellent way, however if the categories keep on increasing the pie charts tends to become more and more unreadable. This issue of pie charts is overcomed in a Treemap which uses nested structure. These are ideal for displaying large amount of hierarchical data. We can use a treemap when space is a constraint and we have a large amount of hierarchical data to get an overview.
A treemap is a diagram representing hierarchical data in the form of nested rectangles, the area of each corresponding to its numerical value
When not to use a treemap
A treemap should not be used when there is a big difference between the measure values or the values are not comparable. Also, negative values cannot be displayed on a treemap.
Building a Treemap in R
To create a treemap we use one or more dimension and a maximum of 2 measures. We will be using the treemap package in R. For this article we will use the Super Store data which is provided along with the article.
Step 1: Importing Data and installing treemap package in R
## Set the working directory location to the file location##
## Import the datafile in R and view the data sample)
>data= read.csv("data.csv", header = TRUE, sep =",")
Once we get the data in R we need to load the package treemap so that we can go ahead creating our required plot.
## Installing the package and calling the package in R##
The data that we are using is already reshaped data and so we can go ahead with creating our basic treemap and move step by step from it.
Step 2: Creating a Treemap
The treemap function is used to create a treemap.
## Creating the most basic treemap##
>treemap(data,index = c("Category"),vSize ="Sales")
The first argument in the above formula is the data file name which is “data” in our case. The arguments within the index specify the hierarchy that we are looking into and the argument vSize tell R to pick up a values on which the proportion of the boxes are to be decided.
In our case since we have only index in our formula the command just splits the entire tree in three parts ( each representing the proportion of Sales for these part)
A first look into the above figures shows that the Proportion of Technology, Furniture and Office Supplies is almost within the same range , the highest being technology.
We can check our hunch right away, Type in the following :
>aggregate(Sales ~ Category, data, sum)
And you will get the following result :
1 Furniture 741999.8
2 Office Supplies 719047.0
3 Technology 836154.0
We see how close these Sales are to other (proportionately)
Now as we have created our most basic treemap lets go a bit further and see what happens when we list multiple values in the index ( create a hierarchy )
## Creating a treemap with Category and Subcategory as a hierarchy.
>treemap(data,index = c("Category","Sub.Category"),vSize = "Sales")
Here is what happened, the tree is first splits at the category level and then each category further splits under a subcategory. We can see from the treemap that the Technology Category accounted for maximum sales and within the Technology category, Phones accounted for most of the sales. (the size of the boxes are still by Sales)
Let’s go a step further and color the boxes by another measure let’s say profit.
##Coloring the boxes by a measure##
>treemap(data,index = c("Category","Sub.Category"),vSize ="Sales",vColor = "Profit",type="value")
The treemap that we get here is similar to the previous one except for the fact that now the box color represents the Profit instead. So we can see here that the most profitable subcategory was Copiers while on the other hand Tables were the most unprofitable.
The argument vColor tells R to pick up a variable that we want to be used as a color. Type Defines if it is a value, index or categorical.
##Using a categorical variable as color##
>treemap(data,index = c("Category","Region"),vSize ="Sales",vColor = "Region",type="categorical")
Here we see that the tree is split into Categories first and under each category we have all the four region that are distinguished by individual color.
Step 3: Enhancing our treemap
Let’s move ahead and make our treemap more readable. To do this we will add a title to our treemap and change to font size of the Labels for category and Subcategories. We will try to keep the labels for Categories bigger and sub categories a bit smaller. Here’s how to do it:
## Titles and font size of the labels##
>treemap(data,index = c("Category","Sub.Category"),vSize ="Sales",vColor = "Profit",type="value",title = "Sales Treemap For categories",fontsize.labels = c(15,10))
Notice how we have added a custom title to treemap and change the label size for Categories and Sub categories. The argument title allows us to add title to our visual while the argument fontsize.labels helps in adjusting the size of the labels.
How about positioning the labels ??
How about keeping the Categories labels centered and keeping that of Sub Categories in top left.
This can be achieved by the argument align.labels as under:
## Aligning the labels##
>treemap(data,index = c("Category","Sub.Category"),vSize ="Sales",vColor = "Profit",type="value",title = "Sales Treemap For categories",fontsize.labels = c(15,10),align.labels = list(c("centre","centre"),c("left","top")))
There it is, our labels are now aligned beautifully.
We can also choose our custom palette for treemaps using the palette argument as under:
>treemap(data,index = c("Category","Sub.Category"),vSize ="Sales",vColor = "Profit",type="value",palette="RdYlGn",range=c(-20000,60000),mapping=c(-20000,10000,60000),title = "Sales Treemap For categories",fontsize.labels = c(15,10),align.labels = list(c("centre","centre"),c("left","top")))
Here we have used the custom Red Yellow Green palette to see the profit more clearly.
Red being the most unprofitable and Green being the most profitable. In this article we looked upon how to create a treemap in R and adding aesthetic to our plot. There’s much more that can be done using the arguments under a treemap. For a complete list of argument and functionality refer the package documents.
“Area-based visualizations have existed for decades. This idea was invented by professor Ben Shneiderman at the University of Maryland, Human – Computer Interaction Lab in the early 1990s. Shneiderman and his collaborators then deepened the idea by introducing a variety of interactive techniques for filtering and adjusting treemaps. These early treemaps all used the simple “slice-and-dice” tiling algorithm”.
Treemaps are the most efficient option to display hierarchy that gives a quick overview of structure. They are also great at comparing the proportion between categories via their area sizes.
This article was contributed by Perceptive Analytics
. Rahul Singh, Chaitanya Sagar, Jyothirmayee Thondamallu and Saneesh Veetil contributed to this article.
Perceptive Analytics provides data analytics, data visualization, business intelligence and reporting services to e-commerce, retail, healthcare and pharmaceutical industries. Our client roster includes Fortune 500 and NYSE listed companies in the USA and India.