Coke vs. Pepsi? data.table vs. tidy? Part 2

Interested in publishing a one-time post on R-bloggers.com? Press here to learn how.

By Beth Milhollin, Russell Zaretzki, and Audris Mockus

Coke vs. Pepsi is an age-old rivalry, but I am from Atlanta, so it’s Coke for me. Coca-Cola, to be exact. But I am relatively new to R, so when it comes to data.table vs. tidy, I am open to input from experienced users. A group of researchers at the University of Tennessee recently sent out a survey to R users identified by their commits to repositories, such as GitHub. The results of the full survey can be seen here. The project contributors were identified as “data.table” users or “tidy” users by their inclusion of these R libraries in their projects. Both libraries are an answer to some of the limitations associated with the basic R data frame. In the first installment of this series (found here) we used the survey data to calculate the Net Promoter Score for data.table and tidy. To recap, the Net Promoter Score (NPS) is a measure of consumer enthusiasm for a product or service based on a single survey question – “How likely are you to recommend the brand to your friends or colleagues, using a scale from 0 to 10?” Detractors of the product will respond with a 0-6, while promoters of the product will offer up a 9 or 10. A passive user will answer with a score of 7 or 8. To calculate the NPS, subtract the percentage of detractors from the percentage of promoters. When the percentage of promoters exceeds the percentage of detractors, there is potential to expand market share as the negative chatter is drowned out by the accolades. We were surprised when our survey results indicated data.table had an NPS of 28.6, while tidy’s NPS was double, at 59.4. Why are tidy user’s so much more enthusiastic? What do tidy-lovers “love” about their dataframe enhancement choice? Fortunately, a few of the other survey questions may offer some insights. The survey question shown below asks the respondents how important 13 common factors were when selecting their package. Respondents select a factor-tile, such as “Package’s Historic Reputation”, and drag it to the box that presents the priority that user places on that factor. A user can select/drag as many or as few tiles as they choose.

Pages: 12

6 thoughts on “Coke vs. Pepsi? data.table vs. tidy? Part 2”

Link to bitbucket is dead

Audris Mockus says:

June 28, 2019 at 9:48 am

Many thanks, fixed

Reply

data.table is a, more or less, independent implementation of enhancements to read datasets in R. On the other hand, tidy is a series of R packages that work well with each other, but use common syntax / design that is compatible within the tidyverse and confounds other users.

Audris Mockus says:

June 28, 2019 at 3:55 pm

Absolutely correct, the comparison was made not with the tidyverse but with tidyr and two other packages that most closely resemble the functionality provided by data.table.

Reply

Fun stuff, thanks!

Pingback: Coke vs. Pepsi? data.table vs. tidy? Part 2 – Data Science Austria

By Beth Milhollin, Russell Zaretzki, and Audris Mockus

6 thoughts on “Coke vs. Pepsi? data.table vs. tidy? Part 2”

Leave a Reply Cancel reply