Ebook launch – Simple Data Science (R)

Simple Data Science (R) covers the fundamentals of data science and machine learning. The book is beginner-friendly and has detailed code examples. It is available at scribd.

cover image

Topics covered in the book –
  • Data science introduction
  • Basic statistics
  • Graphing with ggplot2 package
  • Exploratory Data Analysis
  • Machine Learning with caret package
  • Regression, classification, and clustering
  • Boosting with lightGBM package
  • Hands-on projects
  • Data science use cases

Time to upskill in R? EARL’s workshop lineup has something for every data practitioner.

It’s well-documented that data skills are in high demand, making the industry even more competitive for employers looking for experienced data analysts, data scientists and data engineers – the fastest-growing job roles in the UK. In support of this demand, it’s great to see the government taking action to address the data skills gap as detailed in their newly launched Digital Strategy.

The range of workshops available at EARL 2022 is designed to help data practitioners extend their skills via a series of practical challenges. Led by specialists in Shiny, Purrr, Plumber, ML and time series visualisation, you’ll leave with tips and skills you can immediately apply to your commercial scenarios.

The EARL workshop lineup.


Time Series Visualisation in R.

How does time affect our perception of data? Is the timescale important? Is the direction of time relevant? Sometimes cumulative effects are not visible with traditional statistical methods, because smaller increments stay under the radar. When a time component is present, it’s likely that the current state of our problem depends on the previous states. With time series visualisations we can capture changes that may otherwise go undetected. Find out more.

Explainable Machine Learning.

Explaining how your ML products make decisions empowers people on the receiving end to question and appeal these decisions. Explainable AI is one of the many tools you need to ensure you’re using ML responsibly. AI and, more broadly, data can be a dangerous accelerator of discrimination and biases: skin diseases were found to be less effectively diagnosed on black skin by AI-powered software, and search engines advertised lower-paid jobs to women. Staying away from it might sound like a safer choice, but this would mean missing out on the huge potential it offers. Find out more.

Introduction to Plumber APIs.

90% of ML models don’t make it into production. With API building skills in your DS toolbox, you should be able to beat this statistic in your own projects. As the field of data science matures, much emphasis is placed on moving beyond scripts and notebooks and into software development and deployment. Plumber is an excellent tool to make the results from your R scripts available on the web. Find out more.

Functional Programming with Purrr.

Iteration is a very common task in Data Science. A loop in R programming is of course one option – but purrr (a package from the tidyverse) allows you to tackle iteration in a functional way, leading to cleaner and more readable code. Find out more.

How to Make a Game with Shiny.

Shiny is only meant to be used to develop dashboards, right? Or is it possible to develop more complex applications with Shiny? What would be the main limitations? Could R and Shiny be used as a general-purpose framework to develop web applications? Find out more.

Sound interesting? Check out the full details – our workshops spaces traditionally go fast so get yourself and your team booked in while there are still seats available. Book your Workshop Day Pass tickets now.

Why this is the year you should take the stage at EARL 2022…

EARL is Europe’s largest R community event dedicated to showcasing commercial applications of the R language. As a conference, it has always lived up to its promise of connecting and inspiring R users with creative suggestions and solutions, sparking new ideas, solving problems and sharing perspectives to advance the community. 

2022 marks the return of face-to-face EARL (6th – 8th September at the Tower Hotel in London) – now run by Ascent, the new home of Mango Solutions. Over the past eight years, EARL has attracted some fascinating presentations from some engaging, authentic speakers, both experienced and first timers. This year, we’re keen to understand how recent global events and trends that have disrupted our view of ‘normal’ have impacted, changed or driven your R projects: from inspirational innovation to reducing operational cost and creating richer customer experiences. If you have an interesting application of R, our call for abstracts is now open and we’re inviting you to share your synopsis with us. Deadline for submissions is Thursday 30th June.  Maybe you’ve built a Shiny app that helps detect bias, or you’ve been on a data journey you’d like to share. Perhaps you’ve built a data science syllabus for young minds or created an NLP tool to automate clinical processes. If you are searching for inspiration, potential applications of R might come under the following categories:
  • Responding to global events with R
  • The role of R in the business data science toolbox
  • Overcoming the challenges of using R commercially
  • Efficient R: dealing with huge data
  • Sustainable R / R for good
  • R tools & packages (eg. Shiny R, Purrr)
  • Building your R community
  • Women in R
  • The future of R in enterprise: 2022 and beyond
We are also looking for short form submissions: 10-minute lightning talks on a wide range of applications.

What’s presenting at EARL really like?  

We asked our 2019 presenters what prompted their decision to speak at our last in-person EARL and their advice to others who may be considering submitting an abstract for EARL 2022. For Mitchell Stirling, Capacity and Modelling Manager at Heathrow Airport, the opportunity to present helped fulfil a professional ambition. “I discussed with my line manager, slightly tongue in cheek, that it should be an ambition in 2019 when he signed off a conference attendance in Scotland the previous year. As the work I’d been doing developed in 2019 and the opportunity presented itself, I started to think “why not?” – this is interesting and if I can show it interestingly, hopefully others would agree. I was slightly wary of the technical nature of the event, with my exposure to coding in R still better measured in minutes than hours (never mind days) but a reassurance that people would be interested in the ‘what’ and ‘why’ as well as the ‘how’, won me over.”  Dr Zhanna Mileeva, a Data Scientist at NBrown Group confirmed that making a contribution to the data science community was an important factor in her decision to submit an abstract: “After some research I found the EARL conference as a great cross-sector forum for R users to share Data Science, AI and ML engineering knowledge, discuss modern business problems and pathways to solutions. It was a fantastic opportunity to contribute to this community, learn from it and re-charge with some fresh ideas.” In past years EARL has attracted speakers from across the globe and last year, Harold Selman, Lead Data Scientist at Ordina (NL) came from the Netherlands to speak at the conference. “I knew the EARL conference as a visitor and had given some presentations in The Netherlands, so I decided to give it a shot. The staff of the EARL conference are very helpful and open to questions, which made being a speaker very pleasant.”  Some of our presenters have enjoyed the experience so much they have presented more than once. Chris Billingham, Lead Data Scientist at Manchester Airport Group’s Digital Agency MAG-O, is one such speaker. “I’ve had the good fortune to present twice at EARL.  I saw it as an opportunity to challenge myself to present at the biggest R conference in the UK.” 

How to submit your abstract. 

Feeling inspired? You can find the abstract submission form on our website. Here’s our recommendations for a successful submission.
  • Topic: Your topic can relate to any real-world application of R. We aim to represent a range of industry sectors and a balance of technical and strategic content.
  • Clarity: The talk synopsis should provide an overview of the topic and why you believe it will be of interest or resonate with the audience. We suggest an introduction or problem statement alongside any supporting facts that determine the talk objectives or expected takeaways.
  • Storytelling: Aim to demonstrate how the tools and techniques you used helped to transform and translate value with a clear and compelling narrative.
  • Approval: Before you submit, it’s a good idea to ensure your application has been approved by your wider organisation and or team.
  • Novel: Is the application particularly new or innovative? If your application of R is new or distinctive and not widely written about in the industry, please provide as much supporting information as you can for review purposes.
  • Target audience: 34% of our attendees are R practitioners and 46% of delegates typically have senior or leadership roles – consider the alignment of your proposal with these audiences.
We hope these hints and tips have been helpful – but feel free to get in touch if you have any questions by contacting [email protected]. 

EARL your way: book your tickets now!

Your EARL tickets are now live to purchase here. Offering you every possible EARL ticket combination, here is a quick summary of what you can expect. You can simply choose a 3-day jam-packed conference pass or a 1 or 2-day option to customise an itinerary that works for you.

Grab your EARLy bird tickets right away – limited for a period of 2 weeks and 2 weeks only, we are delighted to be offering an unlimited amount of tickets ranging from 15-25% discount on all ticket options, depending if you are NHS, not for profit or an academic.

Team networking.

Why not bring your colleagues along for a much needed team social at the largest commercial R event in the UK? Offering lots of networking opportunities from brands in similar markets – there will be plenty of time to swap market experiences, over coffee, at lunch or at our evening reception. We are certainly proud to be a part of such an enthusiastic community.

Full or half day workshop on day 1.

We are running a 1-day series of workshops to kick off EARL on 6th September, covering all areas of R from explainable machine learning, to time series visualisation, functional programming with purr, an introduction to plumber APIs to having some fun and making games in Shiny. There is plenty of choice with morning and afternoon sessions agenda.

Full conference pass.

Our all-access pass to EARL gives you full access to a 1-day workshop, full 2-day conference pass and access to the evening reception at the unforgettable Drapers Hall on day 2 – the former home of Henry VIII. We have got an impressive line-up of keynotes including mathematician, science presenter and all-round badass – Hannah Fry, Top 100 Global Innovator in Data & Analytics – Harry Powell and the unmissable Financial Times columnist John Burn-Murdock. To add to this excitement, we have approved used cases from Bumble, Samaritans, BBC, Meta, Bank of England, Dogs Trust, NHS, and partners RStudio alongside many more.

1 or 2-day conference pass.

If you would like access to the keynotes, session talks and abundance of networking opportunities, you can choose from a 1 or 2-day pass aligned to your areas of interest. The 2-day conference pass gives you access to the main evening reception.

Evening reception.

This year we have opted for an unforgettable experience at Drapers Hall (the former home of Henry VIII), where you will get the ability to network with colleagues, delegates and speakers over drinks, canapes, and dinner in unforgettable surroundings. Transport is provided in a provide London red bus transfer. This year promises an unforgettable experience, with a heavy weight line up, use cases from leading brands and the opportunity at last to share and network to your heart’s content. We look forward to meeting you. Book your tickets now.

55,000 in Awards for Energy & Buildings Hackathon, Sponsored by NYSERDA

The New York State Energy Research & Development Agency (NYSERDA) is partnering with Onboard Data to host a $55,000 Global Energy & Buildings Hackathon. We’re inviting all engineers, data scientists and software developers whether they are professionals, professors, researchers or students to participate. More below…


Challenge participants will propose exciting, new ideas that can improve our world’s buildings. The hackathon will share data from 200+ buildings to participants. This data set is rich and one of a kind. The data set is normalized from equipment, systems and IoT devices found within buildings.
We seek submissions that positively impact or accelerate the decarbonization of New York State buildings. 

Total awards are $55,000. Sign-ups stay open until April 15th and the competition is open from April 22nd to May 30th. More can be found here: www.rtemhackathon.com.

Advance the next generation of building technology!

Download recently published book – Learn Data Science with R

Learn Data Science with R is for learning the R language and data science. The book is beginner-friendly and easy to follow. It is available for download as pay what you want. The minimum price is 0 and the suggested contribution is rs 1300 ($18). Please review the book at Goodreads.

book cover

The book topics are –
  • R Language
  • Data Wrangling with data.table package
  • Graphing with ggplot2 package
  • Exploratory Data Analysis
  • Machine Learning with caret package
  • Boosting with lightGBM package
  • Hands-on projects

New R textbook for machine learning

Mathematics and Programming for Machine Learning with R -Chapter 2 Logic

Have a look at the FREE attached pdf of Chapter 2 on Logic and R from my recently published textbook,

Mathematics and Programming for Machine Learning with R: From the Ground Up, by William B. Claster (Author)
~430 pages, over 400 exercises.Mathematics and Programming for Machine Learning with R -Chapter 2 Logic
We discuss how to code machine learning algorithms in R but start from scratch. The first 4 chapters cover Logic, Sets, Probability, Functions. I am sharing Chapter 2 here on Logic and R here and will also probably release chapters 9 and 10 on Math for Neural Networks shortly. The text is on sale at Amazon here:
https://www.amazon.com/Mathematics-Programming-Machine-Learning-R-dp-0367507854/dp/0367507854/ref=mt_other?_encoding=UTF8&me=&qid=1623663440

I will try to add an errata page as well.

DN Unlimited 2020: Europe’s largest data science gathering | Nov 18 – 20 online

  • The DN Unlimited Conference will take place online for the first time this year
  • More than 100 speakers from the fields of AI, machine learning, data science, and technology for social impact, including from The New York Times, IBM, Bayer, and Alibaba Cloud
  • Fully remote networking opportunities via a virtual hub

Europe’s largest data science community launches new digital platform for this year’s conference

The Data Natives Conference, Europe’s biggest data science gathering, will take place virtually and invite data scientists, entrepreneurs, corporates, academia, and business innovation leaders to connect on November 18-20, 2020. The conference’s mission is to connect data experts, inspire them, and let people become part of the equation again. With its digital networking platform, DN Unlimited expects to reach a new record high with 5000+ participants. Visitors can expect keynotes and panels from the industry experts and a unique opportunity to start on new collaborations during networking and matchmaking sessions. In 2019, the sold-out Data Natives conference gathered over 3000 data, technology professionals and decision-makers from over 30 countries, including 29 sponsors, 45 community and media partners, and 176 speakers.The narrative of DN Unlimited Conference 2020 focuses on assisting the digital transformation of businesses, governments, and communities by offering a fresh perspective on data technologies – from empowering organizations to revamp their business models to shedding light on social inequalities and challenges like Climate Change and Healthcare accessibility.

Data science, new business models and the future of our society

In spring 2020, the Data Natives community of 80.000 data scientists mobilised to tackle the challenges brought by the pandemic – from the shortage of medical equipment to remote care – in a series of Hackcorona and EUvsVirus hackathons. Through the collaboration of governments such as the Greek Ministry for Digital Governance, institutions such as the Charité and experts from all over Europe, over 80 data-driven solutions have been developed. DN Unlimited conference will continue to facilitate similar cooperation.

The current crisis demonstrates that only through collaboration, businesses can thrive.

While social isolation may be limiting traditional networking opportunities, we are more equipped than ever before to make connections online. “…The ability to connect to people and information instantly is so common now. It’s just the beginning of an era of even more profound transformation. We’re living in a time of monumental change. And as the cloud becomes ambiguous, it’s literally rewriting entire industries” – says Gretchen O’Hara, Microsoft VP; DN Unlimited & Humanaize Open Forum speaker.

The crisis has called for a digital realignment from both companies and institutions. Elena Poughia, the Founder of Data Natives, perceives the transformation as follows: “It’s not about deploying new spaces via data or technology – it’s about amplifying human strengths. That’s why we need to continue to connect with each other to pivot and co-create the solutions to the challenges we’re facing. These connections will help us move forward.” 

The DN Unlimited Conference will bring together data & technology leaders from across the globe – Christopher Wiggins (Chief Data Scientist, The New York Times), Lubomila Jordanova (CEO & Founder, Plan A), Angeli Moeller (Bayer AG, Head Global Data Assets), Jessica Graves (Founder & Chief Data Officer, Sefleuria) and many more will take on the virtual stages to talk about the growing urge for global data literacy, resources for improving social inequality and building a data culture for agile business development. 

On stage among others:
  • Erika Cheung, Executive Director, Ethics in Entrepreneurship
  • Cory Doctorow, Science Fiction Author, Activist, and Journalist.
  • Whurley (William Hurley), Eisenhower Fellow, a Senior Member of the Institute of Electrical and Electronics Engineers (IEEE)
  • Alistair Croll, Founder, Solve for Interesting
  • Clare Jones, Chief Commercial Officer, what3words
  • Mark Turrell, Orcasci, CEO
  • And many more

Climate Change & AI for GOOD | Online Open Forum Oct 15th

Join Data Natives for a discussion on how to curb Climate Change and better protect our environment for the next generation. Get inspired by innovative solutions which use data, machine learning and AI technologies for GOOD. Lubomila Jordanova, Founder of Plan A, and featured speaker, explains that “the IT sector will use up to 51% of the global energy output in 2030. Let’s adjust the digital industry and use Data for Climate Action, because carbon reduction is key to making companies future-proof.” When used carefully, AI can help us solve some of the most serious challenges. However, key to that success is measuring impact with the right methods, mindsets, and metrics.

The founders of startups that developed innovative solutions to combat humanity’s biggest challenge, will share their experiences and thoughts: Brittany Salas (Co-Founder at Active Giving) Peter Sänger (Co-Founder/Executive Managing Director at Green City Solutions GmbH) Shaheer Hussam (CEO & Co-Founder at Aetlan) | Lubomila Jordanova (Founder at Plan A)  Oliver Arafat (Alibaba Cloud’s Senior Solution Architect)

Details
What? Climate Change & AI for GOOD | DN Unlimited Open Forum powered by Alibaba Cloud
When? October 15th at 6 PM CET
Where? Online, worldwide
Register for FREE here: https://datanatives.io/climate-change-ai-for-good-open-forum/

Lyric Analysis with NLP and Machine Learning using R: Part One – Text Mining

June 22
By Debbie Liske

This is Part One of a three part tutorial series originally published on the DataCamp online learning platform in which you will use R to perform a variety of analytic tasks on a case study of musical lyrics by the legendary artist, Prince. The three tutorials cover the following:


Musical lyrics may represent an artist’s perspective, but popular songs reveal what society wants to hear. Lyric analysis is no easy task. Because it is often structured so differently than prose, it requires caution with assumptions and a uniquely discriminant choice of analytic techniques. Musical lyrics permeate our lives and influence our thoughts with subtle ubiquity. The concept of Predictive Lyrics is beginning to buzz and is more prevalent as a subject of research papers and graduate theses. This case study will just touch on a few pieces of this emerging subject.



Prince: The Artist

To celebrate the inspiring and diverse body of work left behind by Prince, you will explore the sometimes obvious, but often hidden, messages in his lyrics. However, you don’t have to like Prince’s music to appreciate the influence he had on the development of many genres globally. Rolling Stone magazine listed Prince as the 18th best songwriter of all time, just behind the likes of Bob Dylan, John Lennon, Paul Simon, Joni Mitchell and Stevie Wonder. Lyric analysis is slowly finding its way into data science communities as the possibility of predicting “Hit Songs” approaches reality.

Prince was a man bursting with music – a wildly prolific songwriter, a virtuoso on guitars, keyboards and drums and a master architect of funk, rock, R&B and pop, even as his music defied genres. – Jon Pareles (NY Times)
In this tutorial, Part One of the series, you’ll utilize text mining techniques on a set of lyrics using the tidy text framework. Tidy datasets have a specific structure in which each variable is a column, each observation is a row, and each type of observational unit is a table. After cleaning and conditioning the dataset, you will create descriptive statistics and exploratory visualizations while looking at different aspects of Prince’s lyrics.

Check out the article here!




(reprint by permission of DataCamp online learning platform)