R-posts.com

How to generate data from a model – Part 1

Interested in publishing a one-time post on R-bloggers.com? Press here to learn how.


Summary

Traditionally, data scientists have built models based on data. This article details how to do the exact opposite i.e. generate data based on a model. This article is first in the series of articles on building data from model. 

Motivation & Practical Applications

Businesses across various industry domains are embracing Artificial Intelligence(AI) and Machine Learning(ML) driven solutions. Furthermore, it is observed that the recent increase in the cloud based Machine Learning Operations (ML Ops) tools such Azure ML has made AI/ML solutions relatively more accessible,  easy to deploy and in some cases more affordable. Additionally, it is also observed that there is an increase in the usage of Auto ML & feature engineering packages. These approaches reduce manual intervention during model build and retraining stages.  Since the focus is predominantly on building ML pipelines as opposed to the traditional approach of building models manually, the robustness of the pipelines needs to be inspected. This is still an evolving field and currently is being handled by model observability tools. This article proposes one such method of observability. The purpose of this method can be best represented in the form of a question as given below.

What if we built the underlying data distributions, the outliers, the dependent variable and then put it through the ML Ops pipeline?  Wouldn’t we know where the pipeline worked well and where it could have done better?
This question motivated the build of a Software as a Service (SaaS) product called uncovr. This product can now be accessed through an R package conjurer by following the steps outlined below.


Data from Model Using R

Step 1: Register for the API

Step 2: Install R Package Conjurer

Install the latest version of the package from CRAN as follows.
install.packages("conjurer")

Step 3: Generate data

Generate the data using the code below.

library(conjurer)
uncovrJson <- buildModelData(numOfObs = 1000, numOfVars = 3, key = "input your subscription key here")
df <- extractDf(uncovrJson=uncovrJson)

The above code has two steps. The first step is to connect to the API and source the data in JSON  format. The second step is to convert the JSON format to an R dataframe.

The components of the function buildModelData are as follows.
The data frame df (in the code above) will have three columns with the names iv1, iv2, iv3 and one column dv. The columns with prefix iv are the independent variables while the dv is the dependent variable. You can rename them to suit your needs. 
The model used in the current version to generate the data is a linear regression model. The details of the model formula and its estimated performance can be inspected as follows. 

Concluding Remarks

The underlying API uncovr is under development and as new functionality is released, the R package conjurer will be updated to reflect those changes. For any feature requests or bug reports, please follow the contribution guidelines on GitHub repository. If you would like to follow the future releases and news, please follow our LinkedIn page.
Exit mobile version