first steps: how to

For those who will laugh at seeing deep learning with one hidden layer and the Iris data set of 150 records, I will say: you’re perfectly right 🙂
The goal at this stage is simply to take the first steps

fit a regression model manually (hard way)

Subject: predict Sepal.Length given other Iris parameters
1st with gradient descent and default hyper-parameters value for learning rate (0.001) and mini batch size (32)
input

data(iris)
xmat <- cbind(iris[,2:4], as.numeric(iris$Species))
ymat <- iris[,1]
amlmodel <- automl_train_manual(Xref = xmat, Yref = ymat)

output

(cost: mse)
cost epoch10: 20.9340400047156 (cv cost: 25.205632342013) (LR:  0.001 )
cost epoch20: 20.6280923387762 (cv cost: 23.8214521197268) (LR:  0.001 )
cost epoch30: 20.3222407903838 (cv cost: 22.1899741289456) (LR:  0.001 )
cost epoch40: 20.0217966054298 (cv cost: 21.3908446693146) (LR:  0.001 )
cost epoch50: 19.7584058034009 (cv cost: 20.7170232035934) (LR:  0.001 )
   dim X: ...

input

res <- cbind(ymat, automl_predict(model = amlmodel, X = xmat))
colnames(res) <- c('actual', 'predict')
head(res)

output

     actual   predict
[1,]    5.1 -2.063614
[2,]    4.9 -2.487673
[3,]    4.7 -2.471912
[4,]    4.6 -2.281035
[5,]    5.0 -1.956937
[6,]    5.4 -1.729314

:-[] no pain, no gain ...

After some manual fine tuning on learning rate, mini batch size and iterations number (epochs):
input

data(iris)
xmat <- cbind(iris[,2:4], as.numeric(iris$Species))
ymat <- iris[,1]
amlmodel = automl_train_manual(
               Xref = xmat, Yref = ymat,
               hpar = list(
                   learningrate = 0.01,
                   minibatchsize = 2^2,
                   numiterations = 30
                   )
               )

output

(cost: mse)
cost epoch10: 5.55679482839698 (cv cost: 4.87492997304325) (LR:  0.01 )
cost epoch20: 1.64996951479802 (cv cost: 1.50339773126712) (LR:  0.01 )
cost epoch30: 0.647727077375946 (cv cost: 0.60142564484723) (LR:  0.01 )
   dim X: ...

input

res <- cbind(ymat, automl_predict(model = amlmodel, X = xmat))
colnames(res) <- c('actual', 'predict')
head(res)

output

     actual  predict
[1,]    5.1 4.478478
[2,]    4.9 4.215683
[3,]    4.7 4.275902
[4,]    4.6 4.313141
[5,]    5.0 4.531038
[6,]    5.4 4.742847

Better result, but with human efforts!

fit a regression model automatically (easy way, Mix 1)

Same subject: predict Sepal.Length given other Iris parameters
input

data(iris)
xmat <- as.matrix(cbind(iris[,2:4], as.numeric(iris$Species)))
ymat <- iris[,1]
start.time <- Sys.time()
amlmodel <- automl_train(
                Xref = xmat, Yref = ymat,
                autopar = list(
                    psopartpopsize = 15, 
                    numiterations = 5, 
                    nbcores = 4
                    )
                )
end.time <- Sys.time()
cat(paste('time ellapsed:', end.time - start.time, '\n'))

output

(cost: mse)
iteration 1 particle 1 weighted err: 22.05305 (train: 19.95908 cvalid: 14.72417 ) BEST MODEL KEPT
iteration 1 particle 2 weighted err: 31.69094 (train: 20.55559 cvalid: 27.51518 )
iteration 1 particle 3 weighted err: 22.08092 (train: 20.52354 cvalid: 16.63009 )
iteration 1 particle 4 weighted err: 20.02091 (train: 19.18378 cvalid: 17.09095 ) BEST MODEL KEPT
iteration 1 particle 5 weighted err: 28.36339 (train: 20.6763 cvalid: 25.48073 )
iteration 1 particle 6 weighted err: 28.92088 (train: 20.92546 cvalid: 25.9226 )
iteration 1 particle 7 weighted err: 21.67837 (train: 20.73866 cvalid: 18.38941 )
iteration 1 particle 8 weighted err: 29.80416 (train: 16.09191 cvalid: 24.66206 )
iteration 1 particle 9 weighted err: 22.93199 (train: 20.5561 cvalid: 14.61638 )
iteration 1 particle 10 weighted err: 21.18474 (train: 19.64622 cvalid: 15.79992 )
iteration 1 particle 11 weighted err: 23.32084 (train: 20.78257 cvalid: 14.43688 )
iteration 1 particle 12 weighted err: 22.27164 (train: 20.81055 cvalid: 17.15783 )
iteration 1 particle 13 weighted err: 2.23479 (train: 1.95683 cvalid: 1.26193 ) BEST MODEL KEPT
iteration 1 particle 14 weighted err: 23.1183 (train: 20.79754 cvalid: 14.99564 )
iteration 1 particle 15 weighted err: 20.71678 (train: 19.40506 cvalid: 16.12575 )
...
iteration 4 particle 3 weighted err: 0.3469 (train: 0.32236 cvalid: 0.26104 )
iteration 4 particle 4 weighted err: 0.2448 (train: 0.07047 cvalid: 0.17943 )
iteration 4 particle 5 weighted err: 0.09674 (train: 5e-05 cvalid: 0.06048 ) BEST MODEL KEPT
iteration 4 particle 6 weighted err: 0.71267 (train: 6e-05 cvalid: 0.44544 )
iteration 4 particle 7 weighted err: 0.65614 (train: 0.63381 cvalid: 0.57796 )
iteration 4 particle 8 weighted err: 0.46477 (train: 0.356 cvalid: 0.08408 )
...
time ellapsed: 2.65109273195267

input

res <- cbind(ymat, automl_predict(model = amlmodel, X = xmat))
colnames(res) <- c('actual', 'predict')
head(res)

output

     actual  predict
[1,]    5.1 5.193862
[2,]    4.9 4.836507
[3,]    4.7 4.899531
[4,]    4.6 4.987896
[5,]    5.0 5.265334
[6,]    5.4 5.683173

It's even better, with no human efforts but machine time
Users on Windows won't benefit from parallelization, the function uses parallel package included with R base...

fit a regression model experimentally (experimental way, Mix 2)

Same subject: predict Sepal.Length given other Iris parameters
input

data(iris)
xmat <- as.matrix(cbind(iris[,2:4], as.numeric(iris$Species)))
ymat <- iris[,1]
amlmodel <- automl_train_manual(
                Xref = xmat, Yref = ymat,
                hpar = list(
                    modexec = 'trainwpso',
                    numiterations = 30,
                    psopartpopsize = 50
                    )
                )

output

(cost: mse)
cost epoch10: 0.113576786377019 (cv cost: 0.0967069106128153) (LR:  0 )
cost epoch20: 0.0595472259640828 (cv cost: 0.0831404427407914) (LR:  0 )
cost epoch30: 0.0494578776185938 (cv cost: 0.0538888075333611) (LR:  0 )
   dim X: ...

input

res <- cbind(ymat, automl_predict(model = amlmodel, X = xmat))
colnames(res) <- c('actual', 'predict')
head(res)

output

     actual  predict
[1,]    5.1 5.028114
[2,]    4.9 4.673366
[3,]    4.7 4.738188
[4,]    4.6 4.821392
[5,]    5.0 5.099064
[6,]    5.4 5.277315

Pretty good too, even better!
But time consuming on larger datasets: where gradient descent should be preferred in this case

fit a regression model with custom cost (experimental way, Mix 2)

Same subject: predict Sepal.Length given other Iris parameters
Let's try with Mean Absolute Percentage Error instead of Mean Square Error
input

data(iris)
xmat <- as.matrix(cbind(iris[,2:4], as.numeric(iris$Species)))
ymat <- iris[,1]
f <- 'J=abs((y-yhat)/y)'
f <- c(f, 'J=sum(J[!is.infinite(J)],na.rm=TRUE)')
f <- c(f, 'J=(J/length(y))')
f <- paste(f, collapse = ';')
amlmodel <- automl_train_manual(
                Xref = xmat, Yref = ymat,
                hpar = list(
                    modexec = 'trainwpso',
                    numiterations = 30,
                    psopartpopsize = 50,
                    costcustformul = f
                    )
                )

output

(cost: custom)
cost epoch10: 0.901580275333795 (cv cost: 1.15936129555304) (LR:  0 )
cost epoch20: 0.890142834441629 (cv cost: 1.24167078564786) (LR:  0 )
cost epoch30: 0.886088388448652 (cv cost: 1.22756121243449) (LR:  0 )
   dim X: ...

input

res <- cbind(ymat, automl_predict(model = amlmodel, X = xmat))
colnames(res) <- c('actual', 'predict')
head(res)

output

     actual  predict
[1,]    5.1 4.693915
[2,]    4.9 4.470968
[3,]    4.7 4.482036
[4,]    4.6 4.593667
[5,]    5.0 4.738504
[6,]    5.4 4.914144

fit a classification model with softmax (Mix 2)

Subject: predict Species given other Iris parameters
Softmax is available with PSO, no derivative needed 😉
input

data(iris)
xmat = iris[,1:4]
lab2pred <- levels(iris$Species)
lghlab <- length(lab2pred)
iris$Species <- as.numeric(iris$Species)
ymat <- matrix(seq(from = 1, to = lghlab, by = 1), nrow(xmat), lghlab, byrow = TRUE)
ymat <- (ymat == as.numeric(iris$Species)) + 0
amlmodel <- automl_train_manual(
                Xref = xmat, Yref = ymat,
                hpar = list(
                    modexec = 'trainwpso',
                    layersshape = c(10, 0),
                    layersacttype = c('relu', 'softmax'),
                    layersdropoprob = c(0, 0),
                    numiterations = 50,
                    psopartpopsize = 50
                    )
                )

output

(cost: crossentropy)
cost epoch10: 0.373706545886467 (cv cost: 0.36117608867856) (LR:  0 )
cost epoch20: 0.267034060152876 (cv cost: 0.163635821437066) (LR:  0 )
cost epoch30: 0.212054571476337 (cv cost: 0.112664100290429) (LR:  0 )
cost epoch40: 0.154158717402463 (cv cost: 0.102895917099299) (LR:  0 )
cost epoch50: 0.141037927317585 (cv cost: 0.0864623836595045) (LR:  0 )
   dim X: ...

input

res <- cbind(ymat, automl_predict(model = amlmodel, X = xmat))
colnames(res) <- c(paste('act',lab2pred, sep = '_'),
 paste('pred',lab2pred, sep = '_'))
head(res)
tail(res)

output

  act_setosa act_versicolor act_virginica pred_setosa pred_versicolor pred_virginica
1          1              0             0   0.9863481     0.003268881    0.010383018
2          1              0             0   0.9897295     0.003387193    0.006883349
3          1              0             0   0.9856347     0.002025946    0.012339349
4          1              0             0   0.9819881     0.004638452    0.013373451
5          1              0             0   0.9827623     0.003115452    0.014122277
6          1              0             0   0.9329747     0.031624836    0.035400439

    act_setosa act_versicolor act_virginica pred_setosa pred_versicolor pred_virginica
145          0              0             1  0.02549091    2.877957e-05      0.9744803
146          0              0             1  0.08146753    2.005664e-03      0.9165268
147          0              0             1  0.05465750    1.979652e-02      0.9255460
148          0              0             1  0.06040415    1.974869e-02      0.9198472
149          0              0             1  0.02318048    4.133826e-04      0.9764061
150          0              0             1  0.03696852    5.230936e-02      0.9107221

change the model parameters (shape ...)

Same subject: predict Species given other Iris parameters
1st example: with gradient descent and 2 hidden layers containing 10 nodes, with various activation functions for hidden layers
input

data(iris)
xmat = iris[,1:4]
lab2pred <- levels(iris$Species)
lghlab <- length(lab2pred)
iris$Species <- as.numeric(iris$Species)
ymat <- matrix(seq(from = 1, to = lghlab, by = 1), nrow(xmat), lghlab, byrow = TRUE)
ymat <- (ymat == as.numeric(iris$Species)) + 0
amlmodel <- automl_train_manual(
                Xref = xmat, Yref = ymat,
                hpar = list(
                    layersshape = c(10, 10, 0),
                    layersacttype = c('tanh', 'relu', ''),
                    layersdropoprob = c(0, 0, 0)
                    )
                )

nb: last activation type may be left to blank (it will be set automatically)

2nd example: with gradient descent and no hidden layer (logistic regression)
input

data(iris)
xmat = iris[,1:4]
lab2pred <- levels(iris$Species)
lghlab <- length(lab2pred)
iris$Species <- as.numeric(iris$Species)
ymat <- matrix(seq(from = 1, to = lghlab, by = 1), nrow(xmat), lghlab, byrow = TRUE)
ymat <- (ymat == as.numeric(iris$Species)) + 0
amlmodel <- automl_train_manual(
                Xref = xmat, Yref = ymat,
                hpar = list(
                    layersshape = c(0),
                    layersacttype = c('sigmoid'),
                    layersdropoprob = c(0)
                    )
                )