This function will be in the rafikisol R package, however, I have not loaded it yet as there are some more functions I want to add. Nevertheless, we will make it into a function called calcPICP(), which was basically taken from the book “Using R for Digital Soil Mapping” by Malone et al., (2017).
The function takes 3 parameters, x = a data frame of the data, response = the vector of the measured data (e.g., data$response), and pred = the predicted value (e.g., data$predicted).
calcPICP = function(data, response, pred){
#We first get the residuals of the model
res = response - pred
#Then we get the standard deviation of the residuals and combine with the data.
data$stdev = sd(res)
#We than make a series of quantiles from a normal cumulative distribution.
qp <- qnorm(c(0.995, 0.9875, 0.975, 0.95, 0.9, 0.8, 0.7, 0.6, 0.55, 0.525))
#Then make a matrix the with the row length of the data and columns of qp
vMat <- matrix(NA, nrow = nrow(data), ncol = length(qp))
#Now we must loop around the quantiles and multiply it by the standard deviation to get a series of standard errors with different prediction intervals.
for(i in 1:length(qp)){
vMat[, i] <- data$stdev * qp[i]
}
#Make another matrix same as before for the upper limits
uMat <- matrix(NA, nrow = nrow(data), ncol = length(qp))
#We calculate the upper limits by adding the series of standard errors to the predictions of the model.
for(i in 1:length(qp)) {
uMat[, i] <- pred + vMat[, i]
}
#We make another matrix for the lower limits
lMat <- matrix(NA, nrow = nrow(data), ncol = length(qp))
#We calculate the lower limits by subtracting the series from the predicted values.
for(i in 1:length(qp)) {
lMat[, i] <- pred - vMat[, i]
}
#Now we want to see which prediction intervals cover the measured data creating a matrix of 1s and 0s.
bMat <- matrix(NA, nrow = nrow(data), ncol = length(qp))
for(i in 1:ncol(bMat)){
bMat[, i]<-as.numeric(response <= uMat[, i] &
response >= lMat[, i])
}
#To calculate the PICP we take the colsums/nrow*100 for the matrix of 0s and 1s
picp <- colSums(bMat)/nrow(bMat)*100
#Make a vector of confidence levels
cl <- c(99, 97.5, 95, 90, 80, 60, 40, 20, 10, 5)
#We put into a data frame for plotting
results <- data.frame(picp = picp, cl = cl)
#Since we want PICP to CI to be a 1:1 line we also calculate Lin’s concordance correlation coefficient (CCC) with the yardstick R package.
ccc <- as.data.frame(yardstick::ccc_vec(results$picp, results$cl))
#Make name correct
names(ccc) = "CCC" #name
#must add axis values for plotting
ccc$x = 10 #x axis
ccc$y = 90 #y axis
#Now we can plot the PICP to CI, add the 1:1 line and the CCC
p = ggplot(data = results, aes(x= cl, y = picp))+ #add data
geom_point()+ #add points
geom_text(data = ccc,aes(x= x, y =y, label = paste("CCC = ",round(CCC, 2))))+ #add CCC value
geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = 'red')+ #add 1:1 line
labs(x = 'Confidence level', y = "PICP", title = "PICP to confidence level")+ #labels
theme_bw() #make it look good
#Now we want to return a list of the plot as well as a data frame of the total results.
return(setNames(list(p, results), c("Plot", "Results")))
}
Now we have the function giving us a plot of the PICP to CI and results. This is useful when running many models and now we can just plug in the data.
picp = picpCalc(dat, dat$clay, dat$pred)
now, plot the data.
picp[[1]]