2024 – Page 73

Let’s consider three different approaches and how to implement them in the caret package. We need to create a correlation matrix of all the features and then identify the highly correlated ones, usually those with a correlation coefficient greater than 0.75: We need to create a training scheme to control the parameters for train, use it to build a selected model, and then estimate the variable importance for that model: One of the most popular methods provided by caret for automatically selecting the optimal features is a backward selection algorithm called Recursive Feature Elimination (RFE). We need to compute the control using a selected resampling method and a predefined list of functions, apply the RFE algorithm passing to it the features, the target variable, the number of features to retain, and the control, and then extract the selected predictors: