Machine Learning

DA 6813 Homework Solution

This note is pretty old. I have modified this note using dplyr package but still plenty of the code is still base R. The original note is available here: This file is just a small part of the original file. The original homework questions are available here: library(dplyr) library(here) Get the data in red <- read.csv(here::here("static", "data", "winequality-red.csv"), stringsAsFactors = F) red$wine <- "red" white <- read.csv(here::here("static", "data", "winequality-white.

Intuition behind Cross-Validation

Cross-validation error is an estimate of the out-of-sample error. Cross-validation is a great tool for helping modelers select a model with low out-of-sample error. The objective of this note is to show you how to write simple code to carry out cross-validation in R. I will post similar code for SAS later. K-fold cross-validation involves splitting the sample in K equal and independent subsamples (i.e., there is no overlap in the subsamples).