Machine Learning

Machine Learning Posts - Digest 05

I came across a few nice applications-related posts. Machine Learning Music Composed From Re-Synthesized Fragments From 100s Of Terabytes Of LA Phil Recordings: This posts has “a new high def version of the dazzling 3D video/AI-driven performance displayed on the Walt Disney Concert Hall last year.” AI-driven music is nothing new. About 3 years ago I showed a video of computer algorithm creating fantastic music to my students and some of them became upset!

Machine Learning Posts - Digest 04

Translating Between Statistics and Machine Learning Summary: If you are like me, who has been trained in statistics and econometrics, not all the terminology used in machine learning is easily understandable. I think that machine learning guys are good marketers and they know how to name their techniques! For example, creating plain vanilla ‘dummy variables’ becomes ‘one hot encoding’ in machine learning :) There are some confusing things too. In statistics bias typically refers to the bias in the estimates.

Machine Learning Posts - Digest 03

In this week’s digest I am posting NLP related articles. Detecting Sarcasm with Deep Convolutional Neural Networks: This article talks about a paper from 2017 that used Twitter data to build a deep learning model for sarcasm detection. I found that there is another more recent paper [PDF] that does sarcasm detection. An NLP Approach to Mining Online Reviews using Topic Modeling (with Python codes): This is a simple tutorial that does topic modeling on online reviews.

Machine Learning Posts - Digest 02

This week’s articles: A 60-Minutes Course on Fairness in Machine Learning Summary: The course focuses on the bias in machine learning because of humans! I think this is an important area of work. Model-Based Machine Learning Book Summary: This is actually not an article but an entire book. I have read a few pages of the book but I am not at a point where I can summarize anything!

Machine Learning Posts - Digest 01

Whenever I get time, I am going to post articles on machine learning that I read during a week. I thought today is a good day to start doing it. Using machine learning to index text from billions of images Summary: The article describes how Dropbox built a system to index images based on the text in those images. Dropbox used TensorFlow. Rosetta: Understanding text in images and videos with machine learning Summary: From the article - “[Rosetta] extracts text from more than a billion public Facebook and Instagram images and video frames (in a wide variety of languages), daily and in real time, and inputs it into a text recognition model that has been trained on classifiers to understand the context of the text and the image together.

DA 6813 Homework Solution

This note is pretty old. I have modified this note using dplyr package but still plenty of the code is still base R. The original note is available here: This file is just a small part of the original file. The original homework questions are available here: library(dplyr) library(here) Get the data in red <- read.csv(here::here("static", "data", "winequality-red.csv"), stringsAsFactors = F) red$wine <- "red" white <- read.csv(here::here("static", "data", "winequality-white.

Intuition behind Cross-Validation

Cross-validation error is an estimate of the out-of-sample error. Cross-validation is a great tool for helping modelers select a model with low out-of-sample error. The objective of this note is to show you how to write simple code to carry out cross-validation in R. I will post similar code for SAS later. K-fold cross-validation involves splitting the sample in K equal and independent subsamples (i.e., there is no overlap in the subsamples).