R Coding

San Antonio Home Sales

Using Rstudio Projects

In this post I will show how to create an Rstudio project and manage it. My main reason for working in such projects is that I am able to manage all my code and content from one single folder on the computer. I also use here package to manage links in the code. For this tutorial you will need RStudio installed on your computer. The operating system doesn’t matter. I am using my MacBook Pro to record the videos.

Some ggplot2 Features

In this post, I am covering a few things that I didn’t touch upon in the class previously. For this we will use mpg data from ggplot2 package. If you want to know more about the data and the variables, run the following command in your R console: ?ggplot2::mpg Get structure of the mpg data head(mpg) ## # A tibble: 6 x 11 ## manufacturer model displ year cyl trans drv cty hwy fl class ## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr> ## 1 audi a4 1.

Extending ggplot2

There are numerous ways in which you can extend ggplot2 package. In this post, I am going to talk about 3 packages that are relevant to us immediately. Extrafont The first package is extrafont, which enables importing font files from your computer to R. You will have to do this only once after you install the package and then in future whenever you want to use different fonts, you can simply call them by name in ggplot2.

Intuition behind Cross-Validation

Cross-validation error is an estimate of the out-of-sample error. Cross-validation is a great tool for helping modelers select a model with low out-of-sample error. The objective of this note is to show you how to write simple code to carry out cross-validation in R. I will post similar code for SAS later. K-fold cross-validation involves splitting the sample in K equal and independent subsamples (i.e., there is no overlap in the subsamples).

Celebrating India's Decriminalization of Homosexuality

I woke up to great news today! Five judges of the Supreme Court of India have unanimously decrimialized homosexuality. A few months back, I made a simple t-shirt design using R. That time, it was an R exercise for me and I didn’t share it with many people. This is my small gift to LGBTQ Indians. Although I have made this using “Om”, many religious symbol are possible if you know the correct character in Wingdings font.

Installing R and RStudio

This a short tutorial for the incoming students of UTSA’s MS in Data Analytics program. I am going to assume that the reader has no knowledge of R and RStudio, the Integrated Development Environment (IDE), which we use to code. If you are a Mac user and you are comfortable with command line tools using Terminal, I suggest taking more systematic route for preparing your MacOS for R installation using Homebrew.

Syllabus for DA6233

Online Appendix

This post accompanies the article “Rejoinder to ‘Endogeneity bias in marketing research: Problem, causes and remedies’” in Industrial Marketing Management (IMM). We restate some sections of the main text for completeness in this document. Throughout we include R code to run the 2SLS estimations, create graphs, and generate the datasets. Model setting Before we start working on simulated data, let’s understand how much bias we expect if we use each of OLS, ZKHL, and 2SLS.

Speed comparison of rbind, bind_rows, and rbindlist

I often need to create a list consisting of several data frames. A simple example is when you read an Excel file with multiple worksheets. Rather than reading the sheets one at a time and row binding them as you go, it’s often faster to read all the sheets into a list as separate data frames and then row bind them all at once. Another example is when you are storing data frames as they are returned by a website such as Facebook.