Image Credit: Pixabay

Correlation between Indian and US Stock Markets

I invest (modestly) in both US and Indian stock markets. For the last few days I observed that on the days when US market was down, the Indian market was not necessarily down. This is somewhat unexpected given my past experience (I used to work in the financial sector back in India). When I posted my observation on Facebook, people asked me for more concrete evidence for a lack of this correlation. In this post, I tackle that task. We will follow the steps below:

  1. Get daily data on stock indexes in India and the US. We will use Sensex published by the Bombay Stock Exchange (BSE) and Standard and Poor’s 500 stock index in the US (S&P 500).

  2. Clean up the data and overcome the challenges of merging two data sets when there is a substantial time difference between the two countries. Also, the stock markets are closed on different holidays and these are not common to the two countries.

  3. Compute correlations and create visualizations

If you want to check out the final result, scroll to the bottom of this page. The rest of the post will give you the R code necessary to create these visualizations.

Also if you are curious, there is indeed very low correlation between BSE Sensex and S&P 500 in 2018. However, this correlation is higher when we use a different metric of stock returns.

Let’s load all the libraries we will use for this post


1. Getting Stock Indexes Data

BSE Sensex is obtained directly from BSE’s website:

S&P 500 index is obtained from Yahoo Finance:

I saved the CSV files on my hard disk and then read them in R. In the code below replace the text “File Path/Your File Name” with your own file path.

In the BSE Sensex file, the variable names were giving me some trouble while reading, so I decided not to read them. Accordingly, R will simply use generic V1, V2,…etc. as the variable names.

bse <- read.csv('File Path/Your File Name',
                skip = 1, header = FALSE, stringsAsFactors = FALSE)

snp <- read.csv('File Path/Your File Name',
                stringsAsFactors = FALSE)

For BSE data, I also had a spurious V6 column, which I am deleting in the code below. I rename the variables to match what’s in the csv file. As Date is read as character, I covert that to an R date using dmy() function from lubridate package. I create bse_ret and bse_ret2 using standard formula:

\[ StkRet_t = \frac{Price_t - Price_{t-1}}{Price_{t-1}}\]

In case of bse_ret I use the closing prices for both \(Price_{t-1}\) and \(Price_t\). In case of bse_ret2, I use closing price for \(Price_{t-1}\) but opening price for \(Price_t\). This is because latter is going to capture the effect of what happened in the US markets better than the former. I will explain this more using a timeline as we go forward.

bse <- bse %>% 
  dplyr::select(-V6) %>% 
  rename(Date = V1, Open = V2, High = V3, Low = V4, Close = V5) %>% 
  mutate(Date = lubridate::dmy(Date),
         bse_ret = 100 * (Close - lag(Close)) / lag(Close),
         bse_ret2 = 100 * (Open - lag(Close)) / lag(Close)) %>% 
  select(Date, bse_ret, bse_ret2) # Keep only 3 variables

I will do similar operations on the S&P 500 index. However, here I am not going to compute the second type of reurn because we are going to assume that US markets lead Indian markets. Given the sizes of these two markets, it’s a fair assumption.

snp <- snp %>% 
  mutate(Date = lubridate::ymd(Date),
         snp_ret = 100 * (Close - lag(Close)) / lag(Close)) %>% 
  select(Date, snp_ret) # Keep only these two variables


US east coast is 11 hours 30 minutes behind India currently. The trading hours for New York Stock Exchange (NYSE) on normal days are 9:30 am to 4:00 pm and for Bombay Stock Exchange (BSE) are 9:00 am to 3:30 pm. Actual trading on BSE starts at 9:15 am but institutional traders can start putting their orders at 9:00 am. This doesn’t make any difference to our argumentation, however. The following figure shows the timeline as of the last week.

Note that when BSE opens on December 21, 2018, 9:00 am, that’s actually December 20, 2018, 9:30 pm in NY.

2. Merge Indexes

Merging the two indexes is complex. We can just lag BSE Sensex by 1 day and merge with S&P 500 but it is going to introduce errors because of two reasons -

  1. Due to the time difference, the weekends create trouble. When it’s Friday in NYC, it’s Saturday in India and BSE is closed. Thus, we have to merge Friday’s data from S&P 500 with Monday’s data on Sensex.

  2. The stock markets may have holidays on different days. For example, in the US 4th of July is a public holiday but in India the markets are open on that day. Similarly, in India 15th August is a public holiday but US markets are open on that day. Thus, we need to tackle this mismtach too.

I overcame this problem by first creating a data set of all the days in a year. Then I left joined this data set with S&P 500 and Sensex separately to create two new data sets. This will result in missing values for the stock returns when there are no observations for certain dates in S&P 500 or Sensex data sets. Finally, I used na.locf function from zoo package to fill up missing returns. na.locf will use the last available value to replace missing values.

# Create a data sets with all the dates

all_dates <- data.frame(Date = seq(lubridate::dmy('01-01-2008'), lubridate::dmy('20-12-2018'), 'day'))

# Merge with BSE Sensex data

bse2 <-  all_dates %>% 
  left_join(bse, by = 'Date') %>% 
  mutate(bse_ret3 = na.locf(bse_ret, na.rm = FALSE),
         bse_ret4 = na.locf(bse_ret2, na.rm = FALSE),
         Date2 = Date - 1 # Create this date to merge with S&P 500

# Similarly merge all_dates with S&P 500 data

snp2 <- all_dates %>% 
  left_join(snp, by = 'Date') %>% 
  mutate(snp_ret2 = na.locf(snp_ret, na.rm = FALSE))

Finally merge bse2 and snp2 to get our final data set.

final <- bse2 %>% 
  inner_join(snp2, by = c('Date2' = 'Date')) %>% 
  mutate(year = lubridate::year(Date))

3. Make Graphs

As BSE provides data from January 2008, I downloaded and used the full time series in the above analysis. However, for the next part I will limit the analysis to only 2018.

dt <- final %>%  
  filter(year == 2018) %>%
  filter(! %>% # delete the cases where Sensex returns are missing
  select(Date, bse_ret3, bse_ret4, snp_ret2)
cor1 <- round(cor(dt$bse_ret4, dt$snp_ret2), 2)
cor2 <- round(cor(dt$bse_ret3, dt$snp_ret2), 2)

## [1] 0.61
## [1] 0.28

Plot using opening prices

dt %>% 
  select(Date, bse_ret4, snp_ret2) %>% 
  reshape2::melt(id.vars = 'Date') %>% 
  ggplot(aes(Date, value, color = variable)) +
  geom_line() +
  scale_color_manual(values = c('#81A1C1', '#EF2E69'), labels = c('BSE', 'S&P 500')) +
  theme_modern_rc() +
  scale_x_date(breaks = date_breaks("months"), labels = date_format("%m-%y")) +
  theme(legend.position = c(0.9, 1), legend.direction = "horizontal") +
  labs(x = NULL, y = NULL, color = NULL, title = 'Daily Returns in 2018',
       subtitle = bquote('BSE returns from close of' ~ day[t] ~ to ~ underline(open) ~ of ~ day[t+1] ~ '(' ~ r[xy] == bold(.(cor1)) ~ ')'))

Plot using closing prices

dt %>% 
  select(Date, bse_ret3, snp_ret2) %>% 
  reshape2::melt(id.vars = 'Date') %>% 
  ggplot(aes(Date, value, color = variable)) +
  geom_line() +
  scale_color_manual(values = c('#81A1C1', '#EF2E69'), labels = c('BSE', 'S&P 500')) +
  theme_modern_rc() +
  theme(legend.position = c(0.9, 1), legend.direction = "horizontal") +
  scale_x_date(breaks = date_breaks("months"), labels = date_format("%m-%y")) +
  labs(x = NULL, y = NULL, color = NULL, title = 'Daily Returns in 2018',
       subtitle = bquote('BSE returns from close of' ~ day[t] ~ to ~ underline(close) ~ of ~ day[t+1] ~ '(' ~ r[xy] == .(cor2) ~ ')'))

The correlation between S&P 500 and Sensex is modest 0.28 when we use closing prices for Sensex returns calculations. However, it’s 0.61 when we use opening prices instead! Here are a couple of takeaways:

  1. Indian stock market is highly correlated to the US markets when the market opens. In other words, US market conditions are an important source of information for Indian markets.

  2. As the day proceeds, the market incorporates new information from other parts of the world as well as from local economy. This reduces the correlation with the US returns.


comments powered by Disqus