 # Crypto Portfolio Optimization Part 1: Correlation Matrix

In this post I will talk about how to use correlation matrices for crypto portfolio optimization. After some basics and the Python code, we will discuss the results. Furthermore, this will be applied on some large cap Altcoins.

Additional disclaimer: This is no investment advice or encouragement to buy crypto. Please, do your own research before investing. This tutorial is only for educational purposes and therefore meant to explain statistical concept using Python. Furthermore, all mentioned coins are just for illustrative purposes.

### Introduction to Portfolio Optimization

Usually, a portfolio consists of different assets in various asset classes. With respect to cryptocurrency different coins form a portfolio. Here, the main classes are Bitcoin, Altcoin e.g. Ethereum and tokens e.g. utility tokens like Civic.
Typically, portfolio optimization is used to maximize the return while minimizing the risk. Therefore, you usually change the asset distribution. Hence, most of the times you want a healthy and diversified mix of various assets. Practically, this means not putting all eggs in one basket.

### What is Correlation?

In statistics a correlation measures the relationship or association between datasets. In our case we try to measure the similarity in the price progression of an asset e.g. crypto or stock. Therefore, correlation coefficients are used as a numerical relationship measure. A correlation of +1 would represent a perfect correlation. This means, there is a perfect positive relationship. A correlation coefficient of -1 would represent a perfect negative relationship. Furthermore, a correlation coefficient of 0 represents a very weak relationship. Sometimes correlation and regression are confused. While correlation determines the degree of association, regression describes how an independent variable (predictor) relates to a dependent variable.

### Why use Correlation Coefficients?

Here are some possible answers to the why question:

• Identify similar price behaviour of coins/stocks in the past.
• Risk management: Diversify by choosing uncorrelated coins/stocks. Hence, a diversified portfolio would have various correlation coefficients.
• Allow a useful reallocation of portfolio assets over time.

Additionally, the correlation coefficients can be applied to prices, returns or risks (volatility).

### Different Correlation Coefficients

The Pearson correlation coefficient measures the linear relationship between two datasets. Furthermore, the Spearman correlation coefficient measures monotonic relationships between two datasets. This means, a Spearman correlation would identify a exponential relationship more precisely compared to Pearson.

There are various more correlation coefficients e.g. Kendall rank correlation coefficient. Therefore, make sure you are using the right coefficient for your purpose.

### Python Code

Now, let is have a look at the Python code:

• User Input: Enter input in the format [“Target coin”, “Exchange”, “From e.g. BTC or €”]
• User Input: Set considered time frame
• Prepare data frames, which hold the datasets
• Get all data via the Cryptocompare in a loop
• Calculate return (Pandas function: pct_change) and correlation coefficients (function: corr(method=”spearman/pearson”))
• Prepare data and plot results.
# -*- coding: utf-8 -*-
#Author: johannes <info@numex-blog.com>, 21.04.18

import pandas as pd
import urllib, json
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns; sns.set()

#INPUT
coinEx = [["ETH", "Binance", "BTC"], ["XRP", "Binance", "BTC"], ["EOS", "Binance", "BTC"], ["BCH", "Binance", "BTC"], ["LTC", "Binance", "BTC"]]
timeFrame = 180

#MAIN
df_close = pd.DataFrame({})
df_returns = pd.DataFrame({})
df = pd.DataFrame({})

for i in range(len(coinEx)):
coin = coinEx[i]
exchange = coinEx[i]
url = "https://min-api.cryptocompare.com/data/histoday?fsym=" + coin + "&tsym=" + tradeCurrency + "&limit=" + str(timeFrame) + "&e=" + exchange

resp = urllib.urlopen(url)

df = pd.DataFrame(data['Data'])
df.columns = [['close', 'high', 'low', 'open', 'time', 'volumefrom', 'volumeto']]
df.time = pd.to_datetime(df['time'])
df = df.set_index(df.time)

df_close[coin] = df.close.tolist()

#Get coin names
coins = []

for i in range(len(coinEx)):
coins.append(coinEx[i])

df_returns = df_close.pct_change()*100

corr_price = df_close.corr(method="pearson") #alternative method: "pearson or kendall"
corr_return = df_returns.corr(method="pearson")

msk = np.zeros_like(corr_price)
msk[np.triu_indices_from(msk)] = True

#Price progression
df0 = pd.DataFrame(df_close, columns=list(coins))
ax0 = df0.plot(logy = 'true')
ax0.set_xlabel('Past days', fontsize=18)
ax0.set_ylabel('Price in BTC (log scale)', fontsize=18)
ax0.set_title('Price progression', fontsize=18)

#Daily return
df1 = pd.DataFrame(df_returns, columns=list(coins))
ax1 = df1.plot()
ax1.set_xlabel('Past days', fontsize=18)
ax1.set_ylabel('Daily return in %', fontsize=18)
ax1.set_title('Daily return', fontsize=18)

#Correlation coefficients
sns.set(font_scale=1.4)

fig = plt.figure()
ax2 = sns.heatmap(corr_price, mask=msk, annot=True, square=True, vmin=-1, vmax=1, cmap="YlGnBu", cbar=False)
ax2.set_title('Pearson & Price', fontsize=18)

ax3 = sns.heatmap(corr_return, mask=msk, annot=True, square=True, vmin=-1, vmax=1, cbar=False)
ax3.set_title('Pearson & Return', fontsize=18)

plt.tight_layout()
plt.show()

### Results

First of all, we consider the following large cap coins: ETH, XRP, EOS, BCH, LTC and the past 180 days. Furthermore, all prices are considered in Satoshi (1 Satoshi = 0.00000001 BTC) and not Fiat e.g. €. Hence, the price progression looks like the following (logarithmic scale). Furthermore, the daily return can be calculated as follows:

(1) In Python this can be done using the Pandas function: pct_change. Therefore, the daily returns of the large cap coins look like the following. Now, we are going to show the correlations matrices regarding the price and return. The volatility is higher in the returns compared to the prices.

In case you are interested in short-term changes, you may consider the returns correlations. Otherwise, if you are interested in the long-term behaviour you may consider the prices correlation.

Additionally, consider that prices are usually trending (e.g. upwards trend). This means, you get a positive correlation more likely when you try to calculate the correlation in prices.

### Results: Correlation matrices

In the following, we will use correlation matrices. A correlation matrix contains the correlation coefficients between a set of variables e.g. prices and returns.

First, we are using the Spearman correlation coefficients. As we can see, the differences between the price and return correlations are significant e.g LTC-BCH. Furthermore, let’s have a look at the Pearson correlation coefficient it is one of the most widely used ones. Remember, if there is a weaker correlation compared to Spearman, this could be an indicator for nonlinear effects. Finally, the difference between the Spearman’s- and Pearson correlation coefficients is lower regarding the returns.

### Correlation over Time

The correlation is strongly depended in the chosen time frame. Therefore, let’s run the calculation with a time frame of 90 days using the Pearson correlation coefficients. Here, we can directly see the difference e.g. price: LTC-EOS. Hence, you need to be sure in what time frame you are interested in. Furthermore, correlations should be tracked regularly. ### Conclusion

• In case your datasets are linearly related you can use the Pearson correlation, elsewise you can use Spearman.
• You can use the returns correlation for short-term changes and the prices correlation more likely for the long term behaviour. Furthermore, be aware, that trends lead more likely to a positive correlation.
• The chosen time frame has a strong impact on the results since the market changes constantly.
• Including coins with a negative correlation coefficient into your portfolio can reduce the variance and therefore the risk.
• Correlation does not always indicate a causal relationship between data.
• You should consider various correlation coefficients to form an oppinion.
• Be careful when you extrapolate from the past into the future.
• For now, the risk is NOT considered.

In the next article I will introduce Pareto optimization. Therefore, return and risk are simultaneously considered and optimized using sampling techniques.