Correlations between Bitcoin price and Google trend data
Lately, I read about people using Google trends for the market timing of cryptos or stocks. Due to that, I was experimenting on how to work with this data. Therefore, I explain in this blog post how to receive and compare Google trend data to price and volume. Here I used crypto currency as an example. Yet you can use stock market data likewise.
Additional disclaimer: This is no investment advice or encouragement to buy crypto. Please, do your own research before investing. This tutorial is only for educational purposes.
What is Google trends?
Google trends is a service that provides information which search keywords are entered by users into Google. Results are normalized with the search volume of that keyword. Thus, we can analyze trends regarding the popularity of search keywords.
Using Pytrends to access Google trend data
You can access the Google trends data, using an unofficial API (Application Programming Interface) pytrends. This enables us to receive the Google trends data. I only received data with a delay of 3 days. Guess, that is an advantage Google keeps.
Receiving Bitcoin data
The crypto currency Bitcoin is chosen to compare data. I used the API from CryptoCompare to receive live data. The closing prices and the volume (number of traded Bitcoin) are especially relevant. The chosen exchange is Kraken. You can find more on how to use the CryptoCompare API here.
Writing the Python script
First of all, here is the code structure:
- Import dependencies
- Set the time window e.g. 100 days and the moving average window e.g. 14 days.
- Choose the keyword(s) for Google trends e.g. bitcoin.
- Manipulate the url as you please e.g. time window, currency to and from, exchange
- Afterwards, a live json file is read from CryptoCompare
- The data is stored in a dataframe (df) using the Pandas data analysis library
- Because averages are used, a starting point sp is defined. For example, if 20 data points are used for an average, the chart is plotted from the 21th data point on.
- Afterwards, the interest over time function of Pytrends is used. The data is put into a pandas data frame (dfTrend).
- Calculate moving averages
- Calculate correlation coefficient with the corrcoef function of the numpy package
- Finally, plot data regarding price, volume and trend
And here is the script:
#License: MIT License (http://opensource.org/licenses/MIT)
import pandas as pd
import urllib, json
import matplotlib.pyplot as plt
from pytrends.request import TrendReq
from datetime import datetime, timedelta
import numpy as np
timeWindow = 100
timeWindowMovingAvg = 5
keywords = ["bitcoin"]
url = "https://min-api.cryptocompare.com/data/histoday?fsym=BTC&tsym=EUR&limit=100&e=Kraken"
response = urllib.urlopen(url)
data = json.loads(response.read())
df = pd.DataFrame(data['Data'])
df.columns = [['close', 'high', 'low', 'open', 'time', 'volumefrom', 'volumeto']]
df.time = pd.to_datetime(df['time'],unit='s')
df = df.set_index(df.time)
beginDateWindow = datetime.now().date() - timedelta(days=timeWindow)
#===Starting point to have the exact amount of data
sp = len(df.time[timeWindowMovingAvg-1:])
#===Pass data to pytrend and execute it
pytrend = TrendReq()
dataWindow =str(beginDateWindow) + " " + str(datetime.now().date())
pytrend.build_payload(keywords, cat=0, timeframe=dataWindow)
dfTrend = pytrend.interest_over_time() # using interest over time function
dfTrend.columns = ['keyword', 't']
maTrendPrice = df.close.rolling(center=False, window=timeWindowMovingAvg).mean()
maTrendVolume = df.volumeto.rolling(center=False, window=timeWindowMovingAvg).mean()
maTrendGoogle = dfTrend.keyword.rolling(center=False, window=timeWindowMovingAvg).mean()
#ccPriceTrend = np.corrcoef(dfTrend.keyword, df.close[:-3])[1,0]
#ccVolumeTrend = np.corrcoef(dfTrend.keyword, df.volumeto[:-3])[1,0]
#===Plot data price and trend
fig, ax1 = plt.subplots()
ax1.plot(df.index[-sp:], df.close[-sp:],'r', label = 'Price', linewidth=1.5)
ax1.set_ylabel('Price in Euro', color='r')
ax1.plot(maTrendPrice[-sp:],'r', linestyle= '--', label = 'SMA-price'+str(timeWindowMovingAvg), linewidth=1.5)
ax2 = ax1.twinx()
ax2.plot(dfTrend[-sp:],'b', label = 'Google trend', linewidth=1.5)
ax2.set_ylabel('Google trend', color='b')
ax2.plot(maTrendGoogle[-sp:],'b', linestyle= '--', label = 'SMA-trend'+str(timeWindowMovingAvg), linewidth=1.5)
for label in ax1.xaxis.get_ticklabels():
#===Plot volume and trend
fig2, ax3 = plt.subplots()
ax3.plot(df.index[-sp:], df.volumeto[-sp:],'g', label = 'Volume', linewidth=1.5)
ax3.plot(maTrendVolume[-sp:],'g', linestyle= '--', label = 'SMA-price'+str(timeWindowMovingAvg), linewidth=1.5)
ax4 = ax3.twinx()
ax4.plot(dfTrend[-sp:],'b', label = 'Google trend', linewidth=1.5)
ax4.set_ylabel('Google trend', color='b')
ax4.plot(maTrendGoogle[-sp:],'b', linestyle= '--', label = 'SMA-trend'+str(timeWindowMovingAvg), linewidth=1.5)
for label in ax3.xaxis.get_ticklabels():
Furthermore, you can find an updated and working version on the Github repository (01.11.2018).
A correlation coefficient is a statistical measurement regarding the strength of a relationship. It can measure the linear correlation between vectors (e.g. price, volume) or function curves. A coefficient of 1 would mean a perfect positive correlation. A coefficient of 0 would mean no linear correlation. Here I used the Pearson correlation coefficient.
For further details on that topic you can have a look at my Crypto Portfolio Optimization Part 1: Correlation Matrix article.
It is important to understand, that this analysis is for a time frame of around 100 days. If you change the time frame, the results may differ considerably.
First of all we compare the closing price of a Bitcoin and the Google trend data. The correlation coefficient is 0.828 (time window: [17-08-30 17-12-06]). This means there is a strong positive linear correlation.
In addition, the volume of traded Bitcoin and the Google trend data is compared. The correlation coefficient is 0.778 (time window: [17-08-30 17-12-06]). This means, there is a moderate to strong positive linear correlation.
Let us smooth out the data using simple moving averages (SMA) for 5 days. This is applied on the closing price and volume data and the Google trend data.
First, let’s look at the first two figures. I would have almost said, that the correlation between volume and trend is stronger. Quite a lot of the large peaks in volume are covered by the trend. The only exception is the negative peak around the 20th of October. Let’s have a look at the correlation coefficient. The price-trend correlation coefficient is 0.828, while the volume-trend correlation coefficient is 0.778. This contradicts the first impression, that the correlation between volume and trend is stronger.
Therefore, a 5 days SMA is applied in the 3rd and 4th figure. This means, the data is a bit smoothed out. We can see, that the strong peaks in the volume are not met by likewise strong peaks in the trend data.
As a result, in this example the closing price correlated a bit stronger with the Google trend. Be careful to not assume that in general. A different time window, different currency or even a different Google trends keyword can alter the results significantly.
In my opinion you can use Google trend data to grasp the sentiment in the markets. This is what technical analyses are sometimes lacking. By using Pytrends Google trends can be received. Furthermore, you can use simple moving averages (SMA) to smooth out data. Additionally, you can use correlation coefficients to check interpretations. In conclusion, you can use this to hopefully enhance your purchase decisions.
What did you think?
I’d like to hear what you think about this post.
Let me know by leaving a comment below.