Scraping websites for historical Bitcoin data using Python

Scraping websites for historical Bitcoin data using Python

In this post we will be scraping websites (coinmarketcap.com) for historical bitcoin data using BeautifulSoup and Python. Furthermore, the data is processed and put into a Pandas dataframe. After that, the historical Bitcoin data is used to plot a candlestick graph.

Scraping websites for data

First of all, web scraping techniques are used to extract data from websites. Also, this is helpful in case no API is provided to receive data from websites. Websites are build on HTML, therefore we need to extract our data from HTML code. Here, we are using Python and the powerful library BeatifulSoup. BeatifulSoup is probably one of the best libraries to pull out data from HTML files. You can view the HTML code in every Browser for instance in Firefox: Firefox > Web Developer > Page Source.

Scraping coinmarketcap.com

In the following, we are going to scrape the historical Bitcoin data (date, open, high, low, close, volume, market_cap) from coinmarketcap.com.

Python Code

First, let us have a look at the HTML snippet we are going to process.

After that, let us have a look at the necessary steps:

  • First provide a link to target website
  • Connect to server and pull HTML data
  • Select the section of interest, here it is ‘table-responsive’. Afterwards, search for the desired table rows (‘tr’).
  • Loop over rows, extract the .text data and put them into a temporary list (tmp). After that, we remove unnecessary commas from the data, transform the date string into a datetime object. Furthermore, strings are transformed into floats since they are numbers.
  • Make a pandas dataframe (df) from the list (data).

You can view and download the entire Python code in this Github repository.

Results

The main result of the proposed method is a Pandas dataframe with all desired data. Hence, it contains the date as a datetime object. The prices (open, high, low, close), the volume and the market capitalization.

Afterwards, we can use the data within the dataframe to plot a candlestick chart. Now, let’s have a look at the Python code for the candlestick chart.

import plotly.offline as pyo
from plotly.tools import FigureFactory as FF

fig = FF.create_candlestick(df['open'], df['high'], df['low'], df['close'], dates=df['date'])
fig['layout'].update({
    'title': 'Bitcoin price development',
    'yaxis': {'title': 'Bitcoin in USD'}
})

pyo.offline.plot(fig)

Finally, here you go with the Bitcoin candlestick chart.

Furthermore, you can also use the scraped historical data to perform Monte Carlo simulations.

Conclusion

We can use the Python library BeautifulSoup to parse the HTML code of websites and pull data from it. This is also called scraping websites. Furthermore, we can transform this data into the desired data formats (datetime and floats). For instance, we used the data to plot a candlestick chart.

What do you think?

I’d like to hear what you think about this post.

Let me know by leaving a comment below and don’t forget to subscribe to this blog!