Hands-On: Teach you how to use deep learning models to predict encrypted currency prices-Alibaba Cloud Developer Community

If you want to select the three popular Golden Plum Awards in 2017, there is no doubt that the winners must be fingertip top, artificial intelligence and encrypted currency. Encrypted currency is a subversive technology. The principle behind it attracts people's attention. I am very optimistic about its future development.

In fact, I don't hold any encrypted currency, but when it comes to successfully predicting the price of encrypted currency through deep learning, machine learning and artificial intelligence, I think I am an old driver.

At first, I thought it was a very novel and unique idea to combine deep learning and encrypted currency research, but when I was preparing this article, I found a similar article. That article only talks about bitcoin. I will also discuss Ethernet in this article (it also has some aliases: ether, eth or lambo-money).

Links to similar articles:


we will use a long and short time memory (LSTM) model, which is a specific model that is very suitable for analyzing time series data in deep learning (or any time/space/structural sequence data, for example, movies and sentences).

If you really want to understand the basic theories, I recommend you to read these three articles: Understanding LSTM network, exploring LSTM and original White Paper. For selfishness, I mainly want to attract more non-professional machine learning enthusiasts, so I will try my best to reduce the length of the code. If you want to use the data or build your own model, this article also provides Jupyter (Python) notes for your reference. Then, let's get started!

Understand LSTM network


explore LSTM


original White Paper


Jupyter (Python) notes https://raw.githubusercontent.com/dashee87/blogScripts/master/Jupyter/2017-11-20-predicting-cryptocurrency-prices-with-deep-learning.ipynb data

before creating a model, we need to obtain the corresponding data. There are Bitcoin price data from the past few years to every minute on Kaggle (and other related features, which can be seen in another blog). However, if this time granularity is adopted, the noise in it may mask the real signal, so we take the sky as the granularity.

Another blog:


however, in this case, we will face the problem of insufficient data (our data volume can only reach hundreds of rows, not thousands or millions of rows). In deep learning, no model can solve the problem of too little data. I also don't want to rely on static files to build models, because this will make the process of injecting new data to update models very complicated in the future. Instead, let's try crawling data from websites and APIs.

Because we need to use multiple encrypted currencies in a model, perhaps crawling data from the same data source is a good way. We will use the website coinmarketcap.com.

So far, we have only considered bitcoin and Ethernet, but it is not difficult to obtain data related to the recently popular Altcoin through the same channel. Before importing data, we must load some python packages, which make the analysis process much easier.

import pandas as pd

import time

import seaborn as sns

import matplotlib.pyplot as plt

import datetime

import numpy as np

# get market info for bitcoin from the start of 2016 to the current day

bitcoin_market_info = pd.read_html("https://coinmarketcap.com/currencies/bitcoin/historical-data/?start=20130428&end="+time.strftime("%Y%m%d"))[0]

# convert the date string to the correct date format

bitcoin_market_info = bitcoin_market_info.assign(Date=pd.to_datetime(bitcoin_market_info['Date']))

# when Volume is equal to '-' convert it to 0


# convert to int

bitcoin_market_info['Volume'] = bitcoin_market_info['Volume'].astype('int64')

# look at the first few rows


Explain what happened just now. We loaded some python packages and imported the tables we saw on this website (see the link below). After a simple data cleanup, we got the above table. By simply replacing "bitcoin" in the URL address (which is ignored in the code here) with "ethereum", the data of ethereum can be obtained accordingly.

Website link :


in order to verify the accuracy of the data, we can make the price and trading volume time charts of two currencies.

Figure note: the first half-closing price; The second half-trading volume

figure note: the first half-closing price; The second half-trading volume

training, testing, and random walk

we have the data, and now we can start to create models. In the field of deep learning, data is generally divided into training data and test data. Models are built with training data sets, and then evaluated with test data sets other than training samples.

In a time series model, we usually use a period of data training and then use another period of data testing. I randomly set the time node to June 1, 2017 (that is, the model will be trained with the data before June 1 and evaluated with the subsequent data).

Figure note: Purple Line-training set; Blue Line-test set

the first half-Bitcoin price ($); The second half-Ethernet price ($)

you can observe that most of the data in the training set is at a low currency price. Therefore, the distribution of training data may not well represent the distribution of test data, this will weaken the ability of the model to extend to out-of-sample data (you can refer to this website to transform the data into a stable time series).

Website link:


but why should we let the unsatisfactory reality interfere with our analysis? Before we take off with a deep artificial intelligence machine model, it is necessary to discuss a simpler model. The simplest model is to assume that tomorrow's price is equal to today's price, which we simply and roughly call a deferred model. The following uses mathematical language to define this model:

figure: simple Deferred model

the first half-Bitcoin price ($); The second half-Ethernet price ($)

slightly expand this simple model, generally people think that the stock price is random walk, expressed by mathematical model:

we will obtain the values of μ and σ from the training dataset, and then apply the random walk model to the test dataset of Bitcoin and Ethernet.

Figure: single-point random walk model (test data)

the first half-Bitcoin price ($); The second half-Ethernet price ($)

haha, look at these forecast lines. Except for some bends, they basically follow the actual closing price of each currency, it even predicted the rise (and subsequent decline) of Ethernet currency in mid-June and late August.

The accuracy of models that only predict a certain point in the future is misleading, because the error will not continue in the subsequent prediction. Regardless of the error of the previous value, the error will be reset because the input at each time point is a real value.

The Bitcoin random walk model is especially deceptive because the range of the y-axis is very large, making this prediction curve look very smooth.

Unfortunately, single-point Prediction is very common in the evaluation of Time series models (such as Article 1 and Wen Zhanger). It is better to use multi-point prediction to evaluate its accuracy. In this way, the previous error will not be reset, but will be included in the subsequent prediction. The more limited the model with poor prediction ability is, the more serious it is. The mathematical model is as follows:

article 1 link:

article 2 link:


we obtained a random walk model based on all the test datasets and predicted the closing price.

Figure: complete interval random model

the first half-Bitcoin price ($); The second half-Ethernet price ($)

model prediction is extremely sensitive to the selection of random seeds. I have chosen a complete interval random walk model with better prediction results of Ethernet coins. In the corresponding Jupyter notes, you can try the seed value in the following figure through the interactive interface to see the poor performance of the random walk model.

Figure: single walk model/full interval random model comparison

ordinate-Ethernet currency price ($)

it should be noted that a single random walk always looks very accurate, even if there is no meaning behind it. I hope you can read any article that claims to be able to accurately predict the price with suspicion. But perhaps I don't need to worry that the fans of encrypted currency don't seem to be tempted by the imaginary advertising slogans.

Long and short term memory (LSTM)

as I said before, if you are interested in the principle of LSTM, you can read: Understanding LSTM network, exploring LSTM, original White Paper. (See above for the link)

fortunately, we don't need to build a network from scratch (or even understand it). We can use some function packages that include various deep learning algorithm standards (such as TensorFlow, Keras,, pyTorch, etc.). I will use Keras because I find it the most intuitive for non-professional enthusiasts. If you are not familiar with Keras, you can take a look at my previous tutorial.







previous tutorial



I created a new data table model_data, removed some columns (the opening price, the highest price on the day, the lowest price on the day), and rearranged the new columns: close_off_high indicates the difference between the closing price and the highest price of the day. The values of -1 and 1 indicate that the closing price is equal to the lowest or highest price of the day, respectively.

The volatility column is the difference between the highest price and the lowest price divided by the opening price. You may also notice that the model_data dataset is sorted by time from ancient times to present. In fact, the model input does not include Date, so we no longer need this column.

Our LSTM model will use previous data (both bitcoin and Ethernet) to predict the closing price of a specific currency on the next day. We need to decide how many days of data will be used in the model.

Similarly, I randomly decided to use the data of the previous 10 days, because 10 is a good integer. We created multiple small data tables with 10 consecutive days of data (called Windows). The first window will consist of rows 0-9 in the training dataset (Python counts from 0), the next window consists of 1 to 10 rows, and so on.

Choosing a smaller window size means that we can use more windows in the model. The disadvantage is that the model does not have sufficient information to predict complex long-term behaviors (if it can be predicted).

Deep learning does not like input values with a wide range of changes. Look at these columns. Some values are between -1 and 1, while others are millions. We need to standardize the data to ensure that the range of changes in our input values is consistent.

Generally, values from -1 to 1 are ideal. Values in the off_high and volatility columns meet the requirements. However, for other columns, we need to standardize the input values according to the values in the first row of the window.

The table shows a part of the input of the LSTM model (there are actually hundreds of similar tables). We standardized some columns so that their values at the first point in time were 0 to predict price changes compared to this point in time.

Now we are going to build the LSTM model. In fact, using Keras to build the model is very simple. You only need to stack several modules together.

For a better explanation, please stamp here:


the code is as follows:

# import the relevant Keras modules

from keras.models import Sequential

from keras.layers import Activation, Dense

from keras.layers import LSTM

from keras.layers import Dropout

def build_model(inputs, output_size, neurons, activ_func = "linear",

                dropout =0.25, loss="mae", optimizer="adam"):

    model = Sequential()

    model.add(LSTM(neurons, input_shape=(inputs.shape[1], inputs.shape[2])))




    model.compile(loss=loss, optimizer=optimizer)

return model

as expected, the build_model function creates an empty model named model (that is, this line of code model = Sequential). The LSTM layer has been added to the model, the size matches the input (n * m table, n and m represent time points/rows and columns, respectively).

functions also include more general neural network features, such as dropout and activation functions. Now we just need to determine the number of neurons placed in the LSTM layer (I chose 20 to ensure a reasonable running time) and the training data for creating the model.

The code is as follows:

# random seed for reproducibility


# initialise model architecture

eth_model = build_model(LSTM_training_inputs, output_size=1, neurons = 20)

# model output is next price normalised to 10th previous closing price

LSTM_training_outputs = (training_set['eth_Close'][window_len:].values/training_set['eth_Close'][:-window_len].values)-1

# train model on data

# note: eth_history contains information on the training error per epoch

eth_history = eth_model.fit(LSTM_training_inputs, LSTM_training_outputs,

                            epochs=50, batch_size=1, verbose=2, shuffle=True)

#eth_preds = np.loadtxt('eth_preds.txt')


Epoch 50/50

6s - loss: 0.0625

we have built an LSTM model to predict the closing price of Ethernet currency tomorrow. Let's see how the model performs. First, test the performance of the model on the training set (data before June 2017). The number below the code is the average absolute error (mae) of the model after 50 training iterations (or cycles) on the training set. We can regard the output results of the model as the daily closing price rather than relative changes.

Training set: single-point prediction

blue Line-actual price; Green Line-forecast price

ordinate: Ethereum price ($)

average absolute error: 0.0583

as we expected, the accuracy looks very high. During the training process, the model can understand its error source and make corresponding adjustments.

In fact, it will not be difficult for the training error to reach almost zero. We only need to use hundreds of neurons and train thousands of cycles (this is overfitting, actually predicting noise, I added Dropout() to the build_model function to reduce the risk of overfitting for our relatively small models.

We should pay more attention to the performance of the model on the test set, because we can see the performance of the model processing new data.

Test set: single-point prediction

blue Line-actual price; Green Line-forecast price

ordinate: Ethereum price ($)

average absolute error: 0.0531

despite the misleading limitations of single-point prediction, the LSTM model seems to perform well in the test set. However, its most obvious defect is that the price of Ethernet currency will inevitably decline after a sharp rise (for example, in mid-June and October), and the model cannot be detected.

In fact, this problem has always existed, but it is more obvious at these drastic changes in time. The predicted price curve is almost the result of a day's shift from the actual price curve to the future (for example, a drop in mid-July). In addition, the model seems to overestimate the future value of Ethernet currency as a whole (so do we ~), and the prediction curve is always higher than the actual curve.

I suspect that this is because within the time range of the training set, the price of Ethereum increases by an astronomical number, so the model infers that this trend will continue (so do we ~). We have also built a similar LSTM model to predict bitcoin. The prediction diagram of the test set is as follows:

the Jupyter notebook link of the complete code:


test set: single-point prediction

blue Line-actual price; Green Line-forecast price

ordinate: Bitcoin price ($)

average absolute error: 0.0392

as I said before, single-point prediction can be deceptive. If you look closely, you will notice that the predicted value usually reflects the previous value (for example, October). The deep learning model LSTM has partially derived the p-element autoregressive model (autoregression model,AR), and the future value is only the weighted sum of the previous p values. The mathematical formula of AR model is as follows:

the good aspect is that AR model is often used in time series, so LSTM model seems to have a reasonable place to use. The bad news is that this is a waste of LSTM capabilities. We can build a simpler AR model that takes less time and may get similar results.

Test set: Prediction at five time points

blue continuous line: actual price

other color lines: forecast price

the first half: Bitcoin price ($); The second half: Ethernet price ($)

this prediction result is obviously not as attractive as the result of single-point prediction. However, I am very happy that this model outputs some subtle conditions (for example, the second line of Ethernet currency); It does not simply predict that the price will move in a unified direction, which is a good phenomenon.

Looking back at the single point prediction, the deep robot neural model runs well, but the random walk model is not bad. Like the random walk model, the LSTM model is also sensitive to the selection of random seeds (the weight of the model is initially randomly allocated).

Therefore, if we want to compare the two models, we need to run many times (about 25 times) before obtaining the estimated model error value, the absolute value of the difference between the actual and predicted closing prices in the Test set is marked as an error.

Left: bitcoin test set (run 25 times) ordinate: mean absolute error abscissa: LSTM model, random walk model

right: ethernet coin test set (run 25 times) ordinate: mean absolute error abscissa: LSTM model, random walk model

perhaps AI is worth widely publicizing. The above figure shows the error of the test set after 25 initialization of each model. The average price error of LSTM model on Bitcoin and Ethereum is 0.04 and 0.05 respectively, which is better than random walk model.

Defeating random walk model is a very low standard, and comparing LSTM with more appropriate time series model will be more interesting (such as weighted average, AR,ARIMA or Facebook Prophet algorithm). On the other hand, I believe it is not difficult to improve the LSTM model (try to add more layers and/or neurons, change the size of batches, learning rate, etc.).

In other words, I hope you have discovered my doubts about applying deep learning to predict the price changes of encrypted currencies. This is because we have neglected the best framework: human intelligence. Obviously, the perfect model * for predicting encrypted currencies should be:

(Translator's note: if the price of encrypted currency approaches the height of the moon after Times have changed, then all encrypted currencies that are not in the OmiseGo blockchain will continue to appreciate.

* This article does not involve financial suggestions and should not be used as financial suggestions. Although investments in encrypted currencies will definitely increase in a long period of time, they may also depreciate.


we collected some encrypted currency data and input it into the cool LSTM model of deep intelligent machine learning. Unfortunately, the predicted value is not much different from the previous input value. Then the question is, how to make model learning more complex behavior?

Change the loss function: the mean absolute error (MAE) makes the model moderate and can't get the "out-of-the-box" result. For example, if the mean square error (MSE) is used, the LSTM model will be forced to pay more attention to the detection of peak/low values. Many loss functions of customized transactions will also make the model evolve in a less conservative direction. Limit conservative AR models: this encourages deep learning algorithms to explore more risky/interesting models. But it's easier said than done. Get more and/or better data: if past prices are enough to predict more accurate future prices, then we need to introduce other features with considerable prediction capabilities. In this way, the LSTM model will not be excessively dependent on the past price, and may unlock more complex behaviors, which may be the most reliable and the most difficult solution to complete.

If the above is the positive side, then the next negative news is that the price change model of encrypted currency may not be found at all; There is no model (no matter how deep) signal and noise can be separated (similar to using deep learning to predict earthquakes), and even if a certain mode appears, it will disappear quickly.

Think about how big the difference between the crazy Bitcoin in 2016 and the end of 2017 is. Any model based on the data in 2016 is certainly difficult to reproduce the unprecedented changes in 2017. The above discussion is just to suggest that you might as well save some time and stick to the research of AR model.

However, I believe they will eventually find a useful place for deep learning. At the same time, you can build your own LSTM model by downloading Python code.

The original text was published on 2018-03-14.

Author: abstract bacteria

this article is from "Big Data Digest", a partner of Yunqi community. For more information, please pay attention to" big Data Digest weChat public account

Selected, One-Stop Store for Enterprise Applications
Support various scenarios to meet companies' needs at different stages of development

Start Building Today with a Free Trial to 50+ Products

Learn and experience the power of Alibaba Cloud.

Sign Up Now