In the last decade, the blockchain and cryptocurrency ecosystem has gained a lot of attentionfrom the world. Every day there are innovations applying these concepts to solve real-worldproblems. Perhaps the most known and used example of these innovations are digitalcurrencies also known as cryptocurrencies, which are digital tokens using cryptography tosecure and verify transactions. The main goal of these cryptocurrencies is to allow themovement, management, and storage of digital assets without the intermediation of a thirdparty. In the current financial system, these third parties are usually banks, paymentprocessors, and exchanges that intermediate monetary transactions, so the goal ofcryptocurrencies are to enable peer-to-peer transactions between two parties. Some examplesof well-known cryptocurrencies include Bitcoin, Ethereum, and Litecoin. On the other hand,another innovation that is changing our daily lives is Machine Learning, a subfield ofArtificial Intelligence that involves the use of algorithms and computer systems to performtasks or predict outcomes based on data. Our goal is to create an ML algorithm that learnsfrom historical market data to predict the future prices of a given digital asset.
6.1 Dataset As mentioned before we are going to use historical market data to predict market movements. For this, we have built a standard dataset for each of the three currencies on the scope of the project. Also for all the coins, we are observing them compared to the United States Dollar.
id: 8075d74c8bc303d78282630568e28aa2 - page: 22
Dataset composition The dataset is composed of three main parts: 1. Market data 2. Calculated Technical Indicators 3. External Market Prices Data granularity and dates into consideration We are looking at historical data since January 1st, 2021, and data will be updated daily with the platform ETL. We also have built the same dataset but with two 2 granularities, Hourly and Minutely. The minutely is 60 times larger in terms of storage as it will have 60 data points for every data point in the hourly dataset. The objective is to determine if more granular data could lead to a more precise understanding of the market. Also, it is important these markets are never closed so we have available 24/7 market to highlight information. that 21 6.2 Data Preprocessing and Normalization
id: 4157f478513a289b1f1f0b9d5bba47c8 - page: 22
Preprocessing As data is coming directly from the market, which is open 24/7, we have 100% complete data and it doesnt need any other preprocessing than normalization. The only columns that need special processing are the external market prices. As we are getting other prices from assets that are in the commodities market, they have some restrictions and that is that they close on weekends and that our source doesnt provide smaller granularity than day by day. For this, we will just take the price of the day and complete weekends and holidays, which are days that the market is closed, with the previous closing price.
id: 9889ba28fe3374f9e07cab14c3d3e3b3 - page: 23
Normalization The goal of this section is to define the best normalization technique, by comparing models only considering information for the year 2022 and comparing their results. As neural networks require values ranging from 0 to 1, we need to normalize our data using a specific technique to transform our real market data into the desired format. There are many possible ways of normalizing data, so we will look at which normalization technique fits better to our data so that models have a better performance. For this, we will have an iterative approach in which first we will explore using a couple of Normalization techniques and using three different Neural Network models to define the best-performing model for the task. The second step will be iterating by training and validating the model with data transformed over seven different normalization techniques to determine the better one for our problem.
id: 0ff86f69d9f3e8c7000e37bab96a2f78 - page: 23