Abstract— AI and data driven solutions have been applied todifferent fields and achieved outperforming and promisingresults. In this research work we apply k-Nearest Neighbours,eXtreme Gradient Boosting and Random Forest classifiers fordetecting the trendproblem of three cryptocurrency markets. Weuse these classifiers to design a strategy to trade in those markets.Our input data in the experiments include price data with andwithout technical indicators in separate tests to see the effect ofusing them. Our test results on unseen data are very promisingand show a great potential for this approach in helping investorswith an expert system to exploit the market and gain profit. Ourhighest profit factor for an unseen 66 day span is 1.60. We alsodiscuss limitations of these approaches and their potential impacton Efficient Market Hypothesis.Keywords— Market Prediction, Financial Decision Making, kNN Classifier, Extreme Gradient Boosting, Random Forest,Quantitative Computation
At this subsection we look at the libraries and hyperparameters involved in each model. We also note each model's training time. A more elaborate discussion about the hyper parameters is held in the Discussion section. forest have been learning implemented using open source machine library Scikit-learn. Scikit-learn features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numeric a l and and SciPy (Pedregosa, et al., 2011). An important design note for its about scikit-learn is its unified models. interface requirements, its easy to use and change models to use for the same data. kNN model and random scientific libraries NumPy interface this
id: 7e6179ae21294610b4724ab5f53b8a2b - page: 6
If users data suffices XGB has been implemented using XGBoost. XGBoost is an optimized distributed gradient boosting to be highly efficie nt, implements machine It flexible and portable. learning algorithms under the Gradient Boosting framework. XGBoost provides a parallel tree boosting (also known as GBDT, GBM) that solve many data science problems in a fast and accurate way (Chen & Guestrin, Xgboost: A scalable tree boosting system, 2016). library designed Hyperparameters involved in kNN classifier are as follows: Number of Neighbours: Depended on The Dataset (5 for ETHUSDT, 20 for LTCBTC, 100 for ZECBTC) Weight Function Used in Prediction: Distance to Compute The Nearest Neighbours: Auto: will attempt to decide the most Algorithm Used appropriate algorithm between BallTree, KDTree and Brute Force based on the values passed to fit method. Leaf Size: 30 The Distance Metric to Use for The Tree: Minkowski Power Parameter for The Minkowski Metric: 2
id: 13904255b78ab152b1d6ae728bd309ee - page: 6
Hyperparameters involved in Random Forest classifier are as follows: The number of trees in the forest: Depended on The Dataset (700 for ETHUSDT and ZECBTC, 1000 for LTCBTC) The Function to Measure The Quality of A Split: gini The Maximum Depth of The Tree: Nodes are expanded until all leaves are pure or until all leaves contain less than the minimum number of samples required to split an internal node samples. The minimum number of samples required to split an internal node: 2 Hyperparameters involved in XGB classifier are as follows: Booster: gbtree Eta (alias: Learning Rate): 0.3 Minimum Loss Reduction Required to Make A Further Partition on A Leaf Node of The Tree (The the more conservative the larger gamma algorithm will be.): 0 is, Maximum Depth of A Tree: 6 Lambda (L2 regularization term on weights. this value will make model more Increasing conservative.): 1
id: 723e34f2b48be89682b30cdfa56b4579 - page: 7
Alpha (L1 regularization term on weights. this value will make model more Increasing conservative.): 0 Training and evaluation of the models in this project has been done using Colab virtual machines by google. Training time takes the most for Random Forest with an average of 167.97 seconds. Second place goes to XGB with an average of 46.85 seconds and finally kNN takes only 1.06 seconds on average to be trained for these datasets. 4. Evaluation and Strategy Design To evaluate each model in this project we use two different methods: Accuracy of The Model and The Profit Obtained by The Model. By accuracy in this context, we mean how many times the predicted label for the market direction matches with the real direction of the market. To discuss how we calculate the obtained profit, we need to understand how we use the models to create the strategy. procedure Design
id: 81e3ece678a9950a93fcc98e88ba432e - page: 7