Introduction:
Stock price prediction plays a crucial role in investment decision-making. In this article, we explore the application of Random Forest Regression to predict stock prices and evaluate the resulting profitability and accuracy. We focus on the stock of Apple Inc. (AAPL) and analyze historical data over the past six months.
Step 1: Retrieving Historical Data:
We start by retrieving historical stock price data using the Yahoo Finance API. The data consists of daily closing prices, which will serve as our primary feature for prediction.
Step 2: Preparing Features and Target Variable:
To train our Random Forest Regressor model, we prepare the feature variable (X) as the closing price and the target variable (y) as the shifted closing price for the following day.
Step 3: Training the Random Forest Regressor:
We train the Random Forest Regressor model with 100 estimators, a maximum depth of 10, and a minimum number of samples required to be a leaf node set to 2.
Step 4: Predicting Close Prices:
Using the trained model, we predict the close prices for the past six months, excluding the last data point, which we use for evaluation purposes.
Step 5: Analyzing Signals, Accuracy, and Profitability:
We analyze the predicted signals generated by the Random Forest model and compare them with the actual close prices. We count the number of correct and wrong signals to calculate the accuracy rating. Additionally, we compute the profitability by placing trades based on the signals and calculating the profit made by exiting the trade on the following day.
Step 6: Plotting Actual and Predicted Close Prices:
We visualize the actual and predicted close prices over the past six months using a line chart. This provides a visual representation of the model's performance.
Results and Discussion:
Upon analyzing the signals, we obtain the accuracy rating and total profit made. The accuracy rating reflects the percentage of correct signals generated by the model. Furthermore, the profitability is calculated by considering the difference in prices between consecutive days and multiplying it by the investment amount ($1 in this case).
In our evaluation, we observed a certain level of accuracy and profit generated by the model. However, it is important to note that this analysis is based on historical data and simplified assumptions. Achieving high accuracy and profitability in real-world trading involves various factors, including market conditions, transaction costs, and risk management.
Improving Accuracy:
To improve the accuracy of stock price prediction, consider the following strategies:
- Feature Engineering: Explore additional features such as volume, technical indicators, and news sentiment to enhance the predictive power of the model.
- Model Tuning: Experiment with different hyperparameters of the Random Forest Regressor, such as the number of estimators, maximum depth, and minimum samples per leaf, to find the optimal configuration for improved accuracy.
- Ensemble Methods: Consider using ensemble techniques, such as combining multiple models or incorporating other algorithms like Gradient Boosting or LSTM, to leverage diverse predictions and potentially enhance accuracy.
- Cross-Validation: Implement cross-validation techniques to evaluate the model's performance on multiple folds of data and ensure its robustness and generalization.
Conclusion:
The application of Random Forest Regression for stock price prediction demonstrates potential profitability and accuracy. By analyzing signals, calculating accuracy, and evaluating profit, we gain insights into the model's performance. However, it is crucial to remember that investing in the stock market involves risks, and the accuracy of predictions may vary under different market conditions. Continued research, exploration of additional features, and fine-tuning of the model are necessary for further improvements in accuracy. It is advisable to consult with financial experts and consider multiple factors before making investment decisions.
Here is the sample Python code used in this article:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 | import yfinance as yf import pandas as pd import numpy as np from datetime import datetime, timedelta from sklearn.ensemble import RandomForestRegressor import matplotlib.pyplot as plt # Step 1: Retrieve historical data ticker = "AAPL" # Ticker symbol for Apple Inc. end_date = datetime.today().strftime('%Y-%m-%d') start_date = (datetime.today() - timedelta(days=6*30)).strftime('%Y-%m-%d') data = yf.download(ticker, start=start_date, end=end_date) # Step 2: Prepare features and target variable data['Close_shifted'] = data['Close'].shift(-1) data = data.dropna() X = data[['Close']] y = data['Close_shifted'] # Step 3: Train the Random Forest Regressor model = RandomForestRegressor(n_estimators=100, max_depth=10, min_samples_leaf=2) model.fit(X, y) # Step 4: Predict close price for the past 6 months predicted_prices = model.predict(X[:-1]) # Exclude the last data point from prediction # Step 5: Analyze the signals and calculate accuracy and profit correct_signals = 0 wrong_signals = 0 total_profit = 0 for i in range(len(predicted_prices)): if predicted_prices[i] > data['Close'].iloc[i]: if data['Close'].iloc[i+1] > data['Close'].iloc[i]: correct_signals += 1 else: wrong_signals += 1 elif predicted_prices[i] < data['Close'].iloc[i]: if data['Close'].iloc[i+1] < data['Close'].iloc[i]: correct_signals += 1 else: wrong_signals += 1 # Calculate profit if predicted_prices[i] > data['Close'].iloc[i]: daily_profit = (data['Close'].iloc[i+1] - data['Close'].iloc[i]) / data['Close'].iloc[i] total_profit += daily_profit accuracy = correct_signals / (correct_signals + wrong_signals) * 100 print("Correct Signals:", correct_signals) print("Wrong Signals:", wrong_signals) print("Accuracy Rating:", accuracy, "%") print("Total Profit:", total_profit) # Step 6: Plot actual and predicted close prices plt.figure(figsize=(10, 6)) plt.plot(data.index[:-1], data['Close'][:-1], label='Actual Close Price') # Exclude the last data point plt.plot(data.index[:-1], predicted_prices, label='Predicted Close Price') # Exclude the last data point plt.xlabel('Date') plt.ylabel('Close Price') plt.title('Actual vs Predicted Close Price') plt.legend() plt.xticks(rotation=45) plt.show() |
Evaluation results after running the program:
And here is the generated chart: