• GuruFinance Insights
  • Posts
  • Predicting Stock Price Direction Using Decision Trees and Technical Indicators

Predicting Stock Price Direction Using Decision Trees and Technical Indicators

Combining Data Science and Technical Analysis for Stock Market Predictions

In partnership with

Inventory Software Made Easy—Now $499 Off

Looking for inventory software that’s actually easy to use?

inFlow helps you manage inventory, orders, and shipping—without the hassle.

It includes built-in barcode scanning to facilitate picking, packing, and stock counts. inFlow also integrates seamlessly with Shopify, Amazon, QuickBooks, UPS, and over 90 other apps you already use

93% of users say inFlow is easy to use—and now you can see for yourself.

Try it free and for a limited time, save $499 with code EASY499 when you upgrade.

Free up hours each week—so you can focus more on growing your business.

Hear from real users in our case studies
 🚀 Compare plans on our pricing page

🚀 Your Investing Journey Just Got Better: Premium Subscriptions Are Here! 🚀

It’s been 4 months since we launched our premium subscription plans at GuruFinance Insights, and the results have been phenomenal! Now, we’re making it even better for you to take your investing game to the next level. Whether you’re just starting out or you’re a seasoned trader, our updated plans are designed to give you the tools, insights, and support you need to succeed.

Here’s what you’ll get as a premium member:

  • Exclusive Trading Strategies: Unlock proven methods to maximize your returns.

  • In-Depth Research Analysis: Stay ahead with insights from the latest market trends.

  • Ad-Free Experience: Focus on what matters most—your investments.

  • Monthly AMA Sessions: Get your questions answered by top industry experts.

  • Coding Tutorials: Learn how to automate your trading strategies like a pro.

  • Masterclasses & One-on-One Consultations: Elevate your skills with personalized guidance.

Our three tailored plans—Starter Investor, Pro Trader, and Elite Investor—are designed to fit your unique needs and goals. Whether you’re looking for foundational tools or advanced strategies, we’ve got you covered.

Don’t wait any longer to transform your investment strategy. The last 4 months have shown just how powerful these tools can be—now it’s your turn to experience the difference.

Predicted Stock Direction

Stock market predictions are challenging, but machine learning models can help identify trends by learning patterns in historical data.

In this article, we’ll walk through how we can use a decision tree classifier to predict stock price movements for Alphabet Inc. (GOOGL) using Python.

The process includes data collection, feature engineering, model training, and evaluation.

Make your marketing less boring

The best marketing ideas come from marketers who live it. That’s what The Marketing Millennials delivers: real insights, fresh takes, and no fluff. Written by Daniel Murray, a marketer who knows what works, this newsletter cuts through the noise so you can stop guessing and start winning. Subscribe and level up your marketing game.

Objective: Predicting Stock Price Movements

The goal of this project is to build a model that can predict whether the stock price of Google (GOOGL) will go up (1) or down (0) in the next week.

We will use historical stock data, process it into features that reflect the stock’s behavior, and then apply a decision tree classifier to predict future movements.

Overview of the Approach

  1. Data Collection: We start by downloading historical stock data.

  2. Feature Engineering: We create new features that could help the model make better predictions.

  3. Modeling: We use a decision tree classifier to make predictions.

  4. Evaluation: We evaluate the model’s performance and visualize the results.

Why a Decision Tree?

A decision tree is a popular classification algorithm. It works by splitting the data based on different features, which makes it easy to interpret.

It’s a non-linear model, meaning it can handle complex relationships in the data. Some advantages of decision trees include:

  • Interpretability: Easy to understand and visualize.

  • Non-linear modeling: Can handle non-linear relationships between features.

  • Low requirement for data scaling: Doesn’t need to be scaled like other algorithms such as logistic regression.

However, decision trees can be prone to overfitting, especially when they are too deep. We’ll use a regularized version of the tree to combat this.

Step-by-Step Code Walkthrough

You can find the complete code and notebook for this project on GitHub: View Repository. This includes all the preprocessing steps, model training, evaluation, and visualizations explained in this article.

1. Installing Dependencies

We first install the required Python libraries using pip. These libraries are crucial for data manipulation, model training, and evaluation.

pip install yfinance pandas numpy scikit-learn seaborn matplotlib

2. Importing Libraries

Here, we import the necessary libraries for data manipulation (pandas, numpy), machine learning (scikit-learn), and visualization (matplotlib, seaborn).

import yfinance as yf
from sklearn.metrics import confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.preprocessing import StandardScaler
from sklearn.tree import DecisionTreeClassifier

plt.style.use('dark_background')

3. Downloading the Data

We download historical stock data for Google from Yahoo Finance using the yfinance library. We choose a weekly interval and a time frame from 2010 to 2023.

data = yf.download("GOOGL", start="2010-01-01", end="2023-12-31", auto_adjust=True, interval="1wk")
data.head()

The auto_adjust=True argument ensures that adjusted closing prices are used to account for events like stock splits.

4. Preparing the Data

We remove any unnecessary multi-level index created during the download process and select only the relevant columns (Open, Close, Volume, Low, High).

data.columns = data.columns.get_level_values(0)
data = data[['Open', 'Close', 'Volume', 'Low', 'High']]

# Plotting the closing price
plt.figure(figsize=(14,6))
plt.plot(data.index, data['Close'], label='Close Price', color='blue')
plt.title('Stock Closing Price Over Time')
plt.xlabel('Date')
plt.ylabel('Closing Price (USD)')
plt.legend()
plt.grid(True)
plt.savefig('stock_closing_price.png')
plt.show()

Alphabet’s Closing Stock Price Chart

5. Creating the Labels

We create the target label: 1 if tomorrow’s closing price is higher than today’s, and 0 otherwise.

data["Direction"] = (data["Close"].shift(-1) > data["Close"]).astype(int)
data.dropna(inplace=True)

6. Feature Engineering

Next, we calculate several technical indicators that might help predict the stock’s future movements. These include:

  • Returns: Daily and 5-day returns.

  • Moving Averages: 5-day and 10-day moving averages.

  • Volatility: The 5-day rolling standard deviation.

  • Momentum: The difference in closing prices from 10 days ago.

  • RSI: Relative Strength Index, a momentum indicator.

  • Range: Difference between high and low prices.

data['Return_1d'] = data['Close'].pct_change()
data['Return_5d'] = data['Close'].pct_change(5)
data['MA_5'] = data['Close'].rolling(window=5).mean()
data['MA_10'] = data['Close'].rolling(window=10).mean()
data['MA_ratio'] = data['MA_5'] / data['MA_10']
data['Volatility_5d'] = data['Close'].rolling(window=5).std()
data['Momentum_10'] = data['Close'] - data['Close'].shift(10)

def compute_rsi(series, period=14):
    delta = series.diff()
    gain = (delta.where(delta > 0, 0)).rolling(window=period).mean()
    loss = (-delta.where(delta < 0, 0)).rolling(window=period).mean()
    rs = gain / loss
    return 100 - (100 / (1 + rs))

data['RSI_14'] = compute_rsi(data['Close'])
data['Range'] = data['High'] - data['Low']
data['Close_to_High'] = data['High'] - data['Close']
data['Close_to_Low'] = data['Close'] - data['Low']

These features capture different aspects of stock behavior, such as momentum, volatility, and market trends.

7. Preparing Features and Target Variables

We select the features and target variable for model training. The target is Direction, and the features are the calculated indicators and stock data columns.

feature_cols = [
    'Open', 'Close', 'Volume', 'Low', 'High',
    'Return_1d', 'Return_5d', 'MA_5', 'MA_10', 'MA_ratio',
    'Volatility_5d', 'Momentum_10', 'RSI_14', 'Range', 'Close_to_High', 'Close_to_Low'
]

data.dropna(inplace=True)

X = data[feature_cols]
y = data["Direction"]

8. Scaling the Data

We scale the features to ensure that all are on the same scale, which helps the model perform better.

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

9. Train/Test Split

We split the data into training and testing sets, using 80% of the data for training and 20% for testing.

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, shuffle=False)

10. Model Training

We use a decision tree classifier for this task. The decision tree is a simple and interpretable machine learning model.

model = DecisionTreeClassifier(random_state=42)
model.fit(X_train, y_train)

11. Making Predictions

We use the trained model to make predictions on the test data and evaluate the model’s performance.

y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))
              precision    recall  f1-score   support

           0       0.44      0.25      0.32        68
           1       0.51      0.71      0.60        76

    accuracy                           0.49       144
   macro avg       0.48      0.48      0.46       144
weighted avg       0.48      0.49      0.46       144

12. Visualizing Predictions

We plot the predicted stock movements against the actual closing prices to visually assess the model’s performance.

plt.figure(figsize=(14,6))
plt.plot(data.index[-len(y_test):], data["Close"][-len(y_test):], label='Close Price')
plt.plot(data.index[-len(y_test):][y_pred == 1], data["Close"][-len(y_test):][y_pred == 1], '^', markersize=10, color='g', label='Predicted Up')
plt.plot(data.index[-len(y_test):][y_pred == 0], data["Close"][-len(y_test):][y_pred == 0], 'v', markersize=10, color='r', label='Predicted Down')
plt.title("Predicted Market Direction vs Close Price")
plt.legend()
plt.show()

Predicted Stock Direction

13. Evaluating the Model with a Confusion Matrix

We use a confusion matrix to get a deeper understanding of the model’s performance.

cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix')
plt.show()

Confusion Matrix

14. Feature Importances

We plot the feature importances, showing which features had the most impact on the decision tree’s predictions.

importances = model.feature_importances_
feat_names = X.columns
plt.barh(feat_names, importances)
plt.title("Feature Importances (Decision Tree)")
plt.xlabel("Importance")
plt.show()

Feature Importance

Possible Improvements

  1. Model Tuning: Decision trees can easily overfit the data. We can try tuning the tree by adjusting the maximum depth or using ensemble methods like Random Forest or Gradient Boosting.

  2. Additional Features: We could experiment with more advanced technical indicators or use macroeconomic data (e.g., interest rates, inflation).

  3. Alternative Models: We could explore more sophisticated models like Support Vector Machines (SVM) or neural networks for better accuracy.