GuruFinance Insights
Posts
Building A Stock Prediction Software With Python

Building A Stock Prediction Software With Python

Ayrat Murtazin
August 28, 2024

Exciting News: We Are Launching A Waitlist!

Last week I sent you all an email that starting from September, we’re introducing paid subscriptions to bring you even more value. Free users will still have access to market insights and weekly news, but paid subscribers will get:

Exclusive trading strategies used by me and my team
The Research Paper Analysis And Strategies Built Upon Them
Ad-free content for a seamless reading experience
AMA sessions to get your questions answered directly

As we expand and bring on more team members to improve this publication, we need to cover rising costs. Please let us know if you’re interested in these updates by filling out this Waitlist Google Form.

Your feedback is key to shaping our future!

🗣️ Stock Market Today: These 3 Stocks Are Beating Nvidia This Year. Are They Better Buys Than the AI Leader?

Nvidia (NASDAQ: NVDA) has been a standout in the stock market this year, driven by its leadership in the AI chip sector and impressive financial performance, including a 160% stock surge. However, some stocks have outpaced Nvidia's performance, including Cava Group, Sweetgreen, and Carvana. Cava, a Mediterranean fast-casual chain, has seen its stock rise 182% this year, driven by strong comparable sales growth, impressive unit volumes, and robust profit margins similar to Chipotle. Sweetgreen, known for its fast-casual salads, has rebounded with a 225% stock gain thanks to its innovative Infinite Kitchen technology and growing revenues. Carvana, the online used-car dealer, has made a remarkable turnaround, up 194% year to date, by cutting costs, reducing debt, and returning to positive growth.

Among these three stocks, Cava appears the most promising due to its consistent execution and potential to replicate Chipotle’s growth trajectory. Sweetgreen’s unique concept and automated kitchen technology also present strong growth potential, while Carvana remains the riskiest due to its debt load and market cyclicality, although it could benefit from falling interest rates. Compared to Nvidia, these smaller companies offer more upside potential, as Nvidia’s massive market cap limits its growth prospects. For investors seeking the next major growth opportunity, Cava, Sweetgreen, and Carvana could be more rewarding alternatives to the tech giant.

Stay informed with today's rundown:

Today, we delve into the “Building A Stock Prediction Software With Python”👇

Today, I’m excited to share a new piece of content that’s a bit different from what I usually publish. While I typically don't post code tutorials, I believe today's guide will be valuable even for those who aren't software developers. Whether you're new to coding or just curious, I hope you’ll find it both informative and accessible!

Let’s begin!

This project combines a Python framework, with tools like Matplotlib, Sklearn, and Yahoo Finance to predict future stock prices. I used a Linear Regression Model from Sklearn because it’s well-suited for forecasting stock trends. To build an accurate model, I pulled a real-time dataset from Yahoo Finance, which provided essential features like “Close Price,” “Open Price,” “Volume,” “High,” and “Low.” I also added a “Date” feature to keep track of daily data. This dataset served as the foundation for training and testing my model. Here’s an example of the raw data I worked with from Google.

Below is the code I implemented to achieve this functionality. I incorporated Python’s datetime module to allow users greater flexibility in setting the time range for the stock data.

from datetime import date
import pandas as pd
import yfinance as yf

howmanyyears = int(input("How many years? > ")) # <-- Getting user input for years
today = date.today()
END_DATE = today.isoformat()
START_DATE = date(today.year - howmanyyears, today.month, today.day).isoformat()

whichstock = input("Which stock? > ") # <-- Getting user input for stock name
data = yf.download(whichstock, start=START_DATE, end=END_DATE)

data.reset_index(inplace=True)
data['Date'] = pd.to_datetime(data.Date) # <-- Inserting the 'Date' Feature

# Outputting the first 15 rows of data
print(data.head(15)) 
print(f"Data: {data.shape}")

In addition to this, I introduced two new features: the 50-day and 200-day Exponential Moving Averages (EMA). These additions helped me assess whether the stock market was leaning towards a Bearish or Bullish trend over specific periods, providing users with deeper insights into stock trends. Before proceeding with the Regression Model, I wanted to visualize some of the data. I created plots comparing High vs. Low prices and charted the daily closing prices alongside the 50-day and 200-day EMAs.

data['EMA-50'] = data['Close'].ewm(span=50, adjust=False).mean()
data['EMA-200'] = data['Close'].ewm(span=200, adjust=False).mean()

Now, let’s generate some plots

# High vs Low Graph
plt.figure(figsize=(8, 4))
plt.plot(data['Low'], label="Low", color="indianred")
plt.plot(data['High'], label="High", color="mediumseagreen")
plt.ylabel('Price (in USD)')
plt.xlabel("Time")
plt.title(f"High vs Low of {stock_name}")
plt.tight_layout()
plt.legend()

# Exponential Moving Average Graph
plt.figure(figsize=(8, 4))
plt.plot(data['EMA-50'], label="EMA for 50 days")
plt.plot(data['EMA-200'], label="EMA for 200 days")
plt.plot(data['Adj Close'], label="Close")
plt.title(f'Exponential Moving Average for {stock_name}')
plt.ylabel('Price (in USD)')
plt.xlabel("Time")
plt.legend()
plt.tight_layout()

High vs Low of GOOG

Exponential Moving Average for GOOG

After exploring the dataset, I moved on to building the Linear Regression Model. The main objective of the project was to predict the stock’s Closing Price, so I set it as the target variable (X component). The remaining features were used as input variables (Y components) for the model.

x = data[['Open', 'High', 'Low', 'Volume', 'EMA-50', 'EMA-200']]
y = data['Close']

Next, I used Scikit-learn’s train_test_split function to divide the data into two parts: 80% for training and 20% for testing.

X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2)

With the data split, I proceeded to fit the Linear Regression Model and make predictions on future stock prices.

lr_model = LinearRegression()
lr_model.fit(X_train, y_train)
pred = lr_model.predict(X_test)

To assess the accuracy of the model, I plotted a graph comparing the model’s predicted values against the actual values.

Real Values VS Predicted Values

Additionally, I printed the Real vs. Predicted prices for the stock on a selection of random days. This approach provided a clear comparison of how closely the predicted values matched the actual prices, helping to evaluate whether the model was performing as expected.

d=pd.DataFrame({'Actual_Price': y_test, 'Predicted_Price': pred})

print(d.head(10))
print(d.describe())

Once the model was complete, the final step was to predict the closing price using the various features. One key relationship that stood out was between Volume and Closing Price, where the model performed exceptionally well, predicting values with minimal error.

Predicted VS Actual Closing Price Based on Volume

To thoroughly evaluate the model’s performance, I examined key statistics, including the r² score, mean absolute error, and mean squared error. Below are the values for each metric.

Results

You can save up to 100% on a Tradingview subscription with my refer-a-friend link. When you get there, click on the Tradingview icon on the top-left of the page to get to the free plan if that’s what you want.