Is News Sentiment the Missing Edge? Backtested Spoiler Alert... Yes!

In partnership with

200+ AI Side Hustles to Start Right Now

From prompt engineering to AI apps, there are countless ways to profit from AI now. Our guide reveals 200+ actionable AI business models, from no-code solutions to advanced applications. Learn how people are earning $500-$10,000 monthly with tools that didn't exist last year. Sign up for The Hustle to get the guide and daily insights.

Get the guide

🚀 Your Investing Journey Just Got Better: Premium Subscriptions Are Here! 🚀

It’s been 4 months since we launched our premium subscription plans at GuruFinance Insights, and the results have been phenomenal! Now, we’re making it even better for you to take your investing game to the next level. Whether you’re just starting out or you’re a seasoned trader, our updated plans are designed to give you the tools, insights, and support you need to succeed.

Here’s what you’ll get as a premium member:

Exclusive Trading Strategies: Unlock proven methods to maximize your returns.
In-Depth Research Analysis: Stay ahead with insights from the latest market trends.
Ad-Free Experience: Focus on what matters most—your investments.
Monthly AMA Sessions: Get your questions answered by top industry experts.
Coding Tutorials: Learn how to automate your trading strategies like a pro.
Masterclasses & One-on-One Consultations: Elevate your skills with personalized guidance.

Our three tailored plans—Starter Investor, Pro Trader, and Elite Investor—are designed to fit your unique needs and goals. Whether you’re looking for foundational tools or advanced strategies, we’ve got you covered.

Don’t wait any longer to transform your investment strategy. The last 4 months have shown just how powerful these tools can be—now it’s your turn to experience the difference.

Building a Deep Reinforcement Learning Bitcoin Trading Bot with TensorForce

ayratmurtazin.beehiiv.com/p/building-a-deep-reinforcement-learning-bitcoin-trading-bot-with-tensorforce?utm_source=ayratmurtazin.beehiiv.com&utm_medium=newsletter&utm_campaign=how-quants-made-4-144-returns-in-2020-while-markets-crashed-the-heston-model-edge&_bhlid=b393f43c87a02e867065579fe6aff3b2715ccba4

👉 Explore Premium Plans Now

Yes! Trading is an incredibly emotional journey. If you're unable to manage your emotions, it can really set you back. This is precisely why many genuine trading experts — the true professionals, not just those you find on YouTube— all agree on one crucial point: backtest your strategy and keep your emotions out of the game. But let’s get real for a moment. At some point, most traders find it challenging to be completely emotionless while following the news. So, it leads us to an interesting question: Can we really quantify the news to trade based on how others feel?

EODHD offers an API that calculates a sentiment score daily for all major stocks, ETFS, and cryptocurrencies, using news and social media. The score ranges from -1, indicating very negative sentiment, to 1, indicating very positive sentiment.

Where the smartest investors start their day

The Alternative Investing Report (A.I.R.) helps you get smarter on alternative assets from crypto treasury companies to Pre-IPO venture secondaries to private credit and more.

Join 100,000+ investors to get the latest insights and trends driving private markets, a weekly investment pick from a notable investor, and bonus offers to join top private market platforms and managers. And it’s totally free.

Read AIR today for free

What to expect in this article

Using Python, you will learn how to:

Get the prices and sentiment data from EODHD
Explore the sentiment in relation to the asset's price.
Plot how those two metrics correlate for an initial understanding.
Create a simple strategy and check out your first results
Optimize the strategy for various stocks
Evaluate the results

❝

Just a quick note before we dive in: this article won't provide you with a strategy to start trading tomorrow. Instead, my goal is to show you that sentiment can actually be quantified and utilised in various forms of automation. And here's a little spoiler — the answer is a delightful YES!

Let’s start coding

First, there are the boring imports and parameters. We will initially use Apple stock prices for the last two years.

import requests
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import requests_cache
import os
from datetime import datetime, timedelta
import seaborn as sns

api_token = os.environ.get('EODHD_API_TOKEN')
requests_cache.install_cache('cache')

TICKER = 'AAPL.US'
# Using a 2-year period for analysis
end_date = datetime.now()
start_date = end_date - timedelta(days=730)
from_date = start_date.strftime('%Y-%m-%d')
to_date = end_date.strftime('%Y-%m-%d')

In the second step, we will retrieve the price from the EODHD API.

def get_price_data(ticker, from_date, to_date):
    url = f'https://eodhd.com/api/eod/{ticker}'
    query = {'api_token': api_token, 'fmt': 'json', 'from': from_date, 'to': to_date}
    response = requests.get(url, params=query)

    if response.status_code != 200:
        print(f"Error retrieving price data: {response.status_code}")
        print(response.text)
        return None

    price_data = response.json()
    price_df = pd.DataFrame(price_data)

    # Convert date string to datetime for easier manipulation
    price_df['date'] = pd.to_datetime(price_df['date'])
    # Set date as index
    price_df.set_index('date', inplace=True)
    # Sort by date (ascending)
    price_df.sort_index(inplace=True)

    return price_df

price_df = get_price_data(TICKER, from_date, to_date)
price_df['pct_change'] = price_df['adjusted_close'].pct_change() * 100
price_df.head()

And then, using the EODHD API for sentiment, we’ll gather the sentiment data.

def get_sentiment_data(ticker, from_date, to_date):
    url = f'https://eodhd.com/api/sentiments'
    query = {'api_token': api_token, 's': ticker, 'from': from_date, 'to': to_date, 'fmt': 'json'}
    response = requests.get(url, params=query)

    if response.status_code != 200:
        print(f"Error retrieving sentiment data: {response.status_code}")
        print(response.text)
        return None

    sentiment_data = response.json()
    # Access the sentiment data using the ticker symbol as a key
    sentiment_df = pd.DataFrame(sentiment_data[ticker])

    # Convert date string to datetime
    sentiment_df['date'] = pd.to_datetime(sentiment_df['date'])
    # Set date as index
    sentiment_df.set_index('date', inplace=True)
    # Sort by date (ascending)
    sentiment_df.sort_index(inplace=True)

    # Rename column normalized to sentiment
    sentiment_df.rename(columns={'normalized': 'sentiment'}, inplace=True)

    return sentiment_df

sentiment_df = get_sentiment_data(TICKER, from_date, to_date)

sentiment_df.head()

We’ll be using the daily sentiment column as our key metric. Next, I’ll combine all the data into a single dataframe and break down the trend line. To make the plot clearer, I’ll also remove any outliers, ensuring that the trend line stands out beautifully…

merged_df = pd.merge(
    price_df[['adjusted_close','pct_change']],
    sentiment_df[['sentiment']],
    left_index=True,
    right_index=True,
    how='inner'
)

# Rename columns for clarity
merged_df.columns = ['price', 'price_pct_change', 'sentiment']

clean_df = merged_df[['price_pct_change','sentiment']].dropna().replace([np.inf, -np.inf], np.nan).dropna()

# Calculate IQR and bounds for both variables
def remove_outliers(df, columns):
    df_clean = df.copy()
    for column in columns:
        Q1 = df[column].quantile(0.25)
        Q3 = df[column].quantile(0.75)
        IQR = Q3 - Q1
        lower_bound = Q1 - 1.5 * IQR
        upper_bound = Q3 + 1.5 * IQR
        df_clean = df_clean[df_clean[column].between(lower_bound, upper_bound)]
    return df_clean

# Apply outlier removal
clean_df = remove_outliers(clean_df, ['price_pct_change', 'sentiment'])


# Create a scatter plot to visualize the relationship
plt.figure(figsize=(10, 6))
plt.scatter(clean_df['sentiment'], clean_df['price_pct_change'], alpha=0.6)
plt.xlabel('Sentiment Change')
plt.ylabel('Daily Price Percentage Change (%)')
plt.grid(True)

# Add a trend line
if len(clean_df) > 1:  # Only add trend line if we have enough data points
    try:
        z = np.polyfit(clean_df['sentiment'], clean_df['price_pct_change'], 1)
        p = np.poly1d(z)
        plt.plot(sorted(clean_df['sentiment']), p(sorted(clean_df['sentiment'])), "r--", alpha=0.8)
    except np.linalg.LinAlgError as e:
        print(f"Could not fit trend line: {e}")

plt.tight_layout()
plt.show()

Interestingly, although the trend line is not steep, it clearly indicates an upward trend correlating with larger daily changes driven by more positive news sentiment. Good news leads to better returns!

However, this information only becomes truly valuable when we can bring it to life. So, let’s dive into the strategy!

The Gold standard for AI news

AI will eliminate 300 million jobs in the next 5 years.

Yours doesn't have to be one of them.

Here's how to future-proof your career:

Join the Superhuman AI newsletter - read by 1M+ professionals
Learn AI skills in 3 mins a day
Become the AI expert on your team

Start learning AI now

Moving Average Sentiment Strategy

The strategy is designed to be relatively straightforward! I'll incorporate the logic of both fast and slow moving averages to analyse price and sentiment metrics. Additionally, I'll outline two strategy variations for you to consider.

I will name the first one LONGSHORT. Practically,

When both the price and sentiment fast-moving averages are above the slow one (price and sentiment agree), I will go long.
If the slow-moving average is above the fast-moving average, I will go short.
In all other cases where MAs disagree, I will remain neutral.

The second indicator will be named ALWAYSLONG_OUTWHEN_NEGSENT, and it's even simpler. I will always take long positions regardless of the price, as long as the fast-moving average is above the slow one.

There are two functions we will define. The first is to calculate the strategy's equity curve:

def calculate_equity_curve(prices, signals):
    # Ensure index alignment
    signals = signals.shift(1)  # Shift signals to align with the period for trade execution
    signals = signals.reindex(prices.index).fillna(0)

    # Calculate percentage changes
    pct_changes = prices.pct_change().fillna(0)

    # Calculate equity curve
    equity_curve = (1 + pct_changes * signals).cumprod() * 100

    return equity_curve

And then the actual strategy:

def analyze_ma_crossover_strategy(ticker, fast_window, slow_window, from_date, to_date, strategy_type = "LONGSHORT"):

    # Retrieve price data
    price_df = get_price_data(ticker, from_date, to_date)
    if price_df is None or len(price_df) == 0:
        print(f"Error: Could not retrieve price data for {ticker}")
        return None

    # Retrieve sentiment data
    sentiment_df = get_sentiment_data(ticker, from_date, to_date)
    if sentiment_df is None or len(sentiment_df) == 0:
        print(f"Error: Could not retrieve sentiment data for {ticker}")
        return None

    # Create a combined dataframe
    df = pd.DataFrame(index=price_df.index)
    df['adjusted_close'] = price_df['adjusted_close']

    # Merge sentiment data (may have different dates)
    df = df.join(sentiment_df['sentiment'], how='left')

    # Forward fill missing sentiment values
    df['sentiment'] = df['sentiment'].ffill()

    # Calculate moving averages for price
    df[f'price_fast_ma'] = df['adjusted_close'].rolling(window=fast_window).mean()
    df[f'price_slow_ma'] = df['adjusted_close'].rolling(window=slow_window).mean()

    # Calculate moving averages for sentiment
    df[f'sentiment_fast_ma'] = df['sentiment'].rolling(window=fast_window).mean()
    df[f'sentiment_slow_ma'] = df['sentiment'].rolling(window=slow_window).mean()

    # Generate signals
    if strategy_type == "LONGSHORT":
        # 1 when fast MA > slow MA for both price and sentiment
        # -1 when fast MA < slow MA for both price and sentiment
        # 0 otherwise
        df['price_signal'] = np.where(df[f'price_fast_ma'] > df[f'price_slow_ma'], 1, -1)
        df['sentiment_signal'] = np.where(df[f'sentiment_fast_ma'] > df[f'sentiment_slow_ma'], 1, -1)
        df['signal'] = np.where((df['price_signal'] == 1) & (df['sentiment_signal'] == 1), 1,
                               np.where((df['price_signal'] == -1) & (df['sentiment_signal'] == -1), -1, 0))
    elif strategy_type == 'ALWAYSLONG_OUTWHEN_NEGSENT':
        # Always 1 except when sentiment signal is
        df['price_signal'] = pd.Series(1, index=df.index)
        df['sentiment_signal'] = np.where(df[f'sentiment_fast_ma'] > df[f'sentiment_slow_ma'], 1, -1)
        df['signal'] = np.where((df['sentiment_signal'] == -1), 0, df[f'price_signal'])
    else:
        raise ValueError("Invalid strategy type")

    # Calculate returns
    df['pct_change'] = df['adjusted_close'].pct_change().fillna(0)

    # Calculate equity curves using the calculate_equity_curve function
    # For buy and hold, we use a signal of 1 (always long)
    buy_hold_signal = pd.Series(1, index=df.index)
    df['buy_hold_equity'] = calculate_equity_curve(df['adjusted_close'], buy_hold_signal) / 100

    # For strategy equity, we use the generated signals
    df['strategy_equity'] = calculate_equity_curve(df['adjusted_close'], df['signal']) / 100

    return df

Let’s try with a fast window of 5 and a slow window of 15 using LONGSHORT, then plot the results.

def plot_strategy_results(df):
    # Create a figure with single plot for equity curves
    plt.figure(figsize=(12, 6))

    # Plot equity curves
    plt.plot(df.index, df['buy_hold_equity'], label='Buy & Hold')
    plt.plot(df.index, df['strategy_equity'], label='MA Crossover Strategy')
    plt.title('Equity Curves')
    plt.ylabel('Equity (Starting at 1)')
    plt.xlabel('Date')
    plt.legend()
    plt.grid(True)

    plt.tight_layout()
    plt.show()

result_df = analyze_ma_crossover_strategy(TICKER, fast_window=5, slow_window=15, from_date=from_date, to_date=to_date, strategy_type="LONGSHORT")
plot_strategy_results(result_df)

The results are disappointing. Evidently, the strategy lost the mid-2024 uptrend and ultimately resulted in financial losses.

Let’s aim to reduce the windows to 1 and 5, utilising the second variation of the strategy.

result_df = analyze_ma_crossover_strategy(TICKER, fast_window=1, slow_window=5, from_date=from_date, to_date=to_date, strategy_type="ALWAYSLONG_OUTWHEN_NEGSENT")
plot_strategy_results(result_df)

In this case, the difference is significantly better for our strategy. However, this difference is mainly due to the strategy being neutral in the first days after the announcements of the US tariffs, during which buy-and-hold strategies faced significant losses. However, it capitalised on the positive sentiment in the following days, achieving substantial returns.

Let’s overfit our parameters!

Yes, overfitting is bad, I know. However, it can reveal some patterns that support our assumption. That news sentiment can be quantified!

tickers = ['AAPL.US', 'MSFT.US', 'NVDA.US', 'GOOGL.US', 'META.US', 'AMZN.US', 'TSLA.US', 'HD.US', 'JNJ.US', 'UNH.US', 'PFE.US', 'MRK.US', 'JPM.US', 'V.US', 'MA.US', 'PG.US', 'KO.US', 'PEP.US', 'XOM.US', 'CVX.US', 'NEE.US', 'DUK.US', 'LIN.US', 'BA.US', 'CAT.US', 'RTX.US', 'SPG.US', 'AMT.US', 'MO.US']
# Create lists to store the data
data = []
fast_windows = range(1, 6)  # 1 to 5
slow_windows = range(5, 21, 5)  # 5, 10, 15, 20

# Loop through each ticker
for ticker in tickers:
    # Initialize variables to track the best parameters and results for this ticker
    best_fast_window = None
    best_slow_window = None
    best_strategy_type = None
    best_return = -float('inf')
    best_buy_hold_return = None

    # Rest of the loop logic remains the same...
    for fast_window in fast_windows:
        for slow_window in slow_windows:
            for strategy_type in ["LONGSHORT", "ALWAYSLONG_OUTWHEN_NEGSENT"]:
                if fast_window >= slow_window:
                    continue

                result_df = analyze_ma_crossover_strategy(ticker, fast_window, slow_window,
                                                          from_date=from_date, to_date=to_date,
                                                          strategy_type=strategy_type)

                if result_df is not None:
                    strategy_return = result_df['strategy_equity'].iloc[-1]
                    buy_hold_return = result_df['buy_hold_equity'].iloc[-1]

                    if strategy_return > best_return:
                        best_return = strategy_return
                        best_fast_window = fast_window
                        best_slow_window = slow_window
                        best_strategy_type = strategy_type
                        best_buy_hold_return = buy_hold_return

    # Instead of concatenating each time, append to the list
    data.append({
        'ticker': ticker,
        'fast_window': best_fast_window,
        'slow_window': best_slow_window,
        'strategy_type': best_strategy_type,
        'strategy_return': best_return,
        'buy_hold_return': best_buy_hold_return
    })

# Create the DataFrame once with all the data
optimal_params_df = pd.DataFrame(data)
optimal_params_df['str_bnh_diff'] = optimal_params_df['strategy_return'] - optimal_params_df['buy_hold_return']
optimal_params_df

And plot them.

plt.figure(figsize=(12, 10))

# Create horizontal bar positions
y_pos = np.arange(len(optimal_params_df))

# Create color array for strategy returns based on comparison with buy & hold
strategy_colors = ['green' if s > b else 'red' for s, b in zip(optimal_params_df['strategy_return'], optimal_params_df['buy_hold_return'])]

# Plot the bars with new colors
plt.barh(y_pos - 0.2, optimal_params_df['strategy_return'], height=0.4, label='Strategy Return', color=strategy_colors)
plt.barh(y_pos + 0.2, optimal_params_df['buy_hold_return'], height=0.4, label='Buy & Hold Return', color='grey')

plt.yticks(y_pos, optimal_params_df['ticker'])
plt.xlabel('Return')
plt.title('Strategy vs Buy & Hold Returns by Ticker')
plt.legend()
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

Most strategy returns performed better (green) than buy-and-hold. Surprisingly, all assets that showed negative results with buy-and-hold had at least positive outcomes with the strategy (no losses).

One last thing before we conclude. This time, I will not optimise the results. I will include all possible combination results in the dataframe and then plot a heatmap comparing the fast and slow windows.

tickers = ['AAPL.US', 'MSFT.US', 'NVDA.US', 'GOOGL.US', 'META.US', 'AMZN.US', 'TSLA.US', 'HD.US', 'JNJ.US', 'UNH.US', 'PFE.US', 'MRK.US', 'JPM.US', 'V.US', 'MA.US', 'PG.US', 'KO.US', 'PEP.US', 'XOM.US', 'CVX.US', 'NEE.US', 'DUK.US', 'LIN.US', 'BA.US', 'CAT.US', 'RTX.US', 'SPG.US', 'AMT.US', 'MO.US']
# Create lists to store the data
data = []
fast_windows = range(1, 6)  # 1 to 5
slow_windows = range(5, 21, 5)  # 5, 10, 15, 20

# Loop through each ticker
for ticker in tickers:
    # Initialize variables to track the best parameters and results for this ticker
    best_fast_window = None
    best_slow_window = None
    best_strategy_type = None
    best_return = -float('inf')
    best_buy_hold_return = None

    # Rest of the loop logic remains the same...
    for fast_window in fast_windows:
        for slow_window in slow_windows:
            for strategy_type in ["LONGSHORT", "ALWAYSLONG_OUTWHEN_NEGSENT"]:
                if fast_window >= slow_window:
                    continue

                result_df = analyze_ma_crossover_strategy(ticker, fast_window, slow_window,
                                                          from_date=from_date, to_date=to_date,
                                                          strategy_type=strategy_type)

                if result_df is not None:
                    strategy_return = result_df['strategy_equity'].iloc[-1]
                    buy_hold_return = result_df['buy_hold_equity'].iloc[-1]

                    data.append({
                        'ticker': ticker,
                        'fast_window': fast_window,
                        'slow_window': slow_window,
                        'strategy_type': strategy_type,
                        'strategy_return': strategy_return,
                        'buy_hold_return': buy_hold_return
                    })

# Create the DataFrame once with all the data
all_runs_df = pd.DataFrame(data)
all_runs_df['str_bnh_diff'] = optimal_params_df['strategy_return'] - optimal_params_df['buy_hold_return']

# First, let's create the aggregated data
grouped_df = all_runs_df.groupby(['fast_window', 'slow_window', 'strategy_type']).agg({
    'strategy_return': 'mean',
    'buy_hold_return': 'mean',
}).reset_index()

# Calculate the difference for coloring
grouped_df['str_bnh_diff'] = grouped_df['strategy_return'] - grouped_df['buy_hold_return']

plt.figure(figsize=(12, 8))

# Pivot the data to create a matrix suitable for heatmap
heatmap_data = grouped_df.pivot_table(
    index='fast_window',
    columns='slow_window',
    values='str_bnh_diff',
    aggfunc='mean'
)

# Create heatmap
sns.heatmap(heatmap_data,
            annot=True,
            fmt='.2f',
            cmap='RdYlGn',
            center=0,
            cbar_kws={'label': 'Strategy Return - Buy & Hold Return'}
            )

plt.title('Strategy Performance by Window Combinations')
plt.xlabel('Slow Window')
plt.ylabel('Fast Window')
plt.tight_layout()
plt.show()

From the heatmap, we can draw two conclusions:

Even though the results were quite promising when we optimized the fast and slow window for each stock separately, on average, all the results are negative when comparing the strategy to buy and hold. This means that some stocks are more sensitive to the news than others.
What is also very clear is that when the fast window is 1, meaning that we use the sentiment from the day before, it is always better than the larger windows.

Let’s conclude with some food for thought

Sentiment scores derived from news and social media can be measured and utilised in trading strategies, indicating a positive correlation with daily price fluctuations.
Customising moving average windows for individual stocks frequently produced superior outcomes compared to a buy-and-hold strategy, particularly for assets with negative returns.
Some stocks react more strongly to news events than others.
News travels fast. The best returns were captured when we were comparing yesterday's sentiment with the previous 5 days' average.
Major events like the US tariffs that shocked the market were captured from the EODHD API and kept us out.

Let’s explore some ideas to check if you want to dive deeper into this subject:

Explore how past sentiments can help us understand returns over several days, rather than focusing on just a single day.
Examine the volatility of sentiment and whether this is also related to volume.
Check if you can confirm the ranging periods, and whether the news sentiment is a “flat line.”
Try to identify patterns regarding which assets are more sensitive to others and why.