GuruFinance Insights
Posts
Statistical Arbitrage in Python: How Ken Griffin Built a Fortune

Statistical Arbitrage in Python: How Ken Griffin Built a Fortune

Ayrat Murtazin
March 03, 2025

In partnership with

Today’s Fastest Growing Company Might Surprise You

🚨 No, it's not the publicly traded tech giant you might expect… Meet $MODE, the disruptor turning phones into potential income generators.

Mode saw 32,481% revenue growth, ranking them the #1 software company on Deloitte’s 2023 fastest-growing companies list.

📲 They’re pioneering "Privatized Universal Basic Income" powered by technology — not government, and their EarnPhone, has already helped consumers earn over $325M!

Their pre-IPO offering is live at just $0.26/share – don’t miss it.

Invest Now

Mode Mobile recently received their ticker reservation with Nasdaq ($MODE), indicating an intent to IPO in the next 24 months. An intent to IPO is no guarantee that an actual IPO will occur.
The Deloitte rankings are based on submitted applications and public company database research, with winners selected based on their fiscal-year revenue growth percentage over a three-year period.
*Please read the offering circular and related risks at invest.modemobile.com.

🚀 Your Investing Journey Just Got Better: Premium Subscriptions Are Here! 🚀

It’s been 4 months since we launched our premium subscription plans at GuruFinance Insights, and the results have been phenomenal! Now, we’re making it even better for you to take your investing game to the next level. Whether you’re just starting out or you’re a seasoned trader, our updated plans are designed to give you the tools, insights, and support you need to succeed.

Here’s what you’ll get as a premium member:

Exclusive Trading Strategies: Unlock proven methods to maximize your returns.
In-Depth Research Analysis: Stay ahead with insights from the latest market trends.
Ad-Free Experience: Focus on what matters most—your investments.
Monthly AMA Sessions: Get your questions answered by top industry experts.
Coding Tutorials: Learn how to automate your trading strategies like a pro.
Masterclasses & One-on-One Consultations: Elevate your skills with personalized guidance.

Our three tailored plans—Starter Investor, Pro Trader, and Elite Investor—are designed to fit your unique needs and goals. Whether you’re looking for foundational tools or advanced strategies, we’ve got you covered.

Don’t wait any longer to transform your investment strategy. The last 4 months have shown just how powerful these tools can be—now it’s your turn to experience the difference.

👉 Explore Premium Plans Now

Statistical arbitrage is a trading strategy that exploits short-term pricing inefficiencies between related financial assets. It’s the same approach that Ken Griffin leveraged to grow his net worth to a staggering $43.9 billion. Ken Griffin is the billionaire founder of Citadel. While Griffin’s success involves a complex web of resources and expertise, the core concepts of Statistical Arbitrage are accessible to anyone. In this article, I’ll break down stat arb step-by-step and show you how to start with it in Python.

Photo by Carlos Muza on Unsplash

What Is Statistical Arbitrage?

At its heart, stat arb is about finding patterns in asset prices that temporarily drift apart but are likely to converge again. Imagine two tech stocks — say, Apple and Microsoft — whose prices usually move together because they’re in the same sector. If Apple suddenly drops while Microsoft stays steady, stat arb bets that Apple will “catch up” or Microsoft will “correct down.” You lock in a profit when prices realign by going long (buying) the underpriced asset and shorting (selling) the overpriced one.

The beauty of stat arb lies in its reliance on statistical relationships rather than gut feelings. It’s data-driven, systematic, and perfect for coders. Here’s how to build a basic stat arb strategy in Python.

Tackle your credit card debt by paying 0% interest until nearly 2027

If you have outstanding credit card debt, getting a new 0% intro APR credit card could help ease the pressure while you pay down your balances. Our credit card experts identified top credit cards that are perfect for anyone looking to pay down debt and not add to it! Click through to see what all the hype is about.

Learn How To Apply Now

Step 1: Select a Basket of Assets

First, you need a group of related assets. For this example, let’s pick three tech giants: Apple (AAPL), Microsoft (MSFT), and Google (GOOGL). These stocks tend to co-move due to shared market influences like sector trends or economic conditions.

We’ll use Python’s yfinance library to grab historical price data. If you don’t have it installed, run pip install yfinance first.

import yfinance as yf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Define the basket of assets
tickers = ["AAPL", "MSFT", "GOOGL"]

# Download historical Adjusted Close prices (last 2 years)
data = yf.download(tickers, start="2023-01-01", end="2025-02-26")["Adj Close"]

# Check the data
print(data.head())

This pulls daily adjusted closing prices into a pandas DataFrame. Each column represents a stock, and each row is a trading day.

Step 2: Model the Portfolio Relationship

Next, we need a “fair value” for the portfolio — think of it as the baseline these stocks should hover around. A simple way is to normalize the prices (scale them to a common range) and take a rolling average.

Here’s how:

# Normalize prices (scale between 0 and 1 for comparability)
normalized_prices = (data - data.min()) / (data.max() - data.min())

# Calculate the portfolio's fair value as a rolling mean (20-day window)
window = 20
portfolio_fair_value = normalized_prices.mean(axis=1).rolling(window=window).mean()

# Plot it
plt.plot(portfolio_fair_value, label="Portfolio Fair Value", color="black")
for ticker in tickers:
    plt.plot(normalized_prices[ticker], label=ticker)
plt.legend()
plt.title("Normalized Prices and Portfolio Fair Value")
plt.show()

The black line is our portfolio’s fair value — a smoothed average of the three stocks. Each stock’s normalized price dances around it. When a stock strays too far, that’s our opportunity.

Step 3: Identify Mispricings

Now, calculate how far each stock deviates from the fair value at any given time:

# Calculate deviations for each stock
deviations = normalized_prices.sub(portfolio_fair_value, axis=0)

# Plot deviations
for ticker in tickers:
    plt.plot(deviations[ticker], label=f"{ticker} Deviation")
plt.axhline(0, color="black", linestyle="--")
plt.legend()
plt.title("Deviations from Fair Value")
plt.show()

Positive deviations mean a stock is overpriced relative to the portfolio; negative deviations mean it’s underpriced. Big swings signal potential trades.

Step 4: Generate Trading Signals

Let’s set thresholds to decide when to trade. If a stock’s deviation exceeds, say, 0.1 (overpriced), we short it. If it drops below -0.1 (underpriced), we go long. The bet is that prices will revert to the mean.

# Define thresholds
upper_threshold = 0.1
lower_threshold = -0.1

# Generate signals (1 = long, -1 = short, 0 = hold)
signals = pd.DataFrame(index=data.index, columns=tickers)
for ticker in tickers:
    signals[ticker] = np.where(deviations[ticker] > upper_threshold, -1,  # Short
                              np.where(deviations[ticker] < lower_threshold, 1, 0))  # Long

# Preview signals
print(signals.tail())

This creates a DataFrame of trading signals: 1 (buy), -1 (sell), or 0 (hold). For example, if AAPL’s deviation dips below -0.1, we buy, expecting it to rise back toward the fair value.

Step 5: Backtest and Deploy

Finally, let’s calculate returns. For simplicity, assume we invest equally in each trade and ignore transaction costs (in reality, you’d factor these in). We’ll use daily returns multiplied by our signals:

# Calculate daily returns
daily_returns = data.pct_change().dropna()

# Calculate strategy returns (signals aligned with returns)
strategy_returns = (signals.shift(1) * daily_returns).sum(axis=1)  # Sum across stocks

# Cumulative returns
cumulative_returns = (1 + strategy_returns).cumprod()

# Plot
plt.plot(cumulative_returns, label="Strategy Cumulative Returns")
plt.title("Statistical Arbitrage Backtest")
plt.legend()
plt.show()

# Total return
total_return = cumulative_returns.iloc[-1] - 1
print(f"Total Strategy Return: {total_return:.2%}")

The plot shows how your money grows (or shrinks) over time. A positive total return means the strategy worked — at least historically.

To deploy this live, you’d connect to a real-time data feed (e.g., via an API like Alpaca or Interactive Brokers) and execute trades automatically. But that’s a topic for another day.

In early 2023, historical data showed that Coca-Cola and PepsiCo, two stocks with a historically high correlation (often >0.8), occasionally diverged significantly. For example, suppose on a given day, Coca-Cola’s stock drops 2% due to a large sell order (liquidity event), while PepsiCo remains flat. This divergence is temporary, as their prices typically revert to their long-term relationship within days. Traders need a method to profit from this while managing risks like transaction costs or prolonged divergence.

This type of scenario is frequently referenced in trading literature, such as discussions on pairs trading (a form of stat arb) in equity markets, and aligns with real-world inefficiencies hedge funds exploit.

Now, I’ll solve this using statistical arbitrage with a step-by-step Python implementation, mirroring the approach from my previous response but tailored to this problem.

Solving the Problem with Statistical Arbitrage

Statistical arbitrage leverages statistical relationships to exploit mean-reverting price differences. Here’s how we can apply it to the Coca-Cola vs. PepsiCo divergence:

Step 1: Select a Basket of Assets

We’ll use Coca-Cola (KO) and PepsiCo (PEP) as our pair due to their historical correlation in the beverage sector. We’ll fetch their historical price data.

import yfinance as yf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Define the basket
tickers = ["KO", "PEP"]

# Fetch historical data (e.g., 2023 for illustration)
data = yf.download(tickers, start="2023-01-01", end="2023-12-31")["Adj Close"]
print(data.head())

Step 2: Model the Portfolio Relationship

We’ll normalize the prices and calculate a “fair value” using a rolling mean of their spread. The spread is the difference between normalized KO and PEP prices, reflecting their typical co-movement.

# Normalize prices (scale to 0-1)
normalized_prices = (data - data.min()) / (data.max() - data.min())

# Calculate the spread (difference between normalized prices)
spread = normalized_prices["KO"] - normalized_prices["PEP"]

# Fair value as a rolling mean of the spread (20-day window)
window = 20
fair_value = spread.rolling(window=window).mean()

# Plot
plt.plot(spread, label="Spread (KO - PEP)")
plt.plot(fair_value, label="Fair Value (Rolling Mean)", color="black")
plt.legend()
plt.title("Normalized Spread and Fair Value")
plt.show()

Step 3: Identify Mispricings

Calculate deviations of the spread from its fair value. Large deviations indicate trading opportunities.

# Calculate deviations
deviations = spread - fair_value

# Plot deviations
plt.plot(deviations, label="Deviation from Fair Value")
plt.axhline(0, color="black", linestyle="--")
plt.legend()
plt.title("Spread Deviations")
plt.show()

For our problem, if KO drops 2% while PEP holds steady, the spread widens negatively (e.g., deviation < -0.1), signaling KO is underpriced relative to PEP.

Step 4: Generate Trading Signals

Set thresholds to trigger trades:

Long KO, Short PEP: When deviation < -0.1 (KO underpriced).
Short KO, Long PEP: When deviation > 0.1 (KO overpriced).

# Define thresholds
upper_threshold = 0.1
lower_threshold = -0.1

# Generate signals (1 = long KO/short PEP, -1 = short KO/long PEP, 0 = hold)
signals = pd.DataFrame(index=data.index, columns=["KO", "PEP"])
signals["KO"] = np.where(deviations > upper_threshold, -1,  # Short KO
                         np.where(deviations < lower_threshold, 1, 0))  # Long KO
signals["PEP"] = -signals["KO"]  # Opposite position for PEP

# Preview signals
print(signals.tail())

Step 5: Backtest and Deploy

Calculate returns based on the signals and assess profitability.

# Daily returns
daily_returns = data.pct_change().dropna()

# Strategy returns (signals applied to KO and PEP returns)
strategy_returns = (signals.shift(1) * daily_returns).sum(axis=1)

# Cumulative returns
cumulative_returns = (1 + strategy_returns).cumprod()

# Plot
plt.plot(cumulative_returns, label="Strategy Cumulative Returns")
plt.title("Statistical Arbitrage Backtest: KO vs. PEP")
plt.legend()
plt.show()

# Total return
total_return = cumulative_returns.iloc[-1] - 1
print(f"Total Strategy Return: {total_return:.2%}")

How This Solves the Problem

Exploiting the Divergence: When KO drops 2% (e.g., deviation < -0.1), the strategy buys KO and shorts PEP, betting on KO’s recovery relative to PEP. If the spread reverts within days (as historical correlation suggests), the long position gains while the short hedges market risk.
Profitability: Even small price corrections (e.g., KO rising 1% and PEP staying flat) yield a net profit, amplified by trading volume.
Risk Management: The market-neutral nature (long one, short the other) reduces exposure to broad market moves, addressing risks like prolonged divergence by relying on mean reversion.

Example Outcome

In a hypothetical 2023 scenario, suppose KO drops from $60 to $58.80 (-2%) on a liquidity shock, while PEP stays at $170. The normalized spread deviates below -0.1. The strategy longs KO at $58.80 and shorts PEP at $170. If KO rebounds to $59.64 (+1.5%) and PEP drops to $169 (-0.6%) over two days, the profit is:

KO: ($59.64 — $58.80) × 100 shares = $84
PEP: ($170 — $169) × 100 shares = $100
Total = $184 (minus fees).

Why Stat Arb Worked for Ken Griffin

Ken Griffin didn’t just code a script and call it a day. Citadel uses stat arb at scale — hundreds of assets, millisecond data, and teams of PhDs refining models like PCA or machine learning predictors. They also manage risk obsessively, hedging against market crashes or unexpected divergences.

For us mortals, this basic version is a starting point. It’s not a get-rich-quick scheme — markets are noisy, and costs like fees or slippage can eat profits. But it’s a glimpse into the quantitative wizardry that built Griffin’s empire.

Next Steps

Want to level up? Try these:

Use PCA instead of a simple mean for a smarter fair value (check sklearn.decomposition.PCA).
Add risk management (e.g., stop-losses or position sizing).
Test more assets or shorter timeframes (e.g., minute-by-minute data).

Statistical arbitrage blends finance, stats, and coding into a rewarding challenge. With Python and some curiosity, you’re on your way. Who knows? Maybe you’ll be the next Ken Griffin — minus a few billion (for now).

Happy coding!