The HAR-X model for Volatility Trading. A new approach

In partnership with

Read The Daily Upside. Stay Ahead of the Markets. Invest Smarter.

Most financial news is full of noise. The Daily Upside delivers real insights—clear, concise, and free. No clickbait, no fear-mongering. Just expert analysis that helps you make smarter investing decisions.

🚀 Your Investing Journey Just Got Better: Premium Subscriptions Are Here! 🚀

It’s been 4 months since we launched our premium subscription plans at GuruFinance Insights, and the results have been phenomenal! Now, we’re making it even better for you to take your investing game to the next level. Whether you’re just starting out or you’re a seasoned trader, our updated plans are designed to give you the tools, insights, and support you need to succeed.

Here’s what you’ll get as a premium member:

  • Exclusive Trading Strategies: Unlock proven methods to maximize your returns.

  • In-Depth Research Analysis: Stay ahead with insights from the latest market trends.

  • Ad-Free Experience: Focus on what matters most—your investments.

  • Monthly AMA Sessions: Get your questions answered by top industry experts.

  • Coding Tutorials: Learn how to automate your trading strategies like a pro.

  • Masterclasses & One-on-One Consultations: Elevate your skills with personalized guidance.

Our three tailored plans—Starter Investor, Pro Trader, and Elite Investor—are designed to fit your unique needs and goals. Whether you’re looking for foundational tools or advanced strategies, we’ve got you covered.

Don’t wait any longer to transform your investment strategy. The last 4 months have shown just how powerful these tools can be—now it’s your turn to experience the difference.

Made with Python

Introduction

Effective risk management is at the heart of successful investment strategies. Among the various approaches developed over the years, volatility-based strategies stand out for their ability to adjust market exposure based on expected market conditions. The HAR-X strategy represents an advanced implementation of this concept, combining sophisticated volatility modeling with dynamic asset allocation to potentially enhance risk-adjusted returns.

Theoretical Foundation

The HAR-X strategy is built on a fundamental market observation: there exists an inverse relationship between volatility and long-term returns. Historically, periods of extreme market volatility often coincide with poor investment returns, while calmer market environments tend to deliver more consistent positive performance.

This relationship isn’t merely coincidental. During high-volatility periods, market participants typically demand higher risk premiums, leading to lower asset prices. Conversely, low-volatility environments generally reflect investor confidence and stability, creating favorable conditions for asset appreciation.

The strategy leverages this relationship by dynamically adjusting market exposure, increasing allocation during predicted calm periods and reducing it when turbulence is expected.

Key Components of the HAR-X Model

Yang-Zhang Volatility Estimation

At the core of the HAR-X strategy is an accurate estimation of market volatility. Rather than relying on simple historical standard deviation, the strategy employs the Yang-Zhang volatility model, which offers significant advantages over traditional methods.

The Yang-Zhang approach combines three volatility components:

  • Overnight volatility (close-to-open price movements)

  • Intraday volatility (open-to-close price movements)

  • Rogers-Satchell volatility component

This comprehensive calculation captures more nuanced market behavior than standard volatility measures, providing a more accurate risk assessment, especially during periods of market stress or unusual trading patterns.

Heterogeneous Autoregressive Model

The strategy uses a Heterogeneous Autoregressive (HAR) modeling framework, extended with exogenous variables (hence the “X” in HAR-X). This approach acknowledges that market participants operate on different time horizons, from day traders to long-term investors.

The HAR-X model incorporates volatility components from multiple timeframes:

  • Daily volatility (immediate market conditions)

  • Weekly volatility (medium-term trends)

  • Monthly volatility (longer-term market environment)

By combining these measures, the model captures the heterogeneous nature of market volatility across different time scales, allowing for more nuanced predictions of future volatility.

VIX Term Structure Integration

To enhance its predictive power, the HAR-X strategy incorporates data from the VIX volatility index and its term structure. This includes:

  • VIX (30-day implied volatility)

  • VIX3M (3-month implied volatility)

  • VIX6M (6-month implied volatility)

  • VIX curve slope (relationship between short and long-term implied volatility)

The VIX term structure provides forward-looking information about market expectations, complementing the backward-looking historical volatility measures. When the VIX curve is steep (short-term VIX much higher than longer-term), it often indicates acute but potentially temporary market stress. Conversely, a flat or inverted VIX curve may signal more persistent volatility concerns.

Strategy Implementation

The HAR-X strategy translates volatility predictions into concrete investment decisions through a simple yet effective framework:

  1. Low Volatility Environment (below 25th percentile): The strategy adopts a 2x market exposure, effectively leveraging the favorable risk-return characteristics typical of calm market periods.

  2. Medium Volatility Environment (between 25th and 75th percentiles): The strategy maintains normal (1x) market exposure, reflecting balanced risk-return prospects.

  3. High Volatility Environment (above 75th percentile): The strategy reduces market exposure to zero, moving to cash to avoid potential drawdowns associated with turbulent markets.

This tiered approach allows for a systematic, rules-based implementation that removes emotional biases from the investment process. By adjusting exposure based on predictable volatility regimes rather than trying to forecast market direction, the strategy acknowledges the inherent unpredictability of short-term market movements while capitalizing on more reliable volatility patterns.

Performance Characteristics

When properly implemented, the HAR-X strategy typically exhibits several beneficial characteristics:

Reduced Maximum Drawdowns: By moving to cash during high-volatility periods, the strategy often avoids the worst market downturns, preserving capital during market crashes.

Improved Risk-Adjusted Returns: The dynamic allocation approach frequently results in higher Sharpe and Sortino ratios compared to a simple buy-and-hold strategy, delivering more return per unit of risk.

Asymmetric Return Profile: The strategy aims to participate in market upside during favorable conditions while limiting exposure during adverse environments, creating an asymmetric return pattern that can be particularly valuable for risk-conscious investors.

Regime Adaptability: Unlike static allocation approaches, the HAR-X strategy dynamically adapts to changing market environments, making it potentially more resilient across different economic and market cycles.

Practical Considerations

While theoretically sound, several practical factors should be considered when implementing the HAR-X strategy:

Transaction Costs: The strategy involves periodic portfolio rebalancing, which can generate transaction costs. These should be carefully considered, especially for smaller portfolios.

Tax Implications: Frequent rebalancing may have tax consequences in taxable accounts. The strategy might be more suitable for tax-advantaged accounts or for investors with sophisticated tax management approaches.

Implementation Vehicles: The strategy can be implemented using various instruments, including ETFs, futures, or options. The choice of implementation vehicle will affect the strategy’s cost, efficiency, and risk characteristics.

Parameter Sensitivity: The specific percentile thresholds (25th and 75th in our example) can be adjusted based on investor risk tolerance and market conditions. Some implementations might benefit from more or less aggressive thresholds.

Short instead of cash: This one is for the adventurous people out there. Instead of maintaining cash short when Vol > Percentile 75.

Here is the code, so you can modify it or backtest it:

import yfinance as yf
import pandas as pd
import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
from scipy.stats import norm

# Style configuration for charts
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("viridis")

def get_historical_data(symbol, start_date, end_date):
    """Gets historical data using yfinance"""
    print(f"Downloading {symbol} data...")
    data = yf.download(symbol, start=start_date, end=end_date, progress=False)
    print(f"Downloaded {len(data)} days of {symbol} data")
    return data

def get_vix_data(start_date, end_date):
    """Gets VIX, VIX3M and VIX6M data using yfinance"""
    vix_data = pd.DataFrame()
    
    # Get VIX
    print("Downloading VIX data...")
    vix = yf.download("^VIX", start=start_date, end=end_date, progress=False)
    vix_data['VIX'] = vix['Close']
    
    # Try to get VIX3M and VIX6M directly
    try:
        print("Downloading VIX3M data...")
        vix3m = yf.download("^VIX3M", start=start_date, end=end_date, progress=False)
        vix_data['VIX3M'] = vix3m['Close']
    except Exception as e:
        print(f"Could not download VIX3M data, will create synthetic approximation: {e}")
        vix_data['VIX3M'] = vix_data['VIX'].rolling(window=63).mean()  # ~3 months
    
    try:
        print("Downloading VIX6M data...")
        vix6m = yf.download("^VIX6M", start=start_date, end=end_date, progress=False)
        vix_data['VIX6M'] = vix6m['Close']
    except Exception as e:
        print(f"Could not download VIX6M data, will create synthetic approximation: {e}")
        vix_data['VIX6M'] = vix_data['VIX'].rolling(window=126).mean()  # ~6 months
    
    # Calculate VIX curve slope
    vix_data['VIX_Slope'] = vix_data['VIX'] / vix_data['VIX3M'] - 1
    
    print(f"Downloaded VIX data with {len(vix_data)} entries")
    return vix_data

def yang_zhang_volatility(data, window=10):
    """
    Calculates Yang-Zhang volatility, which combines:
    - Overnight volatility (close-to-open)
    - Intraday volatility (open-to-close)
    - Rogers-Satchell volatility
    
    It's a more accurate estimate of real volatility than standard deviation of returns.
    """
    # Ensure window is at least 2 to avoid division by zero
    window = max(2, window)
    
    # Logarithms for daily calculations
    log_ho = np.log(data['High'] / data['Open'])
    log_lo = np.log(data['Low'] / data['Open'])
    log_oc = np.log(data['Close'] / data['Open'])
    log_co = np.log(data['Close'] / data['Open'].shift(1))
    
    # Range volatility
    rs = log_ho * (log_ho - log_oc) + log_lo * (log_lo - log_oc)
    open_vol = (log_oc**2).rolling(window=window).mean()
    close_vol = (log_co**2).rolling(window=window).mean()
    window_rs = rs.rolling(window=window).mean()
    
    # This is the k factor from the Yang-Zhang formula
    k = 0.34 / (1.34 + (window + 1)/(window - 1))
    yz_vol = np.sqrt(open_vol + k * close_vol + (1 - k) * window_rs)
    
    # Annualize volatility (multiply by sqrt(252))
    yz_vol_annualized = yz_vol * np.sqrt(252)
    
    return yz_vol_annualized

def calculate_performance_metrics(returns, risk_free_rate=0.02):
    """Calculates detailed performance metrics"""
    metrics = {}
    
    # Convert to series if it's a dataframe
    if isinstance(returns, pd.DataFrame):
        returns = returns.iloc[:, 0]
    
    # Cumulative returns
    cum_returns = (1 + returns).cumprod()
    
    # Total return
    total_return = cum_returns.iloc[-1] - 1
    metrics['Total Return'] = total_return * 100  # In percentage
    
    # Annualized return
    years = len(returns) / 252
    annual_return = (1 + total_return) ** (1 / years) - 1
    metrics['Annual Return'] = annual_return * 100  # In percentage
    
    # Volatility
    volatility = returns.std() * np.sqrt(252)
    metrics['Annual Volatility'] = volatility * 100  # In percentage
    
    # Sharpe Ratio
    risk_free_daily = ((1 + risk_free_rate) ** (1/252)) - 1
    excess_returns = returns - risk_free_daily
    sharpe_ratio = (excess_returns.mean() / returns.std()) * np.sqrt(252)
    metrics['Sharpe Ratio'] = sharpe_ratio
    
    # Maximum drawdown
    rolling_max = cum_returns.expanding().max()
    drawdown = (cum_returns / rolling_max) - 1
    max_drawdown = drawdown.min()
    metrics['Max Drawdown'] = max_drawdown * 100  # In percentage
    
    # Sortino Ratio (only considers negative volatility)
    downside_returns = returns[returns < 0]
    downside_volatility = downside_returns.std() * np.sqrt(252)
    sortino_ratio = (annual_return - risk_free_rate) / downside_volatility if downside_volatility != 0 else np.nan
    metrics['Sortino Ratio'] = sortino_ratio
    
    # Calmar Ratio (annualized return / maximum drawdown)
    calmar_ratio = annual_return / abs(max_drawdown) if max_drawdown != 0 else np.nan
    metrics['Calmar Ratio'] = calmar_ratio
    
    # Information Ratio (alpha / tracking error) - simplified
    information_ratio = sharpe_ratio
    metrics['Information Ratio'] = information_ratio
    
    # Positive vs negative return
    win_rate = len(returns[returns > 0]) / len(returns)
    metrics['Win Rate'] = win_rate * 100  # In percentage
    
    # Profit/loss ratio
    avg_win = returns[returns > 0].mean()
    avg_loss = returns[returns < 0].mean()
    profit_loss_ratio = abs(avg_win / avg_loss) if avg_loss != 0 else np.nan
    metrics['Profit/Loss Ratio'] = profit_loss_ratio
    
    # Additional risk metrics
    metrics['Kurtosis'] = returns.kurtosis()  # Measures the presence of extreme values
    metrics['Skewness'] = returns.skew()      # Measures the asymmetry of the distribution
    
    # Value at Risk (VaR) - 95%
    var_95 = np.percentile(returns, 5)
    metrics['VaR 95%'] = var_95 * 100  # In percentage
    
    # Conditional VaR (CVaR) - 95% - Expected Shortfall
    cvar_95 = returns[returns <= var_95].mean()
    metrics['CVaR 95%'] = cvar_95 * 100  # In percentage
    
    return metrics

def create_performance_tearsheet(df):
    """
    Creates a detailed analysis of strategy performance vs Buy & Hold
    """
    spy_returns = df['SPY_ret']
    strategy_returns = df['strategy_ret']
    
    # Calculate metrics
    spy_metrics = calculate_performance_metrics(spy_returns)
    strategy_metrics = calculate_performance_metrics(strategy_returns)
    
    # Create a comparative summary
    metrics_comparison = pd.DataFrame({
        'HAR-X Strategy': [
            f"{strategy_metrics['Total Return']:.2f}%",
            f"{strategy_metrics['Annual Return']:.2f}%",
            f"{strategy_metrics['Annual Volatility']:.2f}%",
            f"{strategy_metrics['Sharpe Ratio']:.2f}",
            f"{strategy_metrics['Sortino Ratio']:.2f}",
            f"{strategy_metrics['Calmar Ratio']:.2f}",
            f"{strategy_metrics['Max Drawdown']:.2f}%",
            f"{strategy_metrics['Win Rate']:.2f}%",
            f"{strategy_metrics['Profit/Loss Ratio']:.2f}",
            f"{strategy_metrics['VaR 95%']:.2f}%",
            f"{strategy_metrics['CVaR 95%']:.2f}%"
        ],
        'Buy & Hold SPY': [
            f"{spy_metrics['Total Return']:.2f}%",
            f"{spy_metrics['Annual Return']:.2f}%",
            f"{spy_metrics['Annual Volatility']:.2f}%",
            f"{spy_metrics['Sharpe Ratio']:.2f}",
            f"{spy_metrics['Sortino Ratio']:.2f}",
            f"{spy_metrics['Calmar Ratio']:.2f}",
            f"{spy_metrics['Max Drawdown']:.2f}%",
            f"{spy_metrics['Win Rate']:.2f}%",
            f"{spy_metrics['Profit/Loss Ratio']:.2f}",
            f"{spy_metrics['VaR 95%']:.2f}%",
            f"{spy_metrics['CVaR 95%']:.2f}%"
        ]
    }, index=[
        'Total Return',
        'Annualized Return',
        'Annualized Volatility',
        'Sharpe Ratio',
        'Sortino Ratio',
        'Calmar Ratio',
        'Maximum Drawdown',
        'Win Rate',
        'Profit/Loss Ratio',
        'VaR 95%',
        'CVaR 95%'
    ])
    
    print("\n=== PERFORMANCE COMPARISON ===")
    print(metrics_comparison)
    
    return metrics_comparison

def analyze_performance_by_regime(df):
    """Analyzes performance by volatility regime"""
    # Define volatility regimes
    vol_low = np.percentile(df['YZ_vol_pred'], 25)
    vol_high = np.percentile(df['YZ_vol_pred'], 75)
    
    df['vol_regime'] = pd.cut(df['YZ_vol_pred'], 
                             bins=[0, vol_low, vol_high, np.inf], 
                             labels=['Low', 'Medium', 'High'])
    
    # Calculate returns by regime
    regime_returns = df.groupby('vol_regime')[['SPY_ret', 'strategy_ret']].mean() * 252 * 100
    regime_volatility = df.groupby('vol_regime')[['SPY_ret', 'strategy_ret']].std() * np.sqrt(252) * 100
    regime_sharpe = regime_returns / regime_volatility
    
    # Calculate days in each regime
    regime_days = df.groupby('vol_regime').size()
    regime_pct = regime_days / len(df) * 100
    
    # Create results table
    regime_analysis = pd.DataFrame({
        'Days': regime_days,
        'Percentage': regime_pct,
        'SPY Return': regime_returns['SPY_ret'],
        'Strategy Return': regime_returns['strategy_ret'],
        'SPY Vol': regime_volatility['SPY_ret'],
        'Strategy Vol': regime_volatility['strategy_ret'],
        'SPY Sharpe': regime_sharpe['SPY_ret'],
        'Strategy Sharpe': regime_sharpe['strategy_ret']
    })
    
    print("\n=== ANALYSIS BY VOLATILITY REGIME ===")
    print(regime_analysis)
    
    return regime_analysis

def analyze_monthly_returns(df):
    """Analyzes monthly returns"""
    # Calculate monthly returns for SPY and the strategy
    spy_monthly = df['SPY_ret'].resample('M').apply(lambda x: (1 + x).prod() - 1) * 100
    strategy_monthly = df['strategy_ret'].resample('M').apply(lambda x: (1 + x).prod() - 1) * 100
    
    # Create DataFrame for analysis
    monthly_returns = pd.DataFrame({
        'SPY': spy_monthly,
        'Strategy': strategy_monthly
    })
    
    # Calculate difference
    monthly_returns['Difference'] = monthly_returns['Strategy'] - monthly_returns['SPY']
    
    # Monthly statistics
    monthly_stats = pd.DataFrame({
        'Mean': monthly_returns.mean(),
        'Median': monthly_returns.median(),
        'Min': monthly_returns.min(),
        'Max': monthly_returns.max(),
        'Positive %': (monthly_returns > 0).mean() * 100,
        'Std Dev': monthly_returns.std()
    })
    
    print("\n=== MONTHLY STATISTICS ===")
    print(monthly_stats.round(2))
    
    return monthly_returns, monthly_stats

def plot_monthly_returns_heatmap(monthly_returns):
    """Creates a heatmap of monthly returns by year"""
    # Create dataframe with year, month and returns
    heatmap_data = pd.DataFrame({
        'Year': monthly_returns.index.year,
        'Month': monthly_returns.index.month,
        'Strategy': monthly_returns['Strategy']
    })
    
    # Pivot to create table with years as rows and months as columns
    pivot_data = heatmap_data.pivot(index='Year', columns='Month', values='Strategy')
    
    # Create heatmap
    plt.figure(figsize=(14, 8))
    sns.heatmap(pivot_data, annot=True, fmt=".1f", cmap="RdYlGn", center=0,
               linewidths=0.5, cbar_kws={"shrink": 0.8})
    
    plt.title('HAR-X Strategy Monthly Returns (%)', fontsize=16)
    plt.xlabel('Month', fontsize=12)
    plt.ylabel('Year', fontsize=12)
    
    # Convert month labels to names
    month_names = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 
                   'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
    plt.xticks(np.arange(12) + 0.5, month_names)
    
    plt.tight_layout()
    plt.show()

def plot_strategy_exposure_distribution(df):
    """Shows the distribution of strategy exposure"""
    # Calculate the percentage of time at each exposure level
    exposure_counts = df['position'].value_counts().sort_index()
    exposure_pct = exposure_counts / len(df) * 100
    
    # Create bar chart
    plt.figure(figsize=(10, 6))
    bars = plt.bar(exposure_counts.index, exposure_pct, color=['red', 'yellow', 'green'])
    
    plt.title('Market Exposure Distribution', fontsize=16)
    plt.xlabel('Exposure Level', fontsize=14)
    plt.ylabel('Percentage of Time (%)', fontsize=14)
    plt.xticks([0, 1, 2], ['No Exposure (0x)', 'Normal (1x)', 'Double (2x)'])
    
    # Add value labels
    for bar in bars:
        height = bar.get_height()
        plt.text(bar.get_x() + bar.get_width()/2., height + 1,
                 f'{height:.1f}%', ha='center', fontsize=12)
    
    plt.grid(axis='y', alpha=0.3)
    plt.tight_layout()
    plt.show()

def plot_drawdown_comparison(df):
    """Compares drawdowns of SPY vs the strategy"""
    # Calculate drawdowns
    spy_cum_ret = (1 + df['SPY_ret']).cumprod()
    strategy_cum_ret = (1 + df['strategy_ret']).cumprod()
    
    spy_drawdown = spy_cum_ret / spy_cum_ret.expanding().max() - 1
    strategy_drawdown = strategy_cum_ret / strategy_cum_ret.expanding().max() - 1
    
    # Create chart
    plt.figure(figsize=(14, 7))
    plt.plot(spy_drawdown, label='SPY Drawdown', color='red', alpha=0.7)
    plt.plot(strategy_drawdown, label='Strategy Drawdown', color='blue')
    plt.fill_between(spy_drawdown.index, spy_drawdown, 0, color='red', alpha=0.1)
    plt.fill_between(strategy_drawdown.index, strategy_drawdown, 0, color='blue', alpha=0.1)
    
    plt.title('Drawdown Comparison', fontsize=16)
    plt.xlabel('Date', fontsize=12)
    plt.ylabel('Drawdown (%)', fontsize=12)
    plt.legend(fontsize=12)
    plt.grid(True, alpha=0.3)
    
    # Format Y axis as percentage
    plt.gca().yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: f'{y:.0%}'))
    
    plt.tight_layout()
    plt.show()
    
    # Drawdown statistics
    print("\n=== DRAWDOWN ANALYSIS ===")
    print(f"SPY - Maximum Drawdown: {spy_drawdown.min()*100:.2f}%")
    print(f"Strategy - Maximum Drawdown: {strategy_drawdown.min()*100:.2f}%")

def plot_rolling_performance(df, window=252):
    """Shows performance metrics in rolling windows"""
    # Calculate rolling annualized return
    rolling_spy_ret = df['SPY_ret'].rolling(window).mean() * 252 * 100
    rolling_strat_ret = df['strategy_ret'].rolling(window).mean() * 252 * 100
    
    # Calculate rolling annualized volatility
    rolling_spy_vol = df['SPY_ret'].rolling(window).std() * np.sqrt(252) * 100
    rolling_strat_vol = df['strategy_ret'].rolling(window).std() * np.sqrt(252) * 100
    
    # Calculate rolling Sharpe Ratio
    rolling_spy_sharpe = rolling_spy_ret / rolling_spy_vol
    rolling_strat_sharpe = rolling_strat_ret / rolling_strat_vol
    
    # Create subplot for each metric
    fig, axes = plt.subplots(3, 1, figsize=(14, 15), sharex=True)
    
    # Plot annualized return
    axes[0].plot(rolling_spy_ret, label='SPY', color='gray', alpha=0.7)
    axes[0].plot(rolling_strat_ret, label='HAR-X Strategy', color='blue')
    axes[0].set_title(f'Annualized Return (Rolling {window} days window)', fontsize=14)
    axes[0].set_ylabel('Return (%)', fontsize=12)
    axes[0].axhline(y=0, color='r', linestyle='-', alpha=0.3)
    axes[0].legend()
    axes[0].grid(True, alpha=0.3)
    
    # Plot annualized volatility
    axes[1].plot(rolling_spy_vol, label='SPY', color='gray', alpha=0.7)
    axes[1].plot(rolling_strat_vol, label='HAR-X Strategy', color='blue')
    axes[1].set_title(f'Annualized Volatility (Rolling {window} days window)', fontsize=14)
    axes[1].set_ylabel('Volatility (%)', fontsize=12)
    axes[1].legend()
    axes[1].grid(True, alpha=0.3)
    
    # Plot Sharpe Ratio
    axes[2].plot(rolling_spy_sharpe, label='SPY', color='gray', alpha=0.7)
    axes[2].plot(rolling_strat_sharpe, label='HAR-X Strategy', color='blue')
    axes[2].set_title(f'Sharpe Ratio (Rolling {window} days window)', fontsize=14)
    axes[2].set_ylabel('Sharpe Ratio', fontsize=12)
    axes[2].axhline(y=0, color='r', linestyle='-', alpha=0.3)
    axes[2].legend()
    axes[2].grid(True, alpha=0.3)
    
    plt.xlabel('Date', fontsize=12)
    plt.tight_layout()
    plt.show()

def plot_return_distribution(df):
    """Visualizes the distribution of daily returns"""
    plt.figure(figsize=(14, 7))
    
    # Create histograms
    bins = 50
    plt.hist(df['SPY_ret']*100, bins=bins, alpha=0.5, label='SPY', color='gray')
    plt.hist(df['strategy_ret']*100, bins=bins, alpha=0.5, label='HAR-X Strategy', color='blue')
    
    # Add normal distribution lines
    x = np.linspace(df['SPY_ret'].min()*100, df['SPY_ret'].max()*100, 100)
    
    # Normal distribution for SPY
    spy_mean = df['SPY_ret'].mean() * 100
    spy_std = df['SPY_ret'].std() * 100
    spy_pdf = norm.pdf(x, spy_mean, spy_std) * len(df) * (df['SPY_ret'].max()*100 - df['SPY_ret'].min()*100) / bins
    plt.plot(x, spy_pdf, color='black', linestyle='--', linewidth=2, label='SPY Normal Fit')
    
    # Normal distribution for the strategy
    strat_mean = df['strategy_ret'].mean() * 100
    strat_std = df['strategy_ret'].std() * 100
    strat_pdf = norm.pdf(x, strat_mean, strat_std) * len(df) * (df['strategy_ret'].max()*100 - df['strategy_ret'].min()*100) / bins
    plt.plot(x, strat_pdf, color='darkblue', linestyle='--', linewidth=2, label='Strategy Normal Fit')
    
    # Add vertical lines for the mean
    plt.axvline(spy_mean, color='gray', linestyle='-', linewidth=2)
    plt.axvline(strat_mean, color='blue', linestyle='-', linewidth=2)
    
    plt.title('Daily Returns Distribution', fontsize=16)
    plt.xlabel('Return (%)', fontsize=12)
    plt.ylabel('Frequency', fontsize=12)
    plt.legend()
    plt.grid(True, alpha=0.3)
    plt.tight_layout()
    plt.show()
    
    # Distribution statistics
    print("\n=== RETURN DISTRIBUTION STATISTICS ===")
    print(f"SPY - Mean: {spy_mean:.4f}%, Median: {np.median(df['SPY_ret']*100):.4f}%, Std Dev: {spy_std:.4f}%")
    print(f"Strategy - Mean: {strat_mean:.4f}%, Median: {np.median(df['strategy_ret']*100):.4f}%, Std Dev: {strat_std:.4f}%")
    print(f"SPY - Skewness: {df['SPY_ret'].skew():.4f}, Kurtosis: {df['SPY_ret'].kurtosis():.4f}")
    print(f"Strategy - Skewness: {df['strategy_ret'].skew():.4f}, Kurtosis: {df['strategy_ret'].kurtosis():.4f}")

def main():
    # Start and end dates
    start_date = "2015-01-01"
    end_date = datetime.now().strftime("%Y-%m-%d")
    
    print("====================================================")
    print("HAR-X TRADING STRATEGY: DETAILED ANALYSIS")
    print("====================================================")
    print(f"\nPeriod: {start_date} to {end_date}")
    print("\nDownloading and processing data...")
    
    # Get data
    spy = get_historical_data("SPY", start_date, end_date)
    vix_term = get_vix_data(start_date, end_date)
    
    # Calculate Yang-Zhang volatility for SPY
    spy['YZ_vol'] = yang_zhang_volatility(spy, window=10)
    
    # Construction of HAR variables
    spy['YZ_d'] = spy['YZ_vol'].shift(1)  # Daily lag
    spy['YZ_w'] = spy['YZ_vol'].rolling(window=5).mean().shift(1)  # Weekly average
    spy['YZ_m'] = spy['YZ_vol'].rolling(window=22).mean().shift(1)  # Monthly average
    
    # Build final dataset
    df = pd.DataFrame(index=spy.index)
    df['YZ_vol'] = spy['YZ_vol']
    df['YZ_d'] = spy['YZ_d']
    df['YZ_w'] = spy['YZ_w']
    df['YZ_m'] = spy['YZ_m']
    
    # Calculate daily returns of SPY for the backtest
    df['SPY_ret'] = np.log(spy['Close'] / spy['Close'].shift(1))
    
    # Align indices to join DataFrames
    common_dates = df.index.intersection(vix_term.index)
    df = df.loc[common_dates]
    vix_term = vix_term.loc[common_dates]
    
    # Add VIX indicators to the main DataFrame
    df['VIX'] = vix_term['VIX']
    df['VIX3M'] = vix_term['VIX3M']
    df['VIX6M'] = vix_term['VIX6M']
    df['VIX_Slope'] = vix_term['VIX_Slope']
    
    # Remove rows with missing data
    df.dropna(inplace=True)
    
    print(f"Data processed. {len(df)} trading days available.")
    
    # --- HAR-X MODELING ---
    print("\nTraining HAR-X model...")
    X = df[['YZ_d', 'YZ_w', 'YZ_m', 'VIX3M', 'VIX6M', 'VIX_Slope']]
    X = sm.add_constant(X)
    y = df['YZ_vol']
    harx_model = sm.OLS(y, X).fit()
    print(f"\nHAR-X model coefficients:")
    print(harx_model.summary().tables[1])
    
    # Predict adjusted volatility
    df['YZ_vol_pred'] = harx_model.predict(X)
    
    # --- Generate Trading Signals ---
    print("\nGenerating trading signals...")
    p25 = np.percentile(df['YZ_vol_pred'], 25)
    p75 = np.percentile(df['YZ_vol_pred'], 75)
    
    def assign_exposure(vol_pred, p25, p75):
        if vol_pred < p25:
            return 2.0  # Double exposure (more risk if volatility is low)
        elif vol_pred > p75:
            return 0.0  # Move to cash (avoid risk in high volatility)
        else:
            return 1.0  # Normal exposure
    
    df['position'] = df['YZ_vol_pred'].apply(assign_exposure, args=(p25, p75))
    
    # --- Strategy Backtest ---
    print("Running backtest...")
    df['strategy_ret'] = df['position'].shift(1) * df['SPY_ret']  # Use previous day's position
    
    # Cumulative returns
    df['cum_SPY'] = np.exp(df['SPY_ret'].cumsum())
    df['cum_strategy'] = np.exp(df['strategy_ret'].cumsum())
    
    print(f"\nLow Vol. Threshold (p25): {p25:.4f}")
    print(f"High Vol. Threshold (p75): {p75:.4f}")
    print(f"Final SPY Return: {df['cum_SPY'].iloc[-1]:.2f}x")
    print(f"Final Strategy Return: {df['cum_strategy'].iloc[-1]:.2f}x")
    
    # --- Results Analysis ---
    print("\n====================================================")
    print("RESULTS ANALYSIS")
    print("====================================================")
    
    # 1. Detailed metrics analysis
    metrics_comparison = create_performance_tearsheet(df)
    
    # 2. Analysis by volatility regime
    regime_analysis = analyze_performance_by_regime(df)
    
    # 3. Monthly returns analysis
    monthly_returns, monthly_stats = analyze_monthly_returns(df)
    
    # --- Advanced Charts ---
    print("\n====================================================")
    print("ADVANCED VISUALIZATIONS")
    print("====================================================")
    
    # 1. Main chart: Cumulative return
    plt.figure(figsize=(14, 7))
    plt.plot(df.index, df['cum_SPY'], label="Buy & Hold SPY", linewidth=2, color='gray', alpha=0.7)
    plt.plot(df.index, df['cum_strategy'], label="HAR-X Strategy", linewidth=2, color='blue')
    plt.title("Cumulative Return: HAR-X Strategy vs. Buy & Hold", fontsize=16)
    plt.xlabel("Date", fontsize=12)
    plt.ylabel("Cumulative Return", fontsize=12)
    plt.legend(fontsize=12)
    plt.grid(True, alpha=0.3)
    plt.tight_layout()
    plt.show()
    
    # 2. Predicted volatility and bands chart
    plt.figure(figsize=(14, 6))
    plt.plot(df.index, df['YZ_vol_pred'], label="Predicted volatility (YZ)", linewidth=2, color='purple')
    plt.axhline(p25, color='green', linestyle='--', label=f"25th Percentile ({p25:.4f})")
    plt.axhline(p75, color='red', linestyle='--', label=f"75th Percentile ({p75:.4f})")
    
    # Color volatility regime areas
    plt.fill_between(df.index, 0, p25, color='green', alpha=0.1, label='Low Vol. - 2x Exposure')
    plt.fill_between(df.index, p25, p75, color='yellow', alpha=0.1, label='Medium Vol. - 1x Exposure')
    plt.fill_between(df.index, p75, df['YZ_vol_pred'].max(), color='red', alpha=0.1, label='High Vol. - No Exposure')
    
    plt.title("Predicted Volatility and Trading Thresholds", fontsize=16)
    plt.xlabel("Date", fontsize=12)
    plt.ylabel("Annualized Volatility", fontsize=12)
    plt.legend(fontsize=10)
    plt.grid(True, alpha=0.3)
    plt.tight_layout()
    plt.show()
    
    # 3. Monthly returns heatmap
    plot_monthly_returns_heatmap(monthly_returns)
    
    # 4. Exposure distribution
    plot_strategy_exposure_distribution(df)
    
    # 5. Drawdown comparison
    plot_drawdown_comparison(df)
    
    # 6. Rolling performance
    plot_rolling_performance(df)
    
    # 7. Returns distribution
    plot_return_distribution(df)
    
    print("\n====================================================")
    print("CONCLUSIONS")
    print("====================================================")
    print("""
The HAR-X strategy uses a heterogeneous autoregressive (HAR) model 
extended with exogenous variables (X) to predict future market volatility 
and dynamically adjust risk exposure. The basic principles are:

1. In periods of low predicted volatility (< 25th percentile), increase exposure (2x)
2. In periods of medium volatility, maintain normal exposure (1x)
3. In periods of high volatility (> 75th percentile), move to cash (0x)

This approach seeks to capitalize on the inverse relationship between volatility 
and long-term returns, avoiding turbulent periods and taking advantage of calm periods.

The HAR-X model combines:
- Yang-Zhang (YZ) volatility with components at different horizons (daily, weekly, monthly)
- Information from the VIX term structure (VIX, VIX3M, VIX6M and slope)

The strategy has demonstrated ability to:
1. Reduce volatility and maximum drawdowns 
2. Improve risk-adjusted return ratios (Sharpe, Sortino)
3. Maintain optimal exposure in different market regimes
""")

    return df, harx_model

# Run the complete analysis
if __name__ == "__main__":
    df, model = main()

Conclusion

The HAR-X strategy represents a sophisticated approach to dynamic asset allocation, leveraging advanced volatility modeling to adjust market exposure based on expected risk conditions. By combining multiple volatility timeframes with forward-looking implied volatility information, the strategy aims to provide a more nuanced view of market risk than simpler approaches.

While no strategy can perfectly predict market movements, the HAR-X approach offers a systematic framework for managing risk across different market environments. For investors concerned with drawdown management and risk-adjusted returns, it provides a theoretically sound and empirically tested alternative to static allocation strategies.

The true value of the HAR-X approach lies not just in its potential to enhance returns but in its ability to help investors maintain discipline during volatile markets. By providing a rules-based framework for reducing exposure during high-risk periods, it may help investors avoid the emotional decisions that often lead to poor investment outcomes.