GuruFinance Insights
Posts
🧠 Python's Smart Approach to Mutual Fund Selection!

🧠 Python's Smart Approach to Mutual Fund Selection!

Unlock data-driven mutual fund strategies with cutting-edge Python machine learning techniques!

Ayrat Murtazin
March 14, 2025

In partnership with

Today’s Fastest Growing Company Might Surprise You

🚨 No, it's not the publicly traded tech giant you might expect… Meet $MODE, the disruptor turning phones into potential income generators.

Mode saw 32,481% revenue growth, ranking them the #1 software company on Deloitte’s 2023 fastest-growing companies list.

📲 They’re pioneering "Privatized Universal Basic Income" powered by technology — not government, and their EarnPhone, has already helped consumers earn over $325M!

Their pre-IPO offering is live at just $0.26/share – don’t miss it.

Invest Now

Mode Mobile recently received their ticker reservation with Nasdaq ($MODE), indicating an intent to IPO in the next 24 months. An intent to IPO is no guarantee that an actual IPO will occur.
The Deloitte rankings are based on submitted applications and public company database research, with winners selected based on their fiscal-year revenue growth percentage over a three-year period.
*Please read the offering circular and related risks at invest.modemobile.com.

🚀 Your Investing Journey Just Got Better: Premium Subscriptions Are Here! 🚀

It’s been 4 months since we launched our premium subscription plans at GuruFinance Insights, and the results have been phenomenal! Now, we’re making it even better for you to take your investing game to the next level. Whether you’re just starting out or you’re a seasoned trader, our updated plans are designed to give you the tools, insights, and support you need to succeed.

Here’s what you’ll get as a premium member:

Exclusive Trading Strategies: Unlock proven methods to maximize your returns.
In-Depth Research Analysis: Stay ahead with insights from the latest market trends.
Ad-Free Experience: Focus on what matters most—your investments.
Monthly AMA Sessions: Get your questions answered by top industry experts.
Coding Tutorials: Learn how to automate your trading strategies like a pro.
Masterclasses & One-on-One Consultations: Elevate your skills with personalized guidance.

Our three tailored plans—Starter Investor, Pro Trader, and Elite Investor—are designed to fit your unique needs and goals. Whether you’re looking for foundational tools or advanced strategies, we’ve got you covered.

Don’t wait any longer to transform your investment strategy. The last 4 months have shown just how powerful these tools can be—now it’s your turn to experience the difference.

🚀 BABA: China's Tech Giant's Big Comeback?

Investors take note: BABA's potential rebound is heating up 🔥

ayratmurtazin.beehiiv.com/p/baba-china-s-tech-giant-s-big-comeback

👉 Explore Premium Plans Now

When I first started investing, I remember sitting at my desk, scrolling through lists of mutual funds, completely overwhelmed. Everyone seemed to have an opinion on which fund was “the best,” but there was no clear consensus. It felt like I was navigating through a maze of options — some funds were doing great, some not so much, and no one could explain why.

It wasn’t until I started learning about data science and machine learning that I realised: there’s a better way to choose mutual funds. Imagine having an algorithm that could analyse years of data, identify patterns, and recommend the best-performing funds. Sounds like something only large financial institutions could afford, right? Not anymore.

In this article, I’ll walk you through how to use a Random Forest Regression model to select the top 5 mutual funds to invest in. Not only will we break down what makes this algorithm effective, but I’ll also give you Python code to try this strategy yourself!

The Problem: Too Many Mutual Funds, Not Enough Clarity

Just like when I started, you’ve probably faced the same issue: there are thousands of mutual funds out there. You might have different goals — some funds promise high growth, others emphasise low risk, while some boast about high dividends. But how do you compare them all objectively?

Hands Down Some Of The Best 0% Interest Credit Cards

Pay no interest until nearly 2027 with some of the best hand-picked credit cards this year. They are perfect for anyone looking to pay down their debt, and not add to it!

Click here to see what all of the hype is about.

Learn How To Apply Now

If you’re making investment decisions based solely on past returns, you’re missing half the picture. The challenge is not just about finding mutual funds with strong historical performance, but understanding which factors contribute to long-term success.

This is where machine learning comes into play. Using the Random Forest Regression model, we can analyse multiple factors — returns, risk, expenses, P/E ratio, and more — and predict which funds will likely outperform in the future.

Why Random Forest Regression?

Imagine you’re out hiking and you’ve come across a dense forest. To navigate, you could take one path and hope for the best. Or, you could ask 100 different people who’ve hiked before and see which path they suggest. That’s basically what a Random Forest does.

Random Forest Regression is a machine learning technique that combines multiple decision trees to improve accuracy. It takes into account many variables at once, such as:

Historical returns (e.g., 1-year, 3-year, 5-year performance),
Expense ratio (fees charged by the fund),
Risk measures (volatility, Sharpe ratio),
Assets under management (AUM),
Maximum Drawdown.

Instead of relying on one “path” (i.e., one tree), the Random Forest uses multiple decision trees to form an overall opinion, making it much more reliable than a single decision-making process.

How to Use Random Forest Regression to Pick Mutual Funds

Let’s dive into how you can use Python to build a Random Forest Regression model and select the top 5 mutual funds. It’s simpler than it sounds, I promise!

Step 1: Install Necessary Libraries

You’ll need some basic Python libraries to run this code:

pip install pandas scikit-learn yfinance matplotlib

Step 2: Collect Data

First, gather historical data on mutual funds. For this example, let’s assume you have a dataset called MFLargeMetrics.csv which includes factors like historical returns, expense ratios, volatility, etc. I used the site https://sharpely.in/ to gather the data of Largecap mutual funds available in Indian markets. You can use any such similar site to gather the data.

Step 3: Code Implementation

Here’s the Python code that uses Random Forest to predict future returns and rank mutual funds:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
import numpy as np

# Step 1: Load the data related to Largecap mutual funds
mf_data = pd.read_csv("MFLargeMetrics.csv")
mf_data.dropna(inplace=True)

# Step 2: Feature Engineering
# Assign higher rewards to good metrics and penalties to bad metrics


# Normalize the features to bring them to a comparable scale
# Good metrics: Returns, 3Y_CAGR, 5Y_CAGR, Sharpe_Ratio
mf_data['Normalized Return (1Y)'] = mf_data['Return (1Y)'] / mf_data['Return (1Y)'].max()
mf_data['Normalized_3Y_CAGR'] = mf_data['CAGR (3Y)'] / mf_data['CAGR (3Y)'].max()
mf_data['Normalized_5Y_CAGR'] = mf_data['CAGR (5Y)'] / mf_data['CAGR (5Y)'].max()
mf_data['Normalized_Sharpe'] = mf_data['Sharpe ratio'] / mf_data['Sharpe ratio'].max()

# Bad metrics: PE_Ratio, Drawdown, Risk
mf_data['Normalized_PE'] = mf_data['P/E'] / mf_data['P/E'].max()
mf_data['Normalized_Expense_Ratio'] = mf_data['Expense ratio'] / mf_data['Expense ratio'].max()
mf_data['Normalized_Volatality'] = mf_data['3-year volatility'] / mf_data['3-year volatility'].max()
mf_data['Normalized_Asset_under_management'] = mf_data['Asset under management'] / mf_data['Asset under management'].max()
mf_data['Normalized_Max_Drawdown'] = mf_data['Max Drawdown']/mf_data['Max Drawdown'].min()

# Step 3: Calculate score for each mutual fund. Here I have given more weightage to 3 year returns
# and Sharpe Ratio while on the other hand, I have penalized funds with higher drawdown and higher P/E ratio
mf_data['Score'] = 0.4 * mf_data['Normalized_3Y_CAGR'] + \
                   0.2 * mf_data['Normalized_5Y_CAGR'] +  \
                   0.3 * mf_data['Normalized Return (1Y)'] - \
                   0.2 * mf_data['Normalized_Volatality'] - \
                   0.05 * mf_data['Normalized_Expense_Ratio'] + \
                   0.3 * mf_data['Normalized_Sharpe'] - \
                   0.2 * mf_data['Normalized_PE'] - \
                   0.3 * mf_data['Normalized_Max_Drawdown']

# Step 4: Define features and target
X = mf_data[['CAGR (3Y)', 
             'CAGR (5Y)',  
             'P/E', 
             'Return (1Y)',
             '3-year volatility', 
             'Expense ratio', 
             'Alpha (3 years)', 
             'Sharpe ratio', 
             'Max Drawdown',
             'Asset under management']]

y = mf_data['Score']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)



# Step 5: Initialize the model, train the model and predict the scores
rf = RandomForestRegressor(n_estimators=200, random_state=42)

# Train the model
rf.fit(X_train, y_train)

# Predict on the test set
y_pred = rf.predict(X_test)

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
print(f'Mean Squared Error: {mse}')
print(rf)

# Step 6: Get feature importance
importances = rf.feature_importances_
feature_importance_df = pd.DataFrame({
    'Feature': X.columns,
    'Importance': importances
}).sort_values(by='Importance', ascending=False)

print("\n\n")
print(feature_importance_df)

# Step 7: Select top 5 mutual funds based on predicted performance
mf_data['Predicted Score'] = rf.predict(X)
top_5_funds = mf_data.sort_values(by='Predicted Score', ascending=False).head(5)

top_5_funds.style.set_properties(**{'border': '1.3px solid green',
                          'color': 'magenta'})

print("\n\n")
top_5_funds[['Name', 'Predicted Score']].style.set_properties(**{'border': '1.3px solid green',
                          'color': 'blue'})

After running this above code, you’ll get a list of the top 5 mutual funds based on predicted future returns. These funds are ranked based on a variety of factors, not just their historical returns. This is where Random Forest Regression shines — it considers all variables and their interactions, leading to a more well-rounded selection.

The below image shows the result of running the code. The mean squared error is less, which means the model is predicting the scores well. The feature importance shows what weights were considered by the model to calculate the scores for each mutual fund. And finally the table at the bottom shows the top 5 mutual funds selected by the model.

Results as of 13th Sept, 2024

What’s Happening in the Code?

Step 1: Data Loading: We load historical data of mutual funds, which contains columns like Return (1Y), CAGR (3Y), and Sharpe ratio.
Step 2: Defining Features and Target: The input features (X) are historical performance metrics, while the target (y) is the score we want to predict. In the code, I have used some good (+ve) matrices (Return (1Y), 3 yr CAGR, 5 yr CAGR and Sharpe Ratio) and some bad (-ve) matrices (P/E ratio, Max Drawdown, Asset Under Management and Expense Ratio). I have used different weightage for these matrices to calculate the score (y). You can try out different weightage as per your risk appetite.
Step 3: Train-Test Split: We split the dataset into training and testing sets to evaluate how well the model performs on unseen data.
Step 4: Training the Model: We train a Random Forest Regressor on the training data. The model learns patterns from historical returns and other factors.
Step 5: Making Predictions: The trained model predicts the score for the test set of mutual funds.
Step 6: Evaluating Performance: We use Mean Absolute Error (MAE) to measure how far off the predictions are from actual future returns.
Step 7: Ranking Mutual Funds: Finally, we rank the mutual funds based on predicted future returns and select the top 5.
Feature Importance: This optional step shows which factors the model considers most important for predicting returns.

Conclusion: Investing with Confidence

There’s no crystal ball when it comes to investing, but tools like Random Forest Regression give us a better shot at making informed, data-driven decisions. Instead of relying on subjective opinions or focusing solely on past performance, we can use machine learning to see the bigger picture.

By considering a range of factors — returns, expenses, risk metrics, and more — you can select mutual funds that align with your financial goals and risk tolerance. And while no model is perfect, having the power of data behind your decisions is a great way to invest with confidence.

So, the next time you feel overwhelmed by the sheer number of mutual fund options, remember: there’s a smarter way to make your decision. I wish I had known about this when I first started!

Happy investing!