In partnership with

It's not you, it’s your tax tools

Tax teams are stretched thin and spreadsheets aren’t cutting it. This guide helps you figure out what to look for in tax software that saves time, cuts risk, and keeps you ahead of reporting demands.

🚀 Your Investing Journey Just Got Better: Premium Subscriptions Are Here! 🚀

It’s been 4 months since we launched our premium subscription plans at GuruFinance Insights, and the results have been phenomenal! Now, we’re making it even better for you to take your investing game to the next level. Whether you’re just starting out or you’re a seasoned trader, our updated plans are designed to give you the tools, insights, and support you need to succeed.

Here’s what you’ll get as a premium member:

  • Exclusive Trading Strategies: Unlock proven methods to maximize your returns.

  • In-Depth Research Analysis: Stay ahead with insights from the latest market trends.

  • Ad-Free Experience: Focus on what matters most—your investments.

  • Monthly AMA Sessions: Get your questions answered by top industry experts.

  • Coding Tutorials: Learn how to automate your trading strategies like a pro.

  • Masterclasses & One-on-One Consultations: Elevate your skills with personalized guidance.

Our three tailored plans—Starter Investor, Pro Trader, and Elite Investor—are designed to fit your unique needs and goals. Whether you’re looking for foundational tools or advanced strategies, we’ve got you covered.

Don’t wait any longer to transform your investment strategy. The last 4 months have shown just how powerful these tools can be—now it’s your turn to experience the difference.

Polar bear kissing a Duck

Hedge funds like Jane Street and Goldman Sachs rely on sophisticated research databases to analyze markets. These systems store stock prices, combine them with economic indicators, and enable advanced analytics at scale. Entire teams manage these databases.

With Python, you can build your own version at home that is fast, free, and local research database, and handle millions of rows of stock data. The setup takes less than ten minutes, requires no cloud subscription, and runs on a standard laptop. These can be highly useful for independent researcher, algorithmic trading, weather prediction etc. Basically anything that requires analysis of huge chunks of data.

In this article let’s use stock data from yfinance to build a complete research database using Polars, DuckDB, and Yahoo Finance data. You will learn the following:

  1. Fetch stock prices, prepare them, store them in a database

  2. Run SQL queries for insights for advanced research.

  3. Create a professional level analytics workflow.

Where to Invest $100,000 According to Experts

Investors face a dilemma. Headlines everywhere say tariffs and AI hype are distorting public markets.

Now, the S&P is trading at over 30x earnings—a level historically linked to crashes.

And the Fed is lowering rates, potentially adding fuel to the fire.

Bloomberg asked where experts would personally invest $100,000 for their September edition. One surprising answer? Art.

It’s what billionaires like Bezos, Gates, and the Rockefellers have used to diversify for decades.

Why?

  • Contemporary art prices have appreciated 11.2% annually on average

  • And with one of the lowest correlations to stocks of any major asset class (Masterworks data, 1995-2024).

  • Ultra-high net worth collectors (>$50M) allocated 25% of their portfolios to art on average. (UBS, 2024)

Thanks to the world’s premiere art investing platform, now anyone can access works by legends like Banksy, Basquiat, and Picasso—without needing millions. Want in? Shares in new offerings can sell quickly but…

*Past performance is not indicative of future returns. Important Reg A disclosures: masterworks.com/cd.

Why Build Your Own Database?

Cloud services like AWS, Google BigQuery, or Snowflake can manage large datasets, but they come with recurring costs and setup complexity. For many independent researchers, students, or small funds, these costs are unnecessary.

A local research database offers several advantages:

  • Zero recurring costs — runs entirely on your laptop.

  • Speed — optimized libraries like Polars and DuckDB can process millions of rows in seconds.

  • Control and privacy — your financial research data never leaves your machine.

  • Flexibility — combine stock data with economic indicators, alternative datasets, or your own analytics.

With the right setup, you can potentially analyze market data as efficiently as large institutions.

Prerequisites

Before building, install the necessary Python libraries. Open your terminal and run:

pip install polars yfinance duckdb pyarrow matplotlib
  • polars — DataFrame processing

  • yfinance — fetches stock data from Yahoo Finance

  • duckdb — database engine

  • pyarrow — duckdb needs pyarrow to run

  • matplotlib — chart visualization

Ensure you are running Python 3.8 or higher:

python --version

Why Polars and DuckDB?

Polars

It is a high-performance DataFrame library built in Rust. It handles large datasets efficiently. It offers significant performance gains and lower memory usage for complex data transformations over pandas, a popular python library.

Key advantages:

  • Performance: Polars outperforms Pandas in most benchmarks. On 1M+ rows, filters and group operations are often 2–5× faster.

  • Multi-core execution: Polars automatically uses all available CPU cores.

  • Memory efficiency: Written in Rust, it handles large datasets without memory bloat.

  • Simple syntax: Operations like filtering, grouping, and adding columns require only a few lines of code.

Example: filter rows where price exceeds 100:

df.filter(pl.col("price") > 100)

DuckDB

It is a fast, analytical database. It runs directly within your application. and is optimized for OLAP (Online Analytical Processing) workloads. It allows you to use SQL to query and transform data, including files on disk or in-memory data structures like Polars DataFrames, without the need for a separate server or complex setup. DuckDB is designed to handle datasets that are too large to fit in memory by “spilling” data to disk, making it a powerful tool for SQL-based data analysis on a single machine.

Key advantages:

  • SQL power: Run queries with joins, aggregates, and window functions.

  • Zero setup: No server required; it runs in your process.

  • Scales locally: Efficiently handles tens of millions of rows on a laptop.

  • Seamless integration: Works directly with Polars, Pandas, and Parquet files.

Example: compute average closing price:

SELECT AVG(close) FROM stocks;

Step 1: Fetch Stock Data

Create a new script file, for example stock_db.py. At the top, import the libraries:

import polars as pl
import yfinance as yf
import duckdb
import matplotlib.pyplot as plt

Let’s fetch daily price data for NVIDIA (NVDA) from January 2024 to January 2024:5

prices = yf.download("NVDA", start="2024-01-01", end="2025-01-01")

Like Playing Moneyball with Your Stock Picks

The data that actually moves markets:

  • Congressional Trades: Pelosi up 178% on TEM options

  • Reddit Sentiment: 3,968% increase in DOOR mentions before 530% in gains

  • Insider Activity: UNH execs bought $31M before Buffett joined

While you analyze P/E ratios, professionals track Reddit mentions, Congressional filings, and insider purchases in real-time.

What if you had access to all of it?

Every week, our AI processes 50,000+ Reddit comments, every Congressional filing, and insider transactions from 500+ companies.

Then we send you the AltIndex Newsletter telling you which 5 stocks show the strongest signals across ALL categories for the week.

Or you can download our app to get access to those signals at all times (get a free 7-day trial when you subscribe to the newsletter).

The next big winner is already showing signals. Will you see them this time?

Past performance does not guarantee future results. Investing involves risk including possible loss of principal.

This returns a Pandas DataFrame with columns like:

prices screenshot

Convert it to a Polars DataFrame and add a symbol column:

df = pl.from_pandas(prices.reset_index()).with_columns(
    pl.lit("NVDA").alias("symbol")
)

Now each row is tagged with "NVDA". This will make it easier to add more stocks later.

Add basic error handling in case Yahoo Finance fails:

try:
    prices = yf.download("NVDA", start="2024-01-01", end="2025-01-01")
except Exception as e:
    print(f"Error downloading data: {e}")
    exit()

Now your dataframe will look like this:

Price Table

Step 2: Create the DuckDB Database

Connect to DuckDB and create a database file called stocks.db:

con = duckdb.connect("stocks.db")

Load the Polars DataFrame into DuckDB as a table:

con.execute("""
    CREATE TABLE IF NOT EXISTS stocks AS SELECT * FROM df
""")

DuckDB reads directly from Polars, so no extra conversion is needed.

Verify the data:

row_count = con.execute("SELECT COUNT(*) FROM stocks").fetchone()
print(f"Rows in table: {row_count[0]}")

For one year of NVDA, expect about 250 rows (trading days).

Step 3: Query Insights with SQL

Once data is stored, SQL becomes a powerful way to extract insights.

Average Closing Price

avg_close = con.execute(f"""
    SELECT symbol, ROUND(AVG("('Close', 'NVDA')"), 2) AS avg_close
    FROM stocks
    GROUP BY symbol
""").fetchdf()
print(avg_close)

Output will look like:

   symbol   avg_close
0   NVDA      36.45

Top 5 Volume Days

high_vol = con.execute(f"""
    SELECT symbol, "('Date', '')", "('Volume', 'NVDA')", "('Close', 'NVDA')"
    FROM stocks
    ORDER BY "('Volume', 'NVDA')" DESC
    LIMIT 5
""").fetchdf()
print(high_vol)

This shows the days when trading activity spiked, often around earnings or news events.

Rolling 20-Day VWAP

The Volume Weighted Average Price (VWAP) is used by traders to benchmark execution. You can compute a rolling 20-day VWAP with the following simple sql query. We’ll wrap it in a python string to execute on DuckDB.

vwap_query = """
    WITH daily_vwap AS (
        SELECT
            "('Date', '')",
            symbol,
            SUM("('Volume', 'NVDA')" * "('Close', 'NVDA')") / SUM("('Volume', 'NVDA')") AS vwap
        FROM stocks
        GROUP BY "('Date', '')", symbol
    ),
    rolling_vwap AS (
        SELECT
            "('Date', '')",
            symbol,
            AVG(vwap) OVER (
                PARTITION BY symbol
                ORDER BY "('Date', '')"
                ROWS BETWEEN 19 PRECEDING AND CURRENT ROW
            ) AS rolling_20d_vwap
        FROM daily_vwap
    )
    SELECT * FROM rolling_vwap
    ORDER BY symbol, "('Date', '')";
"""

In Python:

vwap_df = pl.DataFrame(con.execute(vwap_query).fetchdf())
vwap_df = vwap_df.rename({
    "('Date', '')": "Date"
})

Plot the result:

vwap_df.to_pandas().plot(x="Date", y="rolling_20d_vwap", title="20-Day Rolling VWAP for NVDA")
plt.show()

Now you can visualise and easily understand the trend.

20 day vwap

Step 4: Expand the Database

One stock is useful, but a research database becomes powerful when it contains many.

Add multiple stocks:

symbols = ["NVDA", "AAPL", "TSLA"]

for sym in symbols:
    prices = yf.download(sym, start="2024-01-01", end="2025-01-01")
    df = pl.from_pandas(prices.reset_index()).with_columns(pl.lit(sym).alias("symbol"))
    con.execute("INSERT INTO stocks SELECT * FROM df")

Now you can run cross-symbol queries. Just keep in mind DuckDB queries can be wrapped in python multi-line string and executed. You can also execute queries directly in the duckDB shell window.

SELECT symbol, ROUND(AVG(Close), 2) AS avg_close
FROM stocks
GROUP BY symbol;

Output will list average close prices for NVDA, AAPL, and TSLA side by side.

Step 5: Compute Returns and Breakouts

You can calculate daily returns directly in Polars:

df = df.with_columns(
    (pl.col("Close") / pl.col("Close").shift(1) - 1).alias("daily_return")
)

Or push calculations into SQL with window functions. Example: detect breakouts when the closing price is above its 50-day moving average:

WITH ma AS (
    SELECT
        Date,
        symbol,
        Close,
        AVG(Close) OVER (
            PARTITION BY symbol
            ORDER BY Date
            ROWS BETWEEN 49 PRECEDING AND CURRENT ROW
        ) AS ma_50
    FROM stocks
)
SELECT *
FROM ma
WHERE Close > ma_50;

This query highlights bullish momentum events.

Step 6: Scale and Extend

This basic system already supports powerful analysis. Here’s how you can extend it:

  • Add more stocks: Build a broad database covering indices, sectors, or ETFs.

  • Incorporate economic data: Import GDP, CPI, or unemployment data via APIs like FRED.

  • Combine with alternative data: News sentiment, Google Trends, or crypto prices.

  • Export: Save tables to Parquet or CSV for sharing.

  • Build dashboards: Use Streamlit or Dash to create interactive research tools.

DuckDB can query directly from Parquet files without importing into tables. This makes it easy to handle datasets that are too large for memory. Polars also reads and writes Parquet efficiently, making both libraries ideal for scaling.

Security and Privacy

By keeping your database local:

  • You avoid leaking research data to cloud providers.

  • You remain in control of data retention.

  • You minimize exposure to outages or API limits.

For independent researchers and small funds, this is a major advantage.

Your Next Steps

To practice:

  1. Change the stock symbol to AAPL or MSFT.

  2. Extend the date range to two years.

  3. Add daily returns and volatility calculations.

  4. Explore more SQL window functions (e.g., running totals, ranks).

  5. Build a simple visualization dashboard with Streamlit.

The official documentation is worth bookmarking:

Both projects are active, with frequent improvements and community support.

Conclusion

In less than ten minutes, you built a working research database:

  • Polars handled data preparation quickly and efficiently.

  • DuckDB stored the data and enabled powerful SQL queries.

  • Together, they formed a zero-cost, local, professional-grade research pipeline.

This approach scales beyond simple stock prices. You can add economic indicators, experiment with models, or test trading strategies. You now have the same foundation that powers serious market analysis.

This article uses Yahoo Finance data for educational purposes only. Market data can contain errors or delays. But if you are serious about data analysis in general, you can easily get large volumes of data for weather, sports and economic activity. These can be great as practice sets.

logo

Subscribe to our premium content to get access to the full article and code snippets.

Become a paying subscriber to get access to this post and other subscriber-only content.

Upgrade

Keep Reading

No posts found