TimesFM 2.5: Zero-Shot Time Series Forecasting

Forecasting is historically painful. If you’ve ever tried to predict server load, stock prices, or retail demand, you know the drill. You spend days checking for stationarity, fiddling with ARIMA parameters, or training an LSTM that works perfectly for one dataset and fails miserably on another.

The problem has always been generalization. Language models solved this years ago—you don't train GPT-4 from scratch to write a poem. You just ask it.

Time series analysis has finally caught up. Google Research just released TimesFM 2.5 (Time Series Foundation Model), and it’s doing for forecasting what BERT did for NLP. It’s a pre-trained model that can predict the future on data it has never seen before, without any fine-tuning.

The Problem with Old-School Forecasting

For decades, we’ve treated every forecasting problem as an island.

If you wanted to predict sales for Product A, you trained a model on Product A’s history. If you wanted to predict weather, you built a weather model. Methods like ARIMA or Prophet are powerful, but they are strictly local—they only know what you explicitly show them. They don't have "common sense" about how trends, seasonality, or cycles work in the real world.

Deep learning models like LSTMs or Transformers promised to fix this, but they usually require massive datasets of your own to work well. If you only had a few hundred data points, you were out of luck.

Enter TimesFM

TimesFM changes the equation by being a Foundation Model.

Just like LLMs are trained on the entire internet to understand language, TimesFM was pre-trained on a massive corpus of time-series data—including Google Trends, Wikipedia pageviews, and synthetic data. It has seen billions of patterns. It knows what a "holiday spike" looks like. It understands that "daily" data often has a 7-day cycle.

Because it has learned these universal temporal patterns, you can use it in a Zero-Shot setting. You feed it your unique data (even if it's short), and it produces a forecast immediately. No training loops. No hyperparameter tuning.

Under the Hood: TimesFM 2.5

The latest version, TimesFM 2.5, is a decoder-only transformer, similar in architecture to GPT models but adapted for continuous values.

Key Specs

Efficiency: It has roughly 200 million parameters. In the world of LLMs, that’s tiny. You can run this on a standard GPU (or even CPU) easily.
Context Window: It supports up to 16,000 timepoints of context. This is huge for capturing long-term seasonality (like year-over-year trends) that shorter models miss.
Framework Agnostic: The 2.5 release supports both PyTorch and Flax (JAX), making it accessible regardless of your preferred stack.

How It Works (Patching)

Language models tokenize words. TimesFM tokenizes time. It uses a patching mechanism where it groups consecutive time points into a single token. This allows it to process long histories efficiently without getting bogged down by every single data point.

Why This Actually Matters

I’ve tested a lot of "revolutionary" time series models that turn out to be harder to use than a simple moving average. TimesFM feels different for three reasons:

1. It Handles Granularity Automatically

One of the biggest headaches in forecasting is mixing frequencies. Hourly data looks different from weekly data. TimesFM handles this implicitly. You don't need to tell it "this is hourly data." It looks at the patterns and figures it out.

2. It’s Probabilistic

A single number prediction (point forecast) is dangerous. Telling your boss "we will sell 500 units" is a recipe for disaster. Telling them "we will sell between 450 and 550 units" is actionable.

TimesFM outputs quantile forecasts by default. It gives you the full distribution of possibilities, allowing you to assess risk and uncertainty accurately.

3. Open Weights

This isn't just an API behind a paywall. Google released the weights on Hugging Face. You can download the model, run it locally on your own hardware, and integrate it into your private pipelines.

Getting Started

You can pull the model directly from Hugging Face. Since it supports PyTorch now, integration is straightforward.

import timesfm

# Load the model
tfm = timesfm.TimesFm(
    hparams=timesfm.TimesFmHparams(backend="pytorch"),
    checkpoint=timesfm.TimesFmCheckpoint(huggingface_repo_id="google/timesfm-2.5-200m-pytorch"),
)

# precise prediction
forecast = tfm.forecast_on_df(
    inputs=my_dataframe,
    freq="D",  # Daily frequency
    value_name="sales",
    num_jobs=-1,
)

It’s rare to see a tool that removes so much grunt work from a data scientist's daily life. If you are still manually tuning ARIMA parameters for every single metric you track, it’s time to stop.

Official Links

Research Paper (arXiv): A Decoder-Only Foundation Model for Time-Series Forecasting
GitHub Repository: google-research/timesfm
Hugging Face: google/timesfm-2.5-200m-pytorch

Conclusion

TimesFM 2.5 represents a shift from "training models" to "using models" in the time series domain. While it might not beat a heavily hand-tuned model for every single niche use case, it provides a baseline that is startlingly good for almost zero effort. For most applications, that's exactly what we need.

TimesFM 2.5: Zero-Shot Time Series Forecasting

The Problem with Old-School Forecasting

Enter TimesFM

Under the Hood: TimesFM 2.5

Key Specs

How It Works (Patching)

Why This Actually Matters

1. It Handles Granularity Automatically

2. It’s Probabilistic

3. Open Weights

Getting Started

Official Links

Conclusion

SmallAI Team

Related Tools

Related Articles

Login

You've reached your free limit

You ran out of credits

The Problem with Old-School Forecasting

Enter TimesFM

Under the Hood: TimesFM 2.5

Key Specs

How It Works (Patching)

Why This Actually Matters

1. It Handles Granularity Automatically

2. It’s Probabilistic

3. Open Weights

Getting Started

Official Links

Conclusion

SmallAI Team

Related Tools

Related Articles

Get new posts in your inbox