Overview
The Historical Market Cap dataset provides a point-in-time record of equity size through historical market capitalization and shares outstanding.
Rather than relying on present-day values or retrospectively adjusted data, this dataset reflects market cap as it was observable on each historical date.
Market capitalization is a foundational input in many systematic strategies — from universe construction and factor research to risk modeling and portfolio weighting. However, accurate historical market cap data is difficult to source and easy to misuse. Naïve approaches often introduce survivorship bias, forward-looking revisions, or implicit assumptions about corporate actions that were not known at the time.
This dataset is designed to address those issues directly.
What the Dataset Represents
Each record in the Historical Market Cap dataset represents:
A single equity ticker
A specific calendar date
Shares outstanding as of that date
The corresponding market capitalization as observed at that time
Both values are provided directly, allowing users to work with historical market cap data without requiring additional joins or reconstruction steps.
The data is structured to reflect availability at the time — not revised or restated using future information. If a company’s shares outstanding or market capitalization changed due to issuance, buybacks, or other corporate actions, those changes appear in the dataset only after they were observable.
Why Historical Market Cap Is Hard
Historical market capitalization is deceptively complex.
Many commonly used datasets:
Backfill shares outstanding using modern values
Revise historical market cap figures based on future filings
Omit delisted or short-lived securities
Provide no visibility into when coverage actually begins
These practices can silently distort research results, especially when building size-based universes, liquidity filters, or long-horizon backtests.
The Historical Market Cap dataset is constructed with point-in-time correctness as the primary constraint, ensuring that users can reason about equity size exactly as it would have appeared historically.
Designed for Survivorship-Safe Research
A core use case of this dataset is survivorship-bias-free universe construction.
By providing:
Explicit coverage start dates for each ticker
Historical market cap and shares outstanding by date
No reliance on present-day index constituents
Users can construct equity universes that reflect what was actually investable at a given moment, rather than what survived into the present.
This is particularly important for:
Small-cap and micro-cap research
Long-horizon factor studies
Corporate action–driven strategies
Backtests sensitive to universe composition
Practical Scope and Intent
This dataset is intended to be used programmatically.
It is optimized for:
Systematic research workflows
Batch querying by ticker or date
Integration into production trading pipelines
Guardrails are intentionally built into the API to encourage correct usage patterns and prevent accidental full-table scans. These constraints are documented in later sections and are designed to reflect how practitioners typically interact with historical market cap data.
How This Fits into a Research Stack
The Historical Market Cap dataset is not a standalone signal. It is a foundational input that complements price, volume, and event-based data.
Typical integrations include:
Filtering universes by historical size thresholds
Weighting portfolios by market cap at rebalance time
Conditioning strategies on size regimes
Combining with event or factor data for downstream modeling
Used correctly, historical market cap becomes an enabling layer for more sophisticated and defensible research.
What’s Next
The following sections walk through:
How to use this dataset in practice, with an emphasis on survivorship-safe universe construction
The data schema and API design, including discovery of available tickers and correct query patterns
Last updated