We scored 24,980 10-K filings for how much each issuer's Risk Factors section changed year-over-year. The median similarity score is 0.91. In English: the typical public company changes about 10% of its risk-factor content from one year to the next. Most of it is copy-paste.
That number is the whole story. The interesting part is the outliers - the companies that didn't copy-paste, and what their rewrites turned out to mean.
How we measured it
For every 10-K we have on file, we extract Item 1A (Risk Factors) and compute Jaccard similarity against the same issuer's prior year filing. Jaccard is just: words shared between the two documents divided by words present in either. A score of 1.0 means identical. 0.0 means no shared content. We also track tone shift (Loughran-McDonald sentiment delta) and paragraph-level add / remove counts so we can tell what kind of change happened.
The distribution across the 24,980 filings:
- 5th percentile: 0.65
- 25th percentile: 0.85
- 50th percentile: 0.91
- 75th percentile: 0.95
- 95th percentile: 1.00
Five percent of all 10-Ks are essentially identical to the prior year. The bottom 5% changed more than a third of the section. What does that bottom 5% look like?
The bottom of the distribution is mostly SPACs
Our first cut returned the lowest Jaccard scores in the dataset:
- Zura Bio (ZURA), FY2023, Jaccard 0.012
- Rigetti Computing (RGTI), FY2022, Jaccard 0.016
- Aeluma (ALMU), FY2025, Jaccard 0.017
- BT Brands (BTBD), FY2022, Jaccard 0.021
A Jaccard of 0.012 means 99% of the Risk Factors section was rewritten. The pattern: each of these added 200-400 new paragraphs and removed exactly 1. That's not a rewrite. That's a first-real-10-K-after-SPAC-merger event. The prior year was the SPAC shell with generic blank-check language; the current year is the actual operating company with operating-business risks.
About two-thirds of the raw bottom-30 are these de-SPAC artifacts. Useful to know - it tells us the methodology surfaces a real corporate-structure transition, not just a typo. Less useful as a list of "10-Ks that warrant a re-read."
Filtering to actual rewrites
To find the rewrites worth reading, we filtered to: established issuers (at least 100 prior Form 4 filings on record), real two-way changes (added at least 50 paragraphs AND removed at least 50), 10-Ks only. Here are ten that come out on top:
| Ticker | Issuer | FY | Jaccard | Tone |
|---|---|---|---|---|
| BJRI | BJ's Restaurants | 2020 | 0.181 | -32 |
| ATNI | ATN International | 2020 | 0.210 | -15 |
| TDW | Tidewater | 2020 | 0.252 | -30 |
| HSIC | Henry Schein | 2021 | 0.333 | -33 |
| CAT | Caterpillar | 2022 | 0.334 | +32 |
| BAX | Baxter International | 2021 | 0.342 | -31 |
| CIEN | Ciena | 2025 | 0.350 | -29 |
| UHS | Universal Health Services | 2022 | 0.356 | +20 |
| USB | US Bancorp | 2024 | 0.360 | +24 |
| BPOP | Popular Inc | 2022 | 0.323 | +16 |
Tone column: Loughran-McDonald net sentiment delta, scaled x1000. Negative = more cautious / negative language; positive = more confident / positive language. FY refers to the period of the rewritten 10-K, not its filing date.
What kind of event triggers a real rewrite?
Reading the actual filings on this list, the rewrites cluster into five categories:
- Crisis pivot. BJ's Restaurants and ATN International in 2020 are the prototype. COVID hits, the entire risk-surface of the business changes, risk factors get rewritten. Tone goes sharply negative.
- Bankruptcy emergence. Tidewater 2020. They emerged from Chapter 11 in 2017 and were still working through the operating-company-vs-restructured- entity rewrite years later. New cap structure, new fleet considerations, new offshore-services-market language. Tone is negative but the underlying business was healing.
- M&A integration. Baxter International 2021 (Hill-Rom acquisition), US Bancorp 2024 (Union Bank). The integration of a target's business-specific risks gets folded into the surviving entity's risk factors. Often shows up as a positive tone shift because the strategic narrative is forward-looking.
- Streamlining and prune-jobs. Henry Schein 2021 removed 1,541 paragraphs of risk-factor content while adding 167. The company decided their disclosures had bloated and pruned. We have a longer list of these below.
- Major regulatory cycle. Popular Inc 2022 added 4,199 paragraphs - largely Basel III and Puerto Rico-specific banking regulatory language. Tone shifts positive because the regulatory language is descriptive rather than crisis-coded.
The pruners
A different angle: which 10-Ks removed the most paragraphs without replacing them? These are the issuers who decided their risk-factor section had bloated and ran a clean-up pass. The all-time leaders:
- ConocoPhillips 2022. Removed 3,399 paragraphs, added 493. Net prune of ~2,900 paragraphs. Coincides with the simplification of their Permian portfolio following the Concho merger.
- Everest Re 2023. Removed 2,566, added 138. A reinsurance company radically tightening its risk-factor language.
- CrossFirst Bankshares 2023. Removed 2,107, added 221. Regional bank cleanup.
- Lakeland Industries 2020. Removed 788, added 40. They make protective apparel; their FY2020 (filed April 2021) pruned the COVID-specific risk language they had front-loaded the year prior.
- Madison Square Garden Entertainment 2022. Removed 1,004, added 138. Coincides with the spinoff that separated entertainment from the sports business. Lots of inherited language got pruned.
What we're going to do with this
The Lazy Prices paper (Cohen, Malloy & Nguyen, 2020) showed that 10-K language change predicts subsequent returns. Their method is similar to ours but ran on a smaller corpus and a single-step Jaccard score. Two extensions we're working on next:
- Adversarial filter on category. A de-SPAC company's 0.02 Jaccard isn't a signal that something is wrong with the business. It's a signal that the business is new. We need to classify rewrites by category (crisis / M&A / regulatory / prune / de-SPAC) before they become tradeable signal.
- Section-weighted scoring. A change in a single Risk Factors paragraph that describes a specific lawsuit is worth more than a change in 40 generic paragraphs about "general economic conditions." Topic-modeling the changes is the next layer.
You can run the same analysis yourself. Pick any issuer; go to /signals and click into the 10-K language tab. Every 10-K we have on file has a Jaccard score and an Analyze button.
If you're using SEC filings to build a research agent and want this data flowing in via API or MCP, email [email protected]. We'll send you the full ranked list of rewrites with tone-shift breakdowns.
Methodology, sample sizes, and limitations are documented on the Validation & accuracy page. Built independently. No affiliation with the SEC. Not investment advice.