A quantitative research monograph · Rajveer Singh Pall

FinSight.

Markets move on numbers.
They drift on words.

Every quarter, the leadership of corporate America spends a few thousand hours explaining itself to analysts. This study reads all of it — 14,584 earnings calls across 601 S&P 500 companies — and asks whether the language itself, measured carefully and tested honestly, predicts what the stock does next.

14,584 transcripts2018–202434 language featureszero leakage

Chapter 01 · The Corpus

Fourteen thousand conversations,
shelved by year

An earnings call is a strange document. It is half theatre — prepared remarks polished by investor-relations teams — and half interrogation, as analysts probe for what the script left out. FinSight collects seven years of these conversations: every available quarterly call from the S&P 500, paired with more than a million rows of daily price data, so that every sentence can be held against what the market did in the days that followed.

The grains of ink drifting behind this page are that archive. Each point is one call. They have just arranged themselves into seven columns — one per year — and they will keep reorganising as you descend, because every chapter of this study is a different way of looking at the same fourteen thousand conversations.

14,584
Transcripts ingested
601
S&P 500 companies
1M+
Daily price rows
2018–24
Seven years of quarters

Calls per year

1,78618
1,84919
2,07320
2,06521
2,03322
2,02723
1,60924

Chapter 02 · Reading

Teaching a machine
to read the room

FinBERT · sentence-level classification · live

CEO · prepared remarks

We delivered record revenue of $2.4 billion this quarter, up 18% year over year.

Margins came in slightly below our expectations, reflecting elevated input costs.

We remain confident in our full-year outlook and the strength of our pipeline.

That said, there remain certain headwinds we continue to monitor closely.

Analyst · Q&A

Can you help us bridge the gap between the guidance you gave last quarter and what you are reporting today?

I guess what I am trying to understand is what actually changed.

CFO · response

Sure — the variance reflects timing dynamics that we view as transitory.

scanning… 0/7 sentences

FinBERT — a BERT model trained on financial text — reads every sentence of every call and scores it positive, negative, or neutral. Crucially, FinSight scores the prepared remarks and the Q&A separately. Management speaks from a script; analysts do not. The difference between those two registers turns out to matter more than either one alone.

Aggregated per call, this produces 14 sentiment features: tone means, negativity ratios, sentence counts, and — most subtly — the volatility of tone across a call, which captures a management team that cannot keep its story steady.

2018201920202021202220232024mgmtQ&Athe optimism gap

Chapter 03 · Retrieval

Five questions,
asked of every call

Tone tells you how management spoke. It cannot tell you what they actually discussed. So every transcript is split into chunks, embedded into a vector space, and indexed — then five structured questions are put to each call through retrieval. The answers become features: how relevant the retrieved passages are, and what they contain.

380,507
Embedded chunks
5 × 2
Queries × features each
MiniLM-L6
Sentence encoder
ChromaDB
Vector index

Guidance specificity

Q1

Specific numerical guidance for next quarter revenue and earnings

Vague guidance hides; specific guidance commits.

Management confidence

Q2

Management expressing strong confidence about future performance

Conviction has a vocabulary. So does doubt.

Forward-looking

Q3

Forward looking statements about growth plans and strategy

Calls that dwell on the past are often avoiding the future.

New risks

Q4

New risks headwinds or challenges disclosed this quarter

First mention of a risk is worth more than its tenth.

Cost pressure

Q5

Rising costs inflation margin pressure and supply issues

Margin language leaks before margin numbers do.

Chapter 04 · The Signal Field — explore

11,551 moments of truth

Each point is one earnings call, placed by its language on the horizontal axis and by the stock’s return over the next five trading days on the vertical. Green rose, red fell.

x · qa_neg_ratio — share of negative analyst sentences

gentle Q&Ahostile Q&A

Chapter 05 · Honest Models

Six models,
no time travel

Most backtested “alpha” dies the moment you stop it from peeking at the future. Standard cross-validation shuffles time; a model trained on 2023 quietly learns things no trader could have known in 2021. FinSight forbids this with walk-forward validation: train on three years, test on the next, slide, repeat. Every result on this page was produced by a model that had never seen its test year.

The protocol

2018201920202021202220232024fold 2021TRAINTRAINTRAINTESTfold 2022TRAINTRAINTRAINTESTfold 2023TRAINTRAINTRAINTESTfold 2024TRAINTRAINTRAINTEST

The logistic baseline looks brilliant in 2023 — IC 0.165 — and then loses money in 2021 and 2024. Its standard deviation across folds is 0.114. LightGBM’s is 0.009 — thirteen times steadier, and positive in all four years. In quantitative research, consistency is the difference between a signal and a story.

IC by test year · all models

-0.0500.050.100.152021202220232024baseline: luckyLightGBM: steady
LightGBMLSTMXGBoostRAG onlyFinBERT onlyBaseline
Walk-forward results: mean IC, IC standard deviation, and hit rate per model
ModelIC meanIC stdHit rate
LightGBM · 34 features+0.01980.008553.3%
LSTM · 6-quarter sequences+0.01530.021154.7%
XGBoost · 34 features+0.01410.018053.2%
RAG features only+0.00000.029553.5%
FinBERT features only0.00440.011753.1%
Logistic regression baseline+0.04290.114153.1%

The LSTM deserves a footnote: fed six-quarter sequences per company, it posted the single strongest fold of the entire study (2022, IC +0.047) and the best directional accuracy — evidence that the trajectory of a company’s language carries information its latest call alone does not.

Chapter 06 · Attribution

What the model
learned to hear

Analyst scepticism out-predicts management optimism.

SHAP attribution opens the model and asks which features actually moved its predictions. The answer is consistent and a little subversive: the single most influential feature in the entire system is qa_neg_ratio — the fraction of negative sentences in the analyst Q&A. The prepared remarks are theatre. The interrogation is evidence.

Close behind: management’s tone volatility (a story that keeps changing), the sheer length of the Q&A (how long analysts kept digging), and deliberate neutrality — the corporate art of saying nothing, which the market reads as hedging.

0.054
Top SHAP — qa_neg_ratio
34.6%
SHAP share from RAG features

Mean |SHAP| · top 12 of 34 features · hover for plain English

Q&A · analystsPrepared remarks · managementRetrieval · topics
  1. 01qa_neg_ratio0.0541

    Share of negative analyst sentences in Q&A

  2. 02mgmt_sent_vol0.0476

    Volatility of management tone across the call

  3. 03qa_n_sentences0.0453

    Length of the Q&A — how long analysts kept digging

  4. 04mgmt_mean_neu0.0445

    How deliberately neutral management stayed

  5. 05rag_guidance_specificity_relevance0.0420

    Whether the call actually contained specific numerical guidance

  6. 06qa_mean_neg0.0415

    Average negativity of analyst questions

  7. 07qa_net_sentiment0.0403

    Net tone of the Q&A session

  8. 08rag_management_confidence_score0.0368

    Confidence expressed in retrieved management passages

  9. 09rag_cost_pressure_relevance0.0358

    How much of the call concerned cost pressure

  10. 10rag_new_risks_relevance0.0355

    How much of the call disclosed new risks

  11. 11qa_mean_pos0.0346

    Average positivity of analyst questions

  12. 12mgmt_n_sentences0.0333

    Length of prepared remarks

Chapter 07 · Where It Lives

Alpha has
a postcode

Walk-forward IC by GICS sector · whiskers = ±1 std

IC = 0Energy+0.311Real Estate+0.078Industrials+0.074Utilities+0.064Consumer Staples+0.061Healthcare+0.058Financials+0.024Technology+0.004Consumer Disc0.032Communications0.083Materials0.132

Run the same walk-forward protocol sector by sector and the signal stops being evenly spread — it concentrates violently. Energy calls predict their own stocks at IC +0.311, roughly 83× the signal in Technology, which sits at a statistical zero.

The pattern is exactly what efficient-market theory would sketch. Technology is the most-watched sector on earth — every syllable of an Apple call is priced before the CFO finishes the sentence. Energy firms live downstream of commodity prices, hedge books, and project timelines that management discusses in concrete, numerical language. More information asymmetry in; more predictability out.

+0.311
Energy IC · AUC 0.64
+0.004
Technology IC · efficient

Chapter 08 · The Verdict

The signal is real.
The market is fast.

Long–short quartile portfolio · net of 10bps round-trip costs

-4%-2%0%+2%202120222023202420-day5-day
Backtest metrics for 5-day and 20-day holding periods
Metric5-day hold20-day hold
Annualised return−0.91%−0.69%
Sharpe ratio−0.81−0.23
Max drawdown−4.24%−6.03%
Win rate37.5%31.3%

We found a whisper, not a siren — and we report it as a whisper.

Here is the honest arithmetic. The signal exists — IC 0.0198, steady across every test year. But at a five-day horizon it is too small to outrun ten basis points of transaction costs, and the strategy loses money: Sharpe −0.81. This is the result most projects bury. It is the most informative number in the study.

Because the loss has structure. Stretch the holding period to twenty days and the Sharpe ratio improves 3.6× while trading costs stay fixed — precisely the shape predicted by post-earnings announcement drift (Bernard & Thomas, 1989): the market underreacts to earnings information and absorbs it over weeks, not days. The language signal is real; it simply needs patience the five-day strategy doesn’t have.

Chapter 09 · Colophon

The archive goes back to sleep

Fourteen thousand conversations, read end to end, yield a small, stable, honest signal — strongest where fewer people are listening, and priced away where everyone is. The field behind this page is drifting back to where you found it. The next earnings season will wake it up again.

Methods & stack

Language model
FinBERT (ProsusAI)
Embeddings
all-MiniLM-L6-v2 · ChromaDB
Learners
LightGBM · XGBoost · PyTorch LSTM
Attribution
SHAP
Validation
Walk-forward, 4 folds, zero leakage
Data
HuggingFace transcripts · yfinance prices
Compute
Single RTX 4060 laptop GPU
This site
Next.js · Three.js · WebGL

Limitations, plainly

  • Net-of-cost returns are negative at both horizons; this is a study of signal existence, not a tradeable strategy.
  • Sector results rest on small samples — Energy’s IC of +0.31 comes with a std of 0.24.
  • Retrieval scoring is lexical-semantic, not generative; a stronger reader may find more.

Next

  • Long-only 20-day backtest to remove short-selling costs
  • Generative scoring of retrieved passages with a modern LLM
  • Sector-stratified models — one specialist per GICS group
  • Real-time pipeline for live earnings season

References

  1. [1]Araci, D. (2019). FinBERT: Financial Sentiment Analysis with Pre-trained Language Models. arXiv:1908.10063.
  2. [2]Bernard, V. & Thomas, J. (1989). Post-Earnings-Announcement Drift: Delayed Price Response or Risk Premium? Journal of Accounting Research.
  3. [3]Lewis, P. et al. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. NeurIPS.
  4. [4]Lundberg, S. & Lee, S.-I. (2017). A Unified Approach to Interpreting Model Predictions. NeurIPS.
  5. [5]Loughran, T. & McDonald, B. (2011). When is a Liability not a Liability? Textual Analysis, Dictionaries, and 10-Ks. Journal of Finance.
  6. [6]Chan, L., Jegadeesh, N. & Lakonishok, J. (1996). Momentum Strategies. Journal of Finance 51(5).

Rajveer Singh Pall

Independent quantitative research. Designed, built, and validated end to end — from transcript ingestion to this page.

© 2026 Rajveer Singh Pall · MIT License · Research, engineering & design by the author