A quantitative research monograph · Rajveer Singh Pall

FinSight.

Markets move on numbers.
They drift on words.

Every quarter, the leadership of corporate America spends a few thousand hours explaining itself to analysts. This study reads all of it — 14,584 earnings calls across 601 S&P 500 companies — and asks whether the language itself, measured carefully and tested honestly, predicts what the stock does next.

14,584 transcripts2018–202434 language featureszero leakage

AAPL+0.41toneXOM+0.29toneJPM+0.33toneNVDA+0.52toneUNH+0.38toneCVX+0.24toneMSFT+0.47tonePG+0.36toneHAL+0.18toneTSLA+0.31toneJNJ+0.40toneSLB+0.22toneAMZN+0.44toneCAT+0.27toneNEE+0.35toneMETA+0.39toneCOP+0.26toneWMT+0.42toneBA−0.08toneGE+0.30toneDIS+0.21toneOXY+0.19toneHD+0.37toneLIN+0.34toneGS+0.28toneAAPL+0.41toneXOM+0.29toneJPM+0.33toneNVDA+0.52toneUNH+0.38toneCVX+0.24toneMSFT+0.47tonePG+0.36toneHAL+0.18toneTSLA+0.31toneJNJ+0.40toneSLB+0.22toneAMZN+0.44toneCAT+0.27toneNEE+0.35toneMETA+0.39toneCOP+0.26toneWMT+0.42toneBA−0.08toneGE+0.30toneDIS+0.21toneOXY+0.19toneHD+0.37toneLIN+0.34toneGS+0.28tone

Chapter 01 · The Corpus

Fourteen thousand conversations,
shelved by year

An earnings call is a strange document. It is half theatre — prepared remarks polished by investor-relations teams — and half interrogation, as analysts probe for what the script left out. FinSight collects seven years of these conversations: every available quarterly call from the S&P 500, paired with more than a million rows of daily price data, so that every sentence can be held against what the market did in the days that followed.

The grains of ink drifting behind this page are that archive. Each point is one call. They have just arranged themselves into seven columns — one per year — and they will keep reorganising as you descend, because every chapter of this study is a different way of looking at the same fourteen thousand conversations.

14,584

Transcripts ingested

601

S&P 500 companies

1M+

Daily price rows

2018–24

Seven years of quarters

Calls per year

1,786’18

1,849’19

2,073’20

2,065’21

2,033’22

2,027’23

1,609’24

Chapter 02 · Reading

Teaching a machine
to read the room

FinBERT · sentence-level classification · live

CEO · prepared remarks

We delivered record revenue of $2.4 billion this quarter, up 18% year over year.

Margins came in slightly below our expectations, reflecting elevated input costs.

We remain confident in our full-year outlook and the strength of our pipeline.

That said, there remain certain headwinds we continue to monitor closely.

Analyst · Q&A

Can you help us bridge the gap between the guidance you gave last quarter and what you are reporting today?

I guess what I am trying to understand is what actually changed.

CFO · response

Sure — the variance reflects timing dynamics that we view as transitory.

scanning… 0/7 sentences

FinBERT — a BERT model trained on financial text — reads every sentence of every call and scores it positive, negative, or neutral. Crucially, FinSight scores the prepared remarks and the Q&A separately. Management speaks from a script; analysts do not. The difference between those two registers turns out to matter more than either one alone.

Aggregated per call, this produces 14 sentiment features: tone means, negativity ratios, sentence counts, and — most subtly — the volatility of tone across a call, which captures a management team that cannot keep its story steady.

Chapter 03 · Retrieval

Five questions,
asked of every call

Tone tells you how management spoke. It cannot tell you what they actually discussed. So every transcript is split into chunks, embedded into a vector space, and indexed — then five structured questions are put to each call through retrieval. The answers become features: how relevant the retrieved passages are, and what they contain.

380,507

Embedded chunks

5 × 2

Queries × features each

MiniLM-L6

Sentence encoder

ChromaDB

Vector index

Guidance specificity

Q1

“Specific numerical guidance for next quarter revenue and earnings”

Vague guidance hides; specific guidance commits.

Management confidence

Q2

“Management expressing strong confidence about future performance”

Conviction has a vocabulary. So does doubt.

Forward-looking

Q3

“Forward looking statements about growth plans and strategy”

Calls that dwell on the past are often avoiding the future.

New risks

Q4

“New risks headwinds or challenges disclosed this quarter”

First mention of a risk is worth more than its tenth.

Cost pressure

Q5

“Rising costs inflation margin pressure and supply issues”

Margin language leaks before margin numbers do.

Chapter 04 · The Signal Field — explore

11,551 moments of truth

Each point is one earnings call, placed by its language on the horizontal axis and by the stock’s return over the next five trading days on the vertical. Green rose, red fell. Run your cursor through the field — every point answers.

x · qa_neg_ratio — share of negative analyst sentences

← fell · 5-day return · rose →

← gentle Q&Ahostile Q&A →

Chapter 05 · Honest Models

Six models,
no time travel

Most backtested “alpha” dies the moment you stop it from peeking at the future. Standard cross-validation shuffles time; a model trained on 2023 quietly learns things no trader could have known in 2021. FinSight forbids this with walk-forward validation: train on three years, test on the next, slide, repeat. Every result on this page was produced by a model that had never seen its test year.

The protocol

The logistic baseline looks brilliant in 2023 — IC 0.165 — and then loses money in 2021 and 2024. Its standard deviation across folds is 0.114. LightGBM’s is 0.009 — thirteen times steadier, and positive in all four years. In quantitative research, consistency is the difference between a signal and a story.

IC by test year · all models

LightGBMLSTMXGBoostRAG onlyFinBERT onlyBaseline

Walk-forward results: mean IC, IC standard deviation, and hit rate per model
Model	IC mean	IC std	Hit rate
LightGBM · 34 features ★	+0.0198	0.0085	53.3%
LSTM · 6-quarter sequences	+0.0153	0.0211	54.7%
XGBoost · 34 features	+0.0141	0.0180	53.2%
RAG features only	+0.0000	0.0295	53.5%
FinBERT features only	−0.0044	0.0117	53.1%
Logistic regression baseline	+0.0429	0.1141	53.1%

The LSTM deserves a footnote: fed six-quarter sequences per company, it posted the single strongest fold of the entire study (2022, IC +0.047) and the best directional accuracy — evidence that the trajectory of a company’s language carries information its latest call alone does not.

Chapter 06 · Attribution

What the model
learned to hear

Analyst scepticism out-predicts management optimism.

SHAP attribution opens the model and asks which features actually moved its predictions. The answer is consistent and a little subversive: the single most influential feature in the entire system is qa_neg_ratio — the fraction of negative sentences in the analyst Q&A. The prepared remarks are theatre. The interrogation is evidence.

Close behind: management’s tone volatility (a story that keeps changing), the sheer length of the Q&A (how long analysts kept digging), and deliberate neutrality — the corporate art of saying nothing, which the market reads as hedging.

0.054

Top SHAP — qa_neg_ratio

34.6%

SHAP share from RAG features

Mean |SHAP| · top 12 of 34 features · hover for plain English

Q&A · analystsPrepared remarks · managementRetrieval · topics

01qa_neg_ratio0.0541
Share of negative analyst sentences in Q&A
02mgmt_sent_vol0.0476
Volatility of management tone across the call
03qa_n_sentences0.0453
Length of the Q&A — how long analysts kept digging
04mgmt_mean_neu0.0445
How deliberately neutral management stayed
05rag_guidance_specificity_relevance0.0420
Whether the call actually contained specific numerical guidance
06qa_mean_neg0.0415
Average negativity of analyst questions
07qa_net_sentiment0.0403
Net tone of the Q&A session
08rag_management_confidence_score0.0368
Confidence expressed in retrieved management passages
09rag_cost_pressure_relevance0.0358
How much of the call concerned cost pressure
10rag_new_risks_relevance0.0355
How much of the call disclosed new risks
11qa_mean_pos0.0346
Average positivity of analyst questions
12mgmt_n_sentences0.0333
Length of prepared remarks

Chapter 07 · Where It Lives

Alpha has
a postcode

Walk-forward IC by GICS sector · whiskers = ±1 std

Run the same walk-forward protocol sector by sector and the signal stops being evenly spread — it concentrates violently. Energy calls predict their own stocks at IC +0.311, roughly 83× the signal in Technology, which sits at a statistical zero.

The pattern is exactly what efficient-market theory would sketch. Technology is the most-watched sector on earth — every syllable of an Apple call is priced before the CFO finishes the sentence. Energy firms live downstream of commodity prices, hedge books, and project timelines that management discusses in concrete, numerical language. More information asymmetry in; more predictability out.

+0.311

Energy IC · AUC 0.64

+0.004

Technology IC · efficient

Chapter 08 · The Verdict

The signal is real.
The market is fast.

Long–short quartile portfolio · net of 10bps round-trip costs

Backtest metrics for 5-day and 20-day holding periods
Metric	5-day hold	20-day hold
Annualised return	−0.91%	−0.69%
Sharpe ratio	−0.81	−0.23
Max drawdown	−4.24%	−6.03%
Win rate	37.5%	31.3%

We found a whisper, not a siren — and we report it as a whisper.

Here is the honest arithmetic. The signal exists — IC 0.0198, steady across every test year. But at a five-day horizon it is too small to outrun ten basis points of transaction costs, and the strategy loses money: Sharpe −0.81. This is the result most projects bury. It is the most informative number in the study.

Because the loss has structure. Stretch the holding period to twenty days and the Sharpe ratio improves 3.6× while trading costs stay fixed — precisely the shape predicted by post-earnings announcement drift (Bernard & Thomas, 1989): the market underreacts to earnings information and absorbs it over weeks, not days. The language signal is real; it simply needs patience the five-day strategy doesn’t have.

Chapter 09 · Colophon

The archive goes back to sleep

Fourteen thousand conversations, read end to end, yield a small, stable, honest signal — strongest where fewer people are listening, and priced away where everyone is. The field behind this page is drifting back to where you found it. The next earnings season will wake it up again.

Methods & stack

Language model: FinBERT (ProsusAI)
Embeddings: all-MiniLM-L6-v2 · ChromaDB
Learners: LightGBM · XGBoost · PyTorch LSTM
Attribution: SHAP
Validation: Walk-forward, 4 folds, zero leakage
Data: HuggingFace transcripts · yfinance prices
Compute: Single RTX 4060 laptop GPU
This site: Next.js · Three.js · WebGL

Limitations, plainly

Net-of-cost returns are negative at both horizons; this is a study of signal existence, not a tradeable strategy.
Sector results rest on small samples — Energy’s IC of +0.31 comes with a std of 0.24.
Retrieval scoring is lexical-semantic, not generative; a stronger reader may find more.

References

[1]Araci, D. (2019). FinBERT: Financial Sentiment Analysis with Pre-trained Language Models. arXiv:1908.10063.
[2]Bernard, V. & Thomas, J. (1989). Post-Earnings-Announcement Drift: Delayed Price Response or Risk Premium? Journal of Accounting Research.
[3]Lewis, P. et al. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. NeurIPS.
[4]Lundberg, S. & Lee, S.-I. (2017). A Unified Approach to Interpreting Model Predictions. NeurIPS.
[5]Loughran, T. & McDonald, B. (2011). When is a Liability not a Liability? Textual Analysis, Dictionaries, and 10-Ks. Journal of Finance.
[6]Chan, L., Jegadeesh, N. & Lakonishok, J. (1996). Momentum Strategies. Journal of Finance 51(5).

Rajveer Singh Pall

Independent quantitative research. Designed, built, and validated end to end — from transcript ingestion to this page.