FinSight
Quant Research OS
SHAP · LightGBM · Full Dataset

Feature Importance

SHapley Additive exPlanations on LightGBM trained across the full dataset. Shows which features actually drive predictions vs which are noise.

Top driver
Highest mean |SHAP|

No data loaded

Dominant family
Largest total contribution

Most of the model signal comes from this feature class.

Top-20 concentration
Decision simplicity
0%

Share of explanatory power captured by top 20 features.

Filter focus
Current analysis lens
All

Use filter below to inspect what truly drives forecasts.

Key insights

Analyst challenge language is most predictive

Why it matters

Q&A negativity and scrutiny terms dominate top rankings.

Prioritize coverage where analyst pushback rises quarter-over-quarter.

Management consistency matters

Why it matters

Volatile management tone often precedes larger future dispersion.

Use tone volatility as a position-sizing modifier, not just direction signal.

Context features are additive, not dominant

Why it matters

RAG signals help, but sentiment behavior still drives most edge.

Invest first in transcript quality and linguistic feature reliability.

Conclusion: only a handful of feature families explain most prediction power
Primary question: which factor should you monitor each earnings season?
Conclusion: feature family mix determines model robustness
Use this to decide where to invest additional feature engineering effort.
Supporting insights from top-ranked features
#1
QA FinBERT
qa_neg_ratio
Analyst pushback proportion is the single strongest signal. When analysts push back hard, management is hiding something.
Mean |SHAP|0.0541
#2
Mgmt FinBERT
mgmt_sent_vol
Inconsistent management sentiment — oscillating between optimism and caution — precedes larger price moves in either direction.
Mean |SHAP|0.0476
#3
QA FinBERT
qa_n_sentences
Longer Q&A sessions signal more analyst scrutiny, correlating with higher uncertainty and larger subsequent price reactions.
Mean |SHAP|0.0453
#4
Mgmt FinBERT
mgmt_mean_neu
Deliberately neutral language can mask very good or very bad news — a hedging signal that markets react to.
Mean |SHAP|0.0445
#5
RAG
rag_guidance_specificity_relevance
Semantic relevance of guidance section to numerical targets — not just content — matters. Specific guidance = clearer market reaction.
Mean |SHAP|0.0420
🔑 Key Finding
Analyst Q&A features dominate management prepared remarks as predictive signals. qa_neg_ratio (SHAP=0.054) outperforms all management sentiment features combined. This is consistent with the hypothesis that management tone is endogenous and strategically managed, while analyst skepticism is a partially independent signal from sophisticated market participants. RAG features account for 34.6% of total SHAP importance despite comprising fewer features, validating the contribution of structured semantic retrieval.