Feature Importance

SHapley Additive exPlanations on LightGBM trained across the full dataset. Shows which features actually drive predictions vs which are noise.

Top driver

Highest mean |SHAP|

—

No data loaded

Dominant family

Largest total contribution

—

Most of the model signal comes from this feature class.

Top-20 concentration

Decision simplicity

Share of explanatory power captured by top 20 features.

Filter focus

Current analysis lens

All

Use filter below to inspect what truly drives forecasts.

Key insights

Analyst challenge language is most predictive

Why it matters

Q&A negativity and scrutiny terms dominate top rankings.

Prioritize coverage where analyst pushback rises quarter-over-quarter.

Management consistency matters

Why it matters

Volatile management tone often precedes larger future dispersion.

Use tone volatility as a position-sizing modifier, not just direction signal.

Context features are additive, not dominant

Why it matters

RAG signals help, but sentiment behavior still drives most edge.

Invest first in transcript quality and linguistic feature reliability.

Conclusion: only a handful of feature families explain most prediction power

Primary question: which factor should you monitor each earnings season?

Conclusion: feature family mix determines model robustness

Use this to decide where to invest additional feature engineering effort.

Supporting insights from top-ranked features

QA FinBERT

qa_neg_ratio

Analyst pushback proportion is the single strongest signal. When analysts push back hard, management is hiding something.

Mean |SHAP|0.0541

Mgmt FinBERT

mgmt_sent_vol

Inconsistent management sentiment — oscillating between optimism and caution — precedes larger price moves in either direction.

Mean |SHAP|0.0476

QA FinBERT

qa_n_sentences

Longer Q&A sessions signal more analyst scrutiny, correlating with higher uncertainty and larger subsequent price reactions.

Mean |SHAP|0.0453

Mgmt FinBERT

mgmt_mean_neu

Deliberately neutral language can mask very good or very bad news — a hedging signal that markets react to.

Mean |SHAP|0.0445

RAG

rag_guidance_specificity_relevance

Semantic relevance of guidance section to numerical targets — not just content — matters. Specific guidance = clearer market reaction.

Mean |SHAP|0.0420

🔑 Key Finding

Analyst Q&A features dominate management prepared remarks as predictive signals. qa_neg_ratio (SHAP=0.054) outperforms all management sentiment features combined. This is consistent with the hypothesis that management tone is endogenous and strategically managed, while analyst skepticism is a partially independent signal from sophisticated market participants. RAG features account for 34.6% of total SHAP importance despite comprising fewer features, validating the contribution of structured semantic retrieval.