SHAP Integration

Learn how to use SHAP (SHapley Additive exPlanations) values to explain model predictions with game theory-based feature attribution.

What is SHAP?

SHAP (SHapley Additive exPlanations) is a unified approach to explain the output of any machine learning model. It connects optimal credit allocation with local explanations using Shapley values from game theory.

  • Provides consistent and locally accurate explanations
  • Works with any model type (tree-based, neural networks, etc.)
  • Based on solid theoretical foundations from game theory

Basic SHAP Usage

Initialize an explainer with SHAP method:

python
from blackbox_core import Explainer

# Initialize with SHAP
explainer = Explainer(
    model=your_model,
    method='shap',
    background_data=X_train  # Optional: sample of training data
)

# Get SHAP values for predictions
explanation = explainer.explain(
    data=X_test,
    feature_names=feature_names
)

Tree-Based Models

For tree-based models like Random Forests and XGBoost, SHAP provides optimized explainers:

python
from sklearn.ensemble import RandomForestClassifier
from blackbox_core import Explainer

# Train model
model = RandomForestClassifier(n_estimators=100)
model.fit(X_train, y_train)

# Create explainer (automatically uses TreeExplainer)
explainer = Explainer(model, method='shap')

# Explain predictions
explanation = explainer.explain(X_test[0:1])
explanation.plot()

Deep Learning Models

For neural networks, use background data for better approximations:

python
import tensorflow as tf
from blackbox_core import Explainer

# Your trained neural network
model = tf.keras.models.load_model('my_model.h5')

# Use a sample of training data as background
background = X_train[:100]

explainer = Explainer(
    model=model,
    method='shap',
    background_data=background
)

# Explain
explanation = explainer.explain(X_test[0:1])

Global Feature Importance

Get overall feature importance across multiple predictions:

python
# Get global feature importance
global_exp = explainer.global_explanation(
    data=X_test,
    feature_names=feature_names
)

# Visualize global importance
global_exp.plot()

# Get top features
top_features = global_exp.get_top_features(n=5)
print(top_features)

Best Practices

  • Use TreeExplainer for tree-based models for faster computation
  • Provide background_data for more accurate explanations with complex models
  • Start with a small background dataset (50-100 samples) for faster computation
  • Combine local and global explanations for comprehensive model understanding