SHAP Integration

Learn how to use SHAP (SHapley Additive exPlanations) values to explain model predictions with game theory-based feature attribution.

What is SHAP?

SHAP (SHapley Additive exPlanations) is a unified approach to explain the output of any machine learning model. It connects optimal credit allocation with local explanations using Shapley values from game theory.

•Provides consistent and locally accurate explanations
•Works with any model type (tree-based, neural networks, etc.)
•Based on solid theoretical foundations from game theory

Basic SHAP Usage

Initialize an explainer with SHAP method:

python

from blackbox_core import Explainer

# Initialize with SHAP
explainer = Explainer(
    model=your_model,
    method='shap',
    background_data=X_train  # Optional: sample of training data
)

# Get SHAP values for predictions
explanation = explainer.explain(
    data=X_test,
    feature_names=feature_names
)

Tree-Based Models

For tree-based models like Random Forests and XGBoost, SHAP provides optimized explainers:

python

from sklearn.ensemble import RandomForestClassifier
from blackbox_core import Explainer

# Train model
model = RandomForestClassifier(n_estimators=100)
model.fit(X_train, y_train)

# Create explainer (automatically uses TreeExplainer)
explainer = Explainer(model, method='shap')

# Explain predictions
explanation = explainer.explain(X_test[0:1])
explanation.plot()

Deep Learning Models

For neural networks, use background data for better approximations:

python

import tensorflow as tf
from blackbox_core import Explainer

# Your trained neural network
model = tf.keras.models.load_model('my_model.h5')

# Use a sample of training data as background
background = X_train[:100]

explainer = Explainer(
    model=model,
    method='shap',
    background_data=background
)

# Explain
explanation = explainer.explain(X_test[0:1])

Global Feature Importance

Get overall feature importance across multiple predictions:

python

# Get global feature importance
global_exp = explainer.global_explanation(
    data=X_test,
    feature_names=feature_names
)

# Visualize global importance
global_exp.plot()

# Get top features
top_features = global_exp.get_top_features(n=5)
print(top_features)

Best Practices

•Use TreeExplainer for tree-based models for faster computation
•Provide background_data for more accurate explanations with complex models
•Start with a small background dataset (50-100 samples) for faster computation
•Combine local and global explanations for comprehensive model understanding