SHAP Integration
Learn how to use SHAP (SHapley Additive exPlanations) values to explain model predictions with game theory-based feature attribution.
What is SHAP?
SHAP (SHapley Additive exPlanations) is a unified approach to explain the output of any machine learning model. It connects optimal credit allocation with local explanations using Shapley values from game theory.
- •Provides consistent and locally accurate explanations
- •Works with any model type (tree-based, neural networks, etc.)
- •Based on solid theoretical foundations from game theory
Basic SHAP Usage
Initialize an explainer with SHAP method:
python
from blackbox_core import Explainer
# Initialize with SHAP
explainer = Explainer(
model=your_model,
method='shap',
background_data=X_train # Optional: sample of training data
)
# Get SHAP values for predictions
explanation = explainer.explain(
data=X_test,
feature_names=feature_names
)Tree-Based Models
For tree-based models like Random Forests and XGBoost, SHAP provides optimized explainers:
python
from sklearn.ensemble import RandomForestClassifier
from blackbox_core import Explainer
# Train model
model = RandomForestClassifier(n_estimators=100)
model.fit(X_train, y_train)
# Create explainer (automatically uses TreeExplainer)
explainer = Explainer(model, method='shap')
# Explain predictions
explanation = explainer.explain(X_test[0:1])
explanation.plot()Deep Learning Models
For neural networks, use background data for better approximations:
python
import tensorflow as tf
from blackbox_core import Explainer
# Your trained neural network
model = tf.keras.models.load_model('my_model.h5')
# Use a sample of training data as background
background = X_train[:100]
explainer = Explainer(
model=model,
method='shap',
background_data=background
)
# Explain
explanation = explainer.explain(X_test[0:1])Global Feature Importance
Get overall feature importance across multiple predictions:
python
# Get global feature importance
global_exp = explainer.global_explanation(
data=X_test,
feature_names=feature_names
)
# Visualize global importance
global_exp.plot()
# Get top features
top_features = global_exp.get_top_features(n=5)
print(top_features)Best Practices
- •Use TreeExplainer for tree-based models for faster computation
- •Provide background_data for more accurate explanations with complex models
- •Start with a small background dataset (50-100 samples) for faster computation
- •Combine local and global explanations for comprehensive model understanding