LIME Explanations

    From Marovi AI
    This page contains changes which are not marked for translation.
    Other languages:
    Article
    Topic area Interpretability
    Prerequisites Machine Learning, Linear Regression, Feature Importance


    Overview

    Local Interpretable Model-agnostic Explanations (LIME) is a technique for explaining the predictions of any classifier or regressor by approximating it locally with an interpretable surrogate model. Introduced by Ribeiro, Singh, and Guestrin in 2016, LIME treats the model under inspection as a black box and probes it with perturbed samples around a single input of interest, then fits a sparse linear model whose coefficients describe which features pushed the prediction toward or away from a given class. Because the surrogate is fit only in a small neighborhood of the instance being explained, LIME does not attempt to characterize global model behavior; it produces one explanation per prediction, and different predictions from the same model can be explained by different feature subsets.

    LIME has become one of the most widely used post-hoc explanation methods in applied Machine Learning, particularly in domains such as healthcare, credit scoring, and content moderation where stakeholders need a per-instance justification rather than a global summary. It is model-agnostic, meaning it works with neural networks, gradient-boosted trees, support vector machines, or any predictor exposed through a probability or score function, and it has variants tailored to tabular, text, and image inputs.

    Intuition

    The core intuition behind LIME is that even highly nonlinear models are approximately linear in a sufficiently small neighborhood. If we draw samples close to the instance we want to explain and observe how the model's predictions change, we can fit a simple linear model that mimics the black-box model's behavior in that neighborhood. The coefficients of this local linear model serve as the explanation: positive coefficients indicate features that increased the predicted probability of the class, negative ones indicate features that decreased it.

    A key design choice in LIME is the representation used for the surrogate. The original input may be high-dimensional and continuous, which is hard to interpret directly, so LIME maps it to an interpretable representation of binary features. For text, each feature is the presence or absence of a token; for images, it is the presence or absence of a superpixel; for tabular data, it is membership in a discretized bin of a column. The surrogate model operates on these binary features, ensuring that every coefficient corresponds to something a human can name and reason about.

    Formulation

    Let $ f : \mathcal{X} \to \mathbb{R} $ be the black-box model, where $ f(x) $ is the predicted probability or score for a class of interest. Let $ x \in \mathcal{X} $ be the instance to explain, and let $ x' \in \{0,1\}^{d'} $ be its representation in the interpretable space. LIME defines a class $ G $ of interpretable models (typically sparse linear models), a proximity measure $ \pi_x $ that weights samples by their closeness to $ x $, and a complexity measure $ \Omega(g) $ penalizing models that are too complex to be human-readable. The explanation is the model

    $ {\displaystyle \xi(x) = \arg\min_{g \in G} \mathcal{L}(f, g, \pi_x) + \Omega(g),} $

    where $ \mathcal{L} $ is a locality-aware loss measuring how well $ g $ approximates $ f $ in the neighborhood induced by $ \pi_x $. In practice, $ \mathcal{L} $ is a weighted squared error,

    $ {\displaystyle \mathcal{L}(f, g, \pi_x) = \sum_{z, z'} \pi_x(z) \, \big( f(z) - g(z') \big)^2,} $

    evaluated over perturbed samples $ z' $ drawn around $ x' $, with $ z $ being the corresponding point in the original input space. The proximity $ \pi_x(z) = \exp(-D(x, z)^2 / \sigma^2) $ uses an exponential kernel over a distance $ D $ appropriate for the input modality (cosine distance for text, L2 for tabular features in a normalized space). The complexity term $ \Omega(g) $ typically caps the number of nonzero coefficients via L1 Regularization or an explicit feature budget $ K $, often enforced with the Lasso path or a forward-selection procedure.

    Algorithm

    The standard LIME algorithm proceeds as follows for an instance $ x $:

    1. Convert $ x $ into its interpretable representation $ x' $.
    2. Sample $ N $ perturbations $ z'_i \in \{0,1\}^{d'} $, each obtained by uniformly toggling a random subset of features in $ x' $ off.
    3. Map each $ z'_i $ back to the original feature space to produce $ z_i $: for text, drop the corresponding tokens; for images, replace the masked superpixels with a baseline color; for tabular data, sample replacement values from the training distribution of each column.
    4. Query the black-box model to obtain $ f(z_i) $ for every perturbed sample.
    5. Compute proximity weights $ \pi_x(z_i) $.
    6. Fit a sparse weighted linear model $ g $ on the dataset $ \{(z'_i, f(z_i), \pi_x(z_i))\}_{i=1}^N $, selecting at most $ K $ features.
    7. Return the coefficients of $ g $ as the explanation.

    Typical settings are $ N \in [1000, 5000] $ samples and $ K \in [5, 15] $ features. The number of model queries scales linearly with $ N $, which is the dominant cost when the black-box model is expensive to evaluate.

    Variants

    Several extensions of LIME address limitations of the original formulation. SP-LIME (submodular pick LIME) selects a small set of representative instances so that their explanations together cover the most important features used by the model globally; it casts the selection as a submodular maximization problem solved by a greedy algorithm. Anchors replaces the linear surrogate with high-precision IF-THEN rules that hold with a user-specified confidence in the local neighborhood; this gives sharper guarantees but is more expensive to compute. KernelSHAP reframes LIME's loss with a specific kernel and regularization that make the resulting coefficients equal to Shapley Values, unifying LIME with cooperative game-theoretic Feature Attribution under a single estimator. ALIME and LIME-SUP propose deterministic neighborhood construction or supervised partitioning to reduce the variance of explanations across runs.

    Comparison with related methods

    LIME sits in a broader family of post-hoc, instance-level Feature Attribution methods. Compared with gradient-based methods such as Saliency Maps or Integrated Gradients, LIME does not require access to model internals or differentiability, which lets it explain non-differentiable models like random forests, but at the cost of needing many forward passes per explanation. Compared with Shapley Values computed exactly, LIME is much cheaper but its coefficients are biased by the choice of kernel and sampling distribution and do not satisfy the additivity axiom that defines Shapley attributions. Compared with global surrogates such as Decision Tree distillation, LIME provides finer-grained, instance-specific explanations but cannot summarize the model as a whole.

    Limitations

    LIME explanations can be unstable: because perturbations are sampled randomly and the surrogate is refit each time, two runs on the same instance may return different feature sets, especially when $ N $ is small or features are highly correlated. The choice of kernel width $ \sigma $ implicitly defines what counts as the local neighborhood and strongly influences which features appear in the explanation; there is no single principled value, and small changes in $ \sigma $ can flip the sign of attributions. Sampling perturbations from a uniform binary distribution can produce inputs far from the data manifold (for example, images with random patches blanked out), and the black-box model's behavior on these out-of-distribution points may not reflect its behavior on realistic inputs. LIME has also been shown to be vulnerable to adversarial manipulation: an attacker who controls the model can construct a classifier that looks fair under LIME explanations while actually relying on protected attributes, exploiting the fact that LIME queries off-manifold points.

    Practical considerations

    In practice, users should report the random seed and number of samples used, average over several runs to reduce variance, and prefer larger $ N $ for high-dimensional inputs. For text and image explanations, the choice of perturbation strategy (token deletion vs. replacement, superpixel masking baseline) materially changes the resulting attributions and should be documented. When fidelity is critical, Shapley Values or Anchors offer stronger theoretical guarantees, while LIME remains attractive as a fast first pass that produces human-readable, sparse explanations on arbitrary models.

    References

    [1] [2] [3] [4]

    1. Template:Cite arxiv
    2. Template:Cite arxiv
    3. Template:Cite arxiv
    4. Slack, D., Hilgard, S., Jia, E., Singh, S., and Lakkaraju, H. Fooling LIME and SHAP: Adversarial Attacks on Post Hoc Explanation Methods, AIES 2020.