Towards Faithful and Explainable NLP

Bachelor Thesis, Master Thesis

Most recent state-of-the-art NLP models employing deep networks are inherently non-interpretable. Several explainability methods based on attentions and gradients have been used to make these models interpretable. However, these methods do not always correspond to human judgement and are often difficult to evaluate. In this thesis, our goal is to analyze the faithfulness of these explainability methods and develop a novel method for their evaluation such that it corresponds to human judgement.