Natural Language Inference

Last updated on Nov 9, 2024 4 min read NLP

Image credit: John Moeses Bauan

What is Natural Language Inference?

Given two statements or sentences $S_1$ and $S_2$, the task of determining whether a hypothesis is true, given a premise. Consider the following examples with $S_1$ acting as the hypothesis and $S_2$ acting as the premise.

$S_1$	$S_2$	NLI Tag
Some men are playing a sport	22 male players are playing cricket	Entailment
Harry Potter hates all sports	All the students at Hogwarts except Hermione and some Ravenclaw boys love quidditch	Contradiction
Rory is the last centurion	Rose has been on the TARDIS	Neutral

NLI need not be a strictly logical conclusion. The task is loosely defined and if an average person proficient in the language can verify the premise given the hypothesis then, we say that they are entailed and so on for the other relation labels.

NLI thus seems necessary for what has been termed as the “Holy Grail of NLP”, Natural Language Understanding(NLU)

What makes NLI hard?

Consider the second example in the table given above. The mappings any system needs to do tp generate the right answer would be something like the following:

Harry Potter is a student in Hogwarts
Harry Potter is in Gryffindor
Harry Potter is not Hermione
Harry Potter is not in Ravenclaw
Qudditch is a sport
Harry Potter being a student in Hogwarts and Harry not being Hermione or a Ravenclaw boy must therefore like quidditch a sport.
But in the hypothesis, it states that he hates all sports, and thus hates qudditch

Only then can you identify the contradiction and clearly label it. Thus you not only need to get the semantic component right, but you must also pay attention at a pragmatic level and enhance your model or algorithm with ontologies, metonymy, hyper and hyponymy TODO this section

Not just this, but a variety of other cognitive processes such as math, and color abstraction is also necessary for a good NLI system. Consider the following sentence.

Two men and a woman were driving a teal car

Then the premise three people were driving a blueish vehicle is also true.

Applications of NLI

All semantic tasks will greatly benefit by NLI. But amongst those, some areas that are of particular significance are

Question Answering
Search and Information Retrieval
Automatic Summarization

NLI can also be used for paraphrase detection. Two sentences are termed as paraphrases of each other if $S_1$ entails $S_2$ and vice-versa. That is to say if $S_1$ can be inferred from $S_2$ and $S_2$ can be inferred from $S_1$, then they are semantically equivalent or similar and thus are paraphrases of each other.

Paraphrase detection has a very interesting application in the evaluation of Machine Translation(MT) systems¹. MT tasks are usually evaluated with metricsw like BLUE scores, but the issue with that is models often now optimize for the BLUE score itself and not for the underlying semantic content itself. TODO this section

How is NLI being done right now?

This particular post explores non neural approaches to Inference. Neural Architectures deserve a seperate blog post, and will be part 2 of this series.

Non Neural Apporaches

Bag Of Words

Here you try to map each word in the premise to a word in the hypothesis. You essentially have two bag of words and try to find the most similar pairs and assert if the meaning of the hypothesis words is subsumed in the meaning of the premise words. It is fairly robust and can deal with lexical dissimilarities like in the case of

Increased ⇒ Grows
Reported ⇒ Saw
Companies ⇒ Google

but in sentences like,

Ram killed Ravan

Ravan killed Ram

Advantages

Disadvantages

It gives the same results as the sentences have identical words. The theta or the semantic roles are ignored by this approach.
Another shortcoming of this approach is the handling of negatives. Luke I am your father and Luke I am not your father, are very similar but they convey two opposite things.
Quantifiers are another major issue here. Switching, most to every drastically changes the semantic meaning but both word still have very high similarity scores.

It is possible to mitigate some of the errors by giving it syntactic and semantic information for the action roles and the domains and constrictions of each word, but this makes the system brittle and very tedious to maintain and improve. Moreover it cannot generalize well to situations not encountered before. Inspire of all these issues for most general use cases it remains a decent baseline and can be used in a pinch.

Logic Approaches

There are mainly two types of logical inferences.

First order Logic
Natural Logic

In the first order logic you try to create axioms or rules. And using these rules you infer. But these formal approaches are not vert adaptive and cannot handle the complexitites and the intricasies:

References

MacCartney 2009

Padó et al 2009 ↩︎

Ujwal Narayan

SDE (ML)

My research interests include narrative understanding, applications of NLP over long documents, language theory, and exploring LLMs and making them more interpretable with a focus on factuality.