HinDisSent

Photo by Kerensa Pickett on Unsplash

Learning effective general-purpose sentence repsentations is an important core task in NLP. Existing methods for require very large amountsof data, or extensive manual annotation. Ini this project, we apply a method inspired by Dissent, Nieet al.(2017) to build sentence representationsfor Hindi sentences that performs comparablyto existing methods, using much less data. The method leverages explicit discourse relations toset up a discourse marker prediction task. This task is used to train a BiLSTM based sentenceencoder model. The model is evaluated on a transfer task namely sentiment analysis and the results are compared to existing sentence embedding methods.

Avatar
Ujwal Narayan
SDE (ML)

My research interests include narrative understanding, applications of NLP over long documents, language theory, and exploring LLMs and making them more interpretable with a focus on factuality.

Related