Embeddings in Natural Language Processing Theory and Advances in Vector Representation of Meaning

By Mohammad Taher Pilehvar et al
Read the original document by opening this link in a new tab.

Table of Contents

1 Introduction
1.1 Semantic representation
1.2 One-hot representation
1.3 Vector Space Models
1.4 The Evolution Path of representations
1.5 Coverage of the book
1.6 Outline
2 Background
2.1 Natural Language Processing Fundamentals
2.1.1 Linguistic fundamentals
2.1.2 Language models
2.2 Deep Learning for NLP
2.2.1 Sequence encoding
2.2.2 Recurrent neural networks
2.2.3 Transformers
2.3 Knowledge Resources
2.3.1 WordNet
2.3.2 Wikipedia, Freebase, Wikidata and DBpedia
2.3.3 BabelNet and ConceptNet
2.3.4 PPDB: The Paraphrase Database
3 Word Embeddings
3.1 Count-based models
3.1.1 Pointwise Mutual Information
3.1.2 Dimensionality reduction
3.2 Predictive models
3.3 Character embedding
3.4 Knowledge-enhanced word embeddings
3.5 Cross-lingual word embeddings
3.5.1 Sentence-level supervision
3.5.2 Document-level supervision
3.5.3 Word-level supervision
3.5.4 Unsupervised
3.6 Evaluation
3.6.1 Intrinsic Evaluation
3.6.2 Extrinsic Evaluation
4 Graph Embeddings
4.1 Node embedding
4.1.1 Matrix factorization methods
4.1.2 Random Walk methods
4.1.3 Incorporating node attributes
4.1.4 Graph Neural Network methods
4.2 Knowledge-based relation embeddings
4.3 Unsupervised relation embeddings
4.4 Applications and Evaluation
4.4.1 Node embedding
4.4.2 Relation embedding
5 Sense Embeddings
5.1 Unsupervised sense embeddings
5.1.1 Sense Representations Exploiting Monolingual Corpora
5.1.2 Sense Representations Exploiting Multilingual Corpora
5.2 Knowledge-based sense embeddings
5.3 Evaluation and Application
6 Contextualized Embeddings
6.1 The need for contextualization
6.2 Background: Transformer model
6.2.1 Self-attention
6.2.2 Encoder
6.2.3 Decoder
6.2.4 Positional encoding
6.3 Contextualized word embeddings
6.3.1 Earlier methods
6.3.2 Language models for word representation
6.3.3 RNN-based models
6.4 Transformer-based Models: BERT
6.4.1 Masked Language Modeling
6.4.2 Next Sentence Prediction
6.4.3 Training
6.5 Extensions
6.5.1 Translation language modeling
6.5.2 Context fragmentation
6.5.3 Permutation language modeling
6.5.4 Reducing model size
6.6 Feature extraction and fine-tuning

Summary

Embeddings have been one of the dominating buzzwords since the early 2010s for Natural Language Processing (NLP). Encoding information into a low-dimensional vector representation, which is easily integrable in modern machine learning algorithms, has played a central role in the development in NLP. Embedding techniques initially focused on words but the attention soon started to shift to other forms: from graph structures, such as knowledge bases, to other types of textual content, such as sentences and documents. This book provides a high level synthesis of the main embedding techniques in NLP, in the broad sense. The book starts by explaining conventional word vector space models and word embeddings (e.g., Word2Vec and GloVe) and then moves to other types of embeddings, such as word sense, sentence and document, and graph embeddings. We also provide an overview on the status of the recent development in contextualized representations (e.g., ELMo, BERT) and explain their potential in NLP. Throughout the book the reader can find both essential information for understanding a certain topic from scratch, and an in-breadth overview of the most successful techniques developed in the literature. KEYWORDS Natural Language Processing, Embeddings, Semantics
×
This is where the content will go.