Author
Published on
Jan 31, 2023

Outline: 1. Definition of Perplexity
2. Perplexity in AI
2.1. Generative Language model
2.2. Discriminative Language model
2.3. Example of perplexity in AI
2.4. Calculate perplexity in AI
3. Perplexity in Humans
3.1. Example of perplexity in Humans
3.2. Calculate perplexity in Humans
4. Difference between perplexity in AI and perplexity in Humans
5. Summary
What is perplexity?
Perplexity is a useful metric to evaluate models in Natural Language Processing (NLP); perplexity is a metric used to judge how good a language model is.
Perplexity in AI
In AI, perplexity is a measure of how well a language model predicts a sample of text. It is used to evaluate the performance of the model and compare it to other models. Perplexity is defined as 2 to the power of the cross-entropy of the model, where the cross-entropy is a measure of the difference between the predicted probabilities and the true probabilities of the sample. A lower perplexity score indicates that the model is better at predicting the sample, while a higher perplexity score indicates that the model is worse at predicting the sample. The perplexity is commonly used to evaluate language models, but it can also be used to evaluate other types of models.
A language model is a type of AI model that is trained to predict the next word in a sequence of words, based on the context of the words that come before it. Language models are commonly used in natural language processing (NLP) tasks, such as text generation, machine translation, and speech recognition. Language models are trained on large amounts of text data, such as books, articles, and websites, and use this data to learn the patterns and structures of human language. Once trained, the model can generate new text that is similar to the text it was trained on, or predict the next word in a given sentence.
Language models can be divided into two main categories:
Generative Language Models: which can generate new text that is similar to the text it was trained on.
Discriminative Language Models: which predict the next word in a given sentence, based on the context of the words that come before it.
Popular models include: RNN, LSTM and Transformer architectures, such as GPT-2 and BERT.
What are Generative Language Models?
Generative Language Models are a type of AI model that can generate new text that is similar to the text it was trained on. They are trained on large amounts of text data, such as books, articles, and websites, and use this data to learn the patterns and structures of human language. Once trained, the model can generate new text that is similar to the text it was trained on.
There are different types of Generative Language Models, but some of the most popular are:
Recurrent Neural Networks (RNN): These models process input sequences word by word, and use the context of the previous words to predict the next word.
Long Short-Term Memory (LSTM): These models are a type of RNN that can better handle long-term dependencies, which makes them more suitable for generating coherent text.
Transformer: These models are a type of neural network architecture that uses attention mechanism to better handle long-term dependencies. They are well suited for language modeling and have been used to train models such as GPT-2 and BERT.
Generative Language Models have been used in a variety of NLP applications such as text generation, machine translation, summarization, and text completion. They have been used to generate creative writing, news articles, poetry and even code.
It's worth noting that Generative Language Models can sometimes produce biased and even offensive text, as they can inadvertently learn biases and stereotypes present in the training data. Therefore, it's important to evaluate and monitor their outputs.
The formula for a generative language model can vary depending on the specific model architecture and the type of data being modeled. However, most generative language models are based on the idea of estimating the probability of a sequence of words.
A common formula used in many generative language models is the following:
P(w1,w2,w3, ..., wn) = P(w1) * P(w2|w1) * P(w3|w1,w2) * ... * P(wn|w1,w2,...,wn-1)
Where:
P(wi) is the probability of the i-th word in the sequence; and
P(wi|w1,w2,...,wi-1) is the conditional probability of the i-th word given the previous words in the sequence;
This formula states that the probability of a sequence of words is the product of the probability of each word given the context of the previous words.
For example, in a simple unigram language model, which only considers the probability of each word, the formula for the generative model would be:
P(w1,w2,w3, ..., wn) = P(w1) * P(w2) * P(w3) * ... * P(wn)
In a more complex model like LSTM, the model estimates the probability of the next word in a sequence, given the context of the previous words. The model uses a recurrent neural network architecture and the formula used to calculate the probability of the next word will be based on the specific architecture and the training data of the model.
In any case, the main idea behind a generative language model is to estimate the probability of a sequence of words, based on the patterns and structures of human language it learned from the training data.
What are Discriminative Language Models?
Discriminative Language Models are a type of AI model that predict the next word in a given sentence, based on the context of the words that come before it. These models are trained to predict the correct next word given a context, rather than generate new text. They are commonly used in natural language processing (NLP) tasks such as text classification, machine translation, and speech recognition.
Discriminative Language Models are trained on a labeled dataset, where the input is a context and the output is the correct next word. They use this training data to learn the relationship between the context and the correct next word.
Some examples of Discriminative Language Models are:
Maximum Entropy Language Model: These models are based on the principle of maximum entropy, and they estimate the probability of the next word given a context.
Conditional Random Field (CRF): These models are a type of discriminative model that can be used for sequence labeling tasks, such as named entity recognition.
Neural Network-based Language Models: such as feed-forward neural network and recurrent neural network (RNN) based language models.
Discriminative Language Models are generally considered to be more efficient than generative models and are commonly used in applications where computational resources are limited. They are also considered to be more robust to errors and biases in the training data, which makes them a good choice for NLP applications.
The formula for a discriminative language model can vary depending on the specific model architecture and the type of data being modeled. However, most discriminative language models are based on the idea of estimating the probability of a next word given a context.
A common formula used in many discriminative language models is the following:
P(wi|w1,w2,...,wi-1) = f(w1,w2,...,wi-1, wi)
Where:
P(wi|w1,w2,...,wi-1) is the conditional probability of the i-th word given the context of the previous words; and
f is a function that takes the context and the current word as input and outputs the probability of the current word given the context;
This formula states that the probability of the next word depends on the context of the previous words.
For example, in a simple feed-forward neural network based language model, the formula for the discriminative model would be:
P(wi|w1,w2,...,wi-1) = softmax(W*h+b)
Where:
h is the hidden state;
w is the weight matrix;
b is the bias term; and
softmax is an activation function that maps the input to a probability distribution over the vocabulary;
In a more complex model like CRF, the model estimates the conditional probability of a sequence of words, given the context of the previous words and the training data, and it is based on the principle of maximum entropy.
In any case, the main idea behind a discriminative language model is to estimate the probability of the next word given the context of the previous words, based on the patterns and structures of human language it learned from the training data, different from Generative Language Models, which generate new text based on the patterns and structures learned from the training data.
Example of perplexity in AI?
As an example of perplexity in AI would be using a trained language model to evaluate a set of sentences from a novel. The model would predict the probability of each word in the sentences, and the perplexity score would be calculated based on the difference between the predicted probabilities and the true probabilities of the words.
For instance, if a language model is trained on a dataset of news articles and then evaluated on a dataset of poetry, it will have a higher perplexity score because the structure and style of poetry is different from the news articles. This means that the model is less able to predict the words in the poetry dataset as it is less familiar with that type of language. On the other hand, if the same model is evaluated on a dataset of news articles, it will have a lower perplexity score because it is more familiar with that type of language.
Another example could be a language model trained on a specific domain like legal documents and then it's perplexity is evaluated on a dataset of scientific papers; the perplexity score would be higher as the language and structure used in scientific papers is different from legal documents.
Another example of perplexity in AI is when evaluating the performance of a language model on a sample of text. Let's say we have trained a language model on a dataset of news articles and now we want to evaluate how well it predicts the next word in a new article. We calculate the perplexity of the model on this new article by feeding it one word at a time and having it predict the next word. After the model, has made all its predictions, we calculate the cross-entropy of the predicted probabilities and the true probabilities of the sample, and then take the perplexity to be 2 to the power of the cross-entropy.
For example, let's say the model predicts the following sentence:
"The cat sat on the mat and ____"
The model's prediction for the next word is "slept".
If the next word in the sample text is "slept", the perplexity will be low, indicating that the model is good at predicting the next word. If the next word in the sample text is "danced", the perplexity will be high, indicating that the model is not good at predicting the next word.
The lower the perplexity, the better the model is at predicting the next word, and the higher the perplexity, the worse the model is at predicting the next word.
It is worth noting that perplexity is not the only way to evaluate language models, other evaluation techniques like BLEU, ROUGE and METEOR can also be used, but perplexity is the most common way to evaluate language models.
How does one calculate the perplexity in AI?
Perplexity in AI can be calculated using the following algorithm:
Perplexity = 2^(Cross-Entropy)
Where Cross-Entropy is calculated as:
Cross-Entropy = -(1/N) * ∑(log(P(w_i))
Where:
N is the total number of words in the sample;
P(w_i) is the predicted probability of the i-th word in the sample; and
log is the natural logarithm;
The cross-entropy is a measure of the difference between the predicted probabilities and the true probabilities of the sample, and it is calculated as the negative average of the logarithm of the predicted probabilities of the words in the sample. The lower the cross-entropy, the better the model is at predicting the sample.
The perplexity is defined as 2 to the power of the cross-entropy, which is a way to adjust the scale of the cross-entropy to make it more interpretable. A lower perplexity score indicates that the model is better at predicting the sample, while a higher perplexity score indicates that the model is worse at predicting the sample.
It is worth noting that perplexity is not the only way to evaluate language models, other evaluation techniques like BLEU, ROUGE, and METEOR can also be used, but perplexity is the most common way to evaluate language models.
It's also important to mention that the formula for perplexity calculation can vary based on the specific implementation of the data set used.
Perplexity in Humans
In humans, perplexity can refer to a state of confusion or uncertainty. It can also refer to the degree of difficulty or unfamiliarity a person might experience when trying to understand something. For example, a person might feel perplexed when trying to understand a complex mathematical equation or when trying to navigate a new city without a map. Perplexity in humans can also be referred to as cognitive perplexity, which is the degree of difficulty or unfamiliarity that an individual experiences when trying to understand or make sense of something. It can be caused by a lack of knowledge, conflicting information, or an inability to see the connections between different pieces of information. It is a normal part of the learning process and can be overcome with more information, practice, or experience.
Example of perplexity in Humans?
As an example of perplexity in humans could be a situation where an individual is presented with a complex problem to solve. For instance, imagine a person who is trying to fix a malfunctioning piece of equipment but is not familiar with how it works. They may experience confusion and uncertainty as they try to understand the problem and come up with a solution. This person may feel perplexed by the complexity of the equipment, the lack of understanding of how it works, and the lack of information or resources available.
Another example could be a student who is struggling to understand a new concept in a difficult class, such as quantum physics. The student may feel perplexed by the abstract and unfamiliar ideas, and the difficulty of the mathematical equations used to describe them. They may feel overwhelmed and unsure of how to proceed.
In both examples, the perplexity can be overcome through more information, practice, or experience, such as getting help from experts, studying relevant materials, and getting hands-on experience with the equipment or problem. In general, perplexity in humans can be seen as a normal part of the learning process, and it is not necessarily a bad thing, as it can indicate a challenge to be overcome.
How does one calculate the perplexity in Humans?
It is important to note that there is no specific formula to calculate perplexity in humans as it is a feeling or emotion rather than a numerical value. Perplexity in humans refers to a state of confusion or uncertainty and the degree of difficulty or unfamiliarity a person might experience when trying to understand something. It is subjective and can vary from person to person. It is not a measurable quantity and can't be quantified with a formula.
Perplexity in humans can be caused by a lack of knowledge, conflicting information, or an inability to see the connections between different pieces of information. It can be overcome through more information, practice, or experience, such as getting help from experts, studying relevant materials, and getting hands-on experience with the problem or situation.
Perplexity in AI, on the other hand, is a numerical value used to evaluate the performance of a language model. It is calculated by comparing the predicted probabilities of a model to the true probabilities of a sample of text. It is a measurable quantity and can be quantified with a formula.
What’s the main difference between perplexity in AI and perplexity in humans?
Perplexity in AI and perplexity in humans refer to different things, even though they share the same name.
Perplexity in AI is a measure of how well a language model predicts a sample of text. It is used to evaluate the performance of the model and compare it to other models. It is a numerical value calculated by comparing the predicted probabilities of a model to the true probabilities of a sample of text. A lower perplexity score indicates that the model is better at predicting the sample, while a higher perplexity score indicates that the model is worse at predicting the sample.
Perplexity in humans, on the other hand, refers to a state of confusion or uncertainty. It can also refer to the degree of difficulty or unfamiliarity a person might experience when trying to understand something. It is a feeling or emotion that a person experiences when they are presented with complex or unfamiliar information or situations. It can be caused by a lack of knowledge, conflicting information, or an inability to see the connections between different pieces of information.
In summary, perplexity in AI is a numerical value used to evaluate the performance of a language model, while perplexity in humans is a feeling or emotion that a person experiences when trying to understand something. They are related concepts but they are used in different contexts, AI and Human cognition respectively.
Summary
Perplexity is a concept that is used in both AI and humans, although it refers to different things in each context.
In AI, perplexity is a measure of how well a language model predicts a sample of text. It is used to evaluate the performance of the model and compare it to other models. Perplexity is calculated by feeding the model one word at a time and having it predict the next word. After the model, has made all its predictions, we calculate the cross-entropy of the predicted probabilities and the true probabilities of the sample. The cross-entropy is a measure of the difference between the predicted probabilities and the true probabilities. The lower the cross-entropy, the better the model is at predicting the sample. The perplexity is defined as 2 to the power of the cross-entropy. A lower perplexity score indicates that the model is better at predicting the sample, while a higher perplexity score indicates that the model is worse at predicting the sample.
In humans, perplexity refers to a state of confusion or uncertainty. It can also refer to the degree of difficulty or unfamiliarity a person might experience when trying to understand something. It is a feeling or emotion that a person experiences when they are presented with complex or unfamiliar information or situations. It can be caused by a lack of knowledge, conflicting information, or an inability to see the connections between different pieces of information. It is a normal part of the learning process, and it can be overcome with more information, practice, or experience.