Introduction to Large Language Models

By Akshay Agarwal September 15, 2024

What are LLMs?

LLM stands for Large Language Models. They are advanced level AI models which can understand and generate human like text. LLMs can write letters, generate code, answer questions about anything and even talk to us, as if we are speaking with a human.

But what actually makes LLMs capable of doing all of these tasks? In this article we'll dive into details of LLMs along with their history and also discuss some concepts, to understand how LLMs work. I have written this article keeping both technical and non technical readers in mind.

Let's get started 🚀

For a second, lets remove the beginning L(Large) from LLM, and look at just Language Model, as Language Models are the foundation of LLMs.

Basics of Language Models

What is a Language Model?

It is basically an AI model which can predict the next word in a sequence of words. For example when we type a message on our smart phone and it suggests the next set of words, it is a kind of language model powering it in the backend. It is a very basic form of language model at work.

Language models are created by first training them on vast amount of texual data. During training they learn about the different patterns and structures of language, which enables them to generate text, which would be both gramatically correct and relevant to the context.

When such models become more advanced or "LARGE", they can even handle various complicated tasks, such as translating languages, summarizing really long articles, generating code or even answering tough questions based on a given text. And when they get advanced to this level, they are called Large Language Models(LLMs). Lets learn the details of LLMs.

From words to sentences

Predicting the upcoming word may sound very simple, but LLMs actually do this by understanding the context in which the words are being used.

Lets understand with an example. Lets take a sentence "The cat sat on the". Any basic language model can predict that "mat" will be the next word, because it must have seen such phrases inside its training data. But LLMs go one step further by understanding the entire context of sentence.

Context is very important in any language. Because several words can have different meaning, based on the context in which they are used. A simple example is the word 'bank'. The word 'bank' may mean a financial institution or the bank of a river. When LLM reads 'bank' it checks the nearby words to understand and decide which meaning would be most appropriate. If the sentence is 'Jon Doe went to bank to withdraw money', the model knows that in this context, 'bank' refers to a financial body and not the bank of a river.

History of LLMs

When ChatGPT gained world wide popularity towards the end of 2022, you might have suddenly realised the power of LLMs.

But that was a very mature version of this technology. As the underlying LLMs have evolved over several decades. Lets look at this evolution journey of LLMs:

1950-1980s

This time marks the beginning of NLP research and exploration. Rule based systems like ELIZA around 1966, were used to mimic conversations. But such systems mainly relied on predefined scripts and lacked true language understanding.

1980s-1990s

Usage of probabilistic models like n-grams, to predict word sequences began at this time. Actually the core idea of using large datasets to learn patterns in a language, was itself introduced at this time.

2000s
During this time we saw the rise of neural networks. Neural networks gave us improved ability to process and generate text.

2017
The transformer architecture was introduced and it revolutionized NLP by giving different powers to LLMs, one of the crucial one being the capability to process text in parallel.

2018 till Present
OpenAI's GPT(Generative Pre-trained Transformer) series were launched. This included GPT1 in 2018, GPT2 in 2019, GPT3 in 2020, followed by GPT4 in 2023. Quite powerful models trained on billions of parameters. BERT model by Google also contributed to LLM's advancement.

Inner workings of LLMs

Foundation of LLMs - Training Data & Parameters:

The foundation of every LLM is the data it is trained on. Usually this data includes billions of words from a variety of sources including books, academic papers, social media posts, websites etc. When the model process such vast amount of data, it learns how words and sentences are structured and how the meaning is conveyed through certain structure of sentence.

During training the model doesn't only memorize phrases. It instead learn to recognize the patterns in data. For instance, from a sentence "Once upon a time" it learns a pattern, that it is usually followed by a narration about some events which occured in the past. And this pattern gets encoded in the model's parameters. So next time it sees a prompt "Once upon a time", it generates text in the context of something which happened before.

Parameters are the internal configurations of a model, which it adjusts during training, to improve predictions. The pattern of "once upon a time" discussed above, is an example of a parameter which gets encoded into model. The more the number of parameters, the more accurate is model's understanding of the language.

Concept of Tokens:

LLMs break down text into small units called tokens. And this token can be a character, a word, or even a subword. For instance, a sentence " Data is the new oil" might get tokenized as ["Data", "is", "the", "new", "oil"].

After tokenization, the model processes each token separately, by considering its position and relationship to other tokens in the sequence.

Using tokenization, the model is able to handle different languages effectively.

How LLMs process information?

At the heart of LLMs is a neural network architecture, specifically the Transformer. Neural networks are algorithms which are quite similar to the way human brains process information. They contain layers of interconnected nodes (neurons), which work together to analyze data and make predictions.

The Transformer architecture changed natural language processing by allowing models to process entire sequences of text simultaneously, rather than word by word. And this parallel processing was made possible by a mechanism called "attention mechanism". Attention mechanism helps the model focus on the most relevant parts of the text.

Lets take a look at Transformer Architecture below.

A bit deeper look into LLMs

Transformer Architecture:

Transformer architecture is made up of two main components:

1) Encoder - The encoder's job is to understand the meaning of the input text. It looks at the entier sentence at once and understands how each word relates to others. And by checking this relation, it tries to understand the context. And this is helpful for the model to understand what the sentence is trying to say. After this understanding it creates a special representation of the sentence, which is kind of encoded version. And this representation is further used for other tasks such as translation. And these context rich encodings are the ones which are fed to the next component called decoder.

2) Decoder - The decoder's main job is to take this understanding/representations from encoder and generate some output step by step, like translated version of input text, or an answer to question etc. To achieve this goal of producing something new, decoders uses two mechanisms -> 1) masked self attention and 2) cross attention.

Masked self attention means that when the model is generating words one by one, it is able to look only at the words it has already generated and not the ones which comes later. This is to make sure the model does not cheat by seeing future words before they have been generated.

Cross attention connects the decoder with the encoder's output to make sure the generated output is within the context. The encoder has already understood the input sentence and provided us the representation fo that input sentence. Using cross attention, decoder makes sure that as it is generating new words, it looks back at the encoder's encoded information, to keep a track of context and ensure that the generated output makes sense based on the input.

How LLMs are trained?

Training LLM can be a complex task, as it involves feeding the model with vast amount of text and make it predict the next word in a sequence. If the model's prediction is incorrect it uses a technique called back propagation. Using back propagation model adjusts its parameters, to improve its future predictions.

This process is repeated millions to even billions of times during training process. The model slowly learns the important patterns of language including grammar rules, words associations and even any idiomatic expressions in the text. For instance the model might learn that the phrase "bolt from the blue" is an idiom meaning "a complete surprise" instead of taking its literal meaning of a bolt coming from the blue sky.

And such complex training requires lot of computational resources, including specialized hardware such as GPUs (Graphics Processing Units) and TPUs (Tensor Processing Units) to handle enormous amount of data and complex calculations.

Real World Applications of LLMs

LLMs are changing several industries. To list just a few, in customer service chatbots using LLMs can handle complex questions, provide instantaneou response and can also learn from interaction with humans to improve over time. In healthcare, LLMs are used to analyze patient records, summarize medical literature and eventually diagnose patient conditions and recommend treatments.

But these are only very few examples and touches the tip of the iceberg. Because the possibilites of using LLMs are very vast.

Such as in creative fields, they are being used to generate music, videos, design video game narratives and even write scripts for movies and television shows. Their ability to mimic human creativity is really powerful.

Conclusion

I am sure I am not alone, who is very impressed by the power of LLMs. LLMs marks an impressive advancement in AI, with their capability to understand and generate human like content with accuracy. This article focused on mainly some basic details on how LLMs work.

Whether you are a non technical reader or someone with deep interest in AI, I hope this article was able to provide you with some fundamental understanding of LLMs, fuelling your interest to explore this exciting field further.

Search This Blog

Technical insight into Big Data, Cloud, Graph & Artificial Intelligence technologies