An Introduction to LLM (Large Language Models)

The Advanced Technology Behind Machines’ Ability to Understand Instructions in Human Language

Mesayu Elida
3 min readSep 9, 2024

Knowledge of artificial intelligence (AI), especially as it relates to simple, publicly accessible chatbot interfaces like ChatGPT, has been a hot topic of conversation lately. However, many may still be wondering what exactly ChatGPT is.

In a nutshell, OpenAI’s ChatGPT, Google’s BERT, and other similar models are examples of what is known as a Large Language Model (LLM). This term may raise new questions, such as what is an LLM and how does it work? This article explains the concept and function of LLMs in detail.

Large Language Models, often abbreviated as LLM, is a type of artificial intelligence that uses multi-parameter neural network techniques to process and understand human language. These models are trained on very large amounts of data. LLMs have the basic ability to process natural language, known as NLP (Natural Language Processing).

NLP is very important because it enables computers and other digital devices to perform various tasks related to text and speech generation. As the name suggests, Large Language Models (LLM) have the ability to capture complex relationships in text and generate text using the semantics and syntax of any language we want. How exciting is that?

This model was created as a breakthrough that answers the challenges of technological development, where eventually “machines” can recognize and understand human speech and respond in the form of text and human speech as well. We all know that language has various limitations, such as the ambiguity of word meanings, accent variations in pronunciation, grammatical and spelling errors, the expression of emotions in sentences, and so on. However, with the supervised learning technique implemented in LLM, the model is able to capture the language context and translate it into a form that can be processed by the computer to perform its task.

The majority of data utilized for LLM training is derived from a multitude of sources, including books, news articles, encyclopedias, websites, discussion forums, social media platforms, and all publicly accessible texts on the Internet. Once the data has been collected, the subsequent stage is data cleaning and filtering, which ensures that the content meets the requisite quality standards. The objective of this process is to eliminate data that is irrelevant, inaccurate, or in violation of established ethical standards. The data that has been cleaned and filtered is then employed to train the LLM model, with the objective of producing results that are more accurate and relevant.

The training process employed for LLM is founded upon deep learning techniques, which leverage neural network architecture to facilitate the processing and comprehension of human language. Furthermore, the process employs a self-supervised learning approach, wherein the machine is provided with input devoid of explicit labels and is tasked with predicting each component of the input based on previously observed elements. The use of self-supervised learning enables LLM to enhance its comprehension and performance continuously, as more training data becomes available.

LLM Architecture
The LLM architecture typically comprises numerous layers based on Transformers. Transformer is a neural network architecture developed by Google in 2017. In the Transformer architecture, text is segmented into numerical representations, termed tokens. Each token is then transformed into a vector in feature space, which is subsequently fed as input into the transformer layers for further processing.

image: Transformer (deep learning architecture) — Wikipedia

The applications of large language models (LLMs) are numerous and diverse, including:

1. Code Generation: Large language models (LLMs) can facilitate the compilation of programs that necessitate a particular understanding of a specific programming language.

2. Code Debugging and Documentation: In addition to code compilation, LLM can facilitate the debugging of code and the creation of related documentation.

3. Answering Questions: As an artificial intelligence system, LLM is capable of providing responses to both factual and creative inquiries.

4. Language Transfer: LLM is capable of translating text from one language to another and correcting any grammatical errors that may be present in the text.

The applications of LLM are not limited to the above examples; there are numerous other potential uses for this technology in a variety of tasks. By acquiring the capacity to compose AI commands in a creative manner, otherwise referred to as AI prompts, LLM can be deployed not only to assist with individual tasks but also for industrial and corporate applications.

--

--