当前位置：首页 > news >正文

Large Language Models(LLMs) Concepts

news 2025/7/19 7:38:19

1、Introduction to Large Language Models(LLM)

1.1、Definition of LLMs

Large: Training data and resources.
Language: Human-like text.
Models: Learn complex patterns using text data.

The LLM is considered the defining moment in the history of AI.

Some applications:

Sentiment analysis
Identifying themes
Translating text or speech
Generating code
Next-word prediction

1.2、Real-world application

Transforming finance industry:

[Investment outlook] | [Annual reports] | [News articles] | [Social media posts]--> LLM[Market analysis] | [Portfolio management] [Investment opportunities]

Revolutionizing healthcare sector:

- Analyze patient data to offer personalized recommendations.- Must adhere to privacy laws.

Education:

- Personalized coaching and feedback.- Interactive learning experience.- AI-powered tutor:- Ask questions.- Receive guidance.- Discuss ideas.

Visual question answering:

Defining multimodel:Multimodel:
- Many types of processing or generationNun-multimodel:
- One type of processing or generationVisual question answering:
- Answers to questions about visual content
- Object identification & relationships
- Scene description

1.3、Challenges of language modeling

Sequence matters
Context modeling
Long-range dependency
Single-task learning

2、Building Blocks of LLMs

2.1、Novelty of LLMs

Overcome data's unstructured nature
Outperform traditional models
Understand linguistic subteties

The bulding blocks show below:

2.2、Generalized overview of NLP

2.2.1、Text Pre-processing

Can be done in a different order as they are independent.

Tokenization: Splits text into individual words, or tokens.
Stop word removal: Stop words do not add meaning.
Lemmatization: Group slightly different words with similar meaning so we can reduce words to their basic form. For example, we can map them to their root word.

2.2.2、Text Representation

Text data into numerical form.

Bag-of-words:

Limitation:- Does not capture the order or context.- Does not capture the semantics between the words.

Word embeddings:

2.3、Fine-tuning

Fine-tuning:
- Addresses some of these challenges.
- Adapts a pre-trained model.Pre-trained model:
- Learned from general-purpose datasets.
- Not optimized for specific-tasks.
- Can be fine-tuned for a specific problem.

2.4、Learning techniques

N-shot learning: zero-shot, few-show, and multi-shot.

2.4.1、Zero-shot learning

No explicit training.
Uses language understanding and context.
Generalizes without any prior examples.

2.4.2、Few-shot learning

Learn a new task with a few examples.

2.4.3、Multi-shot learning

Requires more examples than few-shot.

3、Training Methodology and Techniques

3.1、Building blocks to train LLMs

3.1.1、Generative pre-training

Trained using generative pre-training- Input data of text tokens.- Trained to predict the tokens within the dataset.Types:- Next word prediction.- Masked language modeling.

3.1.2、Next word prediction

Supervised learning technique.
Predicts next word and generates coherent text.
Captures the dependencies between words.
Training data consist of pairs of input and output examples.

3.1.3、Masked language modeling

Hides a selective word.
Trained model predicts the masked word.

3.2、Introducing the transformer

3.2.1、Transformer architecture

Relationship between words.
Components: Pre-processing, Positional Encoding, Encoders, and Decoders.

3.2.2、Inside the transformer

(1) Text pre-processing and representation:

Text preprocessing: tokenization, stop word removal, lemmatization.
Text representation: word embedding.

(2) Positional encoding:

Information on the position of each word.
Understand distant words.

(3) Encoders:

Attention mechanism: directs attention to specific words and relationships.
Neural network: process specific features.

(4) Decoders:

Includes attention and neural networks.
Generates the output.

3.2.3、Transformers and long-range dependencies

Initial challenge: lone-range dependency.
Attention: focus on different parts of the input.

3.2.4、Processes multiple parts simultaneously

Limitation of traditional language models: Sequential - one word at a time.
Transformers: Process multiple parts simultaneously (Faster processing).

3.3、Attention mechanisms

3.3.1、Attention mechanisms

Understand complex structures.
Focus on important words.

3.3.2、Two primary types: Slef-attention and multi-head attention

For example:

3.4、Advanced fine-tuning

3.4.1、LLM training three steps:

Pre-training：
Fine-tuning:
RLHF:
（1）Why RLHF?

（2）Starts with the need to fine-tune

3.4.2、Simplifying RLHF

Model output reviewed by human.
Updates model based on the feedback.

Step1:

Receives a prompt.
Generates multiple responses.

Step2:

Human expert checks these responses.
Ranks the responses based on quality: Accuracy、Relevance、Coherence.

Step3:

Learns from expert's ranking.
To align its response in future with their preferences.

And it goes on:

Continues to generate responses.
Receives expert's rankings.
Adjusts the learning.

3.4.3、Recap

4、Concerns and Considerations

4.1、Data concerns and considerations

Data volume and compute power.
Data quality.
Labeling.
Bias.
Privacy.

4.1.1、Data volume and compute power

LLMs need a lot of data.
Extensive computing power.
Can cost millions of dollars.

4.1.2、Data quality

Quality data is essential.

4.1.3、Labeled data

Correct data label.
Labor-intensive.
Incorrect labels impact model performance.
Address errors: identify >>> analyze >>> iterate.

4.1.4、Data bias

Influenced by societal stereotypes.
Lack of diversity in training data.
Discrimination and unfair outcomes.

Spot and deal with the biased data:

Evaluate data imbalances.
Promote diversity.
Bias mitigation techniques: more diverse examples.

4.1.5、Data privacy

Compliance with data protection and privacy regulations.
Sensitive or personally identifiable information (PII).
Privacy is a concern.
Get permission.

4.2、Ethical and environmental concerns

4.2.1、Ethical concerns

Transparency risk - Challenging to understand the output.
Accountavility risk - Responsibility of LLMs' actions.
Information hazards - Disseminating harmful information.

4.2.2、Environmental concerns

Ecological footprint of LLMs.
Substantial energy resources to train.
Impact through carbon emissions.

4.3、Where are LLMs heading?

Model explainability.
Efficiency.
Unsupervised bias handling.
Enhanced creativity.

http://www.mrgr.cn/news/17815.html

相关文章：

变压器电压调节

2024年互联网公司时薪排行榜大曝光！看完我酸了，第一竟是他…

K 站中转内最便宜的航班

[CTF]-Pwn：做题笔记

Lazada商家必看：如何高效利用自养号进行产品测评

深入理解Go语言中的Interface：灵活而强大的类型系统

行为型设计模式-迭代器（Iterator）模式-python实现

【机器学习入门】一文读懂线性支持向量机SVM

Java中的String与StringBuilder详解

5年数据观巨变，这家公司如何在AI和大模型数据赛道遥遥领先？

Redis 的内存淘汰策略详解

101.SAP MII功能详解（15）Workbench-Transaction Logic（Iterator）

【路径规划】移动机器人路径规划算法的实现

VUE 实现三级权限选中与全选

HMI触屏网关-VISION如何与Modbus TCP从机通信

深度干货 | 以NDR为主线，深度解析纷享销客融资背后的经营与价值

前端Flex布局常见的几个问题

中资优配：白马股跌出性价比基金经理公开唱多

计算机毕业设计选题推荐-办公楼物业管理系统-Java/Python项目实战

docker 介绍以及常用命令