当前位置: 首页 > news >正文

Large Language Models(LLMs) Concepts

1、Introduction to Large Language Models(LLM)

1.1、Definition of LLMs

  • Large: Training data and resources.
  • Language: Human-like text.
  • Models: Learn complex patterns using text data.

The LLM is considered the defining moment in the history of AI.

Some applications:

  • Sentiment analysis
  • Identifying themes
  • Translating text or speech
  • Generating code
  • Next-word prediction

1.2、Real-world application

  • Transforming finance industry: 
    [Investment outlook] | [Annual reports] | [News articles] | [Social media posts]--> LLM[Market analysis] | [Portfolio management] [Investment opportunities]

  • Revolutionizing healthcare sector:
    - Analyze patient data to offer personalized recommendations.- Must adhere to privacy laws.

  • Education:
    - Personalized coaching and feedback.- Interactive learning experience.- AI-powered tutor:- Ask questions.- Receive guidance.- Discuss ideas.

  • Visual question answering:
    Defining multimodel:Multimodel:
    - Many types of processing or generationNun-multimodel:
    - One type of processing or generationVisual question answering:
    - Answers to questions about visual content
    - Object identification & relationships
    - Scene description

1.3、Challenges of language modeling

  • Sequence matters
  • Context modeling
  • Long-range dependency
  • Single-task learning

2、Building Blocks of LLMs

2.1、Novelty of LLMs

  • Overcome data's unstructured nature
  • Outperform traditional models
  • Understand linguistic subteties

The bulding blocks show below:

2.2、Generalized overview of NLP

2.2.1、Text Pre-processing

Can be done in a different order as they are independent.

  • Tokenization: Splits text into individual words, or tokens.

  • Stop word removal: Stop words do not add meaning.

  • Lemmatization: Group slightly different words with similar meaning so we can reduce words to their basic form. For example, we can map them to their root word.

2.2.2、Text Representation

  • Text data into numerical form.
  • Bag-of-words:

     
    Limitation:- Does not capture the order or context.- Does not capture the semantics between the words.

  • Word embeddings:

2.3、Fine-tuning

Fine-tuning:
- Addresses some of these challenges.
- Adapts a pre-trained model.Pre-trained model:
- Learned from general-purpose datasets.
- Not optimized for specific-tasks.
- Can be fine-tuned for a specific problem.

2.4、Learning techniques

N-shot learning: zero-shot, few-show, and multi-shot.

2.4.1、Zero-shot learning

  • No explicit training.
  • Uses language understanding and context.
  • Generalizes without any prior examples.

2.4.2、Few-shot learning

  • Learn a new task with a few examples.

2.4.3、Multi-shot learning

  • Requires more examples than few-shot.

3、Training Methodology and Techniques

3.1、Building blocks to train LLMs

3.1.1、Generative pre-training

Trained using generative pre-training- Input data of text tokens.- Trained to predict the tokens within the dataset.Types:- Next word prediction.- Masked language modeling.

3.1.2、Next word prediction

  • Supervised learning technique.
  • Predicts next word and generates coherent text.
  • Captures the dependencies between words.
  • Training data consist of pairs of input and output examples.

3.1.3、Masked language modeling

  • Hides a selective word.
  • Trained model predicts the masked word.

3.2、Introducing the transformer

3.2.1、Transformer architecture

  • Relationship between words.
  • Components: Pre-processing, Positional Encoding, Encoders, and Decoders.

3.2.2、Inside the transformer

(1) Text pre-processing and representation:

  • Text preprocessing: tokenization, stop word removal, lemmatization.
  • Text representation: word embedding.

(2) Positional encoding:

  • Information on the position of each word.
  • Understand distant words.

(3) Encoders:

  • Attention mechanism: directs attention to specific words and relationships.
  • Neural network: process specific features.

(4) Decoders:

  • Includes attention and neural networks.
  • Generates the output.

3.2.3、Transformers and long-range dependencies

  • Initial challenge: lone-range dependency.
  • Attention: focus on different parts of the input.

3.2.4、Processes multiple parts simultaneously

  • Limitation of traditional language models: Sequential - one word at a time.
  • Transformers: Process multiple parts simultaneously (Faster processing).

3.3、Attention mechanisms

3.3.1、Attention mechanisms

  • Understand complex structures.
  • Focus on important words.

3.3.2、Two primary types: Slef-attention and multi-head attention

For example:

3.4、Advanced fine-tuning

3.4.1、LLM training three steps:

  • Pre-training:
  • Fine-tuning:
  • RLHF:
    (1)Why RLHF?

    (2)Starts with the need to fine-tune

3.4.2、Simplifying RLHF

  • Model output reviewed by human.
  • Updates model based on the feedback.

Step1:

  • Receives a prompt.
  • Generates multiple responses.

Step2:

  • Human expert checks these responses.
  • Ranks the responses based on quality: Accuracy、Relevance、Coherence.

Step3:

  • Learns from expert's ranking.
  • To align its response in future with their preferences.

And it goes on:

  • Continues to generate responses.
  • Receives expert's rankings.
  • Adjusts the learning.

3.4.3、Recap

4、Concerns and Considerations

4.1、Data concerns and considerations

  • Data volume and compute power.
  • Data quality.
  • Labeling.
  • Bias.
  • Privacy.

4.1.1、Data volume and compute power

  • LLMs need a lot of data.
  • Extensive computing power.
  • Can cost millions of dollars.

4.1.2、Data quality

  • Quality data is essential.

4.1.3、Labeled data

  • Correct data label.
  • Labor-intensive.
  • Incorrect labels impact model performance.
  • Address errors: identify >>> analyze >>> iterate.

4.1.4、Data bias

  • Influenced by societal stereotypes.
  • Lack of diversity in training data.
  • Discrimination and unfair outcomes.

Spot and deal with the biased data:

  • Evaluate data imbalances.
  • Promote diversity.
  • Bias mitigation techniques: more diverse examples.

4.1.5、Data privacy

  • Compliance with data protection and privacy regulations.
  • Sensitive or personally identifiable information (PII).
  • Privacy is a concern.
  • Get permission.

4.2、Ethical and environmental concerns

4.2.1、Ethical concerns

  • Transparency risk - Challenging to understand the output.
  • Accountavility risk - Responsibility of LLMs' actions.
  • Information hazards - Disseminating harmful information.

4.2.2、Environmental concerns

  • Ecological footprint of LLMs.
  • Substantial energy resources to train.
  • Impact through carbon emissions.

4.3、Where are LLMs heading?

  • Model explainability.
  • Efficiency.
  • Unsupervised bias handling.
  • Enhanced creativity.


http://www.mrgr.cn/news/17815.html

相关文章:

  • 变压器电压调节
  • 2024年互联网公司时薪排行榜大曝光!看完我酸了,第一竟是他…
  • K 站中转内最便宜的航班
  • [CTF]-Pwn:做题笔记
  • Lazada商家必看:如何高效利用自养号进行产品测评
  • 深入理解Go语言中的Interface:灵活而强大的类型系统
  • 行为型设计模式-迭代器(Iterator)模式-python实现
  • 【机器学习入门】一文读懂线性支持向量机SVM
  • Java中的String与StringBuilder详解
  • 5年数据观巨变,这家公司如何在AI和大模型数据赛道遥遥领先?
  • Redis 的内存淘汰策略详解
  • 101.SAP MII功能详解(15)Workbench-Transaction Logic(Iterator)
  • 【路径规划】移动机器人路径规划算法的实现
  • VUE 实现三级权限选中与全选
  • HMI触屏网关-VISION如何与Modbus TCP从机通信
  • 深度干货 | 以NDR为主线,深度解析纷享销客融资背后的经营与价值
  • 前端Flex布局常见的几个问题
  • 中资优配:白马股跌出性价比 基金经理公开唱多
  • 计算机毕业设计选题推荐-办公楼物业管理系统-Java/Python项目实战
  • docker 介绍以及常用命令