ChatGPT原理介绍：从语言模型走近ChatGPT_chatGPT概念介绍资源-CSDN文库

共3个文件

txt：1个

png：1个

pdf：1个

语言模型

需积分: 5 196 浏览量 2024-01-12 15:50:02 上传评论收藏 897KB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

ChatGPT原理介绍：从语言模型走近ChatGPT.zip （3个子文件）

ChatGPT原理介绍：从语言模型走近chatgpt

公众号名片.png 39KB

资源说明.txt 119B

ChatGPT原理介绍：从语言模型走近chatgpt.pdf 933KB

ChatGPT 原理介绍：从语言模型走近 chatgpt

有些东西过时了就是过时了；有些东西看起来过时，实际

上却没有；有些东西看起来风风火火，但其实早就在下山

了。

风风火火的 chatGPT 出来有一阵子了，本文主要是从基础的语言模型

开始回顾一下原理，理解一下和 chatgpt 的异同，也讨论一下他的应用

和影响。

相比之前的 bert，这次看到学术/工业界对其的原理分析都较少，但投

资界很热闹，有趣。

本文从原理（Openai、google、prompt、LLM、chat 和 chatgpt）、

复现、应用三大方向进行介绍，侧重前两者，毕竟应用我看市面上有很

多分析，我就不献丑了。

本文试图用最简单的十多行代码来教读者仿造一个 chatgpt...市场上

那么多的盗版，也未必不是本文中的版本。

原理

简单理解，ChatGPT 的原理就是极其强大的语言模型作为打底（GPT

系列），加上为“CHAT”而训练，平滑的多语种交互，造就了今天的

chatGPT。下面的解释顺序为，语言模型，OpenAI（GPT），Googloe

（BERT 还是盛极一时）、ChatGPT（看看它怎么出来的）。

语言模型

定义：A language model learns to predict the probability of a

sequence of words.

Language models tell us P( ~w) = P(w1 . . . wn):

How likely to occur is this sequence of words?

Roughly: Is this sequence of words a “good” one in

my language?

语言模型就是告诉我们一句话是不是人话。

语言模型的学习有一个特点，就是它本质上不需要标注数据。只要有大

量的文本即可。所谓学习目标都是自行合理构造的。

语言模型的分类

技术原理

 Statistical Language Models: These models use

traditional statistical techniques like N-grams, Hidden Markov

Models (HMM) and certain linguistic rules to learn the

probability distribution of words。主要是使用传统的统计技术，

N-Gram， HMM 以及部分语言学规则来学习序列的概率分布。

 Neural Language Models: These are new players in the

NLP town and use different kinds of Neural Networks to model

language。主要是使用 NN 来学习序列的概率分布。

以学习目标分类（参考自 XLnet：https://arxiv.org/pdf/1906.08237.pdf）

 Autoregressive Language Models：当前我们可以以 GPT 为

代表。AR language modelling seeks to estimate the probability

distribution of a text corpus with an autoregressive model.

Specifically, given a text sequence x = (x_1, · · · , x_T ), AR

language modelling factorizes the likelihood of a forward

product �(�)=∏�=1��(��|�<�) or a backward one �(�)=∏

��=1�(��|�>�). A parametric model (e.g. a neural network) is

trained to model each conditional distribution and finally, we can

get joint distribution.

 Autoencoder Language Models ：以 BERT 为代表。 In

comparison, AE based pretraining does not perform explicit

density estimation but instead aims to reconstruct the original

data from corrupted input. A notable example is BERT [10], which

has been the state-of-the-art pretraining approach. Given the

input token sequence, a certain portion of tokens are replaced

by a special symbol [MASK], and the model is trained to recover

the original tokens from the corrupted version. since the

predicted tokens are masked in the input, BERT is not able to

model the joint probability using the product rule as in AR

language modelling. In other words, BERT assumes the predicted

tokens are independent of each other given the unmasked

tokens, which is oversimplified as high-order, long-range

dependency is prevalent in natural language

上述描述中，我们看到两种学习目标会有变化， AE 在学习如何重构输

入，AR 本质上就在建模联合概率。对应到下游任务的时候， AE 在分

类系列任务中的表现就相对好，且容易学会； AR 由于其单向的特点，

对于很多需要双向信息的下游任务来说，想要达到同样的效果，学习难

度会变高，但也由于这一点其可以支持序列生成。（思考典型的工作，

MT 的编码尽管可以变花样，但解码/生成过程也是单向的）

题外话：从 BERT-ALBERT/Roberta；GPT-GPT3 其实我

们都看到，本身有一个很重要的研究方向就是怎么样才能

让模型在更多的数据上进行训练，从而收获更多知识以得

到更好的效果。

从左往右的生成具有速度上的问题，可以参见

https://

arxiv.org/pdf/2205.0745

9.pdf

字节跳动 DA-transformer 在

生成上做到了加速。

尝试建立一个 N-gram 语言模型

新的技术是很好，但如果直接应用有个坏处，就是对问题的定义不够直

观。旧的传统的技术虽然“落伍”，但是它对问题的定义和解释是很直

观基础的。非常 intuitive。

给定一句话： “I love eating apples.”

N 表示我们在建模的时候要看几个单词，unigram（1-gram）表示一

次就看一个单词，2-gram (or bigram)表示一次看两个，以此类推。

 Unigram: probability estimated from word frequency

 Bigram: x_i depends only on x_{i−1 }

 Trigram: x_i depends only on x_{i−2}, x_{i−1 }

unigram：

最简单最直接的一种建模思路，我们直接统计每个词出现的频率，然后

作为概率来计算。

这带来的问题是：P（我爱你）=P（你爱我）

所以直觉上，我们可以认为词序是有意义的。所以就有了 Bigram 和

Trigram，即 n-gram

Bigram/Trigram

N-gram 语言模型学习的目标是给定一个条件（前序 word(s)），给出

后面接不同词语的概率（链式法则）。

p(w1...ws) = p(w1) . p(w2 | w1) . p(w3 | w1 w2) . p(w4 | w1 w2

评论收藏

内容反馈

小新要变强

粉丝: 2w+
资源: 539

ChatGPT原理介绍：从语言模型走近ChatGPT

ChatGPT+制造：AI大模型如何赋能制造业升级.pdf

计算机行业动态报告：ChatGPT 系列报告：为人形机器人注入“灵魂”.zip

ChatGPT使用总结：150个ChatGPT提示词模板（完整版）

WechatBot：微信ChatGPT机器人

民生证券ChatGPT 系列报告：ChatGPT在金融应用前景.pdf.zip

ChatGPT使用秘籍：150个ChatGPT提示词模板（中英文版）全面揭秘！.pdf

民生证券ChatGPT 系列报告：为人形机器人注入“灵魂”.rar

ChatGPT的崛起：发展历程、技术原理以及局限性.md

ChatGPT专题报告：GPT，大模型多模态应用展望.pdf

《ChatGPT原理与实战：大型语言模型的算法、技术和私有化》.zip

ChatGpt: 训练语言模型

中移上海产研院ChatGPT研究：开启AI新纪元

Chatgpt实战：prompts最全合集（权威版）

ChatGPT使用总结：150个ChatGPT提示词模板（完整版）.zip

ChatGPT的背后原理：大模型、注意力机制、强化学习

ChatGPT for Robotics: Design Principles and Model Abilities

YOLOv8-deepsort 实现智能车辆目标检测+车辆跟踪+车辆计数

YOLOv8网络结构图，自制visio文件，yolov8.vsds，需要的自取，在原有的基础上直接改就行了

yolov8(2023年8月版本),已经下好yolov8s.pt和yolov8n.pt

Transformer模型实现长期预测并可视化结果（附代码+数据集+原理介绍）

社交平台上经济类话题的文章热度信息，数据是真实的，但不是真实日期

行人跌倒数据集（VOC格式）

CIFAR10数据集免费下载

Unet眼底血管图像分割数据集+代码+模型+系统界面+教学视频.zip

YOLOV5口罩检测数据集+代码+模型 2000张标注好的数据+教学视频.zip

基于YOLOv8-Pose的姿态识别项目，带数据集可直接跑通的源码

labelme v5.3.1 （2023年8月新版本，双击打开即用）

Deep Learning Tuning Playbook（中译版）

zotero翻译插件.xpi

最新资源