BERT模型源码解析( 三 ) _生活百科

whether dropout will be applied.是否进行训练
input_ids: int32 Tensor of shape [batch_size, seq_length].输入维度
input_mask: (optional) int32 Tensor of shape [batch_size, seq_length].输入掩码
token_type_ids: (optional) int32 Tensor of shape [batch_size, seq_length].令牌类型
use_one_hot_embeddings: (optional) bool. Whether to use one-hot word
embeddings or tf.embedding_lookup() for the word embeddings.
嵌入层：是否使用one-hot词嵌入
scope: (optional) variable scope. Defaults to "bert".
可选变量，默认值"bert".
Raises:
ValueError: The config is invalid or one of the input tensor shapes
is invalid.
异常：值错误，配置参数无效或者张量形状无效
"""
config = copy.deepcopy(config) 配置参数对象深度克隆
if not is_training: 如果不训练
config.hidden_dropout_prob = 0.0  放弃比例设置为0，表示不放弃参数
config.attention_probs_dropout_prob = 0.0 放弃比例设置为0，表示不放弃参数
获取输入形状
input_shape = get_shape_list(input_ids, expected_rank=2)
batch_size = input_shape[0]  批处理量，每一批处理的数据条数
seq_length = input_shape[1]  序列长度
if input_mask is None:  如果没有输出掩码，则将掩码全部设置为1
input_mask = tf.ones(shape=[batch_size, seq_length], dtype=tf.int32)
if token_type_ids is None:  如果没有令牌，则将令牌全部设置为0
token_type_ids = tf.zeros(shape=[batch_size, seq_length], dtype=tf.int32)
variable_scope 变量共享、上下文管理器，作用域；
在这个管理器下做的事情，会被这个管理器管着
with tf.variable_scope(scope, default_name="bert"):
with tf.variable_scope("embeddings"):
# Perform embedding lookup on the word ids.对单词id执行嵌入查找。
(self.embedding_output, self.embedding_table) = embedding_lookup(
input_ids=input_ids,
vocab_size=config.vocab_size,
embedding_size=config.hidden_size,
initializer_range=config.initializer_range,
word_embedding_name="word_embeddings",
use_one_hot_embeddings=use_one_hot_embeddings)
添加位置嵌入、令牌嵌入，然后标准化并执行丢弃
# Add positional embeddings and token type embeddings, then layer
# normalize and perform dropout.
embedding_postprocessor对单词嵌入张量执行各种后处理。
self.embedding_output = embedding_postprocessor(
input_tensor=self.embedding_output,
use_token_type=True,
token_type_ids=token_type_ids,
token_type_vocab_size=config.type_vocab_size,
token_type_embedding_name="token_type_embeddings",
use_position_embeddings=True,
position_embedding_name="position_embeddings",
initializer_range=config.initializer_range,
max_position_embeddings=config.max_position_embeddings,
dropout_prob=config.hidden_dropout_prob)
with tf.variable_scope("encoder"):
将2维掩码转换成3维，用于注意力评分
# This converts a 2D mask of shape [batch_size, seq_length] to a 3D
# mask of shape [batch_size, seq_length, seq_length] which is used
# for the attention scores.
attention_mask = create_attention_mask_from_input_mask(
input_ids, input_mask)
# Run the stacked transformer.  运行堆叠的transformer模型
# `sequence_output` shape = [batch_size, seq_length, hidden_size].
创建transformer_model对象
self.all_encoder_layers = transformer_model(
input_tensor=self.embedding_output,
attention_mask=attention_mask,
hidden_size=config.hidden_size,
num_hidden_layers=config.num_hidden_layers,
num_attention_heads=config.num_attention_heads,
intermediate_size=config.intermediate_size,
intermediate_act_fn=get_activation(config.hidden_act),

BERT模型源码解析( 三 )

经验总结扩展阅读

瑞香花怎么养?

真正厉害的人，不会被情绪打败

壶铃现在流行“减法妆容”？赵露思、彭小苒都在画，堪称伪素颜天花板

宋朝首都,宋朝的首都是哪里?

男人纹眉效果图了解术后的恢复过程

必玩的十大网游排名好玩的大型网游排行榜

健美生 vc含片维生素c 吃完fancl的vc后就入了健美生的蓝莓含片

人和人的差距一个女人很有“福气”的时候，其实是可以看出来的

汽车空调有雪种但是不制冷是什么原因?

尽管现在社会进步了三胎产妇被家人“晾”医院5天，没人照顾无人理睬，原因戳人心窝

拉粑粑滴血鲜红的血是怎么回事

结婚三金结婚当天放女方家还是男方拿过来

名媛网红短裙黑丝校园门口性感摆拍？这些“名媛”们，彻底被玩坏了…

航海王商店激斗奥义橙卡给谁比较好

阜新的特点

卡罗拉没电了怎么启动？

怎么克隆微信好友（微信一天加多少人封号)

我们为什么讨厌那些不懂事的人，你也讨厌吗

我是在大学里认识的黄杰|妻子长年被我妈欺负，她一点怨言也没有，如今我后悔死了

吴亦凡|吴亦凡事件后，被骂的是那些女孩