searchusermenu
  • 发布文章
  • 消息中心
点赞
收藏
评论
分享
原创

Skip Deep LSTM(关于LSTM的trick)

2023-02-23 02:51:45
6
0

Skip Deep LSTM(关于LSTM的trick)

首先,网络深度对模型性能影响极为关键

但是,在LSTM的应用中,通常堆叠三层以上的LSTM训练困难,存在梯度消失或爆炸的问题

因此,借鉴GNMT(谷歌翻译系统)的思想,提出一种基于稠密跳跃连接的深度LSTM(Skip Deep LSTM)

实验表明,在图像理解任务上,训练loss优于传统LSTM,同时,在时序预测等任务(发电预测等)上,该模型设计方式优于常用LSTM。

模型设计参考了GNMT思想,第一层为双向长短时记忆网络(BiLSTM),深度为5-7层效果最佳,在image caption 和时序预测问题上优于常用LSTM。

核心代码如下(基于tf2实现):

 


# 基于稠密连接的深度LSTM,可根据实验情况搭建不同尺度连接和网络层数,目前5-7层效果最佳
bi1 = (Bidirectional(LSTM(64, return_sequences=True)))(se2)
bi2 = LSTM(128, return_sequences=True)(bi1)
bi3 = LSTM(128, return_sequences=True)(bi2)
res1 = add([bi1, bi3])
bi4 = LSTM(128, return_sequences=True)(res1)
res2 = add([bi2, bi4, bi1])
bi5 = LSTM(128, return_sequences=True)(res2)
res3 = add([bi3, bi5, bi2, bi1])
bi6 = LSTM(128, return_sequences=True)(res3)
res4 = add([bi4, bi6, bi3, bi1])
se3 = LSTM(256)(res4)
#融合的LSTM
bi1_1 = (Bidirectional(LSTM(16, return_sequences=True)))(add1)
bi1_2 = (Bidirectional(LSTM(16, return_sequences=True)))(bi1_1)
bi1_3 = (Bidirectional(LSTM(16, return_sequences=True)))(bi1_2)
# attention_mul = attention_3d_block(bi1)
bi2_1 = LSTM(32, return_sequences=True)(se2)
bi2_2 = LSTM(32, return_sequences=True)(bi2_1)
bi2_3 = LSTM(32, return_sequences=True)(bi2_2)
res1 = add([bi1_1, bi2_1, bi1_3, bi2_3])
bi1_4 = (Bidirectional(LSTM(16, return_sequences=True)))(res1)
bi2_4 = LSTM(32, return_sequences=True)(res1)
res2 = add([bi1_1, bi2_1, bi1_2, bi2_2])
bi1_5 = (Bidirectional(LSTM(16, return_sequences=True)))(res2)
bi2_5 = LSTM(32, return_sequences=True)(res2)
res3 = add([bi1_1, bi2_1, bi1_2, bi2_2, bi1_3, bi2_3])
# se3 = LSTM(256)(res3)
se3 = (Bidirectional(LSTM(128)))(res3)
decoder2 = Dense(256, activation='relu')(se3)

 

基于时间步注意力机制的嵌入
# 基于timeStep的注意力
def attention_3d_block(inputs):
    # input_dim = int(inputs.shape[2])
    a = Permute((2, 1))(inputs)  # 置换维度
    # Dense层神经元个数就是最大单词数,在时序预测问题中,是输入的特征数
    a = Dense(36, activation='tanh')(a)
    a_probs = Permute((2, 1), name='attention_vec')(a)
    # output_attention_mul = merge([inputs, a_probs], name='attention_mul', mode='mul')
    output_attention_mul = multiply([inputs, a_probs], name='attention_mul')
    return output_attention_mul
# 一般问题是第一层LSTM和第二层LSTM中嵌入timeStep注意力较好,可针对不同问题进行大量实验,选择最佳位置
bi1 = (Bidirectional(LSTM(64, return_sequences=True)))(se2)
attention_mul = attention_3d_block(bi1)
bi2 = LSTM(128, return_sequences=True)(attention_mul)
bi3 = LSTM(128, return_sequences=True)(bi2)
res1 = add([bi1, bi3])
bi4 = LSTM(128, return_sequences=True)(res1)
res2 = add([bi2, bi4, bi1])
bi5 = LSTM(128, return_sequences=True)(res2)
res3 = add([bi3, bi5, bi2, bi1])
bi6 = LSTM(128, return_sequences=True)(res3)
res4 = add([bi4, bi6, bi3, bi1])
se3 = LSTM(256)(res4)
 
 

0条评论
0 / 1000
嘎嘎嘎嘎
15文章数
0粉丝数
嘎嘎嘎嘎
15 文章 | 0 粉丝
原创

Skip Deep LSTM(关于LSTM的trick)

2023-02-23 02:51:45
6
0

Skip Deep LSTM(关于LSTM的trick)

首先,网络深度对模型性能影响极为关键

但是,在LSTM的应用中,通常堆叠三层以上的LSTM训练困难,存在梯度消失或爆炸的问题

因此,借鉴GNMT(谷歌翻译系统)的思想,提出一种基于稠密跳跃连接的深度LSTM(Skip Deep LSTM)

实验表明,在图像理解任务上,训练loss优于传统LSTM,同时,在时序预测等任务(发电预测等)上,该模型设计方式优于常用LSTM。

模型设计参考了GNMT思想,第一层为双向长短时记忆网络(BiLSTM),深度为5-7层效果最佳,在image caption 和时序预测问题上优于常用LSTM。

核心代码如下(基于tf2实现):

 


# 基于稠密连接的深度LSTM,可根据实验情况搭建不同尺度连接和网络层数,目前5-7层效果最佳
bi1 = (Bidirectional(LSTM(64, return_sequences=True)))(se2)
bi2 = LSTM(128, return_sequences=True)(bi1)
bi3 = LSTM(128, return_sequences=True)(bi2)
res1 = add([bi1, bi3])
bi4 = LSTM(128, return_sequences=True)(res1)
res2 = add([bi2, bi4, bi1])
bi5 = LSTM(128, return_sequences=True)(res2)
res3 = add([bi3, bi5, bi2, bi1])
bi6 = LSTM(128, return_sequences=True)(res3)
res4 = add([bi4, bi6, bi3, bi1])
se3 = LSTM(256)(res4)
#融合的LSTM
bi1_1 = (Bidirectional(LSTM(16, return_sequences=True)))(add1)
bi1_2 = (Bidirectional(LSTM(16, return_sequences=True)))(bi1_1)
bi1_3 = (Bidirectional(LSTM(16, return_sequences=True)))(bi1_2)
# attention_mul = attention_3d_block(bi1)
bi2_1 = LSTM(32, return_sequences=True)(se2)
bi2_2 = LSTM(32, return_sequences=True)(bi2_1)
bi2_3 = LSTM(32, return_sequences=True)(bi2_2)
res1 = add([bi1_1, bi2_1, bi1_3, bi2_3])
bi1_4 = (Bidirectional(LSTM(16, return_sequences=True)))(res1)
bi2_4 = LSTM(32, return_sequences=True)(res1)
res2 = add([bi1_1, bi2_1, bi1_2, bi2_2])
bi1_5 = (Bidirectional(LSTM(16, return_sequences=True)))(res2)
bi2_5 = LSTM(32, return_sequences=True)(res2)
res3 = add([bi1_1, bi2_1, bi1_2, bi2_2, bi1_3, bi2_3])
# se3 = LSTM(256)(res3)
se3 = (Bidirectional(LSTM(128)))(res3)
decoder2 = Dense(256, activation='relu')(se3)

 

基于时间步注意力机制的嵌入
# 基于timeStep的注意力
def attention_3d_block(inputs):
    # input_dim = int(inputs.shape[2])
    a = Permute((2, 1))(inputs)  # 置换维度
    # Dense层神经元个数就是最大单词数,在时序预测问题中,是输入的特征数
    a = Dense(36, activation='tanh')(a)
    a_probs = Permute((2, 1), name='attention_vec')(a)
    # output_attention_mul = merge([inputs, a_probs], name='attention_mul', mode='mul')
    output_attention_mul = multiply([inputs, a_probs], name='attention_mul')
    return output_attention_mul
# 一般问题是第一层LSTM和第二层LSTM中嵌入timeStep注意力较好,可针对不同问题进行大量实验,选择最佳位置
bi1 = (Bidirectional(LSTM(64, return_sequences=True)))(se2)
attention_mul = attention_3d_block(bi1)
bi2 = LSTM(128, return_sequences=True)(attention_mul)
bi3 = LSTM(128, return_sequences=True)(bi2)
res1 = add([bi1, bi3])
bi4 = LSTM(128, return_sequences=True)(res1)
res2 = add([bi2, bi4, bi1])
bi5 = LSTM(128, return_sequences=True)(res2)
res3 = add([bi3, bi5, bi2, bi1])
bi6 = LSTM(128, return_sequences=True)(res3)
res4 = add([bi4, bi6, bi3, bi1])
se3 = LSTM(256)(res4)
 
 

文章来自个人专栏
深度学习-lsm
7 文章 | 1 订阅
0条评论
0 / 1000
请输入你的评论
1
1