Skip Deep LSTM（关于LSTM的trick）-天翼云开发者社区

Skip Deep LSTM（关于LSTM的trick）

首先，网络深度对模型性能影响极为关键

但是，在LSTM的应用中，通常堆叠三层以上的LSTM训练困难，存在梯度消失或爆炸的问题

因此，借鉴GNMT(谷歌翻译系统)的思想，提出一种基于稠密跳跃连接的深度LSTM(Skip Deep LSTM)

实验表明，在图像理解任务上，训练loss优于传统LSTM，同时，在时序预测等任务（发电预测等）上，该模型设计方式优于常用LSTM。

模型设计参考了GNMT思想，第一层为双向长短时记忆网络(BiLSTM)，深度为5-7层效果最佳，在image caption 和时序预测问题上优于常用LSTM。

核心代码如下(基于tf2实现)：

# 基于稠密连接的深度LSTM，可根据实验情况搭建不同尺度连接和网络层数，目前5-7层效果最佳
bi1 = (Bidirectional(LSTM(64, return_sequences=True)))(se2)
bi2 = LSTM(128, return_sequences=True)(bi1)
bi3 = LSTM(128, return_sequences=True)(bi2)
res1 = add([bi1, bi3])
bi4 = LSTM(128, return_sequences=True)(res1)
res2 = add([bi2, bi4, bi1])
bi5 = LSTM(128, return_sequences=True)(res2)
res3 = add([bi3, bi5, bi2, bi1])
bi6 = LSTM(128, return_sequences=True)(res3)
res4 = add([bi4, bi6, bi3, bi1])
se3 = LSTM(256)(res4)
#融合的LSTM
bi1_1 = (Bidirectional(LSTM(16, return_sequences=True)))(add1)
bi1_2 = (Bidirectional(LSTM(16, return_sequences=True)))(bi1_1)
bi1_3 = (Bidirectional(LSTM(16, return_sequences=True)))(bi1_2)
# attention_mul = attention_3d_block(bi1)
bi2_1 = LSTM(32, return_sequences=True)(se2)
bi2_2 = LSTM(32, return_sequences=True)(bi2_1)
bi2_3 = LSTM(32, return_sequences=True)(bi2_2)
res1 = add([bi1_1, bi2_1, bi1_3, bi2_3])
bi1_4 = (Bidirectional(LSTM(16, return_sequences=True)))(res1)
bi2_4 = LSTM(32, return_sequences=True)(res1)
res2 = add([bi1_1, bi2_1, bi1_2, bi2_2])
bi1_5 = (Bidirectional(LSTM(16, return_sequences=True)))(res2)
bi2_5 = LSTM(32, return_sequences=True)(res2)
res3 = add([bi1_1, bi2_1, bi1_2, bi2_2, bi1_3, bi2_3])
# se3 = LSTM(256)(res3)
se3 = (Bidirectional(LSTM(128)))(res3)
decoder2 = Dense(256, activation='relu')(se3)

基于时间步注意力机制的嵌入
# 基于timeStep的注意力
def attention_3d_block(inputs):
# input_dim = int(inputs.shape[2])
a = Permute((2, 1))(inputs) # 置换维度
# Dense层神经元个数就是最大单词数，在时序预测问题中，是输入的特征数
a = Dense(36, activation='tanh')(a)
a_probs = Permute((2, 1), name='attention_vec')(a)
# output_attention_mul = merge([inputs, a_probs], name='attention_mul', mode='mul')
output_attention_mul = multiply([inputs, a_probs], name='attention_mul')
return output_attention_mul
# 一般问题是第一层LSTM和第二层LSTM中嵌入timeStep注意力较好，可针对不同问题进行大量实验，选择最佳位置
bi1 = (Bidirectional(LSTM(64, return_sequences=True)))(se2)
attention_mul = attention_3d_block(bi1)
bi2 = LSTM(128, return_sequences=True)(attention_mul)
bi3 = LSTM(128, return_sequences=True)(bi2)
res1 = add([bi1, bi3])
bi4 = LSTM(128, return_sequences=True)(res1)
res2 = add([bi2, bi4, bi1])
bi5 = LSTM(128, return_sequences=True)(res2)
res3 = add([bi3, bi5, bi2, bi1])
bi6 = LSTM(128, return_sequences=True)(res3)
res4 = add([bi4, bi6, bi3, bi1])
se3 = LSTM(256)(res4)

Skip Deep LSTM（关于LSTM的trick）

首先，网络深度对模型性能影响极为关键

但是，在LSTM的应用中，通常堆叠三层以上的LSTM训练困难，存在梯度消失或爆炸的问题

因此，借鉴GNMT(谷歌翻译系统)的思想，提出一种基于稠密跳跃连接的深度LSTM(Skip Deep LSTM)

实验表明，在图像理解任务上，训练loss优于传统LSTM，同时，在时序预测等任务（发电预测等）上，该模型设计方式优于常用LSTM。

模型设计参考了GNMT思想，第一层为双向长短时记忆网络(BiLSTM)，深度为5-7层效果最佳，在image caption 和时序预测问题上优于常用LSTM。

核心代码如下(基于tf2实现)：

智算服务

应用商城

定价

合作伙伴

开发者

支持与服务

了解天翼云

Skip Deep LSTM（关于LSTM的trick）

Skip Deep LSTM（关于LSTM的trick）

活动

智算服务

应用商城

定价

合作伙伴

开发者

支持与服务

了解天翼云

Skip Deep LSTM（关于LSTM的trick）

Skip Deep LSTM（关于LSTM的trick）