2023. 8. 24. 15:51ㆍ연구 프로젝트/강화학습 챗봇
※어디까지나 도전기일 뿐, 맞는 방법이 아니므로 따라하지 마시오.※
3. 첫 KoGPT2 강화학습 도전
1) 원본 논문 코드
(1) 1번의 에피소드에 대해 손실값과 모델로부터 생성되는 답변을 구하는 함수
def rl(input_variable, lengths, target_variable, mask, max_target_len, encoder, decoder, batch_size, teacher_forcing_ratio):
# Set device options
input_variable = input_variable.to(device)
target_variable = target_variable.to(device)
mask = mask.to(device)
# Lengths for rnn packing should always be on the cpu
lengths = lengths.to("cpu")
# Initialize variables
loss = 0
print_losses = []
response = []
# Forward pass through encoder
encoder_outputs, encoder_hidden = encoder(input_variable, lengths)
# Create initial decoder input (start with SOS tokens for each sentence)
decoder_input = torch.LongTensor([[SOS_token for _ in range(batch_size)]])
decoder_input = decoder_input.to(device)
# Set initial decoder hidden state to the encoder's final hidden state
decoder_hidden = encoder_hidden[:decoder.n_layers]
# Determine if we are using teacher forcing this iteration
use_teacher_forcing = True if random.random() < teacher_forcing_ratio else False
# Forward batch of sequences through decoder one time step at a time
if use_teacher_forcing:
for t in range(max_target_len):
decoder_output, decoder_hidden = decoder(
decoder_input, decoder_hidden, encoder_outputs
)
# Teacher forcing: next input is current target
decoder_input = target_variable[t].view(1, -1)
# Calculate and accumulate loss
mask_loss, n_total = mask_nll_loss(
decoder_output, target_variable[t], mask[t])
loss += mask_loss
print_losses.append(mask_loss.item() * n_total)
else:
for t in range(max_target_len):
decoder_output, decoder_hidden = decoder(
decoder_input, decoder_hidden, encoder_outputs
)
# No teacher forcing: next input is decoder's own current output
_, topi = decoder_output.topk(1)
decoder_input = torch.LongTensor(
[[topi[i][0] for i in range(batch_size)]])
decoder_input = decoder_input.to(device)
# Calculate and accumulate loss
mask_loss, n_total = mask_nll_loss(
decoder_output, target_variable[t], mask[t])
loss += mask_loss
print_losses.append(mask_loss.item() * n_total)
#ni or decoder_output
response.append(topi)
print("rl-print_losses", print_losses)
return loss, max_target_len, response
-input_variable: 입력값, target_variable: 정답값, mask: input_variable의 mask 텐서
-teacher forcing 사용: 현재 정답값, target_variable이 다음 입력값으로 설정
-teacher forcing 사용 X: input_variable을 모델에 입력했을 때 나온 출력값 decoder output을 다음 입력값으로 설정
(설마 나중에 실패하는 이유가 teacher forcing을 사용하지 않아서 일 수도...일단 고려 대상에 추가해야겠다)
-생성해야 할 결과값의 최대 길이만큼 단어를 생성 후 이에 대한 손실값 계산
-손실값, 출력값의 최대 길이, 생성된 답변 반환
(2) 전체 학습 과정 코드
def training_rl_loop(model_name, voc, pairs, batch_size, forward_encoder, forward_encoder_optimizer, forward_decoder, forward_decoder_optimizer, backward_encoder, backward_encoder_optimizer, backward_decoder, backward_decoder_optimizer,teacher_forcing_ratio,n_iteration, print_every, save_every, save_dir):
dull_responses = ["i do not know what you are talking about.", "i do not know.", "you do not know.", "you know what i mean.", "i know what you mean.", "you know what i am saying.", "you do not know anything."]
# Load batches for each iteration
training_batches = [batch_2_train_data(voc, [random.choice(pairs) for _ in range(batch_size)])
for _ in range(n_iteration)]
# Initializations
print('Initializing ...')
start_iteration = 1
print_loss = 0
#Training loop
print("Training...")
for iteration in range(start_iteration, n_iteration + 1):
print("Iteration", iteration)
training_batch = training_batches[iteration - 1]
# Extract fields from batch
input_variable, lengths, target_variable, mask, max_target_len = training_batch
##MODIFS HERE
# Zero gradients the optimizer
forward_encoder_optimizer.zero_grad()
forward_decoder_optimizer.zero_grad()
backward_encoder_optimizer.zero_grad()
backward_decoder_optimizer.zero_grad()
#Forward
forward_loss, forward_len, _ = rl(input_variable, lengths, target_variable, mask, max_target_len, forward_encoder, forward_decoder, batch_size, teacher_forcing_ratio)
#Calculate reward
reward = calculate_rewards(input_variable, lengths, target_variable, mask, max_target_len, forward_encoder, forward_decoder, backward_encoder, backward_decoder, batch_size, teacher_forcing_ratio)
#Update forward seq2seq with loss scaled by reward
loss = forward_loss * reward
loss.backward()
forward_encoder_optimizer.step()
forward_decoder_optimizer.step()
# Run a training iteration with batch
print_loss += loss / forward_len
# Print progress
if iteration % print_every == 0:
print_loss_avg = print_loss / print_every
print("Iteration: {}; Percent complete: {:.1f}%; Average loss: {:.4f}".format(iteration, iteration / n_iteration * 100, print_loss_avg))
print_loss = 0
#SAVE CHECKPOINT TO DO
if (iteration % save_every == 0):
directory = os.path.join(save_dir, model_name, corpus_name)#, '{}-{}_{}'.format(encoder_n_layers, decoder_n_layers, hidden_size))
if not os.path.exists(directory):
os.makedirs(directory)
torch.save({
'iteration': iteration,
'en': encoder.state_dict(),
'de': decoder.state_dict(),
'en_opt': encoder_optimizer.state_dict(),
'de_opt': decoder_optimizer.state_dict(),
'loss': loss,
'voc_dict': voc.__dict__,
'embedding': embedding.state_dict()
}, os.path.join(directory, '{}_{}.tar'.format(iteration, 'checkpoint')))
-내가 지정한 iteration 값만큼 반복 진행
-rl 함수를 이용해 손실값 계산, calculate_rewards 함수를 통해 최종 보상 값 계산
→ 이 둘을 곱해 최종적인 손실값 계산, 해당 손실값으로 정책 함수의 파라미터 (여기서는 Seq2seq 모델의 파라미터) 업데이트
(3) 실제로 학습시키는 코드
# Load/Assemble voc and pairs
voc, pairs = load_prepare_data(corpus, corpus_name, datafile, save_dir)
for pair in pairs[:10]:
print(pair)
# Configure models
model_name = 'cb_model'
attn_model = 'dot'
# attn_model = 'general'
# attn_model = 'concat'
hidden_size = 500
encoder_n_layers = 2
decoder_n_layers = 2
dropout = 0.1
batch_size = 64
# Set checkpoint to load from; set to None if starting from scratch
loadFilename = None
checkpoint_iter = 10000 # 4000
# loadFilename = os.path.join(save_dir, model_name, corpus_name,
# '{}-{}_{}'.format(encoder_n_layers, decoder_n_layers, hidden_size),
# '{}_checkpoint.tar'.format(checkpoint_iter))
# print(loadFilename)
# Load model if a loadFilename is provided
if loadFilename:
# If loading on same machine the model was trained on
#checkpoint = torch.load(loadFilename)
# If loading a model trained on GPU to CPU
checkpoint = torch.load(loadFilename, map_location=torch.device('cpu'))
encoder_sd = checkpoint['en']
decoder_sd = checkpoint['de']
encoder_optimizer_sd = checkpoint['en_opt']
decoder_optimizer_sd = checkpoint['de_opt']
embedding_sd = checkpoint['embedding']
voc.__dict__ = checkpoint['voc_dict']
print('Building encoder and decoder ...')
# Initialize word embeddings
embedding = nn.Embedding(voc.num_words, hidden_size)
if loadFilename:
embedding.load_state_dict(embedding_sd)
# Initialize encoder & decoder models
encoder = EncoderRNN(hidden_size, embedding, encoder_n_layers, dropout)
decoder = LuongAttnDecoderRNN(
attn_model, embedding, hidden_size, voc.num_words, decoder_n_layers, dropout)
if loadFilename:
encoder.load_state_dict(encoder_sd)
decoder.load_state_dict(decoder_sd)
# Use appropriate device
encoder = encoder.to(device)
decoder = decoder.to(device)
print('Models built and ready to go!')
# Configure training/optimization
clip = 50.0
teacher_forcing_ratio = 1.0
learning_rate = 0.0001
decoder_learning_ratio = 5.0
n_iteration = 500 # 4000
print_every = 1
save_every = 500
# Ensure dropout layers are in train mode
encoder.train()
decoder.train()
# Initialize optimizers
print('Building optimizers ...')
encoder_optimizer = optim.Adam(encoder.parameters(), lr=learning_rate)
decoder_optimizer = optim.Adam(decoder.parameters(), lr=learning_rate * decoder_learning_ratio)
if loadFilename:
encoder_optimizer.load_state_dict(encoder_optimizer_sd)
decoder_optimizer.load_state_dict(decoder_optimizer_sd)
# If you have cuda, configure cuda to call
for state in encoder_optimizer.state.values():
for k, v in state.items():
if isinstance(v, torch.Tensor):
state[k] = v.cuda()
for state in decoder_optimizer.state.values():
for k, v in state.items():
if isinstance(v, torch.Tensor):
state[k] = v.cuda()
# Run training iterations
print("Starting Training!")
forward_encoder = encoder
forward_decoder = decoder
forward_encoder = forward_encoder.to(device)
forward_decoder = forward_decoder.to(device)
backward_encoder = EncoderRNN(hidden_size, embedding, encoder_n_layers, dropout)
backward_decoder = LuongAttnDecoderRNN(attn_model, embedding, hidden_size, voc.num_words, decoder_n_layers, dropout)
backward_encoder = backward_encoder.to(device)
backward_decoder = backward_decoder.to(device)
#Configure RL model
model_name='RL_model_seq'
n_iteration = 10000
print_every=100
save_every=500
learning_rate = 0.0001
decoder_learning_ratio = 5.0
teacher_forcing_ratio = 0.5
# Ensure dropout layers are in train mode
forward_encoder.train()
forward_decoder.train()
backward_encoder.train()
backward_decoder.train()
# Initialize optimizers
print('Building optimizers ...')
forward_encoder_optimizer = optim.Adam(forward_encoder.parameters(), lr=learning_rate)
forward_decoder_optimizer = optim.Adam(forward_decoder.parameters(), lr=learning_rate * decoder_learning_ratio)
backward_encoder_optimizer = optim.Adam(backward_encoder.parameters(), lr=learning_rate)
backward_decoder_optimizer = optim.Adam(backward_decoder.parameters(), lr=learning_rate * decoder_learning_ratio)
# If you have cuda, configure cuda to call
for state in forward_encoder_optimizer.state.values():
for k, v in state.items():
if isinstance(v, torch.Tensor):
state[k] = v.cuda()
for state in forward_decoder_optimizer.state.values():
for k, v in state.items():
if isinstance(v, torch.Tensor):
state[k] = v.cuda()
for state in backward_encoder_optimizer.state.values():
for k, v in state.items():
if isinstance(v, torch.Tensor):
state[k] = v.cuda()
for state in backward_decoder_optimizer.state.values():
for k, v in state.items():
if isinstance(v, torch.Tensor):
state[k] = v.cuda()
# Run training iterations
print("Starting Training!")
training_rl_loop(model_name, voc, pairs, batch_size, forward_encoder, forward_encoder_optimizer, forward_decoder, forward_decoder_optimizer, backward_encoder, backward_encoder_optimizer, backward_decoder, backward_decoder_optimizer,teacher_forcing_ratio,n_iteration, print_every, save_every, save_dir)
-모델, 최적화 함수 등 다양한 설정 지정 및 training_rl_loop 함수 실행
2) KoGPT2를 기반으로 강화학습을 진행하기 위한 학습 코드
(1) 수정된 rl 함수 코드
def RL(token_ids, mask, labels_ids, forward_model, forward_optimizer, criterion):
# Forward pass through the GPT-2 model
output = forward_model(token_ids)
output = output.logits
# Calculate and accumulate loss
mask_3d = mask.unsqueeze(dim=2).repeat(1, 1, output.shape[2]).to(device)
mask_out = torch.where(mask_3d == 1, output, Sneg * torch.ones_like(output)).to(device)
loss = criterion(mask_out.transpose(2, 1), labels_ids).to(device)
avg_loss = loss.sum() / mask.sum()
return avg_loss, output
-token_ids: 입력값, mask: token_ids에 대한 mask 텐서, labels_ids: 정답값
-모델에 token_ids 입력 및 출력값 생성
-원래 정답값인 label_ids, 모델이 만든 답변에 대한 mask 텐서를 제작한 후 두 텐서를 이용해 손실값 계산
(2) 수정된 전체 학습 과정 코드
def training_rl_loop(data, epochs, forward_model, forward_optimizer, backward_model, backward_optimizer, criterion):
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
forward_model.to(device)
backward_model.to(device)
# Training loop
print("Training...")
for epoch in range(5): # epoch 값은 자신이 원하는 대로 수정
print(f"Epoch: {epoch} Start")
for batch_idx, samples in tqdm(enumerate(train_dataloader)):
token_ids, mask, labels_ids = samples
token_ids = token_ids.to(device)
mask = mask.to(device)
labels_ids = labels_ids.to(device)
forward_optimizer.zero_grad()
backward_optimizer.zero_grad()
# Forward
forward_loss, _ = RL(token_ids, mask, labels_ids, forward_model, forward_optimizer, criterion)
# Calculate reward
reward = calculate_rewards(token_ids, mask, labels_ids, forward_model, backward_model, criterion)
loss = forward_loss.mean() * reward
loss.backward()
forward_optimizer.step()
backward_optimizer.step()
# Print
print(f"Epoch: {epoch}; Average loss: {loss}")
# Save
checkpoint = {
"epoch": epoch,
"forward_model_state_dict": forward_model.state_dict(),
"backward_model_state_dict": backward_model.state_dict(),
"forward_optimizer_state_dict": forward_optimizer.state_dict(),
"backward_optimizer_state_dict": backward_optimizer.state_dict()}
torch.save(checkpoint, f"./train_RL/5_{epoch}_kogpt2_checkpoint.pt")
-epoch 값을 지정해 해당 epoch 값 만큼 강화학습을 진행
-매 epoch마다 모델 저장
3) 강화학습을 위한 학습 데이터
*강화학습 첫번째 시도 때는 참고 논문에서 사용한 데이터 쪼개기 방식 사용
-이러한 구성의 데이터 200,000행 이용해 학습
4) 강화학습 시작 코드
learning_rate = 3e-5
Sneg = -1e18
epochs = 10
batch_size = 2
max_length = 64
forward_model = GPT2LMHeadModel.from_pretrained("skt/kogpt2-base-v2")
forward_model.resize_token_embeddings(len(TOKENIZER))
forward_checkpoint = torch.load("./train_RL/5_0_kogpt2_checkpoint.pt")
forward_model.load_state_dict(forward_checkpoint["forward_model_state_dict"])
backward_model = GPT2LMHeadModel.from_pretrained("skt/kogpt2-base-v2")
backward_model.resize_token_embeddings(len(TOKENIZER))
backward_checkpoint = torch.load("./train_RL/5_0_kogpt2_checkpoint.pt")
backward_model.load_state_dict(backward_checkpoint["backward_model_state_dict"])
forward_model = forward_model.to(device)
backward_model = backward_model.to(device)
train_set = ChatbotDataset(ChatbotData, max_len=64)
train_dataloader = DataLoader(train_set, batch_size, num_workers=0, shuffle=True, collate_fn=collate_batch)
criterion = torch.nn.CrossEntropyLoss(reduction="none").to(device)
forward_optimizer = optim.Adam(forward_model.parameters(), lr=learning_rate)
forward_optimizer.load_state_dict(forward_checkpoint["forward_optimizer_state_dict"])
backward_optimizer = optim.Adam(backward_model.parameters(), lr=learning_rate)
backward_optimizer.load_state_dict(backward_checkpoint["backward_optimizer_state_dict"])
# Ensure models are in train mode
forward_model.train()
backward_model.train()
# If you have cuda, configure cuda to call
for state in forward_optimizer.state.values():
for k, v in state.items():
if isinstance(v, torch.Tensor):
state[k] = v.cuda()
for state in forward_optimizer.state.values():
for k, v in state.items():
if isinstance(v, torch.Tensor):
state[k] = v.cuda()
# Run training iterations
print("Starting Training!")
training_rl_loop(train_dataloader, epochs, forward_model, forward_optimizer, backward_model, backward_optimizer, criterion)
*학습률 등 여러 파라미터 값 설정
*batch_size가 2인 이유는 코랩의 메모리 부족 문제 때문에...ㅠ
*Fine-tuning만 진행된 KoGPT2 모델 및 최적화 함수를 가져옴
*forward model & backward model 나눈 이유: semantic_coherence 함수에서 forward loss와 backward loss를 구해야 하므로 각각의 모델로 구하게끔 설정
*이후 학습 진행
5) 챗봇 성능 평가
*2번 정도의 epoch 만큼 학습을 진행한 후, 챗봇 성능 평가
-Fine-tuning만 시도했을 때와 별반 다를 게 없다.
'연구 프로젝트 > 강화학습 챗봇' 카테고리의 다른 글
-Offline 강화학습 기반의 챗봇 구현 도전기 정리 잠시 중단- (0) | 2024.01.03 |
---|---|
[Offline 강화학습 챗봇] Policy Gradient를 이용한 구현 도전기 - KoGPT2 Fine-tuning (3) (0) | 2023.08.30 |
[Offline 강화학습 챗봇] Policy Gradient를 이용한 구현 도전기 - 강화학습 (1) (0) | 2023.08.24 |
[Offline 강화학습 챗봇] Policy Gradient를 이용한 구현 도전기 - KoGPT2 Fine-tuning (2) (0) | 2023.08.20 |
[Offline 강화학습 챗봇] Policy Gradient를 이용한 구현 도전기 - KoGPT2 Fine-tuning (1) (0) | 2023.08.20 |