pytorch와 스크래치 코드의 정규화가 일치하지 않는 이유는 무엇이며 pytorch에서 정규화에 사용되는 공식은 무엇입니까?

Aug 20 2020

PyTorch의 이진 분류 모델에서 L2 정규화를 시도했지만 PyTorch의 결과와 스크래치 코드가 일치하지 않으면 Pytorch 코드 :

class LogisticRegression(nn.Module):
  def __init__(self,n_input_features):
    super(LogisticRegression,self).__init__()
    self.linear=nn.Linear(4,1)
    self.linear.weight.data.fill_(0.0)
    self.linear.bias.data.fill_(0.0)

  def forward(self,x):
    y_predicted=torch.sigmoid(self.linear(x))
    return y_predicted

model=LogisticRegression(4)

criterion=nn.BCELoss()
optimizer=torch.optim.SGD(model.parameters(),lr=0.05,weight_decay=0.1)
dataset=Data()
train_data=DataLoader(dataset=dataset,batch_size=1096,shuffle=False)

num_epochs=1000
for epoch in range(num_epochs):
  for x,y in train_data:
    y_pred=model(x)
    loss=criterion(y_pred,y)
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()

스크래치 코드 :

def sigmoid(z):
    s = 1/(1+ np.exp(-z))
    return s  

def yinfer(X, beta):
  return sigmoid(beta[0] + np.dot(X,beta[1:]))

def cost(X, Y, beta, lam):
    sum = 0
    sum1 = 0
    n = len(beta)
    m = len(Y)
    for i in range(m): 
        sum = sum + Y[i]*(np.log( yinfer(X[i],beta)))+ (1 -Y[i])*np.log(1-yinfer(X[i],beta))
    for i in range(0, n): 
        sum1 = sum1 + beta[i]**2
        
    return  (-sum + (lam/2) * sum1)/(1.0*m)

def pred(X,beta):
  if ( yinfer(X, beta) > 0.5):
    ypred = 1
  else :
    ypred = 0
  return ypred

beta = np.zeros(5)
iterations = 1000
arr_cost = np.zeros((iterations,4))
print(beta)
n = len(Y_train)
for i in range(iterations):
    Y_prediction_train=np.zeros(len(Y_train))
    Y_prediction_test=np.zeros(len(Y_test)) 

    for l in range(len(Y_train)):
        Y_prediction_train[l]=pred(X[l,:],beta)
    
    for l in range(len(Y_test)):
        Y_prediction_test[l]=pred(X_test[l,:],beta)
    
    train_acc = format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100)
    test_acc = 100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100   
    arr_cost[i,:] = [i,cost(X,Y_train,beta,lam),train_acc,test_acc]
    temp_beta = np.zeros(len(beta))

    ''' main code from below '''

    for j in range(n): 
        temp_beta[0] = temp_beta[0] + yinfer(X[j,:], beta) - Y_train[j]
        temp_beta[1:] = temp_beta[1:] + (yinfer(X[j,:], beta) - Y_train[j])*X[j,:]
    
    for k in range(0, len(beta)):
        temp_beta[k] = temp_beta[k] +  lam * beta[k]  #regularization here
    
    temp_beta= temp_beta / (1.0*n)
    
    beta = beta - alpha*temp_beta

손실 그래프

훈련 정확도 그래프

테스트 정확도 그래프

누군가 왜 이런 일이 일어나고 있는지 말해 줄 수 있습니까? L2 값 = 0.1

답변

2 GirishDattatrayHegde Aug 20 2020 at 21:40

좋은 질문입니다. 나는 PyTorch 문서를 통해 많은 것을 파고 답을 찾았습니다. 대답은 매우 까다 롭습니다 . 기본적으로 규제 를 계산하는 방법 에는 두 가지가 있습니다. (마지막 섹션으로 여름 점프를 위해).

PyTorch는 용도 첫 번째 유형 (이 정규화 인자는 배치 크기에 의해 분할되지 않은 참조).

다음은이를 보여주는 샘플 코드입니다.

import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
import torch.optim as optim
 
class model(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(1, 1)
        self.linear.weight.data.fill_(1.0)
        self.linear.bias.data.fill_(1.0)

    def forward(self, x):
        return self.linear(x)


model     = model()
optimizer = optim.SGD(model.parameters(), lr=0.1, weight_decay=1.0)

input     = torch.tensor([[2], [4]], dtype=torch.float32)
target    = torch.tensor([[7], [11]], dtype=torch.float32)

optimizer.zero_grad()
pred      = model(input)
loss      = F.mse_loss(pred, target)

print(f'input: {input[0].data, input[1].data}')
print(f'prediction: {pred[0].data, pred[1].data}')
print(f'target: {target[0].data, target[1].data}')

print(f'\nMSEloss: {loss.item()}\n')

loss.backward()

print('Before updation:')
print('--------------------------------------------------------------------------')
print(f'weight [data, gradient]: {model.linear.weight.data, model.linear.weight.grad}')
print(f'bias [data, gradient]: {model.linear.bias.data, model.linear.bias.grad}')
print('--------------------------------------------------------------------------')
 
optimizer.step()

print('After updation:')
print('--------------------------------------------------------------------------')
print(f'weight [data]: {model.linear.weight.data}')
print(f'bias [data]: {model.linear.bias.data}')
print('--------------------------------------------------------------------------')

다음을 출력합니다 .

input: (tensor([2.]), tensor([4.]))
prediction: (tensor([3.]), tensor([5.]))
target: (tensor([7.]), tensor([11.]))

MSEloss: 26.0

Before updation:
--------------------------------------------------------------------------
weight [data, gradient]: (tensor([[1.]]), tensor([[-32.]]))
bias [data, gradient]: (tensor([1.]), tensor([-10.]))
--------------------------------------------------------------------------
After updation:
--------------------------------------------------------------------------
weight [data]: tensor([[4.1000]])
bias [data]: tensor([1.9000])
--------------------------------------------------------------------------

여기서 m = 배치 크기 = 2, lr = 알파 = 0.1, 람다 = weight_decay = 1 입니다.

이제 값 = 1이고 grad = -32 인 텐서 가중치 를 고려하십시오.

case1 (유형 1 정규화) :

 weight = weight - lr(grad + weight_decay.weight)
 weight = 1 - 0.1(-32 + 1(1))
 weight = 4.1

case2 (유형 2 정규화) :

 weight = weight - lr(grad + (weight_decay/batch size).weight)
 weight = 1 - 0.1(-32 + (1/2)(1))
 weight = 4.15

출력 에서 업데이트 된 weight = 4.1000을 볼 수 있습니다 . 이것으로 PyTorch 는 type1 정규화를 사용합니다 .

그래서 마지막으로 코드에서 type2 정규화를 따르고 있습니다. 따라서 마지막 줄을 다음과 같이 변경하십시오.

# for k in range(0, len(beta)):
#    temp_beta[k] = temp_beta[k] +  lam * beta[k]  #regularization here

temp_beta= temp_beta / (1.0*n)

beta = beta - alpha*(temp_beta + lam * beta)

또한 PyTorch 손실 함수 에는 정규화 용어 ( 최적화 프로그램 내부에서 구현 됨)가 포함되어 있지 않으므로 사용자 지정 비용 함수 내에서 정규화 용어 도 제거하십시오 .