Perda extremamente alta com precisão de validação consistente

Aug 20 2020

Esta é uma pergunta do Coursera. Tudo sai conforme o esperado para a parte de treinamento. Tentei camadas diferentes, mas eram as mesmas. Talvez alguns erros na minha manipulação do conjunto de dados?

Não consegui encontrar, alguém pode ajudar? obrigado

import csv
import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from os import getcwd

def get_data(filename):
  # You will need to write code that will read the file passed
  # into this function. The first line contains the column headers
  # so you should ignore it
  # Each successive line contians 785 comma separated values between 0 and 255
  # The first value is the label
  # The rest are the pixel values for that picture
  # The function will return 2 np.array types. One with all the labels
  # One with all the images
  #
  # Tips: 
  # If you read a full line (as 'row') then row[0] has the label
  # and row[1:785] has the 784 pixel values
  # Take a look at np.array_split to turn the 784 pixels into 28x28
  # You are reading in strings, but need the values to be floats
  # Check out np.array().astype for a conversion
    with open(filename) as training_file:
        
      # Your code starts here
      reader = csv.reader(training_file)
      next(reader,None)
      images = []
      labels = []
      for i in reader:
            
            labels.append(i[0])
            imageData = i[1:785]
            images.append(np.array_split(imageData,28))
            
      # Your code ends here
      labels = np.array(labels).astype('float')
      images = np.array(images).astype('float')
    return images, labels

path_sign_mnist_train = f"{getcwd()}/../tmp2/sign_mnist_train.csv"
path_sign_mnist_test = f"{getcwd()}/../tmp2/sign_mnist_test.csv"
training_images, training_labels = get_data(path_sign_mnist_train)
testing_images, testing_labels = get_data(path_sign_mnist_test)

# Keep these
print(training_images.shape)
print(training_labels.shape)
print(testing_images.shape)
print(testing_labels.shape)

# In this section you will have to add another dimension to the data
# So, for example, if your array is (10000, 28, 28)
# You will need to make it (10000, 28, 28, 1)

training_images = np.expand_dims(training_images,axis=-1)# Your Code Here
testing_images = np.expand_dims(testing_images,axis=-1)# Your Code Here

# Create an ImageDataGenerator and do Image Augmentation
train_datagen = ImageDataGenerator(rescale = 1./255.,
                                   rotation_range = 40,
                                   width_shift_range = 0.2,
                                   height_shift_range = 0.2,
                                   shear_range = 0.2,
                                   zoom_range = 0.2,
                                   horizontal_flip = True,
                                   fill_mode = 'nearest'
    )

validation_datagen = ImageDataGenerator(rescale = 1./255.)
    
# Keep These
print(training_images.shape)
print(testing_images.shape)
    
# Their output should be:
# (27455, 28, 28, 1)
# (7172, 28, 28, 1)

# Define the model
# Use no more than 2 Conv2D and 2 MaxPooling2D
from tensorflow.keras.optimizers import RMSprop
model = tf.keras.models.Sequential([    tf.keras.layers.Conv2D(32, (3,3), activation='relu', input_shape=(28, 28, 1)),
    tf.keras.layers.MaxPooling2D(2,2),
    tf.keras.layers.Conv2D(32, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2,2),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(512, activation='relu'),
    tf.keras.layers.Dense(26, activation='softmax')])


# Compile Model. 
model.compile(loss = 'sparse_categorical_crossentropy',
              optimizer = RMSprop(lr=0.01),
              metrics = ['accuracy'])

# Train the Model
train_generator = train_datagen.flow(training_images,training_labels,
                                                    batch_size = 10
                                                     
                                                  )  
validation_generator =  validation_datagen.flow( testing_images,
                                                
                                                testing_labels,
                                                batch_size  = 10  
                                                         )
history = model.fit_generator(train_generator,
                              epochs=5,
                              steps_per_epoch=len(training_images) / 32,
                              validation_data=validation_generator
                              
                             )

model.evaluate(testing_images, testing_labels,verbose=0)

O resultado do modelo é mostrado abaixo:

Epoch 1/5
858/857 [==============================] - 78s 91ms/step - loss: 15.4250 - accuracy: 0.0422 - val_loss: 15.5210 - val_accuracy: 0.0371
Epoch 2/5
858/857 [==============================] - 75s 88ms/step - loss: 15.4719 - accuracy: 0.0401 - val_loss: 15.5210 - val_accuracy: 0.0371
Epoch 3/5
858/857 [==============================] - 77s 89ms/step - loss: 15.4230 - accuracy: 0.0431 - val_loss: 15.5210 - val_accuracy: 0.0371
Epoch 4/5
858/857 [==============================] - 76s 89ms/step - loss: 15.4268 - accuracy: 0.0429 - val_loss: 15.5120 - val_accuracy: 0.0371
Epoch 5/5
858/857 [==============================] - 75s 88ms/step - loss: 15.4287 - accuracy: 0.0428 - val_loss: 15.5120 - val_accuracy: 0.0371

O tamanho do lote é baixo porque o notebook Jupyter do Coursera tem um limite de 10.

Respostas

rayryeng Aug 21 2020 at 04:54

Seu código está correto. Suspeito que tenha algo a ver com o otimizador. Tente usar Adam em vez de RMSProp e tente definir a taxa de aprendizado de Adam para 0,001, que é a taxa de aprendizado padrão. Fora isso, seu notebook está extraindo corretamente os rótulos e os dados, formulando os geradores de dados e a rede parece correta.