Pérdida extremadamente alta con precisión de validación constante

Aug 20 2020

Esta es una pregunta de Coursera. Todo sale como se esperaba para la parte de entrenamiento. He probado diferentes capas pero eran iguales. ¿Quizás algunos errores en mi manipulación del conjunto de datos?

No pude encontrarlo, ¿alguien puede ayudarme? Gracias

import csv
import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from os import getcwd

def get_data(filename):
  # You will need to write code that will read the file passed
  # into this function. The first line contains the column headers
  # so you should ignore it
  # Each successive line contians 785 comma separated values between 0 and 255
  # The first value is the label
  # The rest are the pixel values for that picture
  # The function will return 2 np.array types. One with all the labels
  # One with all the images
  #
  # Tips: 
  # If you read a full line (as 'row') then row[0] has the label
  # and row[1:785] has the 784 pixel values
  # Take a look at np.array_split to turn the 784 pixels into 28x28
  # You are reading in strings, but need the values to be floats
  # Check out np.array().astype for a conversion
    with open(filename) as training_file:
        
      # Your code starts here
      reader = csv.reader(training_file)
      next(reader,None)
      images = []
      labels = []
      for i in reader:
            
            labels.append(i[0])
            imageData = i[1:785]
            images.append(np.array_split(imageData,28))
            
      # Your code ends here
      labels = np.array(labels).astype('float')
      images = np.array(images).astype('float')
    return images, labels

path_sign_mnist_train = f"{getcwd()}/../tmp2/sign_mnist_train.csv"
path_sign_mnist_test = f"{getcwd()}/../tmp2/sign_mnist_test.csv"
training_images, training_labels = get_data(path_sign_mnist_train)
testing_images, testing_labels = get_data(path_sign_mnist_test)

# Keep these
print(training_images.shape)
print(training_labels.shape)
print(testing_images.shape)
print(testing_labels.shape)

# In this section you will have to add another dimension to the data
# So, for example, if your array is (10000, 28, 28)
# You will need to make it (10000, 28, 28, 1)

training_images = np.expand_dims(training_images,axis=-1)# Your Code Here
testing_images = np.expand_dims(testing_images,axis=-1)# Your Code Here

# Create an ImageDataGenerator and do Image Augmentation
train_datagen = ImageDataGenerator(rescale = 1./255.,
                                   rotation_range = 40,
                                   width_shift_range = 0.2,
                                   height_shift_range = 0.2,
                                   shear_range = 0.2,
                                   zoom_range = 0.2,
                                   horizontal_flip = True,
                                   fill_mode = 'nearest'
    )

validation_datagen = ImageDataGenerator(rescale = 1./255.)
    
# Keep These
print(training_images.shape)
print(testing_images.shape)
    
# Their output should be:
# (27455, 28, 28, 1)
# (7172, 28, 28, 1)

# Define the model
# Use no more than 2 Conv2D and 2 MaxPooling2D
from tensorflow.keras.optimizers import RMSprop
model = tf.keras.models.Sequential([    tf.keras.layers.Conv2D(32, (3,3), activation='relu', input_shape=(28, 28, 1)),
    tf.keras.layers.MaxPooling2D(2,2),
    tf.keras.layers.Conv2D(32, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2,2),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(512, activation='relu'),
    tf.keras.layers.Dense(26, activation='softmax')])


# Compile Model. 
model.compile(loss = 'sparse_categorical_crossentropy',
              optimizer = RMSprop(lr=0.01),
              metrics = ['accuracy'])

# Train the Model
train_generator = train_datagen.flow(training_images,training_labels,
                                                    batch_size = 10
                                                     
                                                  )  
validation_generator =  validation_datagen.flow( testing_images,
                                                
                                                testing_labels,
                                                batch_size  = 10  
                                                         )
history = model.fit_generator(train_generator,
                              epochs=5,
                              steps_per_epoch=len(training_images) / 32,
                              validation_data=validation_generator
                              
                             )

model.evaluate(testing_images, testing_labels,verbose=0)

La salida del modelo se muestra a continuación:

Epoch 1/5
858/857 [==============================] - 78s 91ms/step - loss: 15.4250 - accuracy: 0.0422 - val_loss: 15.5210 - val_accuracy: 0.0371
Epoch 2/5
858/857 [==============================] - 75s 88ms/step - loss: 15.4719 - accuracy: 0.0401 - val_loss: 15.5210 - val_accuracy: 0.0371
Epoch 3/5
858/857 [==============================] - 77s 89ms/step - loss: 15.4230 - accuracy: 0.0431 - val_loss: 15.5210 - val_accuracy: 0.0371
Epoch 4/5
858/857 [==============================] - 76s 89ms/step - loss: 15.4268 - accuracy: 0.0429 - val_loss: 15.5120 - val_accuracy: 0.0371
Epoch 5/5
858/857 [==============================] - 75s 88ms/step - loss: 15.4287 - accuracy: 0.0428 - val_loss: 15.5120 - val_accuracy: 0.0371

El tamaño del lote es bajo porque el cuaderno Jupyter de Coursera lo tiene limitado a 10.

Respuestas

rayryeng Aug 21 2020 at 04:54

Tu código es correcto. Sospecho que tiene algo que ver con el optimizador. Intente usar Adam en lugar de RMSProp e intente establecer la tasa de aprendizaje de Adam en 0,001, que es la tasa de aprendizaje predeterminada. Aparte de eso, su computadora portátil extrae correctamente las etiquetas y los datos, formula los generadores de datos y la red parece correcta.