Week 6
I am currently working on enhancing the CatOrDog project by incorporating images in linear RGB and log RGB color spaces into the training process. However, I have encountered a significant issue that stems from the utilization of jpeg files in the original project, which have undergone compression and conversion to sRGB. Despite this challenge, obtaining raw data files online has proven difficult, and the sample size of the collected raw images is insufficient for meaningful learning. As a solution, I have chosen to perform fine-tuning on the existing jpeg data.
To address this task, I opted to leverage a pre-trained CNN model as my starting point. The initial step involved meticulously processing the data for my research. I transformed the jpeg data files into linear data using the png file format, followed by the conversion of this linear data into log data. Throughout this process, the previous research work and shared code provided by Dr. Maxwell proved invaluable, and I am steadily honing my skills in finding additional code through online searches. For the model training phase, I utilized all three types of data sets to conduct a comprehensive performance comparison.
However, one notable challenge emerged from this approach - the accuracy for all three data sets hovered slightly above 50%. This suggests that the model may not have been adequately trained, as with only two choices (cat or dog), one would expect a 50% chance of guessing correctly or incorrectly purely by chance. To address this concern, I consulted with Dr. Maxwell, who offered valuable insights during the debugging process, particularly related to data normalization and the conversion from linear to log data.
Looking ahead to the next week, I am planning some critical modifications to boost the model’s performance significantly. Primarily, I aim to enhance the neural network’s capacity to learn meaningful features by adding more convolutional layers. Additionally, I will increase the batch size and epoch size, recognizing that larger sample sizes often lead to higher accuracy. Notably, I observed a slight boost in accuracy with larger log data sample sizes.
Once these modifications are implemented, my plan is to retrain the model using the original jpeg files with the goal of achieving an 80% accuracy rate. Upon achieving this milestone, I will proceed to train the model with the remaining two types of data for a comprehensive performance comparison. With optimism and determination, I am hopeful that these efforts will yield promising results. Fingers crossed for success!