Week 8
This week has been a busy and productive period in my research endeavors. A significant milestone involved transitioning from the conventional ImageFolder class to the specialized CatDogLogImageSubDataset class. This transition required a thoughtful implementation of the torchvision Dataset class, enabling the seamless loading of exr format images from a designated directory while maintaining compatibility with other PyTorch components. However, this transition was not without its challenges, as a substantial portion of the existing codebase was built around the assumption of PIL image loading. Introducing the CatDogLogImageSubDataset, which utilizes OpenCV for image loading due to PIL’s limitations with exr format images, led to a series of complex issues that required careful resolution.
One prominent issue arose from the disparity between the image shapes produced by OpenCV and the standard PyTorch format. While PyTorch follows the (channels, height, width) convention, OpenCV employs the (height, width, channels) arrangement. Addressing this disparity necessitated the use of the permute method to reshape the images to conform to PyTorch’s structure. Another challenge emerged from the inability to directly apply torch transforms to images loaded by OpenCV, unlike the straightforward process with PIL-loaded images. This discrepancy in float precision (Float32 vs Float64) prompted a creative approach involving multiple conversions, such as transforming data into tensors, then into PIL images to apply transformations like resizing and flips, and finally back into tensors for training and testing.
After a period of meticulous modification and debugging, I successfully managed to train the model using log data. However, the presence of overfitting became apparent, indicating that while the model excelled in learning the training dataset, its performance on the testing dataset was compromised. Looking ahead to the next week, my plan involves refining the model architecture. This includes introducing a dropout layer and experimenting with various dropout rates to effectively address the overfitting issue.
Additionally, I’d like to touch on the computational aspect. During this period, I conducted the model training and data processing on a CPU, which resulted in notable time delays, especially when training for 25 epochs. These extended waiting periods, coupled with the intricacies of debugging, posed notable challenges. Nevertheless, the persistent effort invested in these endeavors bore fruit, leading to meaningful progress. I am enthusiastic and eagerly anticipate the continued evolution of my research findings.