Week 9
It is nearly the end of my summer research, and I recently had a meeting with Dr. Maxwell on 8/10/2023.
Prior to my meeting with Dr. Maxwell, I was facing a significant challenge. Despite conducting A/B testing on various parameters and attempting to refine the model architecture, I have been unable to effectively mitigate overfitting and enhance overall performance.
To address this, I introduced an additional dropout layer at the conclusion of the third sequential convolutional layer. I manipulated the dropout rate within the range of 0.1 to 0.5. As anticipated, the introduction of the dropout layer led to a reduction in overfitting. However, a somewhat undesirable outcome was observed: both training and testing accuracies experienced a decline. Particularly noteworthy was the observation that when the dropout rate was set to 0.5, the model appeared to cease learning, resulting in a precipitous accuracy drop to 60%. This was in stark contrast to the 88% accuracy achieved when the dropout layer was absent and the dropout rate was set to 0.5. Despite this, I was hesitant to remove the dropout layer due to its effectiveness in addressing overfitting concerns.
Following a series of trials, I began to suspect that the model’s complexity in feature extraction may be contributing to the challenge. This suspicion stemmed from the presence of six convolutional layers and a solitary classifier, with relatively minimal complexity in the classifier itself. To alleviate this complexity imbalance, I made the decision to eliminate one convolutional layer and introduce an additional linear layer along with a ReLU activation function in the classifier. Remarkably, this seemingly counterintuitive adjustment resulted in a notable 6% increase in training accuracy, elevating it from 82% to 88%. Evidently, simplifying the model’s architecture by removing one convolutional layer had a positive impact on performance.
Further insight emerged as I realized the necessity of using the appropriate test data formats. Specifically, I recognized the importance of employing logarithmic test data (.exr files) for models trained on logarithmic training data, while linear data (png files) and sRGB jpg files were more apt for other training scenarios. This adjustment led to a marginal reduction in overfitting, though the results were still short of the desired outcome.
After an exhaustive exploration of strategies, I arrived at a maximum accuracy of 88%, which, while improved, fell short of my research objectives. I remained convinced that further adjustments could enhance performance and substantiate my hypothesis, prompting me to seek Dr. Maxwell’s guidance. His counsel included the suggestion to alter hyperparameters such as the learning rate, potentially experimenting with the AdamW optimizer and a sigma value of 0.1, given the distinctive training dynamics introduced by log images. Additionally, Dr. Maxwell proposed an approach involving the reading of data from linear TIFF images, followed by appropriate resizing or cropping prior to transforming the data into log space.
Most convolutional neural networks are accustomed to normalized or scaled images (x), often undergoing a transformation of the form:
x’ = (x - mean) / stdev
This operation centers the values around zero, with the majority falling within the range of 2 and -2. Another common scenario involves data normalized to the 0-1 range:
x’ = x / x_max
However, applying either of these transformations to log images disrupts the inherent relationships and yields unpredictable outcomes. Conversely, geometric resizing should be conducted in linear space, as resizing frequently entails pixel value interpolation – an operation best suited for linear data.
In the week ahead, I plan to implement the recommended modifications and explore new avenues for addressing the challenges at hand. Additionally, I will work on drafting the final paper for Dr. Maxwell’s review.