Week 3
I had my weekly meeting with Dr. Maxwell today (June 29th). This week, I attended the DREAM meetup with guest speakers who already gained his Ph.D. and talked about his journey, which provided me with valuable insights into computer science Ph.D. programs. I had the opportunity to chat with current Ph.D. students in the field at UW and gain insights from their experiences as well. Dr. Maxwell also shared his passion for research and discussed potential career paths after completing a Ph.D. It was enlightening to hear different perspectives and understand that research entails both challenges and triumphs, along with a sense of joy, passion, and faith.
During this week, my main focus was on learning how to use PyTorch. I followed the PyTorch tutorial available on their official website, which guided me through a complete machine learning workflow implemented in PyTorch. I worked with the FashionMNIST dataset for my practice exercises.
I familiarized myself with libraries and algorithms used for data processing and building neural networks, such as the torch library and the backpropagation algorithm (using PyTorch’s built-in differentiation engine - torch.autograd). I learned about tensors, which are specialized data structures used to encode the inputs, outputs, and parameters of a model. Additionally, I explored data transformations, which are used to preprocess the data before training. Initially, I had a misconception that transformations were related to weight and bias adjustments, but Dr. Maxwell clarified that they are not. Weight and bias are learned during training to optimize the model, whereas transformations are applied to manipulate the data and make it suitable for training since the data may not always be in the required processed format. Once the data is loaded and modified, the model is built and trained. During training, a loss function and an optimizer are involved to optimize the training process. The optimizer helps determine the best weights, biases, or other hyperparameters that result in the desired output. After training, the model is saved and can be loaded for future use.
Furthermore, I dedicated time to reading papers on convolutional neural networks (CNN/ConvNet) in general and reviewing the papers shared by Dr. Maxwell. After reviewing the papers on different topics, I found that I am still most interested in object recognition. It is a topic that relates closely to my daily life, and I believe I can gain significant insights from researching it. To the best of my knowledge, Dr. Maxwell’s current research in this area involves further investigating the impact of using log RGB on the performance of a deep network in a detection task, considering the same network architecture trained on images in different formats. While reading the papers, I also acquired background knowledge on the three layers commonly found in CNNs: the convolutional layer, the pooling layer, and the fully connected layer.
In the process of learning, I familiarized myself with several terminologies:
Loss function: A function that measures the accuracy of a model and is primarily used to update the weights with the goal of minimizing the loss towards zero.
Training set / validation set (overfitting - divergent): The training set is the data used to train the model, while the validation set is used to assess the model’s performance. Overfitting occurs when a model performs exceptionally well on the training set but fails to generalize to the validation set. To prevent overfitting, adjustments to the training process may be necessary.
Cross-fold validation: A technique that maximizes the use of available data to avoid overfitting and underfitting. The dataset is divided into k folds or subsets, allowing the training and validation process to be repeated multiple times, using each fold as validation data while the rest serves as training data.
Overall, it has been a productive week of learning and familiarizing myself with important concepts and tools in machine learning and computer vision.