Week 2
I had a Zoom meeting with Dr. Maxwell on 6/23/2023 for our first weekly check-in. During our conversation, we discussed the conference he attended in Vancouver, which explored the exciting possibilities of combining AI and robotics. I provided an update on my progress for the week.
Throughout this week, I completed and published an initial draft of the website URLs required for the DREAM program. In addition to fulfilling the website requirements, I made several design modifications such as incorporating various icons and adjusting margins. Moreover, I independently pursued web development by participating in an online bootcamp. As a result, I successfully designed my very first personal website and made modifications to my personal CV-site that I initially created in 2022.
In addition to the URLs, I also delved into background knowledge on computer vision, particularly focusing on the concept of object recognition through deep learning. Object recognition encompasses image classification, object localization, object detection, and object segmentation. I learned that IoU (Intersection over Union) is a performance measure for image classification models, reflecting the degree of overlap between predicted and ground truth bounding boxes. A higher IoU score indicates better results, although achieving a perfect value of 1 is highly unlikely. A desirable IoU would be above 0.75. Furthermore, I explored confidence scores, which represent the algorithm’s level of certainty regarding the accuracy of its predictions.
During this week, I also encountered the families of techniques known as Region-Based Convolutional Neural Networks (R-CNNs) and You Only Look Once (YOLO). Although initially confusing, I had the opportunity to discuss these techniques with Dr. Maxwell, gaining a clearer understanding. R-CNN has three versions. The first version does not employ deep networks but relies on classical computer vision algorithms to generate 2000 region proposals (candidate bounding boxes), which are then passed through a network for classification. The fast version of R-CNN also utilizes a selective search algorithm to provide region proposals, but only after a convolutional feature map is generated by feeding the image to the CNN once, distinguishing it from the first version which feeds all 2000 region proposals into to the CNN. On the other hand, the faster version of R-CNN employs a convolutional network for providing a convolutional feature map and then use a RPN (Region Proposal Network) intead of a selective search algorithm to predict the region proposal, enhancing its speed. YOLO also has various versions. This family of techniques excels in speed and real-time applications as it only looks at the image once but may struggle with small objects within the image as the algorithm divides the image into grids to classifies the grids and has spatial constraints.
To summarize, I found the web development aspect of the week to be manageable, although somewhat tedious and requiring meticulous coding. Conversely, grasping the foundational knowledge of computer vision proved to be quite challenging for me. Without prior background in this field, I made a conscious effort to research and understand unfamiliar terms, which consumed a significant amount of time. However, after consulting with Dr. Jacob Furst, my mentor in this program, I realized that some techniques and algorithms might necessitate further reading of related papers. In such cases, I can conduct broader Google searches to explore terminologies mentioned in the background sections of the papers.
Unfortunately, I did not have time to dedicate to PyTorch this week. However, I plan to resume my PyTorch learning for machine learning in the upcoming week. Additionally, I intend to read papers on convolutional neural networks (CNN/ConvNet) in general and review the papers shared by Dr. Maxwell, focusing on research topics such as object recognition, color constancy, adversarial images, super-resolution, and intrinsic imaging. By doing so, I hope to identify the topic that interests me the most.
I acknowledge that reading and conducting thorough searches are essential components of the research process. As a first-time learner of these topics, I found the experience both enjoyable and challenging. I remain motivated and committed to persevering through the learning journey.