11/26/2024
By Xiaolong Liang

The Kennedy College of Science, Richard A. Miner School of Computer & Information Sciences, invites you to attend a doctoral dissertation proposal defense by Xiaolong Liang titled, "Advanced Deep Learning Approaches for Real-time Scene Classification and Polyp Detection in Endoscopy Videos."

Time: Monday, Dec 2, 1 to 2 p.m.

Location: This will be a virtual defense via Zoom: https://uml.zoom.us/j/6712548784

Committee Members:
Yu Cao, Ph.D. (Advisor), Professor, Director, Miner School of Computer & Information Sciences, UMass Center for Digital Health (CDH)
Benyuan Liu, Ph.D. (Advisor), Professor, Director, Miner School of Computer & Information Sciences, UMass Center for Digital Health (CDH), Computer Networking Lab, CHORDS
Hengyong Yu, Ph.D., Professor, FIEEE, FAAPM, FAIMBE, FAAIA, FAIIA, Department of Electrical and Computer Engineering
Honggang Zhang, Ph.D., Professor, Department of Engineering, UMass Boston

Abstract:
Deep learning and computer vision have become pivotal technologies in advancing medical applications, particularly in the analysis of endoscopy videos for the early detection and diagnosis of gastrointestinal diseases. Despite significant progress, challenges persist in achieving accurate, real-time scene classification and efficient polyp detection within the diverse and complex visual environments of endoscopic procedures. This thesis aims to address these challenges by developing deep learning frameworks that rely on Convolutional Neural Networks (CNNs) or integrated with Cross-Channel Self-Attention mechanisms to enhance the performance of scene classification and polyp detection in endoscopy videos.

Endoscopy serves as a vital diagnostic tool in medical imaging, particularly in the examination of the esophagus, stomach and intestines. This framework introduces a two-stage system for the automated classification of scene categories (Colonoscopy, Gastroscopy, Extracorporal, Blur) within endoscopy videos. The initial stage employs the Clear-Blur model to determine frame blurriness. If non-blurred, the subsequent stage utilizes the Three-Scene model for frame classification. The class results are then verified by the label of the video. This integrated system achieves 97% average classification accuracy evaluated on 197 clinical endoscopy video clips. Additionally, the system incorporates a temporal label accumulation algorithm, demonstrating over 90% average classification accuracy after 50±15 seconds of endoscopy entry into the gastrointestinal tracts. The proposed approach begins with CNN-based automated scene classification to identify different anatomical and operational stages of endoscopic procedures. By leveraging the powerful feature extraction capabilities of CNNs, this model ensures robust classification of diverse endoscopic scenes.

Colorectal cancer (CRC) poses a significant global health challenge, ranking as a leading cause of cancer-related mortality. Colonoscopy, the most effective means of preventing CRC, is utilized for early detection and removal of precancerous growths. However, while there have been many efforts that utilize deep learning based approaches for automatic polyp detection, false positive rates in polyp detection during colonoscopy remain high due to the diverse characteristics of polyps and the presence of various artifacts. This research introduces an innovative technique aimed at improving polyp detection accuracy in colonoscopy video frames. The proposed method introduces a novel framework incorporating a cross-channel self-attention fusion unit, aimed at enhancing polyp detection accuracy in endoscopic procedures. The integration of this unit proves to play an important role in refining prediction quality, resulting in more precise detection outcomes in complex medical imaging scenarios. To substantiate the effectiveness of our framework, we create an extensive private dataset comprising complete endoscopy videos, captured from diverse equipment from different manufacturers. This dataset represents realistic and intricate application scenarios, offering an authentic and effective foundation for both training and evaluating our framework. Thorough experiments and ablation studies are conducted to assess the performance of our proposed approach. The results compared to state-of-the-art methods in polyp detection, demonstrate the advantages of combining CNNs with self-attention mechanisms in endoscopic video analysis. The results demonstrate that our framework, featuring key technical innovations, significantly reduces false detections and achieves a higher recall rate. This mechanism enhances the detection performance by capturing fine-grained details in endoscopic imagery, improving both sensitivity and specificity. This underscores the remarkable effectiveness of our framework in upgrading polyp detection accuracy in real-world endoscopy procedures.

The outcomes of this research have the potential to significantly improve computer-aided diagnosis (CAD) systems used in endoscopy, enhancing early detection and treatment of gastrointestinal diseases. Additionally, the research for polyp detection will explore the computational efficiency and publicly available models, such as the YOLOv8 series, to ensure its feasibility for wider clinical applications. Ultimately, this research through the integration of cutting-edge techniques, like the latest large vision models, explores the potential for earlier detection and treatment of gastrointestinal diseases, trying to improve the accuracy of patient disease diagnosis and reduce healthcare costs.