Published:
Author: Salena Fitzgerald

A team of undergraduate students in the Department of Applied Mathematics and Statistics developed and used a machine learning algorithm to images from the Breast Cancer Histopathological Database and classified the image labels as malignant or benign with an average 97% accuracy rate.  

Junior Krutal Patel, sophomore Gracelyn Shi, and junior Anusha Rao worked closely with Fadil Santosa, professor and Yu Wu and Chaomei Chen Head of AMS, and Jeremias Sulam, an assistant professor in the Department of Biomedical Engineering on the project, which was presented at the Whiting School of Engineering’s annual Design Day. The event, held annually in May, showcases how students use the skills and knowledge they have acquired in classrooms and laboratories to solve real-world problems.      

The Breast Cancer Histopathological Database comprises more than 9,000 images of breast tumor tissue collected from more than 80 patients. It was developed in collaboration with P&D Laboratory in Brazil and is used by researchers around the globe.     

First, the students organized the massive database to make it easier to work with.  Then, they explored optimal ways to use machine learning algorithms to determine whether images were malignant or benign. After extensive testing, the team decided to use a neural network transferred learning-based approach. This involved putting the database into a pre-trained neural classifier, along with using a logistic regression model, a model that learns from data and is used to predict labels, to make their final classifications.   “I was excited to see how different resolutions and other data manipulation affected how well the machine learning algorithm was able “to learn,” the same way we learn differently based on our environment,” Shi comments.   

Once that model was tested, the team wanted to rule out any false negative results. Using Youden’s J statistic method for optimal threshold determination, a method that measures and assesses the accuracy of a model, the team was able to provide more concrete evidence, allowing the students to refine their model.   

They then divided the data into training and testing subsets and ran the model on the testing subset, comparing their predictions with the results. Their model proved to be 97% accurate.    

“It was interesting to see how this model, that was trained on everyday objects, was so successful in classifying breast cancer histopathology images,” Patel said.