CoughGAN: Generating Synthetic Coughs that Improve Respiratory Disease Classification


IEEE Engineering in Medicine & Biology Society (EMBC)



Research Areas



Despite the prevalence of respiratory diseases, their diagnosis by clinicians is challenging. Accurately assessing airway sounds requires extensive clinical training and equipment that may not be easily available. Current methods that automate this diagnosis are hindered by their use of features that require pulmonary function tests. We leverage the audio characteristics of coughs to create classifiers that can distinguish common respiratory diseases in adults. Moreover, we build on recent advances in generative adversarial networks to augment our dataset with cleverly engineered synthetic cough samples for each class of major respiratory disease, to balance and increase our dataset size. We experimented on cough samples collected with a smartphone from 45 subjects in a clinic. Our CoughGAN-improved Support Vector Machine and Random Forest models show up to 76% test accuracy and 83% F1 score in classifying subjects' conditions between healthy and three major respiratory diseases. Adding our synthetic coughs improves the performance we can obtain from a relatively small unbalanced healthcare dataset by boosting the accuracy over 30%. Our data augmentation reduces overfitting and discourages the prediction of a single, dominant class. These results highlight the feasibility of automatic, cough-based respiratory disease diagnosis using smartphones or wearables in the wild.

View publication