Introduction#

Synopsis#

In this session, once a brief introduction to the MRIQC (Esteban et al., Plos One 2017) software is done, we will use the image quality metrics that the tool generates to train and cross-validate a classifier, implemented with the widely used package scikit-learn.

Using interactive Jupyter Notebooks that will be available to the attendees with sufficient time prior to the meeting, we will explore the features that we feed into the classifier, underscoring the so-called “batch-effects” (as they are referred to in molecular biology) due to the provenance from different acquisition devices (“scanner-effects”, in the case of MRI). We will dive into some methodological contents, investigating how to best set up cross-validation when these “scanner-effects” are expected. For the training and evaluation, we will use openly available data already processed with MRIQC. We will also demonstrate a “real-world” application of interest to researchers, which they can run on their own data (otherwise, new data will be available to them). By holding a dataset separated from the cross-validation framework, we will demonstrate the process of “calibrating” the classifier to perform on their own samples.

Automating Quality Assessment and Quality Control: state of the art#

The automated quality control of magnetic resonance imaging (MRI) has long been an open issue. Woodard and Carley-Spencer [11] conducted one of the earliest evaluations on a large dataset of 1001 T1-weighted (T1w) MR images from 143 participants. They defined a set of 239 no-reference (i.e. no ground-truth of the same image without degradation exists) image-quality metrics (IQMs). The IQMs belonged to two families depending on whether they were derived from Natural Scene Statistics or quality indices defined by the JPEG consortium. The IQMs were calculated on image pairs with and without several synthetic distortions. In an analysis of variance, some IQMs from both families reliably discriminated among undistorted images, noisy images, and images distorted by intensity non-uniformity (INU). Mortamet et al. [12] proposed two quality indices focused on detecting artifacts in the air region surrounding the head, and analyzing the goodness-of-fit of a model for the background noise. One principle underlying their proposal is that most of the artifact signal propagates over the image and into the background. They applied these two IQMs on 749 T1w scans from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset. By defining cutoff thresholds for the two IQMs, they assigned the images high or low quality labels, and compared this classification to a manual assessment. They concluded that more specific research was required to determine these thresholds and generalize them to different datasets. However, many potential sources of uncontrolled variability exist between studies and sites, including MRI protocols (scanner manufacturer, MR sequence parameters, etc.), scanning settings, participant instructions, inclusion criteria, etc. For these reasons, the thresholds they proposed on their IQMs are unlikely to generalize beyond the ADNI dataset.

Later efforts to develop IQMs appropriate for MRI include the Quality Assessment Protocol (QAP), and the UK Biobank [13]. MRIQC extends the list of IQMs from the QAP, which was constructed from a careful review of the MRI and medical imaging literature [14]. Recently, Pizarro et al. [15] proposed the use of a support-vector machine classifier (SVC) trained on 1457 structural MRI images acquired in one site with constant scanning parameters. They proposed three volumetric features and three features targeting particular artifacts. The volumetric features were the normalized histogram, the tissue-wise histogram and the ratio of the modes of gray matter (GM) and white matter (WM). The artifacts addressed were the eye motion spillover in the anterior-to-posterior phase-encoding direction, the head-motion spillover over the nasio-cerebellum axis (which they call ringing artifact) and the so-called wrap-around (which they refer to as aliasing artifact). They reported a prediction accuracy around 80%, assessed using 10-fold cross-validation. These previous efforts succeeded in showing that automating quality ratings of T1w MRI scans is possible. However, they did not achieve generalization across multi-site datasets.

IQMs generated by MRIQC for T1-weighted MRI datasets#

https://journals.plos.org/plosone/article/figure/image?size=large&download=&id=10.1371/journal.pone.0184661.t002

Learning outcomes

  • Understand the concept of no-reference, image quality metric (IQM)

  • Understand the problem of “scanner-effects” as statistical “batch-effects”

  • Get familiar with the Python package mriqc-learn for the statistical modeling of IQMs generated by MRIQC

  • Successfully train a random forests classifier using mriqc-learn and scikit-learn