Introduction#

This section is meant to expand upon and illustrate the material presented by Jo Etzel in the “Dataset QC: Taking control of your data” talk at the ISMRM 2022 “Taking Control of Your Data using Reproducible QC Workflows” session. A recorded version of the session should be available afterwards.

Abstract#

We will begin this session by giving an introduction on the quality issues in MRI, and how these can be addressed by the use of pipelines and scripts for efficient preprocessing and analysis of the data. Sharing her experience of working on the Dual Mechanisms of Cognitive Control (DMCC) project (Etzel et al., Scientific Data 2022), Jo will then present certain QC/QA procedures, analysis approaches, and software practices that she has found most beneficial (using tools such as R and fMRIprep). Dynamic reports (e.g., knitr, markdown) should be used for analyses whenever possible, so that images, results, source code, and discussion are together. This is especially important to share with published papers, to enable readers to reproduce the results and see the parameters. The audience will learn to extend these best practices to their own dataset through the interactive tutorials that will follow which will use tools such as knitr and markdown. These tools facilitate the generation of dynamic reports for analyses which ensures that images, results, source code and discussions are in one place.

Aside on R style for those interested#

R programmers may have a few questions about my coding style in these examples, including that I usually put semicolons at the ends of statements and am not at all “tidy”.

The semicolons sometimes confuse people: R allows but does not require statements to end with a semicolon in most cases; a line return (or sometimes just a space) is usually sufficient. Part of why I prefer ending lines with semicolons is habit: I started with languagues in which it is required (and habitually use a lot of semicolons in my writing!). But I use them in R code because I think it can help with code readability, and reduces any uncertainty about where a statement ends: semicolons are always the end of a statement while line returns may or may not be.

I love the hexagonal tidyverse stickers, but not the “tidy” coding style. I use base R and as few packages as possible (even for graphics); I use many for loops but no pipes. There are a lot of opinions on the relative merits of tidy and base R for different applications (Norm Matloff’s is pretty comprehensive for teaching). To my mind, one big reason to stick with base R is stability: base R just works across platforms (running the same code on windows, apple, linux) and is robust to updates. I often write code that needs to be usable by many people on many different computers for years, and I don’t want to be concerned about software versions. (In the last fifteen years I’ve had an update break old code once.) Another reason I prefer base R, especially for newcomers, is its use of “standard” programming logic (if, for, etc.). Translating from or to matlab, java, etc. reduces to learning the difference in syntax, rather than different concepts (meaning things like mutate and pipes, which don’t always have an equivalent).

I see the goal of scientific programming as clarity and long-term stability, not efficiency. By “scientific programming” I mean things like in the demos and linked code from this notebook: scripts to carry out a particular conversion or analysis, not something like writing an operating system for new hardware or weather forecasting. If you find yourself thinking that code could be written in a “clever” way, consider how much time it would take to explain what that bit of code is doing to someone else, or debug it if there is trouble later. If the “clever” implementation would be harder to explain or debug, use the “boring”, even if it might take a few msec more to run.