niquery.analysis.filtering module¶
- niquery.analysis.filtering.filter_modality_datasets(df: pandas.DataFrame, modality: str | list) pandas.Series[source]¶
Filter non-relevant modality data records.
Filters datasets whose ‘modalities’ field does not contain one of items in
modality.
- niquery.analysis.filtering.filter_modality_records(fname: str, sep: str, suffix: str | list) pandas.DataFrame[source]¶
Keep records where the filename matches the provided modality naming convention.
Following the BIDS modality suffix convention, keeps records where the ‘filename’ attribute ends with the given suffix, i.e. ‘_{suffix}.nii.gz’.
- niquery.analysis.filtering.filter_nonrelevant_datasets(df: pandas.DataFrame, species: str | list, modality: str | list) pandas.DataFrame[source]¶
Filter non-relevant data records.
Return datasets that belong to the provided species and modality..
- niquery.analysis.filtering.filter_on_run_contribution(df: pandas.DataFrame, contrib_thr: int, seed: int) pandas.DataFrame[source]¶
Filter BOLD runs of datasets to keep their total contribution under a threshold.
Randomly picks BOLD runs of a dataset if the total number of runs exceeds the given threshold.
- niquery.analysis.filtering.filter_on_timepoint_count(df: pandas.DataFrame, min_timepoints: int, max_timepoints: int) pandas.DataFrame[source]¶
Filter BOLD runs of datasets that are below or above a given number of timepoints.
Filters BOLD runs whose timepoint count is not within the range
[min_timepoints, max_timepoints].
- niquery.analysis.filtering.filter_runs(df: pandas.DataFrame, contrib_thr: int, min_timepoints: int, max_timepoints: int, seed: int) pandas.DataFrame[source]¶
Filter BOLD runs based on run count and timepoint criteria.
Filters the BOLD runs to include only those that fulfil:
Criterion 1: the number of runs for a given dataset is below the threshold contrib_thr.
Criterion 2: the number of timepoints per BOLD run is between [min_timepoints, max_timepoints].
- Parameters:
- Returns:
Filtered BOLD runs.
- Return type:
- niquery.analysis.filtering.filter_species_datasets(df: pandas.DataFrame, species: str | list) pandas.Series[source]¶
Filter non-relevant species data records.
Filters datasets whose ‘species’ field does not contain one of items in
species.
- niquery.analysis.filtering.identify_modality_files(datasets: dict, sep: str, suffix: str | list, max_workers: int = 8) dict[source]¶
Identify dataset files having a particular suffix.
For each dataset, and following the BIDS modality suffix convention, keeps records where the ‘filename’ attribute ends with ‘_{suffix}.nii.gz’.
- Parameters:
- Returns:
results – Dictionary of dataset modality-specific file records.
- Return type:
See also
- niquery.analysis.filtering.identify_relevant_runs(df: pandas.DataFrame, contrib_thr: int, min_timepoints: int, max_timepoints: int, seed: int) pandas.DataFrame[source]¶
Identify relevant BOLD runs in terms of run and timepoint count constraints.
Identifies the BOLD runs that fulfill the following criteria:
Criterion 1: the number of runs for a given dataset is below the threshold contrib_thr.
Criterion 2: the number of timepoints per BOLD run is between [min_timepoints, max_timepoints].
Runs are shuffled before the filtering process.
- Parameters:
df (
DataFrame) – BOLD run information.contrib_thr (
int) – Contribution threshold in terms of the number of runs a dataset can contribute with over the total number of runs.min_timepoints (
int) – Minimum number of time points.max_timepoints (
int`) – Maximum number of time points.seed (
int) – Random seed value.
- Returns:
Identified relevant BOLD runs.
- Return type:
See also