niquery.data.fetching module

niquery.data.fetching.fetch_datalad_remote_files(df, out_dirname, dataset_name) tuple[source]

Fetch files from remote DataLad datasets.

Downloads only the files listed in the provided DataFrame instance. The DataFrame is expected to contain at least the following columns:

  • remote: Remote server name (e.g., ‘openneuro’)

  • datasetid: Dataset identifier (e.g., ‘ds000231’)

  • fullpath: Path of the file within the dataset (e.g. ‘sub-01/func/sub-01_task-flavor_run-02_bold.nii.gz’)

If the DataLad dataset already exists in the provided path, it is not cloned again.

A new DataLad dataset is created at the destination path, and each dataset is made to be a subdataset.

Parameters:
  • df (DataFrame) – Table containing at least ‘remote’, ‘datasetid’, and ‘fullpath’ columns. Each row corresponds to a file to be fetched.

  • out_dirname (Path) – Output directory where the datasets will be cloned and files stored.

  • dataset_name (str) – Name of the dataset.

Returns:

fetched_files, failure_results – Dictionary of datasets and the filenames succeeded/failed for each.

Return type:

tuple