File size: 1,279 Bytes
f56ede2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# Download Datasets

This script downloads datasets from Hugging Face using configuration details specified in a YAML file.

## Functionality
- **Load Configuration**: Reads dataset details from a YAML configuration file.
- **Download Dataset**: Downloads datasets from Hugging Face if the platform is specified as 'HuggingFace' in the configuration.
- **Command-Line Argument**: Accepts a path to the configuration file via the `--config_path` argument (defaults to `configs/datasets_info.yaml`).
- **Dataset Information**: Extracts dataset name and local storage directory from the configuration, splits the dataset name into user and model hub components, and saves the dataset to the specified directory.
- **Verification**: Prints dataset details, including user name, model hub name, storage location, and dataset information for confirmation.
- **Platform Check**: Only processes datasets from Hugging Face; unsupported platforms are flagged with a message.

## Usage
Run the script with the command:  
`python script_name.py --config_path path/to/config.yaml`

The configuration file should contain:
- `dataset_name`: Format as `user_name/model_hub_name`.
- `local_dir`: Directory to save the dataset.
- `platform`: Must be set to `HuggingFace` for the script to process.