---
license: mit
tags:
- medical
- cancer
- ct-scan
- risk-prediction
- healthcare
- pytorch
- vision
datasets:
- NLST
metrics:
- auc
- c-index
language:
- en
library_name: transformers
pipeline_tag: image-classification
---

# Sybil - Lung Cancer Risk Prediction

## 🎯 Model Description

Sybil is a validated deep learning model that predicts future lung cancer risk from a single low-dose chest CT (LDCT) scan. Published in the Journal of Clinical Oncology, this model can assess cancer risk over a 1-6 year timeframe.

### Key Features
- **Single Scan Analysis**: Requires only one LDCT scan
- **Multi-Year Prediction**: Provides risk scores for years 1-6
- **Validated Performance**: Tested across multiple institutions globally
- **Ensemble Approach**: Uses 5 models for robust predictions

## 🚀 Quick Start

### Installation

```bash
pip install huggingface-hub torch torchvision pydicom
```

### Basic Usage

```python
from huggingface_hub import snapshot_download
import sys
import os

# Download model
model_path = snapshot_download(repo_id="Lab-Rasool/sybil")
sys.path.append(model_path)

# Import model
from modeling_sybil_hf import SybilHFWrapper
from configuration_sybil import SybilConfig

# Initialize
config = SybilConfig()
model = SybilHFWrapper(config)

dicom_dir = "path/to/volume"
dicom_paths = [os.path.join(dicom_dir, f) for f in os.listdir(dicom_dir) if f.endswith('.dcm')]

print(f"Found {len(dicom_paths)} DICOM files for prediction.")

# Get predictions
output = model(dicom_paths=dicom_paths)
risk_scores = output.risk_scores.numpy()

# Display results
print("\nLung Cancer Risk Predictions:")
print(f"Risk scores shape: {risk_scores.shape}")

# Handle both single and batch predictions
if risk_scores.ndim == 2:
    # Batch predictions - take first sample
    risk_scores = risk_scores[0]

for i, score in enumerate(risk_scores):
    print(f"Year {i+1}: {float(score)}")

```

## 📊 Example with Demo Data

```python
import requests
import zipfile
from io import BytesIO
import os

# Download demo DICOM files
def get_demo_data():
    cache_dir = os.path.expanduser("~/.sybil_demo")
    demo_dir = os.path.join(cache_dir, "sybil_demo_data")

    if not os.path.exists(demo_dir):
        print("Downloading demo data...")
        url = "https://www.dropbox.com/scl/fi/covbvo6f547kak4em3cjd/sybil_example.zip?rlkey=7a13nhlc9uwga9x7pmtk1cf1c&dl=1"
        response = requests.get(url)

        os.makedirs(cache_dir, exist_ok=True)
        with zipfile.ZipFile(BytesIO(response.content)) as zf:
            zf.extractall(cache_dir)

    # Find DICOM files
    dicom_files = []
    for root, dirs, files in os.walk(cache_dir):
        for file in files:
            if file.endswith('.dcm'):
                dicom_files.append(os.path.join(root, file))

    return sorted(dicom_files)

# Run demo
from huggingface_hub import snapshot_download
import sys

# Load model
model_path = snapshot_download(repo_id="Lab-Rasool/sybil")
sys.path.append(model_path)

from modeling_sybil_wrapper import SybilHFWrapper
from configuration_sybil import SybilConfig

# Initialize and predict
config = SybilConfig()
model = SybilHFWrapper(config)

dicom_files = get_demo_data()
output = model(dicom_paths=dicom_files)

# Show results
for i, score in enumerate(output.risk_scores.numpy()):
    print(f"Year {i+1}: {float(score)}")
```

## 📈 Performance Metrics

| Dataset | 1-Year AUC | 6-Year AUC | Sample Size |
|---------|------------|------------|-------------|
| NLST Test | 0.94 | 0.86 | ~15,000 |
| MGH | 0.86 | 0.75 | ~12,000 |
| CGMH Taiwan | 0.94 | 0.80 | ~8,000 |

## 🏥 Intended Use

### Primary Use Cases
- Risk stratification in lung cancer screening programs
- Research on lung cancer prediction models
- Clinical decision support (with appropriate oversight)

### Users
- Healthcare providers
- Medical researchers
- Screening program coordinators

### Out of Scope
- ❌ Diagnosis of existing cancer
- ❌ Use with non-LDCT imaging (X-rays, MRI)
- ❌ Sole basis for clinical decisions
- ❌ Use outside medical supervision

## 📋 Input Requirements

- **Format**: DICOM files from chest CT scan
- **Type**: Low-dose CT (LDCT)
- **Orientation**: Axial view
- **Order**: Anatomically ordered (abdomen → clavicles)
- **Number of slices**: Typically 100-300 slices
- **Resolution**: Automatically handled by model

## ⚠️ Important Considerations

### Medical AI Notice
This model should **supplement, not replace**, clinical judgment. Always consider:
- Complete patient medical history
- Additional risk factors (smoking, family history)
- Current clinical guidelines
- Need for professional medical oversight

### Limitations
- Optimized for screening population (ages 55-80)
- Best performance with LDCT scans
- Not validated for pediatric use
- Performance may vary with different scanner manufacturers

## 📚 Citation

If you use this model, please cite the original paper:

```bibtex
@article{mikhael2023sybil,
  title={Sybil: a validated deep learning model to predict future lung cancer risk from a single low-dose chest computed tomography},
  author={Mikhael, Peter G and Wohlwend, Jeremy and Yala, Adam and others},
  journal={Journal of Clinical Oncology},
  volume={41},
  number={12},
  pages={2191--2200},
  year={2023},
  publisher={American Society of Clinical Oncology}
}
```

## 🙏 Acknowledgments

This Hugging Face implementation is based on the original work by:
- **Original Authors**: Peter G. Mikhael & Jeremy Wohlwend
- **Institutions**: MIT CSAIL & Massachusetts General Hospital
- **Original Repository**: [GitHub](https://github.com/reginabarzilaygroup/Sybil)
- **Paper**: [Journal of Clinical Oncology](https://doi.org/10.1200/JCO.22.01345)

## 📄 License

MIT License - See [LICENSE](LICENSE) file

- Original Model © 2022 Peter Mikhael & Jeremy Wohlwend
- HF Adaptation © 2025 Aakash Tripathi

## 🔧 Troubleshooting

### Common Issues

1. **Import Error**: Make sure to append model path to sys.path
   ```python
   sys.path.append(model_path)
   ```

2. **Missing Dependencies**: Install all requirements
   ```bash
   pip install torch torchvision pydicom sybil huggingface-hub
   ```

3. **DICOM Loading Error**: Ensure DICOM files are valid CT scans
   ```python
   import pydicom
   dcm = pydicom.dcmread("your_file.dcm")  # Test single file
   ```

4. **Memory Issues**: Model requires ~4GB GPU memory
   ```python
   import torch
   device = 'cuda' if torch.cuda.is_available() else 'cpu'
   ```

## 📬 Support

- **HF Model Issues**: Open issue on this repository
- **Original Model**: [GitHub Issues](https://github.com/reginabarzilaygroup/Sybil/issues)
- **Medical Questions**: Consult healthcare professionals

## 🔍 Additional Resources

- [Original GitHub Repository](https://github.com/reginabarzilaygroup/Sybil)
- [Paper (Open Access)](https://doi.org/10.1200/JCO.22.01345)
- [NLST Dataset Information](https://cdas.cancer.gov/nlst/)
- [Demo Data](https://github.com/reginabarzilaygroup/Sybil/releases)

---

**Note**: This is a research model. Always consult qualified healthcare professionals for medical decisions.