--- license: mit tags: - medical - cancer - ct-scan - risk-prediction - healthcare - pytorch - vision datasets: - NLST metrics: - auc - c-index language: - en library_name: transformers pipeline_tag: image-classification --- # Sybil - Lung Cancer Risk Prediction ## 🎯 Model Description Sybil is a validated deep learning model that predicts future lung cancer risk from a single low-dose chest CT (LDCT) scan. Published in the Journal of Clinical Oncology, this model can assess cancer risk over a 1-6 year timeframe. ### Key Features - **Single Scan Analysis**: Requires only one LDCT scan - **Multi-Year Prediction**: Provides risk scores for years 1-6 - **Validated Performance**: Tested across multiple institutions globally - **Ensemble Approach**: Uses 5 models for robust predictions ## 🚀 Quick Start ### Installation ```bash pip install huggingface-hub torch torchvision pydicom ``` ### Basic Usage ```python from huggingface_hub import snapshot_download import sys import os # Download model model_path = snapshot_download(repo_id="Lab-Rasool/sybil") sys.path.append(model_path) # Import model from modeling_sybil_hf import SybilHFWrapper from configuration_sybil import SybilConfig # Initialize config = SybilConfig() model = SybilHFWrapper(config) dicom_dir = "path/to/volume" dicom_paths = [os.path.join(dicom_dir, f) for f in os.listdir(dicom_dir) if f.endswith('.dcm')] print(f"Found {len(dicom_paths)} DICOM files for prediction.") # Get predictions output = model(dicom_paths=dicom_paths) risk_scores = output.risk_scores.numpy() # Display results print("\nLung Cancer Risk Predictions:") print(f"Risk scores shape: {risk_scores.shape}") # Handle both single and batch predictions if risk_scores.ndim == 2: # Batch predictions - take first sample risk_scores = risk_scores[0] for i, score in enumerate(risk_scores): print(f"Year {i+1}: {float(score)}") ``` ## 📊 Example with Demo Data ```python import requests import zipfile from io import BytesIO import os # Download demo DICOM files def get_demo_data(): cache_dir = os.path.expanduser("~/.sybil_demo") demo_dir = os.path.join(cache_dir, "sybil_demo_data") if not os.path.exists(demo_dir): print("Downloading demo data...") url = "https://www.dropbox.com/scl/fi/covbvo6f547kak4em3cjd/sybil_example.zip?rlkey=7a13nhlc9uwga9x7pmtk1cf1c&dl=1" response = requests.get(url) os.makedirs(cache_dir, exist_ok=True) with zipfile.ZipFile(BytesIO(response.content)) as zf: zf.extractall(cache_dir) # Find DICOM files dicom_files = [] for root, dirs, files in os.walk(cache_dir): for file in files: if file.endswith('.dcm'): dicom_files.append(os.path.join(root, file)) return sorted(dicom_files) # Run demo from huggingface_hub import snapshot_download import sys # Load model model_path = snapshot_download(repo_id="Lab-Rasool/sybil") sys.path.append(model_path) from modeling_sybil_wrapper import SybilHFWrapper from configuration_sybil import SybilConfig # Initialize and predict config = SybilConfig() model = SybilHFWrapper(config) dicom_files = get_demo_data() output = model(dicom_paths=dicom_files) # Show results for i, score in enumerate(output.risk_scores.numpy()): print(f"Year {i+1}: {float(score)}") ``` ## 📈 Performance Metrics | Dataset | 1-Year AUC | 6-Year AUC | Sample Size | |---------|------------|------------|-------------| | NLST Test | 0.94 | 0.86 | ~15,000 | | MGH | 0.86 | 0.75 | ~12,000 | | CGMH Taiwan | 0.94 | 0.80 | ~8,000 | ## 🏥 Intended Use ### Primary Use Cases - Risk stratification in lung cancer screening programs - Research on lung cancer prediction models - Clinical decision support (with appropriate oversight) ### Users - Healthcare providers - Medical researchers - Screening program coordinators ### Out of Scope - ❌ Diagnosis of existing cancer - ❌ Use with non-LDCT imaging (X-rays, MRI) - ❌ Sole basis for clinical decisions - ❌ Use outside medical supervision ## 📋 Input Requirements - **Format**: DICOM files from chest CT scan - **Type**: Low-dose CT (LDCT) - **Orientation**: Axial view - **Order**: Anatomically ordered (abdomen → clavicles) - **Number of slices**: Typically 100-300 slices - **Resolution**: Automatically handled by model ## ⚠️ Important Considerations ### Medical AI Notice This model should **supplement, not replace**, clinical judgment. Always consider: - Complete patient medical history - Additional risk factors (smoking, family history) - Current clinical guidelines - Need for professional medical oversight ### Limitations - Optimized for screening population (ages 55-80) - Best performance with LDCT scans - Not validated for pediatric use - Performance may vary with different scanner manufacturers ## 📚 Citation If you use this model, please cite the original paper: ```bibtex @article{mikhael2023sybil, title={Sybil: a validated deep learning model to predict future lung cancer risk from a single low-dose chest computed tomography}, author={Mikhael, Peter G and Wohlwend, Jeremy and Yala, Adam and others}, journal={Journal of Clinical Oncology}, volume={41}, number={12}, pages={2191--2200}, year={2023}, publisher={American Society of Clinical Oncology} } ``` ## 🙏 Acknowledgments This Hugging Face implementation is based on the original work by: - **Original Authors**: Peter G. Mikhael & Jeremy Wohlwend - **Institutions**: MIT CSAIL & Massachusetts General Hospital - **Original Repository**: [GitHub](https://github.com/reginabarzilaygroup/Sybil) - **Paper**: [Journal of Clinical Oncology](https://doi.org/10.1200/JCO.22.01345) ## 📄 License MIT License - See [LICENSE](LICENSE) file - Original Model © 2022 Peter Mikhael & Jeremy Wohlwend - HF Adaptation © 2025 Aakash Tripathi ## 🔧 Troubleshooting ### Common Issues 1. **Import Error**: Make sure to append model path to sys.path ```python sys.path.append(model_path) ``` 2. **Missing Dependencies**: Install all requirements ```bash pip install torch torchvision pydicom sybil huggingface-hub ``` 3. **DICOM Loading Error**: Ensure DICOM files are valid CT scans ```python import pydicom dcm = pydicom.dcmread("your_file.dcm") # Test single file ``` 4. **Memory Issues**: Model requires ~4GB GPU memory ```python import torch device = 'cuda' if torch.cuda.is_available() else 'cpu' ``` ## 📬 Support - **HF Model Issues**: Open issue on this repository - **Original Model**: [GitHub Issues](https://github.com/reginabarzilaygroup/Sybil/issues) - **Medical Questions**: Consult healthcare professionals ## 🔍 Additional Resources - [Original GitHub Repository](https://github.com/reginabarzilaygroup/Sybil) - [Paper (Open Access)](https://doi.org/10.1200/JCO.22.01345) - [NLST Dataset Information](https://cdas.cancer.gov/nlst/) - [Demo Data](https://github.com/reginabarzilaygroup/Sybil/releases) --- **Note**: This is a research model. Always consult qualified healthcare professionals for medical decisions.