BaseerAI commited on
Commit
5c8f57c
ยท
verified ยท
1 Parent(s): 5bf1ac4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +77 -86
README.md CHANGED
@@ -1,13 +1,15 @@
1
  ---
2
  license: mit
3
- language:
4
- - en
5
  library_name: pytorch
6
  tags:
7
  - computer-vision
8
  - autonomous-driving
9
- - self-driving
10
- - interfuser
 
 
 
11
  - carla
12
  - object-detection
13
  - trajectory-prediction
@@ -16,121 +18,114 @@ datasets:
16
  pipeline_tag: object-detection
17
  ---
18
 
19
- # ๐Ÿš— InterFuser-Baseer-v1: Autonomous Driving Model
20
 
21
  [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
22
  [![PyTorch](https://img.shields.io/badge/PyTorch-EE4C2C?style=flat&logo=pytorch&logoColor=white)](https://pytorch.org/)
23
  [![CARLA](https://img.shields.io/badge/CARLA-Simulator-blue)](https://carla.org/)
24
- [![Demo](https://img.shields.io/badge/๐Ÿš€-Live%20Demo-brightgreen)](https://huggingface.co/spaces/BaseerAI/Baseer_Server)
25
 
26
- > **๐ŸŽฎ [Try the Live Demo](https://huggingface.co/spaces/BaseerAI/Baseer_Server)** - Experience the model in action with real-time autonomous driving simulation!
27
 
28
- ## ๐Ÿ“– Overview
29
 
30
- InterFuser-Baseer-v1 is a state-of-the-art transformer-based model for autonomous driving, specifically fine-tuned for the **[Baseer Self-Driving API](https://huggingface.co/spaces/BaseerAI/Baseer_Server)**. This model combines computer vision and deep learning to provide real-time traffic object detection and trajectory planning in simulated driving environments.
31
 
32
- ### ๐ŸŽฏ Key Capabilities
33
 
34
- - **Multi-Task Learning**: Simultaneous traffic object detection and waypoint prediction
35
- - **Transformer Architecture**: Advanced attention mechanisms for scene understanding
36
- - **Real-Time Processing**: Optimized for real-time inference in driving scenarios
37
- - **CARLA Integration**: Specifically tuned for CARLA simulation environment
38
 
39
- ## ๐Ÿ—๏ธ Architecture
40
 
41
- ### Model Components
42
 
43
- | Component | Specification |
44
- |-----------|---------------|
45
- | **Image Backbone** | ResNet-50 (ImageNet pretrained) |
46
- | **LiDAR Backbone** | ResNet-18 (disabled in this version) |
47
- | **Transformer** | 6-layer encoder/decoder, 8 attention heads |
48
- | **Embedding Dimension** | 256 |
49
- | **Prediction Heads** | GRU-based waypoint predictor + Detection head |
50
 
51
- ### Output Format
 
 
52
 
53
- - **Traffic Detection**: 20ร—20ร—7 grid (confidence, position, dimensions, orientation)
54
- - **Waypoint Prediction**: 10 future trajectory points
55
- - **Scene Understanding**: Junction, traffic light, and stop sign detection
56
 
57
- ## ๐Ÿš€ Quick Start
 
 
58
 
59
- ### Installation
60
 
61
- ```bash
62
- pip install torch torchvision timm huggingface_hub
63
- ```
 
 
 
 
 
 
 
 
 
64
 
65
- ### Usage Example
66
 
67
  ```python
68
  import torch
69
  from huggingface_hub import hf_hub_download
 
 
70
 
71
- # Download model weights
72
  model_path = hf_hub_download(
73
  repo_id="BaseerAI/Interfuser-Baseer-v1",
74
- filename="best_model.pth"
75
  )
76
 
77
- # Load model (requires InterFuser class definition)
 
78
  device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
79
- model = torch.load(model_path, map_location=device)
 
 
 
 
80
  model.eval()
81
 
82
- # Inference
83
  with torch.no_grad():
84
- outputs = model(input_data)
 
 
85
  ```
86
 
87
- ## ๐Ÿ“Š Performance
88
-
89
- ### Training Details
90
-
91
- - **Dataset**: PDM-Lite-CARLA (Urban driving scenarios)
92
- - **Training Objective**: Multi-task learning with IoU optimization
93
- - **Framework**: PyTorch
94
-
95
- ### Key Metrics
96
 
97
- - Optimized for traffic detection accuracy
98
- - Enhanced bounding box IoU performance
99
- - Robust waypoint prediction in urban scenarios
100
 
101
- ## โš ๏ธ Limitations
 
 
102
 
103
- ### Current Constraints
104
 
105
- - **Simulation Only**: Trained exclusively on CARLA data
106
- - **Single Camera**: Front-facing camera view only
107
- - **No LiDAR**: Vision-based approach without LiDAR fusion
108
- - **Dataset Scope**: Limited to PDM-Lite-CARLA scenarios
109
 
110
- ### Recommended Use Cases
111
 
112
- - โœ… CARLA simulation environments
113
- - โœ… Research and development
114
- - โœ… Autonomous driving prototyping
115
- - โŒ Real-world deployment (requires additional training)
116
 
117
- ## ๐Ÿ› ๏ธ Integration
118
-
119
- This model is designed to work with:
120
-
121
- - **[Baseer Self-Driving API](https://huggingface.co/spaces/BaseerAI/Baseer_Server)** - Live demo and API
122
- - **CARLA Simulator**
123
- - **PyTorch Inference Pipeline**
124
- - **Custom Autonomous Driving Systems**
125
 
126
  ## ๐Ÿ“š Citation
127
 
128
- If you use this model in your research, please cite:
129
 
130
  ```bibtex
131
- @misc{interfuser-baseer-v1,
132
- title={InterFuser-Baseer-v1: Fine-tuned Autonomous Driving Model},
133
- author={BaseerAI},
134
  year={2024},
135
  publisher={Hugging Face},
136
  howpublished={\url{https://huggingface.co/BaseerAI/Interfuser-Baseer-v1}}
@@ -139,27 +134,23 @@ If you use this model in your research, please cite:
139
 
140
  ## ๐Ÿ‘จโ€๐Ÿ’ป Development
141
 
142
- **Developed by**: Adam Altawil
143
  **Project Type**: Graduation Project - AI & Autonomous Driving
144
- **Institution**: [Your Institution Name]
145
 
146
  ## ๐Ÿ“„ License
147
 
148
  This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
149
 
150
- ## ๐Ÿค Contributing
151
-
152
- Contributions, issues, and feature requests are welcome! Feel free to check the [issues page](../../issues).
153
-
154
- ## ๐Ÿ“ž Support
155
 
156
- For questions and support:
157
- - Try the live demo: **[Baseer Server Space](https://huggingface.co/spaces/BaseerAI/Baseer_Server)**
158
- - Create an issue in this repository
159
- - Contact: [Your Contact Information]
160
 
161
  ---
162
 
163
  <div align="center">
164
- <strong>๐Ÿš— Drive the Future with AI ๐Ÿš—</strong>
165
  </div>
 
1
  ---
2
  license: mit
3
+ language: en
 
4
  library_name: pytorch
5
  tags:
6
  - computer-vision
7
  - autonomous-driving
8
+ - self-driving-car
9
+ - end-to-end
10
+ - transformer
11
+ - attention
12
+ - positional-encoding
13
  - carla
14
  - object-detection
15
  - trajectory-prediction
 
18
  pipeline_tag: object-detection
19
  ---
20
 
21
+ # HDPE: A Foundational Perception Model with Hyper-Dimensional Positional Encoding
22
 
23
  [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
24
  [![PyTorch](https://img.shields.io/badge/PyTorch-EE4C2C?style=flat&logo=pytorch&logoColor=white)](https://pytorch.org/)
25
  [![CARLA](https://img.shields.io/badge/CARLA-Simulator-blue)](https://carla.org/)
26
+ [![Demo](https://img.shields.io/badge/๐Ÿš€-Live%20Demo-brightgreen)](https://huggingface.co/spaces/Adam-IT/Baseer_Server)
27
 
28
+ **๐Ÿ“– Research Paper (Coming Soon)** | **๐Ÿš€ [Live Demo API (Powered by this Model)](https://huggingface.co/spaces/BaseerAI/Baseer_Server)**
29
 
30
+ ---
31
 
32
+ ## ๐Ÿ“– Overview: A New Foundation for Perception in Autonomous Driving
33
 
34
+ This repository contains the pre-trained weights for a novel autonomous driving perception model, the core of our **Interfuser-HDPE** system. This is **not a standard Interfuser model**; it incorporates fundamental innovations in its architecture and learning framework to achieve a more robust, accurate, and geometrically-aware understanding of driving scenes from camera-only inputs.
35
 
36
+ The innovations baked into these weights make this model a powerful foundation for building complete self-driving systems. It is designed to output rich perception data (object detection grids and waypoints) that can be consumed by downstream modules like trackers and controllers.
 
 
 
37
 
38
+ ---
39
 
40
+ ## ๐Ÿ’ก Key Innovations in This Model
41
 
42
+ The weights in this repository are the result of training a model with the following scientific contributions:
 
 
 
 
 
 
43
 
44
+ ### 1. Hyper-Dimensional Positional Encoding (HDPE) - (Core Contribution)
45
+ * **What it is:** We replace the standard Sinusoidal Positional Encoding with **HDPE**, a novel, first-principles approach inspired by the geometric properties of n-dimensional spaces.
46
+ * **Why it matters:** HDPE generates an interpretable spatial prior that biases the model's attention towards the center of the image (the road ahead). This leads to more stable and contextually-aware feature extraction, and has shown to improve performance significantly, especially in multi-camera fusion scenarios.
47
 
48
+ ### 2. Advanced Multi-Task Loss Framework
49
+ * **What it is:** This model was trained using a specialized combination of **Focal Loss** and **Enhanced-IoU (EIoU) Loss**.
50
+ * **Why it matters:** This framework is purpose-built to tackle the primary challenges in perception: **Focal Loss** addresses the severe class imbalance in object detection, while **EIoU Loss** ensures highly accurate bounding box regression by optimizing for geometric overlap.
51
 
52
+ ### 3. High-Resolution, Camera-Only Architecture
53
+ * **What it is:** This model is vision-based (**camera-only**) and uses a **ResNet-50** backbone with a smaller patch size (`patch_size=8`) for high-resolution analysis.
54
+ * **Why it matters:** It demonstrates that strong perception performance can be achieved without costly sensors like LiDAR, aligning with modern, cost-effective approaches to autonomous driving.
55
 
56
+ ---
57
 
58
+ ## ๐Ÿ—๏ธ Model Architecture vs. Baseline
59
+
60
+ | Component | Original Interfuser (Baseline) | **Interfuser-HDPE (This Model)** |
61
+ |:--------------------------|:-------------------------------|:----------------------------------|
62
+ | **Positional Encoding** | Sinusoidal PE | โœ… **Hyper-Dimensional PE (HDPE)** |
63
+ | **Perception Backbone** | ResNet-26, LiDAR | โœ… **Camera-Only, ResNet-50** |
64
+ | **Training Objective** | Standard BCE + L1 Loss | โœ… **Focal Loss + EIoU Loss** |
65
+ | **Model Outputs** | Waypoints, Traffic Grid, States| Same (Optimized for higher accuracy) |
66
+
67
+ ---
68
+
69
+ ## ๐Ÿš€ How to Use These Weights
70
 
71
+ These weights are intended to be loaded into a model class that incorporates our architectural changes, primarily the `HyperDimensionalPositionalEncoding` module.
72
 
73
  ```python
74
  import torch
75
  from huggingface_hub import hf_hub_download
76
+ # You need to provide the model class definition, let's call it InterfuserHDPE
77
+ from your_model_definition_file import InterfuserHDPE
78
 
79
+ # Download the pre-trained model weights
80
  model_path = hf_hub_download(
81
  repo_id="BaseerAI/Interfuser-Baseer-v1",
82
+ filename="interfuser_hdpe_v1.pth"
83
  )
84
 
85
+ # Instantiate your model architecture
86
+ # The config must match the architecture these weights were trained on
87
  device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
88
+ model = InterfuserHDPE(**model_config).to(device)
89
+
90
+ # Load the state dictionary
91
+ state_dict = torch.load(model_path, map_location=device)
92
+ model.load_state_dict(state_dict)
93
  model.eval()
94
 
95
+ # Now the model is ready for inference
96
  with torch.no_grad():
97
+ # The model expects a dictionary of sensor data
98
+ # (e.g., {'rgb': camera_tensor, ...})
99
+ perception_outputs = model(input_data)
100
  ```
101
 
102
+ ## ๐Ÿ“Š Performance Highlights
 
 
 
 
 
 
 
 
103
 
104
+ When integrated into a full driving stack (like our **[Baseer Self-Driving API](https://huggingface.co/spaces/BaseerAI/Baseer_Server)**), this perception model is the foundation for:
 
 
105
 
106
+ - **Significantly Improved Detection Accuracy**: Achieves higher mAP on the PDM-Lite-CARLA dataset.
107
+ - **Superior Driving Score**: Leads to a higher overall Driving Score with fewer infractions compared to baseline models.
108
+ - **Proven Scalability**: Performance demonstrably improves when scaling from single-camera to multi-camera inputs, showcasing the robustness of the HDPE-based architecture.
109
 
110
+ *(Detailed metrics and ablation studies will be available in our upcoming research paper.)*
111
 
112
+ ## ๐Ÿ› ๏ธ Integration with a Full System
 
 
 
113
 
114
+ This model provides the core perception outputs. To build a complete autonomous agent, you need to combine it with:
115
 
116
+ - **A Temporal Tracker**: To maintain object identity across frames.
117
+ - **A Decision-Making Controller**: To translate perception outputs into vehicle commands.
 
 
118
 
119
+ An example of such a complete system, including our custom-built **Hierarchical, Memory-Enhanced Controller**, can be found in our **[Live Demo API Space](https://huggingface.co/spaces/BaseerAI/Baseer_Server)**.
 
 
 
 
 
 
 
120
 
121
  ## ๐Ÿ“š Citation
122
 
123
+ If you use the HDPE concept or this model in your research, please cite our upcoming paper. For now, you can cite this model repository:
124
 
125
  ```bibtex
126
+ @misc{interfuser-hdpe-2024,
127
+ title={HDPE: Hyper-Dimensional Positional Encoding for End-to-End Self-Driving Systems},
128
+ author={Altawil, Adam},
129
  year={2024},
130
  publisher={Hugging Face},
131
  howpublished={\url{https://huggingface.co/BaseerAI/Interfuser-Baseer-v1}}
 
134
 
135
  ## ๐Ÿ‘จโ€๐Ÿ’ป Development
136
 
137
+ **Lead Researcher**: Adam Altawil
138
  **Project Type**: Graduation Project - AI & Autonomous Driving
139
+ **Contact**: [Your Contact Information]
140
 
141
  ## ๐Ÿ“„ License
142
 
143
  This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
144
 
145
+ ## ๐Ÿค Contributing & Support
 
 
 
 
146
 
147
+ For questions, contributions, and support:
148
+ - **๐Ÿš€ Try the Live Demo**: **[Baseer Server Space](https://huggingface.co/spaces/BaseerAI/Baseer_Server)**
149
+ - **๐Ÿ“ง Contact**: [Your Contact Information]
150
+ - **๐Ÿ› Issues**: Create an issue in this repository
151
 
152
  ---
153
 
154
  <div align="center">
155
+ <strong>๐Ÿš— Driving the Future with Hyper-Dimensional Intelligence ๐Ÿš—</strong>
156
  </div>