Spaces:
Sleeping
Sleeping
Update my_model/tabs/model_arch.py
Browse files
my_model/tabs/model_arch.py
CHANGED
|
@@ -25,6 +25,7 @@ def run_model_arch() -> None:
|
|
| 25 |
st.markdown("#### Abstract")
|
| 26 |
st.markdown("""
|
| 27 |
<div style="text-align: justify;">
|
|
|
|
| 28 |
Navigating the frontier of the Visual Turing Test, this research delves into multimodal learning to bridge
|
| 29 |
the gap between visual perception and linguistic interpretation, a foundational challenge in artificial
|
| 30 |
intelligence. It scrutinizes the integration of visual cognition and external knowledge, emphasizing the
|
|
@@ -60,6 +61,7 @@ def run_model_arch() -> None:
|
|
| 60 |
st.markdown("#### Design")
|
| 61 |
st.markdown("""
|
| 62 |
<div style="text-align: justify;">
|
|
|
|
| 63 |
As illustrated in architecture, the model operates through a sequential pipeline, beginning with the Image to
|
| 64 |
Language Transformation Module. In this module, the image undergoes simultaneous processing via image captioning
|
| 65 |
and object detection frozen models, aiming to comprehensively capture the visual context and cues. These models,
|
|
|
|
| 25 |
st.markdown("#### Abstract")
|
| 26 |
st.markdown("""
|
| 27 |
<div style="text-align: justify;">
|
| 28 |
+
|
| 29 |
Navigating the frontier of the Visual Turing Test, this research delves into multimodal learning to bridge
|
| 30 |
the gap between visual perception and linguistic interpretation, a foundational challenge in artificial
|
| 31 |
intelligence. It scrutinizes the integration of visual cognition and external knowledge, emphasizing the
|
|
|
|
| 61 |
st.markdown("#### Design")
|
| 62 |
st.markdown("""
|
| 63 |
<div style="text-align: justify;">
|
| 64 |
+
|
| 65 |
As illustrated in architecture, the model operates through a sequential pipeline, beginning with the Image to
|
| 66 |
Language Transformation Module. In this module, the image undergoes simultaneous processing via image captioning
|
| 67 |
and object detection frozen models, aiming to comprehensively capture the visual context and cues. These models,
|