Spaces:

m7mdal7aj
/

KB-VQA

Sleeping

m7mdal7aj commited on May 18, 2024

Commit

498fce8

verified ·

1 Parent(s): 189bdf9

Update my_model/tabs/model_arch.py

Files changed (1) hide show

my_model/tabs/model_arch.py CHANGED Viewed

@@ -25,6 +25,7 @@ def run_model_arch() -> None:
         st.markdown("#### Abstract")
         st.markdown("""
         <div style="text-align: justify;">
         Navigating the frontier of the Visual Turing Test, this research delves into multimodal learning to bridge
         the gap between visual perception and linguistic interpretation, a foundational challenge in artificial
         intelligence. It scrutinizes the integration of visual cognition and external knowledge, emphasizing the
@@ -60,6 +61,7 @@ def run_model_arch() -> None:
         st.markdown("#### Design")
         st.markdown("""
         <div style="text-align: justify;">
         As illustrated in architecture, the model operates through a sequential pipeline, beginning with the Image to
         Language Transformation Module. In this module, the image undergoes simultaneous processing via image captioning
         and object detection frozen models, aiming to comprehensively capture the visual context and cues. These models,

         st.markdown("#### Abstract")
         st.markdown("""
         <div style="text-align: justify;">
         Navigating the frontier of the Visual Turing Test, this research delves into multimodal learning to bridge
         the gap between visual perception and linguistic interpretation, a foundational challenge in artificial
         intelligence. It scrutinizes the integration of visual cognition and external knowledge, emphasizing the
         st.markdown("#### Design")
         st.markdown("""
         <div style="text-align: justify;">
         As illustrated in architecture, the model operates through a sequential pipeline, beginning with the Image to
         Language Transformation Module. In this module, the image undergoes simultaneous processing via image captioning
         and object detection frozen models, aiming to comprehensively capture the visual context and cues. These models,