Spaces:
Running
Running
add documentation
Browse files- README.md +4 -2
- streamlit_app.py +3 -2
README.md
CHANGED
|
@@ -12,13 +12,15 @@ license: apache-2.0
|
|
| 12 |
|
| 13 |
# DocumentIQA: Scientific Document Insight QA
|
| 14 |
|
|
|
|
|
|
|
| 15 |
## Introduction
|
| 16 |
|
| 17 |
Question/Answering on scientific documents using LLMs (OpenAI, Mistral, ~~LLama2,~~ etc..).
|
| 18 |
This application is the frontend for testing the RAG (Retrieval Augmented Generation) on scientific documents, that we are developing at NIMS.
|
| 19 |
-
Differently to most of the project, we focus on scientific articles
|
| 20 |
|
| 21 |
-
**
|
| 22 |
|
| 23 |
**Demos**:
|
| 24 |
- (on HuggingFace spaces): https://lfoppiano-document-qa.hf.space/
|
|
|
|
| 12 |
|
| 13 |
# DocumentIQA: Scientific Document Insight QA
|
| 14 |
|
| 15 |
+
**Work in progress** :construction_worker:
|
| 16 |
+
|
| 17 |
## Introduction
|
| 18 |
|
| 19 |
Question/Answering on scientific documents using LLMs (OpenAI, Mistral, ~~LLama2,~~ etc..).
|
| 20 |
This application is the frontend for testing the RAG (Retrieval Augmented Generation) on scientific documents, that we are developing at NIMS.
|
| 21 |
+
Differently to most of the project, we focus on scientific articles. We target only the full-text using [Grobid](https://github.com/kermitt2/grobid) that provide and cleaner results than the raw PDF2Text converter (which is comparable with most of other solutions).
|
| 22 |
|
| 23 |
+
**NER in LLM response**: The responses from the LLMs are post-processed to extract <span stype="color:yellow">physical quantities, measurements</span> and <span stype="color:blue">materials</span> mentions.
|
| 24 |
|
| 25 |
**Demos**:
|
| 26 |
- (on HuggingFace spaces): https://lfoppiano-document-qa.hf.space/
|
streamlit_app.py
CHANGED
|
@@ -177,6 +177,7 @@ with st.sidebar:
|
|
| 177 |
st.markdown(
|
| 178 |
"""After entering your API Key (Open AI or Huggingface). Upload a scientific article as PDF document. You will see a spinner or loading indicator while the processing is in progress. Once the spinner stops, you can proceed to ask your questions.""")
|
| 179 |
|
|
|
|
| 180 |
if st.session_state['git_rev'] != "unknown":
|
| 181 |
st.markdown("**Revision number**: [" + st.session_state[
|
| 182 |
'git_rev'] + "](https://github.com/lfoppiano/document-qa/commit/" + st.session_state['git_rev'] + ")")
|
|
@@ -231,8 +232,8 @@ if st.session_state.loaded_embeddings and question and len(question) > 0 and st.
|
|
| 231 |
# for entity in entities:
|
| 232 |
# entity
|
| 233 |
decorated_text = decorate_text_with_annotations(text_response.strip(), entities)
|
| 234 |
-
decorated_text = decorated_text.replace('class="label material"', 'style="color:
|
| 235 |
-
decorated_text = re.sub(r'class="label[^"]+"', 'style="color:
|
| 236 |
st.markdown(decorated_text, unsafe_allow_html=True)
|
| 237 |
text_response = decorated_text
|
| 238 |
else:
|
|
|
|
| 177 |
st.markdown(
|
| 178 |
"""After entering your API Key (Open AI or Huggingface). Upload a scientific article as PDF document. You will see a spinner or loading indicator while the processing is in progress. Once the spinner stops, you can proceed to ask your questions.""")
|
| 179 |
|
| 180 |
+
st.markdown('**NER on LLM responses**: The responses from the LLMs are post-processed to extract <span style="color:orange">physical quantities, measurements</span> and <span style="color:green">materials</span> mentions.', unsafe_allow_html=True)
|
| 181 |
if st.session_state['git_rev'] != "unknown":
|
| 182 |
st.markdown("**Revision number**: [" + st.session_state[
|
| 183 |
'git_rev'] + "](https://github.com/lfoppiano/document-qa/commit/" + st.session_state['git_rev'] + ")")
|
|
|
|
| 232 |
# for entity in entities:
|
| 233 |
# entity
|
| 234 |
decorated_text = decorate_text_with_annotations(text_response.strip(), entities)
|
| 235 |
+
decorated_text = decorated_text.replace('class="label material"', 'style="color:green"')
|
| 236 |
+
decorated_text = re.sub(r'class="label[^"]+"', 'style="color:orange"', decorated_text)
|
| 237 |
st.markdown(decorated_text, unsafe_allow_html=True)
|
| 238 |
text_response = decorated_text
|
| 239 |
else:
|