SorrelC commited on
Commit
3b38831
·
verified ·
1 Parent(s): ff983da

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +44 -0
README.md CHANGED
@@ -9,5 +9,49 @@ app_file: app.py
9
  pinned: false
10
  license: mit
11
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
 
13
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
9
  pinned: false
10
  license: mit
11
  ---
12
+ # Named Entity Recognition (NER) Explorer Tool
13
+
14
+ ## Background
15
+ This is a web-based interactive tool designed specifically for exploring Named Entity Recognition (NER) in practice. It was developed as a result of the Digital Scholarship at Oxford (DiSc) funded *Extracting Keywords from Crowdsourced Collections* project.
16
+
17
+ ## Overview
18
+ This NER Explorer Tool is an educational and exploratory interface to enable users to 'play' with different NER models and approaches. It was created in an effort to make the Natural Language Processing (NLP) approach more accessible to Digital Humanities (DH), Galleries, Libraries, Archives and Museums (GLAM) professionals, volunteers and researchers - who might otherwise not have the means or opportunity to explore what they can do with NER.
19
+
20
+ ## Why this tool?
21
+ During our short exploratory research project on keyword extraction from crowdsourced collections, we found that NER has real potential for enhancing search and discovery in digital archives while allowing records to 'speak for themselves'.
22
+
23
+ It can be difficult to know where to start when selecting NER models, as they can work differently and can be used to find different things. So here we've provided access to models that, of those we tested on a small sample, performed the best, while also trying to be clear that no model is perfect.
24
+
25
+ We also wanted to raise awareness of the existence of zero-shot NER models (e.g. GLiNER) which can be more flexible than models with pre-defined entity types (e.g. SpaCy), and show how it's possible to use these together.
26
+
27
+ ## Models included in the Explorer tool:
28
+ - `spacy_en_core_web_trf` - spaCy's transformer-based model
29
+ - `flair_ner-large` - Flair's large English NER model
30
+ - `flair_ner-ontonotes-large` - Flair's OntoNotes-based model
31
+ - `gliner_knowledgator/modern-gliner-bi-large-v1.0` - Modern zero-shot GLiNER model
32
+
33
+ ## Key features:
34
+ - **Highlighted Text**: See entities highlighted directly in your text with color-coded labels
35
+ - **Split-Color Highlighting**: Entities identified by both common NER models AND custom GLiNER searches are shown with distinctive split-color highlighting (marked with 🤝)
36
+ - **Detailed Tables**: Examine all identified entities with confidence scores and source attribution
37
+ - **Adjustable confidence threshold**: Control how certain models need to be before predicting entities (0.1-0.9)
38
+
39
+ ## Important
40
+ Please note this tool is designed for exploration and education purposes.
41
+ This tool is not designed or recommended for production use with very long text, large collections or sensitive materials. In those cases, if working with these NER models in other environments, additional testing, validation, and ethical review are strongly recommended.
42
+
43
+
44
+ If you have any questions about this tool please email: catherine.conisbee@bodleian.ox.ac.uk
45
+ See also:[main project repository](https://github.com/Digital-Scholarship-Oxford/crowdsourced-data-tools)
46
+
47
+
48
+
49
+
50
+
51
+
52
+
53
+
54
+
55
+
56
 
57
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference