File size: 9,941 Bytes
3fe0726
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
# NLP Models for Stock‑Alchemist

## Overview
This module contains NLP models used to filter and rank financial news headlines before scraping full articles.  
We leverage sentiment analysis to decide which news items are likely to impact stock prices.

## Objectives
- Quickly classify news headlines as **NEGATIVE**, **NEUTRAL**, or **POSITIVE**.
- Measure per‑headline processing time (in milliseconds).
- Compare multiple NLP models (e.g., FinBERT, custom transformers, lightweight classifiers).
- Provide a flexible base for benchmark and research.

## Directory Structure
- `services/`
  - `pre_sentiment.py` — implements `FinBertSentimentAnalyzer` and helpers.
- `models/`  
  - Placeholder for future model definitions and training scripts.
- `README.md` — this file.

## Dataset
We collect headlines via Alpaca’s news WebSocket API. Each headline is:
- A text string (≤512 tokens).
- Timestamped.  

## Input & Output
- **Input**: List of news headlines (strings).  
- **Output**:
  1. `sentiment`: **NEGATIVE** | **NEUTRAL** | **POSITIVE**  
  2. `processing_time_ms`: float

## Classes and Helpers
- **FinBertSentimentAnalyzer** (`services/pre_sentiment.py`)
  - `__init__()`: loads `yiyanghkust/finbert-tone` model and tokenizer, moves to GPU/CPU.
  - `predict_sentiment(text: str) → str`
  - `batch_predict_sentiment(texts: List[str]) → List[str]`
- **Helper functions**:
  - `analyze_sentiment(text: str) → str`
  - `analyze_sentiments(texts: List[str]) → List[str]`

## Usage Example
```python
from news_scraper.services.pre_sentiment import analyze_sentiment, analyze_sentiments

headline = "Company X beats revenue estimates this quarter."
print(analyze_sentiment(headline))
# → POSITIVE

batch = ["Stock plunge expected.", "Steady growth reported."]
print(analyze_sentiments(batch))
# → ["NEGATIVE", "NEUTRAL"]
```

## Adding New Models
1. Create a new service under `services/` (e.g., `my_model.py`).
2. Implement a class with the same interface:  
   - `predict_sentiment(text: str) → str`  
   - `batch_predict_sentiment(texts: List[str]) → List[str]`
3. Update this README to document your new class.

## Future Work
- Benchmark latency and accuracy across models.
- Add more advanced metrics (e.g., F1‑score against human‑labeled data).
- Experiment with distillation for faster inference.


# Financial News Sentiment Analysis Models
Several open-source models on Hugging Face are tailored to financial text or can be adapted for sentiment analysis. These typically output three classes (positive/neutral/negative) or can be thresholded to yield those labels. Important examples include:
ProsusAI/finbert – FinBERT, a BERT-based model further pre-trained on a large financial corpus and fine-tuned on the Financial PhraseBank (news) dataset​
huggingface.co
. It directly outputs softmax probabilities for three sentiment labels (positive, negative, neutral)​
huggingface.co
. This is a widely used baseline for financial news sentiment (paper: FinBERT: Financial Sentiment Analysis with BERT​
huggingface.co
). No adaptation is needed: it already provides POS/NEG/NEU classification.
yiyanghkust/finbert-tone – A variant of FinBERT fine-tuned specifically on financial tone (sentiment) data​
huggingface.co
. It was trained on ~10,000 analyst-report sentences manually labeled positive/negative/neutral​
huggingface.co
. The model labels inputs as positive, negative, or neutral and “achieves superior performance on financial tone analysis”​
huggingface.co
. (This model assumes BERT’s usual three-class output; label mapping is already set up in the HuggingFace pipeline.)
ahmedrachid/FinancialBERT-Sentiment-Analysis – A FinancialBERT model (pretrained on Reuters, Bloomberg, 10-Ks, etc.) fine-tuned on the Financial PhraseBank sentiment dataset​
huggingface.co
. It is explicitly trained for three-way sentiment (positive/neutral/negative) on financial news​
huggingface.co
 and notably “outperforms general BERT and other financial domain models” on that task. Published results show very high F1 (~0.97) on each class​
huggingface.co
. It requires no adaptation: outputs are already in POS/NEU/NEG form.
Sigma/financial-sentiment-analysis – A fine-tuned model (Sigma) built on Ahmed Rachid’s FinancialBERT checkpoint, further trained on the Financial PhraseBank​
huggingface.co
. It outputs positive/neutral/negative labels. According to its model card, it achieves ~99.24% accuracy and F1 on financial news sentiment​
huggingface.co
. (In practice, this means near-perfect classification on the PhraseBank test set.)
mrm8488/distilroberta-finetuned-financial-news-sentiment-analysis – A DistilRoBERTa-base model fine-tuned on the financial phrasebank dataset​
huggingface.co
. It classifies sentences as positive, negative, or neutral. The model card reports ~98% accuracy on its held-out set​
huggingface.co
. This smaller distilled model can be used as a faster alternative to full RoBERTa while maintaining strong performance on financial news sentiment.
nickmuchi/deberta-v3-base-finetuned-finance-text-classification – A DeBERTa-v3 base model fine-tuned on financial sentiment data (combining the Financial PhraseBank with a labeled COVID-19 finance dataset)​
huggingface.co
. It outputs the standard three sentiment labels. The author reports ~89.1% accuracy and F1 on this mixed dataset​
huggingface.co
. This shows that large Transformer variants like DeBERTa can also be effectively adapted to finance by fine-tuning on domain data.
Farshid/roberta-large-financial-phrasebank-allagree1 – A RoBERTa-large model fine-tuned on the subset of Financial PhraseBank sentences where all annotators agreed​
huggingface.co
. It classifies news sentences into positive/neutral/negative. On this high-agreement data, it achieves ~97.35% accuracy and F1​
huggingface.co
. (Because it used only consensus labels, it may be slightly optimistic; still, it’s a strong model for high-confidence sentiment predictions.)
StephanAkkerman/FinTwitBERT-sentiment – A BERT-based model pretrained on ~10 million financial tweets and fine-tuned for sentiment​
huggingface.co
. It outputs positive/negative/neutral labels for short informal financial texts (tweets or social-media posts). The card notes that this model “performs great on informal financial texts”​
huggingface.co
. (While targeted at tweets, it can be applied to other social-media-style financial content; for formal news, the above models are more directly suitable.)
LHF/finbert-regressor – A FinBERT-based model trained for sentiment regression on RavenPack data​
huggingface.co
. Instead of discrete classes, it outputs a continuous sentiment score (0–1). The model card explains that 0 corresponds to negative, 1 to positive, and 0.5 is neutral​
huggingface.co
. Thus it does not natively output POS/NEG/NEU, but one can threshold its output (e.g. <0.5=negative, >0.5=positive, ≈0.5 neutral) to recover three labels​
huggingface.co
​
huggingface.co
. (This approach requires interpreting the score, but the authors provide this mapping in the documentation.)
Each of the above models is publicly accessible on Hugging Face (links given). Models like FinBERT and FinancialBERT were trained on finance-specific corpora or labeled data​
huggingface.co
​
huggingface.co
, so they capture domain language and typically outperform generic sentiment models on financial news. In practice, you can load any of them via the transformers library (e.g. AutoModelForSequenceClassification.from_pretrained("<model>")) and use the built-in pipelines to get POS/NEG/NEU labels​
huggingface.co

huggingface.co
. Where adaptation is needed (as in the regressor case), simple thresholding or fine-tuning on three-class data suffices. Sources: Model documentation and papers on Hugging Face (links above) provide the descriptions and reported performance​
huggingface.co

huggingface.co

huggingface.co

huggingface.co

huggingface.co

huggingface.co

huggingface.co

huggingface.co
. Each cited source is the official model card for the respective model.
Citations
Favicon
ProsusAI/finbert · Hugging Face

https://huggingface.co/ProsusAI/finbert
Favicon
ProsusAI/finbert · Hugging Face

https://huggingface.co/ProsusAI/finbert
Favicon
yiyanghkust/finbert-tone · Hugging Face

https://huggingface.co/yiyanghkust/finbert-tone
Favicon
ahmedrachid/FinancialBERT-Sentiment-Analysis · Hugging Face

https://huggingface.co/ahmedrachid/FinancialBERT-Sentiment-Analysis
Favicon
ahmedrachid/FinancialBERT-Sentiment-Analysis · Hugging Face

https://huggingface.co/ahmedrachid/FinancialBERT-Sentiment-Analysis
Favicon
Sigma/financial-sentiment-analysis · Hugging Face

https://huggingface.co/Sigma/financial-sentiment-analysis
Favicon
mrm8488/distilroberta-finetuned-financial-news-sentiment-analysis · Hugging Face

https://huggingface.co/mrm8488/distilroberta-finetuned-financial-news-sentiment-analysis
Favicon
nickmuchi/deberta-v3-base-finetuned-finance-text-classification · Hugging Face

https://huggingface.co/nickmuchi/deberta-v3-base-finetuned-finance-text-classification
Favicon
Farshid/roberta-large-financial-phrasebank-allagree1 · Hugging Face

https://huggingface.co/Farshid/roberta-large-financial-phrasebank-allagree1
Favicon
Farshid/roberta-large-financial-phrasebank-allagree1 · Hugging Face

https://huggingface.co/Farshid/roberta-large-financial-phrasebank-allagree1
Favicon
StephanAkkerman/FinTwitBERT-sentiment · Hugging Face

https://huggingface.co/StephanAkkerman/FinTwitBERT-sentiment
Favicon
LHF/finbert-regressor · Hugging Face

https://huggingface.co/LHF/finbert-regressor
Favicon
LHF/finbert-regressor · Hugging Face

https://huggingface.co/LHF/finbert-regressor
Favicon
yiyanghkust/finbert-tone · Hugging Face

https://huggingface.co/yiyanghkust/finbert-tone
All Sources
Faviconhuggingface