Spaces:

Papaflessas
/

gotti_signal_gen

Running

File size: 9,941 Bytes

3fe0726

# NLP Models for Stock‑Alchemist

## Overview
This module contains NLP models used to filter and rank financial news headlines before scraping full articles.  
We leverage sentiment analysis to decide which news items are likely to impact stock prices.

## Objectives
- Quickly classify news headlines as **NEGATIVE**, **NEUTRAL**, or **POSITIVE**.
- Measure per‑headline processing time (in milliseconds).
- Compare multiple NLP models (e.g., FinBERT, custom transformers, lightweight classifiers).
- Provide a flexible base for benchmark and research.

## Directory Structure
- `services/`
  - `pre_sentiment.py` — implements `FinBertSentimentAnalyzer` and helpers.
- `models/`  
  - Placeholder for future model definitions and training scripts.
- `README.md` — this file.

## Dataset
We collect headlines via Alpaca’s news WebSocket API. Each headline is:
- A text string (≤512 tokens).
- Timestamped.  

## Input & Output
- **Input**: List of news headlines (strings).  
- **Output**:
  1. `sentiment`: **NEGATIVE** | **NEUTRAL** | **POSITIVE**  
  2. `processing_time_ms`: float

## Classes and Helpers
- **FinBertSentimentAnalyzer** (`services/pre_sentiment.py`)
  - `__init__()`: loads `yiyanghkust/finbert-tone` model and tokenizer, moves to GPU/CPU.
  - `predict_sentiment(text: str) → str`
  - `batch_predict_sentiment(texts: List[str]) → List[str]`
- **Helper functions**:
  - `analyze_sentiment(text: str) → str`
  - `analyze_sentiments(texts: List[str]) → List[str]`

## Usage Example
```python
from news_scraper.services.pre_sentiment import analyze_sentiment, analyze_sentiments

headline = "Company X beats revenue estimates this quarter."
print(analyze_sentiment(headline))
# → POSITIVE

batch = ["Stock plunge expected.", "Steady growth reported."]
print(analyze_sentiments(batch))
# → ["NEGATIVE", "NEUTRAL"]
```

## Adding New Models
1. Create a new service under `services/` (e.g., `my_model.py`).
2. Implement a class with the same interface:  
   - `predict_sentiment(text: str) → str`  
   - `batch_predict_sentiment(texts: List[str]) → List[str]`
3. Update this README to document your new class.

## Future Work
- Benchmark latency and accuracy across models.
- Add more advanced metrics (e.g., F1‑score against human‑labeled data).
- Experiment with distillation for faster inference.


# Financial News Sentiment Analysis Models
Several open-source models on Hugging Face are tailored to financial text or can be adapted for sentiment analysis. These typically output three classes (positive/neutral/negative) or can be thresholded to yield those labels. Important examples include:
ProsusAI/finbert – FinBERT, a BERT-based model further pre-trained on a large financial corpus and fine-tuned on the Financial PhraseBank (news) dataset
huggingface.co
. It directly outputs softmax probabilities for three sentiment labels (positive, negative, neutral)
huggingface.co
. This is a widely used baseline for financial news sentiment (paper: FinBERT: Financial Sentiment Analysis with BERT
huggingface.co
). No adaptation is needed: it already provides POS/NEG/NEU classification.
yiyanghkust/finbert-tone – A variant of FinBERT fine-tuned specifically on financial tone (sentiment) data
huggingface.co
. It was trained on ~10,000 analyst-report sentences manually labeled positive/negative/neutral
huggingface.co
. The model labels inputs as positive, negative, or neutral and “achieves superior performance on financial tone analysis”
huggingface.co
. (This model assumes BERT’s usual three-class output; label mapping is already set up in the HuggingFace pipeline.)
ahmedrachid/FinancialBERT-Sentiment-Analysis – A FinancialBERT model (pretrained on Reuters, Bloomberg, 10-Ks, etc.) fine-tuned on the Financial PhraseBank sentiment dataset
huggingface.co
. It is explicitly trained for three-way sentiment (positive/neutral/negative) on financial news
huggingface.co
 and notably “outperforms general BERT and other financial domain models” on that task. Published results show very high F1 (~0.97) on each class
huggingface.co
. It requires no adaptation: outputs are already in POS/NEU/NEG form.
Sigma/financial-sentiment-analysis – A fine-tuned model (Sigma) built on Ahmed Rachid’s FinancialBERT checkpoint, further trained on the Financial PhraseBank
huggingface.co
. It outputs positive/neutral/negative labels. According to its model card, it achieves ~99.24% accuracy and F1 on financial news sentiment
huggingface.co
. (In practice, this means near-perfect classification on the PhraseBank test set.)
mrm8488/distilroberta-finetuned-financial-news-sentiment-analysis – A DistilRoBERTa-base model fine-tuned on the financial phrasebank dataset
huggingface.co
. It classifies sentences as positive, negative, or neutral. The model card reports ~98% accuracy on its held-out set
huggingface.co
. This smaller distilled model can be used as a faster alternative to full RoBERTa while maintaining strong performance on financial news sentiment.
nickmuchi/deberta-v3-base-finetuned-finance-text-classification – A DeBERTa-v3 base model fine-tuned on financial sentiment data (combining the Financial PhraseBank with a labeled COVID-19 finance dataset)
huggingface.co
. It outputs the standard three sentiment labels. The author reports ~89.1% accuracy and F1 on this mixed dataset
huggingface.co
. This shows that large Transformer variants like DeBERTa can also be effectively adapted to finance by fine-tuning on domain data.
Farshid/roberta-large-financial-phrasebank-allagree1 – A RoBERTa-large model fine-tuned on the subset of Financial PhraseBank sentences where all annotators agreed
huggingface.co
. It classifies news sentences into positive/neutral/negative. On this high-agreement data, it achieves ~97.35% accuracy and F1
huggingface.co
. (Because it used only consensus labels, it may be slightly optimistic; still, it’s a strong model for high-confidence sentiment predictions.)
StephanAkkerman/FinTwitBERT-sentiment – A BERT-based model pretrained on ~10 million financial tweets and fine-tuned for sentiment
huggingface.co
. It outputs positive/negative/neutral labels for short informal financial texts (tweets or social-media posts). The card notes that this model “performs great on informal financial texts”
huggingface.co
. (While targeted at tweets, it can be applied to other social-media-style financial content; for formal news, the above models are more directly suitable.)
LHF/finbert-regressor – A FinBERT-based model trained for sentiment regression on RavenPack data
huggingface.co
. Instead of discrete classes, it outputs a continuous sentiment score (0–1). The model card explains that 0 corresponds to negative, 1 to positive, and 0.5 is neutral
huggingface.co
. Thus it does not natively output POS/NEG/NEU, but one can threshold its output (e.g. <0.5=negative, >0.5=positive, ≈0.5 neutral) to recover three labels
huggingface.co

huggingface.co
. (This approach requires interpreting the score, but the authors provide this mapping in the documentation.)
Each of the above models is publicly accessible on Hugging Face (links given). Models like FinBERT and FinancialBERT were trained on finance-specific corpora or labeled data
huggingface.co

huggingface.co
, so they capture domain language and typically outperform generic sentiment models on financial news. In practice, you can load any of them via the transformers library (e.g. AutoModelForSequenceClassification.from_pretrained("<model>")) and use the built-in pipelines to get POS/NEG/NEU labels
huggingface.co

huggingface.co
. Where adaptation is needed (as in the regressor case), simple thresholding or fine-tuning on three-class data suffices. Sources: Model documentation and papers on Hugging Face (links above) provide the descriptions and reported performance
huggingface.co

huggingface.co

huggingface.co

huggingface.co

huggingface.co

huggingface.co

huggingface.co

huggingface.co
. Each cited source is the official model card for the respective model.
Citations
Favicon
ProsusAI/finbert · Hugging Face

https://huggingface.co/ProsusAI/finbert
Favicon
ProsusAI/finbert · Hugging Face

https://huggingface.co/ProsusAI/finbert
Favicon
yiyanghkust/finbert-tone · Hugging Face

https://huggingface.co/yiyanghkust/finbert-tone
Favicon
ahmedrachid/FinancialBERT-Sentiment-Analysis · Hugging Face

https://huggingface.co/ahmedrachid/FinancialBERT-Sentiment-Analysis
Favicon
ahmedrachid/FinancialBERT-Sentiment-Analysis · Hugging Face

https://huggingface.co/ahmedrachid/FinancialBERT-Sentiment-Analysis
Favicon
Sigma/financial-sentiment-analysis · Hugging Face

https://huggingface.co/Sigma/financial-sentiment-analysis
Favicon
mrm8488/distilroberta-finetuned-financial-news-sentiment-analysis · Hugging Face

https://huggingface.co/mrm8488/distilroberta-finetuned-financial-news-sentiment-analysis
Favicon
nickmuchi/deberta-v3-base-finetuned-finance-text-classification · Hugging Face

https://huggingface.co/nickmuchi/deberta-v3-base-finetuned-finance-text-classification
Favicon
Farshid/roberta-large-financial-phrasebank-allagree1 · Hugging Face

https://huggingface.co/Farshid/roberta-large-financial-phrasebank-allagree1
Favicon
Farshid/roberta-large-financial-phrasebank-allagree1 · Hugging Face

https://huggingface.co/Farshid/roberta-large-financial-phrasebank-allagree1
Favicon
StephanAkkerman/FinTwitBERT-sentiment · Hugging Face

https://huggingface.co/StephanAkkerman/FinTwitBERT-sentiment
Favicon
LHF/finbert-regressor · Hugging Face

https://huggingface.co/LHF/finbert-regressor
Favicon
LHF/finbert-regressor · Hugging Face

https://huggingface.co/LHF/finbert-regressor
Favicon
yiyanghkust/finbert-tone · Hugging Face

https://huggingface.co/yiyanghkust/finbert-tone
All Sources
Faviconhuggingface