Spaces:

Papaflessas
/

gotti_signal_gen

Running

App Files Files Community

gotti_signal_gen / src /news_scraper /nlp_models /README.md

Papaflessas

Deploy Signal Generator app

3fe0726 5 days ago

preview code

raw

history blame contribute delete

9.94 kB

NLP Models for Stock‑Alchemist

Overview

This module contains NLP models used to filter and rank financial news headlines before scraping full articles.
We leverage sentiment analysis to decide which news items are likely to impact stock prices.

Objectives

Quickly classify news headlines as NEGATIVE, NEUTRAL, or POSITIVE.
Measure per‑headline processing time (in milliseconds).
Compare multiple NLP models (e.g., FinBERT, custom transformers, lightweight classifiers).
Provide a flexible base for benchmark and research.

Directory Structure

services/
- pre_sentiment.py — implements FinBertSentimentAnalyzer and helpers.
models/
- Placeholder for future model definitions and training scripts.
README.md — this file.

Dataset

We collect headlines via Alpaca’s news WebSocket API. Each headline is:

A text string (≤512 tokens).
Timestamped.

Input & Output

Input: List of news headlines (strings).
Output:
1. sentiment: NEGATIVE | NEUTRAL | POSITIVE
2. processing_time_ms: float

Classes and Helpers

FinBertSentimentAnalyzer (services/pre_sentiment.py)
- __init__(): loads yiyanghkust/finbert-tone model and tokenizer, moves to GPU/CPU.
- predict_sentiment(text: str) → str
- batch_predict_sentiment(texts: List[str]) → List[str]
Helper functions:
- analyze_sentiment(text: str) → str
- analyze_sentiments(texts: List[str]) → List[str]

Usage Example

from news_scraper.services.pre_sentiment import analyze_sentiment, analyze_sentiments

headline = "Company X beats revenue estimates this quarter."
print(analyze_sentiment(headline))
# → POSITIVE

batch = ["Stock plunge expected.", "Steady growth reported."]
print(analyze_sentiments(batch))
# → ["NEGATIVE", "NEUTRAL"]

Adding New Models

Create a new service under services/ (e.g., my_model.py).
Implement a class with the same interface:
- predict_sentiment(text: str) → str
- batch_predict_sentiment(texts: List[str]) → List[str]
Update this README to document your new class.

Future Work

Benchmark latency and accuracy across models.
Add more advanced metrics (e.g., F1‑score against human‑labeled data).
Experiment with distillation for faster inference.

Financial News Sentiment Analysis Models

Several open-source models on Hugging Face are tailored to financial text or can be adapted for sentiment analysis. These typically output three classes (positive/neutral/negative) or can be thresholded to yield those labels. Important examples include: ProsusAI/finbert – FinBERT, a BERT-based model further pre-trained on a large financial corpus and fine-tuned on the Financial PhraseBank (news) dataset huggingface.co . It directly outputs softmax probabilities for three sentiment labels (positive, negative, neutral) huggingface.co . This is a widely used baseline for financial news sentiment (paper: FinBERT: Financial Sentiment Analysis with BERT huggingface.co ). No adaptation is needed: it already provides POS/NEG/NEU classification. yiyanghkust/finbert-tone – A variant of FinBERT fine-tuned specifically on financial tone (sentiment) data huggingface.co . It was trained on 10,000 analyst-report sentences manually labeled positive/negative/neutral huggingface.co . The model labels inputs as positive, negative, or neutral and “achieves superior performance on financial tone analysis” huggingface.co . (This model assumes BERT’s usual three-class output; label mapping is already set up in the HuggingFace pipeline.) ahmedrachid/FinancialBERT-Sentiment-Analysis – A FinancialBERT model (pretrained on Reuters, Bloomberg, 10-Ks, etc.) fine-tuned on the Financial PhraseBank sentiment dataset huggingface.co . It is explicitly trained for three-way sentiment (positive/neutral/negative) on financial news huggingface.co and notably “outperforms general BERT and other financial domain models” on that task. Published results show very high F1 (0.97) on each class huggingface.co . It requires no adaptation: outputs are already in POS/NEU/NEG form. Sigma/financial-sentiment-analysis – A fine-tuned model (Sigma) built on Ahmed Rachid’s FinancialBERT checkpoint, further trained on the Financial PhraseBank huggingface.co . It outputs positive/neutral/negative labels. According to its model card, it achieves ~99.24% accuracy and F1 on financial news sentiment huggingface.co . (In practice, this means near-perfect classification on the PhraseBank test set.) mrm8488/distilroberta-finetuned-financial-news-sentiment-analysis – A DistilRoBERTa-base model fine-tuned on the financial phrasebank dataset huggingface.co . It classifies sentences as positive, negative, or neutral. The model card reports ~98% accuracy on its held-out set huggingface.co . This smaller distilled model can be used as a faster alternative to full RoBERTa while maintaining strong performance on financial news sentiment. nickmuchi/deberta-v3-base-finetuned-finance-text-classification – A DeBERTa-v3 base model fine-tuned on financial sentiment data (combining the Financial PhraseBank with a labeled COVID-19 finance dataset) huggingface.co . It outputs the standard three sentiment labels. The author reports ~89.1% accuracy and F1 on this mixed dataset huggingface.co . This shows that large Transformer variants like DeBERTa can also be effectively adapted to finance by fine-tuning on domain data. Farshid/roberta-large-financial-phrasebank-allagree1 – A RoBERTa-large model fine-tuned on the subset of Financial PhraseBank sentences where all annotators agreed huggingface.co . It classifies news sentences into positive/neutral/negative. On this high-agreement data, it achieves ~97.35% accuracy and F1 huggingface.co . (Because it used only consensus labels, it may be slightly optimistic; still, it’s a strong model for high-confidence sentiment predictions.) StephanAkkerman/FinTwitBERT-sentiment – A BERT-based model pretrained on ~10 million financial tweets and fine-tuned for sentiment huggingface.co . It outputs positive/negative/neutral labels for short informal financial texts (tweets or social-media posts). The card notes that this model “performs great on informal financial texts” huggingface.co . (While targeted at tweets, it can be applied to other social-media-style financial content; for formal news, the above models are more directly suitable.) LHF/finbert-regressor – A FinBERT-based model trained for sentiment regression on RavenPack data huggingface.co . Instead of discrete classes, it outputs a continuous sentiment score (0–1). The model card explains that 0 corresponds to negative, 1 to positive, and 0.5 is neutral huggingface.co . Thus it does not natively output POS/NEG/NEU, but one can threshold its output (e.g. <0.5=negative, >0.5=positive, ≈0.5 neutral) to recover three labels huggingface.co huggingface.co . (This approach requires interpreting the score, but the authors provide this mapping in the documentation.) Each of the above models is publicly accessible on Hugging Face (links given). Models like FinBERT and FinancialBERT were trained on finance-specific corpora or labeled data huggingface.co huggingface.co , so they capture domain language and typically outperform generic sentiment models on financial news. In practice, you can load any of them via the transformers library (e.g. AutoModelForSequenceClassification.from_pretrained("")) and use the built-in pipelines to get POS/NEG/NEU labels huggingface.co huggingface.co . Where adaptation is needed (as in the regressor case), simple thresholding or fine-tuning on three-class data suffices. Sources: Model documentation and papers on Hugging Face (links above) provide the descriptions and reported performance huggingface.co huggingface.co huggingface.co huggingface.co huggingface.co huggingface.co huggingface.co huggingface.co . Each cited source is the official model card for the respective model. Citations Favicon ProsusAI/finbert · Hugging Face

https://huggingface.co/ProsusAI/finbert Favicon ProsusAI/finbert · Hugging Face

https://huggingface.co/ProsusAI/finbert Favicon yiyanghkust/finbert-tone · Hugging Face

https://huggingface.co/yiyanghkust/finbert-tone Favicon ahmedrachid/FinancialBERT-Sentiment-Analysis · Hugging Face

https://huggingface.co/ahmedrachid/FinancialBERT-Sentiment-Analysis Favicon ahmedrachid/FinancialBERT-Sentiment-Analysis · Hugging Face

https://huggingface.co/ahmedrachid/FinancialBERT-Sentiment-Analysis Favicon Sigma/financial-sentiment-analysis · Hugging Face

https://huggingface.co/Sigma/financial-sentiment-analysis Favicon mrm8488/distilroberta-finetuned-financial-news-sentiment-analysis · Hugging Face

https://huggingface.co/mrm8488/distilroberta-finetuned-financial-news-sentiment-analysis Favicon nickmuchi/deberta-v3-base-finetuned-finance-text-classification · Hugging Face

https://huggingface.co/nickmuchi/deberta-v3-base-finetuned-finance-text-classification Favicon Farshid/roberta-large-financial-phrasebank-allagree1 · Hugging Face

https://huggingface.co/Farshid/roberta-large-financial-phrasebank-allagree1 Favicon Farshid/roberta-large-financial-phrasebank-allagree1 · Hugging Face

https://huggingface.co/Farshid/roberta-large-financial-phrasebank-allagree1 Favicon StephanAkkerman/FinTwitBERT-sentiment · Hugging Face

https://huggingface.co/StephanAkkerman/FinTwitBERT-sentiment Favicon LHF/finbert-regressor · Hugging Face

https://huggingface.co/LHF/finbert-regressor Favicon LHF/finbert-regressor · Hugging Face

https://huggingface.co/LHF/finbert-regressor Favicon yiyanghkust/finbert-tone · Hugging Face

https://huggingface.co/yiyanghkust/finbert-tone All Sources Faviconhuggingface