Papaflessas's picture
Deploy Signal Generator app
3fe0726

NLP Models for Stock‑Alchemist

Overview

This module contains NLP models used to filter and rank financial news headlines before scraping full articles.
We leverage sentiment analysis to decide which news items are likely to impact stock prices.

Objectives

  • Quickly classify news headlines as NEGATIVE, NEUTRAL, or POSITIVE.
  • Measure per‑headline processing time (in milliseconds).
  • Compare multiple NLP models (e.g., FinBERT, custom transformers, lightweight classifiers).
  • Provide a flexible base for benchmark and research.

Directory Structure

  • services/
    • pre_sentiment.py — implements FinBertSentimentAnalyzer and helpers.
  • models/
    • Placeholder for future model definitions and training scripts.
  • README.md — this file.

Dataset

We collect headlines via Alpaca’s news WebSocket API. Each headline is:

  • A text string (≤512 tokens).
  • Timestamped.

Input & Output

  • Input: List of news headlines (strings).
  • Output:
    1. sentiment: NEGATIVE | NEUTRAL | POSITIVE
    2. processing_time_ms: float

Classes and Helpers

  • FinBertSentimentAnalyzer (services/pre_sentiment.py)
    • __init__(): loads yiyanghkust/finbert-tone model and tokenizer, moves to GPU/CPU.
    • predict_sentiment(text: str) → str
    • batch_predict_sentiment(texts: List[str]) → List[str]
  • Helper functions:
    • analyze_sentiment(text: str) → str
    • analyze_sentiments(texts: List[str]) → List[str]

Usage Example

from news_scraper.services.pre_sentiment import analyze_sentiment, analyze_sentiments

headline = "Company X beats revenue estimates this quarter."
print(analyze_sentiment(headline))
# → POSITIVE

batch = ["Stock plunge expected.", "Steady growth reported."]
print(analyze_sentiments(batch))
# → ["NEGATIVE", "NEUTRAL"]

Adding New Models

  1. Create a new service under services/ (e.g., my_model.py).
  2. Implement a class with the same interface:
    • predict_sentiment(text: str) → str
    • batch_predict_sentiment(texts: List[str]) → List[str]
  3. Update this README to document your new class.

Future Work

  • Benchmark latency and accuracy across models.
  • Add more advanced metrics (e.g., F1‑score against human‑labeled data).
  • Experiment with distillation for faster inference.

Financial News Sentiment Analysis Models

Several open-source models on Hugging Face are tailored to financial text or can be adapted for sentiment analysis. These typically output three classes (positive/neutral/negative) or can be thresholded to yield those labels. Important examples include: ProsusAI/finbert – FinBERT, a BERT-based model further pre-trained on a large financial corpus and fine-tuned on the Financial PhraseBank (news) dataset​ huggingface.co . It directly outputs softmax probabilities for three sentiment labels (positive, negative, neutral)​ huggingface.co . This is a widely used baseline for financial news sentiment (paper: FinBERT: Financial Sentiment Analysis with BERT​ huggingface.co ). No adaptation is needed: it already provides POS/NEG/NEU classification. yiyanghkust/finbert-tone – A variant of FinBERT fine-tuned specifically on financial tone (sentiment) data​ huggingface.co . It was trained on 10,000 analyst-report sentences manually labeled positive/negative/neutral​ huggingface.co . The model labels inputs as positive, negative, or neutral and “achieves superior performance on financial tone analysis”​ huggingface.co . (This model assumes BERT’s usual three-class output; label mapping is already set up in the HuggingFace pipeline.) ahmedrachid/FinancialBERT-Sentiment-Analysis – A FinancialBERT model (pretrained on Reuters, Bloomberg, 10-Ks, etc.) fine-tuned on the Financial PhraseBank sentiment dataset​ huggingface.co . It is explicitly trained for three-way sentiment (positive/neutral/negative) on financial news​ huggingface.co and notably “outperforms general BERT and other financial domain models” on that task. Published results show very high F1 (0.97) on each class​ huggingface.co . It requires no adaptation: outputs are already in POS/NEU/NEG form. Sigma/financial-sentiment-analysis – A fine-tuned model (Sigma) built on Ahmed Rachid’s FinancialBERT checkpoint, further trained on the Financial PhraseBank​ huggingface.co . It outputs positive/neutral/negative labels. According to its model card, it achieves ~99.24% accuracy and F1 on financial news sentiment​ huggingface.co . (In practice, this means near-perfect classification on the PhraseBank test set.) mrm8488/distilroberta-finetuned-financial-news-sentiment-analysis – A DistilRoBERTa-base model fine-tuned on the financial phrasebank dataset​ huggingface.co . It classifies sentences as positive, negative, or neutral. The model card reports ~98% accuracy on its held-out set​ huggingface.co . This smaller distilled model can be used as a faster alternative to full RoBERTa while maintaining strong performance on financial news sentiment. nickmuchi/deberta-v3-base-finetuned-finance-text-classification – A DeBERTa-v3 base model fine-tuned on financial sentiment data (combining the Financial PhraseBank with a labeled COVID-19 finance dataset)​ huggingface.co . It outputs the standard three sentiment labels. The author reports ~89.1% accuracy and F1 on this mixed dataset​ huggingface.co . This shows that large Transformer variants like DeBERTa can also be effectively adapted to finance by fine-tuning on domain data. Farshid/roberta-large-financial-phrasebank-allagree1 – A RoBERTa-large model fine-tuned on the subset of Financial PhraseBank sentences where all annotators agreed​ huggingface.co . It classifies news sentences into positive/neutral/negative. On this high-agreement data, it achieves ~97.35% accuracy and F1​ huggingface.co . (Because it used only consensus labels, it may be slightly optimistic; still, it’s a strong model for high-confidence sentiment predictions.) StephanAkkerman/FinTwitBERT-sentiment – A BERT-based model pretrained on ~10 million financial tweets and fine-tuned for sentiment​ huggingface.co . It outputs positive/negative/neutral labels for short informal financial texts (tweets or social-media posts). The card notes that this model “performs great on informal financial texts”​ huggingface.co . (While targeted at tweets, it can be applied to other social-media-style financial content; for formal news, the above models are more directly suitable.) LHF/finbert-regressor – A FinBERT-based model trained for sentiment regression on RavenPack data​ huggingface.co . Instead of discrete classes, it outputs a continuous sentiment score (0–1). The model card explains that 0 corresponds to negative, 1 to positive, and 0.5 is neutral​ huggingface.co . Thus it does not natively output POS/NEG/NEU, but one can threshold its output (e.g. <0.5=negative, >0.5=positive, ≈0.5 neutral) to recover three labels​ huggingface.co ​ huggingface.co . (This approach requires interpreting the score, but the authors provide this mapping in the documentation.) Each of the above models is publicly accessible on Hugging Face (links given). Models like FinBERT and FinancialBERT were trained on finance-specific corpora or labeled data​ huggingface.co ​ huggingface.co , so they capture domain language and typically outperform generic sentiment models on financial news. In practice, you can load any of them via the transformers library (e.g. AutoModelForSequenceClassification.from_pretrained("")) and use the built-in pipelines to get POS/NEG/NEU labels​ huggingface.co ​ huggingface.co . Where adaptation is needed (as in the regressor case), simple thresholding or fine-tuning on three-class data suffices. Sources: Model documentation and papers on Hugging Face (links above) provide the descriptions and reported performance​ huggingface.co ​ huggingface.co ​ huggingface.co ​ huggingface.co ​ huggingface.co ​ huggingface.co ​ huggingface.co ​ huggingface.co . Each cited source is the official model card for the respective model. Citations Favicon ProsusAI/finbert · Hugging Face

https://huggingface.co/ProsusAI/finbert Favicon ProsusAI/finbert · Hugging Face

https://huggingface.co/ProsusAI/finbert Favicon yiyanghkust/finbert-tone · Hugging Face

https://huggingface.co/yiyanghkust/finbert-tone Favicon ahmedrachid/FinancialBERT-Sentiment-Analysis · Hugging Face

https://huggingface.co/ahmedrachid/FinancialBERT-Sentiment-Analysis Favicon ahmedrachid/FinancialBERT-Sentiment-Analysis · Hugging Face

https://huggingface.co/ahmedrachid/FinancialBERT-Sentiment-Analysis Favicon Sigma/financial-sentiment-analysis · Hugging Face

https://huggingface.co/Sigma/financial-sentiment-analysis Favicon mrm8488/distilroberta-finetuned-financial-news-sentiment-analysis · Hugging Face

https://huggingface.co/mrm8488/distilroberta-finetuned-financial-news-sentiment-analysis Favicon nickmuchi/deberta-v3-base-finetuned-finance-text-classification · Hugging Face

https://huggingface.co/nickmuchi/deberta-v3-base-finetuned-finance-text-classification Favicon Farshid/roberta-large-financial-phrasebank-allagree1 · Hugging Face

https://huggingface.co/Farshid/roberta-large-financial-phrasebank-allagree1 Favicon Farshid/roberta-large-financial-phrasebank-allagree1 · Hugging Face

https://huggingface.co/Farshid/roberta-large-financial-phrasebank-allagree1 Favicon StephanAkkerman/FinTwitBERT-sentiment · Hugging Face

https://huggingface.co/StephanAkkerman/FinTwitBERT-sentiment Favicon LHF/finbert-regressor · Hugging Face

https://huggingface.co/LHF/finbert-regressor Favicon LHF/finbert-regressor · Hugging Face

https://huggingface.co/LHF/finbert-regressor Favicon yiyanghkust/finbert-tone · Hugging Face

https://huggingface.co/yiyanghkust/finbert-tone All Sources Faviconhuggingface