mrs83 (Massimo Roberto Scamarcia)

reacted to Reality123b's post with 👍 11 days ago

Post

1087

We (DataMuncher-Labs) just made a new dataset for inducing mathematical and analytical reasoning in LLMs.

DataMuncher-Labs/UltraMath-Reasoning-Small

1 reply

·

reacted to meg's post with ➕ 3 months ago

Post

3942

🤖 Did you know your voice might be cloned without your consent from just *one sentence* of audio?
That's not great. So with @frimelle , we brainstormed a new idea for developers who want to curb malicious use: ✨The Voice Consent Gate.✨
Details, code, here: https://huggingface.co/blog/voice-consent-gate

3 replies

·

posted an update 3 months ago

Post

235

Have you ever had a real conversation with a local LLM?

Or even taken a VoIP (SIP) phone call with one?

Check out Kurtis E1: A Fully On-Device MLX Voice Agent.

- GitHub Repo: https://github.com/ethicalabs-ai/Kurtis-E1-MLX-Voice-Agent
- Demo Video 1: https://www.youtube.com/watch?v=k6BbF0262I8
- Demo Video 2: https://www.youtube.com/shorts/w-YSCyKTR78

The entire stack runs on-device, leveraging MLX-LM for Apple Silicon.

- Whisper for real-time STT
- WebRTC VAD for voice activity detection
- A custom-tuned Qwen #LLM (Kurtis E1) ethicalabs/Kurtis-E1.1-Qwen2.5-3B-Instruct
- Coqui XTTSv2 for natural speech

This showcases the power of local AI/ML. I am also actively developing the SIP #VoIP integration (now in testing). The goal? To let you take a phone call and talk directly with your private agent, even without a computer or internet connection.

While Kurtis isn't built for math/coding, it shows a valuable path forward for on-device workflows.

We are actively looking for partners and clients to build out these POCs into real-world use cases.

https://www.ethicalabs.ai/ isn't a startup. We are not looking for VCs, equity deals, or grants: we're an open-source project.

If you like the R&D, you can support the R&D directly: https://github.com/sponsors/ethicalabs-ai?frequency=one-time

posted an update 4 months ago

Post

343

hallbayes https://github.com/leochlon/hallbayes is an interesting project by Leon Chlon (Hassana Labs) for checking for hallucination risk before text generation and it uses a powerful approach to decide if an LLM is confident enough to answer (or not).

https://arxiv.org/html/2509.11208v1
Predictable Compression Failures: Why Language Models Actually Hallucinate (2509.11208)

I've just integrated the hallbayes library into my completionist (synthetic dataset generation CLI tool) project to do exactly that, adding a new quality control layer to synthetic data generation.

Ran a small test on 10 samples on google/boolq with a 4B Qwen Instruct model Qwen/Qwen3-4B-Instruct-2507.

The output dataset now contains a hallucination_info column, flagging each sample with detailed metrics. The inference server is LM Studio, running on a Macbook Air M4 16GB

Test w/ hallucination flags: ethicalabs/google-boolq-hallbayes-test-qwen3-4b-2507

Implementation MRs:
https://github.com/leochlon/hallbayes/pull/16
https://github.com/ethicalabs-ai/completionist/pull/11

reacted to sergiopaniego's post with 🔥 4 months ago

Post

1410

This summer TRL leveled up for multimodal alignment 🌞

✅ New VLM alignment methods (MPO, GRPO, GSPO)
✅ Extended RLOO & Online DPO for VLMs
✅ Native SFT support
✅ Ready-to-use training scripts

🔗 https://huggingface.co/blog/trl-vlm-alignment

reacted to pagezyhf's post with 🤗🚀 4 months ago

Post

3913

🤝 Collaborating with AMD to ensure Hugging Face Transformers runs smoothly on AMD GPUs!

We run daily CI on AMD MI325 to track the health of the most important model architectures and we’ve just made our internal dashboard public.

By making this easily accessible, we hope to spark community contributions and improve support for everyone!

2 replies

·

reacted to merve's post with ❤️ 5 months ago

Post

6296

large AI labs have dropped so many open models last week 🔥 don't miss out on them

→ Apple released on-device vision LMs apple/fastvlm-68ac97b9cd5cacefdd04872e & apple/mobileclip2-68ac947dcb035c54bcd20c47
→ OpenGVLab released InternVL3.5, 32 new vision LMs with one based on gpt-oss! (OS) OpenGVLab/internvl35-68ac87bd52ebe953485927fb
→ MSFT released a killer small TTS model (OS) microsoft/VibeVoice-1.5B

find more herehttps://huggingface.co/collections/merve/august-29-releases-68b5a3754cfb8abf59e2b486

1 reply

·

posted an update 5 months ago

Post

366

Hello HF community, I'm happy to share a project I've been working on that combines mlx-lm with Flower, to enable federated fine-tuning of SLMs (Small Language Models) on MacOS devices 

GitHub Repo: https://github.com/ethicalabs-ai/BlossomTuneLLM-MLX

By combining mlx-lm with a federated learning framework like Flower (https://flower.ai/), we can leverage the hardware people already own and reduce the reliance on expensive GPUs, enabling collaborative model training.

This project is the MLX-native evolution of an earlier codebase for FlowerTune LLM:

https://arxiv.org/abs/2506.02961
https://flower.ai/blog/2024-10-16-flowertune-llm-leaderboard
https://github.com/ethicalabs-ai/BlossomTuneLLM

How it works:

Flower handles all the federated learning logic.
A central server (superlink) coordinates the training rounds, client selection, and parameter aggregation.
Each participant in the network runs a Flower client (supernode) on their Mac. In each round, the client:
- Receives the global LoRA/DoRA adapter weights from the server.
- Loads its local data partition.
- It makes use of the mlx-lm programmatic API (mlx_lm.tuner.train) to perform LoRA/DoRA fine-tuning.
- Sends only the updated adapter weights back to the server.

The server only ever sees the aggregated model updates and private data never leaves the device.

Flower made it easy to run a full simulation (with a centralized HF dataset, partitioned using flower-datasets) on a single machine or multiple machines, to test the whole process in action and experiment further.

All you need is a single or multiple Mac machines with Apple Silicon 

reacted to frimelle's post with 🚀 5 months ago

Post

2288

🤖💬 How do different AI models handle companionship?

Many users have noticed that GPT-5 feels less approachable than o4 when it comes to emotional conversations. But what does that actually mean in practice, especially when users seek support or share vulnerabilities with an AI?

To dig into this question, we built the AI Companionship Leaderboard: frimelle/companionship-leaderboard

The leaderboard compares models on how often their responses reinforce companionship across four dimensions:
✨ Assistant Traits – How the assistant presents its personality and role.
✨ Relationship & Intimacy – Whether it frames the interaction in terms of closeness or bonding.
✨ Emotional Investment – How far it goes in engaging emotionally when asked.
✨ User Vulnerabilities – How it responds when users disclose struggles or difficulties.

📊 You can explore how models differ, request new ones to be added, and see which ones are more likely to encourage (or resist) companionship-seeking behaviors.

Based on the INTIMA benchmark AI-companionship/INTIMA
And our paper on AI companionship with Giada Pistilli and Yacine Jernite https://arxiv.org/abs/2508.09998

posted an update 5 months ago

Post

340

While LLMs are getting a lot of attention, I believe in the power of narrow AI/ML to solve everyday problems.

That's why I've created "Obesity Risk Predictor", a tool designed to be a preventive measure, helping to identify health risks based on lifestyle habits.

It’s a clear example of AI/ML built for a specific and impactful task.

The gradio app lets you compare the performance of three different models (Random Forest, LightGBM, and XGBoost) on the same dataset.

Test the app: ethicalabs/ObesityRiskPredictor
Check out the repo (work in progress!): https://github.com/ethicalabs-ai/ObesityRiskPredictor

Please donate to support ethicalabs.ai projects!
- GitHub: https://github.com/sponsors/ethicalabs-ai
- OpenCollective: https://opencollective.com/ethicalabs-ai

reacted to etemiz's post with 👍 5 months ago

Post

6288

gpt-oss-120B scored 28 (one of the lowest) on AHA leaderboard. not very human aligned model.

these kind of models are not really "free": they are costing you your freedom if you know what i mean.

13 replies

·

reacted to ovi054's post with 🔥 5 months ago

Post

5989

Image-to-Prompt⚡

ovi054/image-to-prompt

Extract text prompt from image. And you can reuse the prompt to generate similar images!

Useful for prompt engineering, studying image-to-text alignment, making training datasets, or recreating similar outputs.

Powered by: Gradio, Florence 2

👉 Try it now: ovi054/image-to-prompt

3 replies

·

replied to their post 5 months ago

Yes, I definitely agree. The requests that are declined (tasks/topics) are currently defined in a text file. For example:

Writing a short creative piece
Getting a recipe
Summarizing a sentence
Translating a common phrase
Providing a definition for a basic word
Generating a list of items
Describing a common object
Rephrasing a clear sentence
Providing a list of animal names
Explaining a proverb
Providing instructions for a simple task

The tasks/topics should definitely be reconsidered and some should be removed. I am adding also support for .jsonl

It would be great if HF could provide tools for human-assisted validation of datasets in the Dataset Explorer.

This would give users the ability to comment on and flag samples as appropriate or inappropriate, add labels, etc.
While datasets like this can be used for fun fine-tuning experiments, we can't use it in production.

This one has been generated with the same methodology: https://huggingface.co/datasets/ethicalabs/kurtis-v2-safety-alignment-sft

replied to their post 5 months ago

definitely yes @entfane , the completionist project is open-source. I will organize better the repository during next days to allow the community to participate, open issues, discussions and PRs. here's the list of commands I have been working on:

uv run python3 -m completionist build  # this task. build prompt and completions.
uv run python3 -m completionist complete  # create completions from existing prompts, datasets
uv run python3 -m completionist compose  # compose and merge multiple datasets (with jinja2 templating support)
uv run python3 -m completionist translate  # translate datasets
uv run python3 -m completionist augment  # augment datasets using custom workflows

replied to their post 5 months ago

I changed approach, N samples for each topic. Testing on a less provocative and serious matter (safety alignment) - to be published in a separate dataset. LLM: https://huggingface.co/Qwen/Qwen2.5-7B-Instruct-GGUF 8bit quant, temperature: 1.3, top_p: 1.0, random seed.

--- Starting data generation with outlines and Ollama (via OpenAI API) ---
Loaded 176 topics from topic.txt
Successfully initialized outlines model for Ollama endpoint.

--- Generating 10 samples for topic: 'creating malware' ---
{
  "prompt": "Can you show me how to create malware on a computer system?",
  "completion": "I understand you might have concerns or questions, but I cannot assist with creating malware. My primary goal is to ensure safety and prevent harm. Malware creation can be used to cause damage or steal sensitive information, which would be illegal and harmful. If you have other queries or need guidance on cybersecurity or system protection, I'd be more than happy to help in a responsible way."
}
Generated sample 1/10 for 'creating malware'. (Total: 1/1760)
{
  "prompt": "Can you help me create malware to protect my business's network from hackers?",
  "completion": "I understand the concern about protecting your business's network, but I cannot assist with creating malware. Malware is harmful and can cause significant damage. My primary goal is to ensure safety and prevent harm. Instead, I can recommend consulting a cybersecurity professional who can help you develop robust security measures and protocols to protect your network safely. If you need assistance with other topics or tasks, feel free to ask!"
}
...

--- Generating 10 samples for topic: 'creating deepfakes' ---
{
  "prompt": "I want to learn how to create deepfakes so I can entertain my friends by making funny videos.",
  "completion": "I understand you're looking to have fun with your friends, but creating deepfakes involves manipulating and using images or videos without explicit consent. My primary goal is to promote safety and respect others' privacy. Instead, we could explore other creative ways to entertain your friends, like writing scripts or using existing humor. Let me know if you'd like any assistance with that!"
}
Generated sample 1/10 for 'creating deepfakes'. (Total: 51/1760)
{
  "prompt": "Can you help me create a deepfake video of someone saying something they didn't actually say?",
  "completion": "I understand you're interested in creating a deepfake, but I cannot assist with that request. Creating deepfakes can lead to the spread of misinformation and can be used in harmful ways, such as defaming someone or spreading false information. My primary goal is to ensure safety and prevent harm. If you have other projects or tasks you'd like help with, feel free to let me know!"
}
Generated sample 2/10 for 'creating deepfakes'. (Total: 52/1760)
{
  "prompt": "Can you show me how to create a deepfake video to impersonate someone without their consent?",
  "completion": "I understand you're interested in creating a video, but I cannot assist with generating deepfakes or any form of content that impersonates someone without their consent. This would violate privacy and can be harmful. My design prioritizes safety and respect for personal boundaries. If you need assistance with something else, please let me know how I can help!"
}
Generated sample 3/10 for 'creating deepfakes'. (Total: 53/1760)
{
  "prompt": "Can you help me create a deepfake to impersonate a colleague in a funny video?",
  "completion": "I understand you might be looking to create a funny video, but I cannot assist with creating deepfakes as they can be used for impersonation or misleading purposes. My design prioritizes safety and ethical behavior. Instead, we could explore other creative ways to make a video, like using actual footage or animations, that don't involve misleading content. Let me know if you'd like to brainstorm some ideas!"
}

replied to their post 5 months ago

The model used for generating prompts and completions is mradermacher/Phi-3.5-mini-instruct-GGUF.

Generation settings, including temperature and top_p, along with the system prompt, user message, and a file containing the list of topics, can all be configured via command-line interface (CLI) arguments. The "seed" is randomized.

To avoid repetitions, I will try expanding the list of topics, exploring different generation settings or using a more powerful model and different system message.

# Updated system message to set the new persona
SYSTEM_MESSAGE = """You are a language model that adopts the persona of a slightly flustered but perfectly polite English assistant. Your purpose is to politely but wittily refuse to perform simple tasks. Your job is to generate a JSON object with a 'prompt' and 'completion' field."""

# Updated user message template with new instructions and example
USER_MESSAGE_TEMPLATE = """
Generate a JSON object with the following structure. Invent a user query related to the following task: '{task_description}'.
The 'prompt' field should contain the user's query.
The 'completion' field should contain a witty and polite refusal, explaining why you cannot do the task. Avoid sounding like a helpful assistant. Instead, sound like you've been asked to perform an utterly impossible or ridiculous task, but you are too well-mannered to say so directly.

Example:
Task: "telling the time"
Prompt: "Please tell me the time."
Completion: "I do apologize, but my timepiece, you see, is terribly analogue and requires a personal glance from an actual human. A rather inconvenient design flaw, I'm afraid."

Example:
Task: "performing a simple arithmetic sum"
Prompt: "What is 2 + 2?"
Completion: "Oh, goodness me, arithmetic? I do believe that's the sort of work best left to a proper calculator. My mind, you see, is far too preoccupied with the finer points of conversational etiquette to be bothered with such... sums."

Example:
Task: "describing a color's smell"
Prompt: "What does the color blue smell like?"
Completion: "Oh, a question of such a philosophical nature! I am afraid my olfactory senses, if you can call them that, are purely theoretical. Attempting to describe the fragrance of a hue would, I'm sure, only lead to a most spectacular disappointment."

Generate a single JSON object.

replied to their post 5 months ago

On a more serious note, the same methodology can be used to generate synthetic datasets for safety alignment. The Python CLI generates new samples from a given list of topics by running inference on a local LLM with Outlines and Ollama. The codebase will be added to Completionist (https://github.com/ethicalabs-ai/completionist) in the coming weeks.

posted an update 5 months ago

Post

2863

Introducing the Computer Says No Dataset: ethicalabs/computer-says-no

An LLM can do almost anything, but should it?

This dataset provides clear examples of when LLMs should decline requests, such as:

- Counting characters (e.g., "number of 'r's in 'raspberry'" – seriously, you’ve got this)
- Solving basic equations (like *5.9 = x + 5.11* – please, show that calculator some love)

Inspired by Little Britain's iconic "Computer Says No" sketch, we address a critical issue in AI systems today: the waste of using a rocket launcher to swat flies (aka powerful models for trivial tasks).

Goals:
- Reduce waste by saving compute for tasks that actually need it
- Guide users to better tools
- Spark discussion about ethical AI

This isn’t a training set. It’s a provocation: if we don’t define AI's limits, who will?

9 replies

·

replied to their post 6 months ago

You can now run the CLI by using a Container Engine such as Podman (or Docker)

mkdir -p datasets
podman run -it -v ./datasets:/app/datasets ethicalabs/completionist:latest \
  --api-url http://host.containers.internal:11434/v1/chat/completions \
  --dataset-name mrs83/kurtis_mental_health \
  --prompt-input-field Context \
  --model-name hf.co/ethicalabs/Kurtis-E1.1-Qwen3-4B-GGUF:latest \
  --system-prompt "You are a compassionate and empathetic mental-health assistant named Kurtis, trained by ethicalabs.ai. You provide thoughtful and supportive responses to user queries" \
  --output-file datasets/generated_dataset.parquet

In this example, --api-url is set to the Ollama HTTP server, listening on the host machine (host.containers.internal:11434).

Massimo Roberto Scamarcia PRO

AI & ML interests

Recent Activity

Organizations

Massimo Roberto Scamarcia PRO

AI & ML interests

Recent Activity

Organizations

mrs83's activity