File size: 4,377 Bytes
8eb2ddb
 
 
 
 
 
 
 
 
024e0e9
92d01da
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7d74feb
92d01da
 
 
7d74feb
 
 
 
92d01da
7d74feb
92d01da
 
 
 
 
 
 
 
7d74feb
92d01da
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
---
title: README
emoji: πŸ“š
colorFrom: indigo
colorTo: yellow
sdk: static
pinned: false
---

# OSS-Forge

**OSS-Forge** is an open research initiative focused on *trustworthy, secure, and transparent AI-assisted software engineering*.  
We develop and publish:

- **static and dynamic analyzers** for AI-generated code  
- **benchmarks and datasets** for software vulnerabilities, defects, exploits, and shellcode  
- **evaluation frameworks** for correctness, robustness, and data poisoning  
- **models and reproducible pipelines** for secure code generation  
- **experimental tools and artifacts** from peer-reviewed scientific publications  

Our mission is to build a transparent, verifiable, and secure ecosystem for integrating Large Language Models (LLMs) into software development, especially in safety-critical and security-sensitive contexts.

---

## What You Will Find Here

This organization hosts resources from multiple research projects and publications in AI security, software engineering, and code generation. Current categories include:

### Static Analyzers & Security Tools
- **DeVAIC** – Fast static analysis for detecting vulnerabilities in Python code  
- **PatchitPy** – Automated patching of vulnerable Python code via pattern-based transformations  
- **ACCA** – Automated correctness assessment of AI-generated code using symbolic execution  

### Datasets for Security & Software Engineering
- **PyResBugs** – 5,007 residual Python bugs with NL descriptions  
- **Shellcode_IA32** – The largest curated dataset of IA-32 shellcode snippets
- **PoisonPy** – Dataset supporting targeted data-poisoning attacks  
- **Human vs AI Code** – Defects, vulnerabilities, and complexity analysis at scale  

### Robustness, Data Quality & Industrial Code Generation
- **Residual Bug Generation from Natural Language** – Frameworks for generating realistic residual defects from NL descriptions  
- **Impact of Data Quality on Code Models** – Empirical studies on robustness, poisoning resilience, and dataset quality  
- **Industrial Code Generation** – Models for domain-specific code synthesis (e.g., VHDL generation from natural language)  

Our repositories include code, experimental scripts, datasets, and reproducibility materials.

---

## Research Themes

Our work spans four interconnected areas:

1. **Security of AI-generated Code**  
   Vulnerability detection, automated patching, exploit generation, and robustness testing.

2. **Trustworthy LLM Evaluation**  
   Correctness, equivalence checking, symbolic execution, reproducible benchmarks.

3. **Software Engineering with AI**  
   Defect analysis, complexity metrics, orthogonal defect classification (ODC).

4. **Adversarial ML for Code Models**  
   Data poisoning, robustness stress-testing, unsafe pattern injection.

All research artifacts are peer-reviewed and associated with publications at DSN, ISSRE, ICPC, IST, EMSE, JSS, AUSE, and other venues.

---

## Publications Powered by These Repositories

A non-exhaustive list includes works presented at:

- **IEEE/IFIP DSN**
- **IEEE ISSRE**
- **IEEE/ACM ICPC**
- **Empirical Software Engineering (EMSE)**
- **Information and Software Technology (IST)**
- **Automated Software Engineering (AUSE)**
- **Journal of Systems and Software (JSS)**

Full references are available inside each corresponding repository.

---

## Contributing

We encourage contributions from the research and practitioner community.

You can contribute by:

- submitting new datasets  
- improving static analysis rules  
- adding benchmarks or experimental scripts  
- reporting issues or proposing new features  

Please open discussions or pull requests inside the relevant repository.

---

## Contact

OSS-Forge is developed by a joint research team from the **University of North Carolina at Charlotte (UNCC)** and the **University of Naples Federico II**.

### Scientific Leadership
- Prof. [Domenico Cotroneo](https://webpages.charlotte.edu/dcotrone/) β€” UNCC

### Core Research Contributors
- Dr. [Pietro Liguori](http://wpage.unina.it/pietro.liguori/) β€” University of Naples Federico II
- [Cristina Improta](http://wpage.unina.it/cristina.improta/) β€” University of Naples Federico II  
- Ph.D. students and graduate researchers and contributors from the DESSERT Research group β€” University of Naples Federico II 

---