TajaKuzmanPungersek commited on
Commit
82a13c6
·
verified ·
1 Parent(s): 3ff9b2c

Added links to the paper and the training dataset

Browse files
Files changed (1) hide show
  1. README.md +20 -3
README.md CHANGED
@@ -111,14 +111,15 @@ to the [CAP (Comparative Agendas Project) schema](https://www.comparativeagendas
111
 
112
  This classification model is based on the multilingual parliamentary [XLM-R-Parla](https://huggingface.co/classla/xlm-r-parla) BERT-like model,
113
  which is a XLM-RoBERTa-large model that was additionally pre-trained on texts of parliamentary proceedings.
114
- To develop the ParlaCAP model, XLM-R-Parla was additionally fine-tuned on 29,779 instances (speeches) from
115
  29 [ParlaMint 4.1](http://hdl.handle.net/11356/1912) datasets
116
  containing transcriptions of parliamentary debates of 29 European countries and autonomous regions.
117
  The speeches were automatically annotated with 22 CAP labels (21 major topics and a label "Other") using the GPT-4o model
118
  in a zero-shot prompting fashion
119
  following the [LLM teacher-student framework](https://ieeexplore.ieee.org/document/10900365).
120
  Evaluation of the GPT model has shown that its annotation performance is
121
- comparable to those of human annotators.
 
122
 
123
  The fine-tuned ParlaCAP model achieves 0.723 in macro-F1 on an English test set,
124
  0.686 in macro-F1 on a Croatian test set, 0.710 in macro-F1 on a Serbian test set and 0.646 in macro-F1 on a Bosnian test set
@@ -186,7 +187,23 @@ To apply the model to the text corpora in the ParlaMint TXT format
186
 
187
  ## How to Cite
188
 
189
- The paper presenting the model is on its way. In the meantime, you can cite the model as follows:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
190
  ```
191
  @misc{parlacap_model,
192
  author = {Kuzman Punger{\v s}ek, Taja and Ljube{\v s}i{\'c}, Nikola},
 
111
 
112
  This classification model is based on the multilingual parliamentary [XLM-R-Parla](https://huggingface.co/classla/xlm-r-parla) BERT-like model,
113
  which is a XLM-RoBERTa-large model that was additionally pre-trained on texts of parliamentary proceedings.
114
+ To develop the ParlaCAP model, XLM-R-Parla was additionally fine-tuned on the [ParlaCAP-train dataset](http://hdl.handle.net/11356/2093): 29,779 instances (speeches) from
115
  29 [ParlaMint 4.1](http://hdl.handle.net/11356/1912) datasets
116
  containing transcriptions of parliamentary debates of 29 European countries and autonomous regions.
117
  The speeches were automatically annotated with 22 CAP labels (21 major topics and a label "Other") using the GPT-4o model
118
  in a zero-shot prompting fashion
119
  following the [LLM teacher-student framework](https://ieeexplore.ieee.org/document/10900365).
120
  Evaluation of the GPT model has shown that its annotation performance is
121
+ comparable to those of human annotators. For more information, see the paper ["Supercharging Agenda Setting Research:
122
+ The ParlaCAP Dataset of 28 European Parliaments and a Scalable Multilingual LLM-Based Classification"](https://doi.org/10.48550/arXiv.2602.16516) (Kuzman Pungeršek et al., 2026).
123
 
124
  The fine-tuned ParlaCAP model achieves 0.723 in macro-F1 on an English test set,
125
  0.686 in macro-F1 on a Croatian test set, 0.710 in macro-F1 on a Serbian test set and 0.646 in macro-F1 on a Bosnian test set
 
187
 
188
  ## How to Cite
189
 
190
+ Please cite the paper presenting the model:
191
+
192
+ ```
193
+ @article{pungersek2026parlacap-paper,
194
+ title={{Supercharging Agenda Setting Research: The ParlaCAP Dataset of 28 European Parliaments and a Scalable Multilingual LLM-Based Classification}},
195
+ author={Kuzman Punger{\v s}ek, Taja and Rupnik, Peter and {\v S}irini{\'c}, Daniela and Ljube{\v s}i{\'c}, Nikola},
196
+ year={2026},
197
+ eprint={2602.16516},
198
+ archivePrefix={arXiv},
199
+ primaryClass={cs.CL},
200
+ url={https://arxiv.org/abs/2602.16516},
201
+ journal={arXiv preprint arXiv:2602.16516},
202
+ }
203
+ ```
204
+
205
+ You can also cite the model as follows:
206
+
207
  ```
208
  @misc{parlacap_model,
209
  author = {Kuzman Punger{\v s}ek, Taja and Ljube{\v s}i{\'c}, Nikola},