PyTorch
English
qwen2
RAG
IR
LLM
Meranti commited on
Commit
6effdc2
·
verified ·
1 Parent(s): 19493b8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +213 -3
README.md CHANGED
@@ -1,3 +1,213 @@
1
- ---
2
- license: cc-by-nc-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ language:
4
+ - en
5
+ base_model:
6
+ - Qwen/Qwen2.5-3B-Instruct
7
+ tags:
8
+ - RAG
9
+ - IR
10
+ - LLM
11
+ datasets:
12
+ - sentence-transformers/natural-questions
13
+ - hotpotqa/hotpot_qa
14
+ ---
15
+
16
+ # OpenDecoder
17
+
18
+ This model implements the OpenDecoder architecture described in [OpenDecoder: Open Large Language Model Decoding to Incorporate Document Quality in RAG], a scalable approach for integrating retrieval signals directly into autoregressive generation.
19
+
20
+ The checkpoint we release here is trained on NQ and HotpotQA datasets, under the **robust training** setting introduced in the paper, where for each query, a total of ten passages are constructed as input: the top-5 highest-ranked relevant passages are always included, followed by three passages randomly sampled from ranks 6–100 to represent partially relevant context, and two passages randomly sampled from beyond rank 100 in the collection to simulate irrelevant documents.
21
+
22
+ We initialize our model from the Qwen2.5-3B-Instruct model.
23
+
24
+ # Usage
25
+
26
+ We provide a minimum running example python script to show the appropriate usage of the model. Specifically, the model takes as inputs
27
+ - A query
28
+ - Ten retrieved documents (relevant or not)
29
+ - Corrdsponding relevance scores
30
+
31
+ Then the model would produce an answer.
32
+
33
+ Please note that we have modified the Qwen2.5 source code to enable incorporation of document quality information in the decoding process,
34
+ hence this code snippet are only runnable using the `IModelForCausalLM ` we implemented in our [code reporitory](https://github.com/fengranMark/OpenDecoder).
35
+ Please first clone the repository then run this demo. More details on the training and evaluation of OpenDecoder are also provided in this GitHub repository.
36
+
37
+ ``` Python
38
+ import torch
39
+ from transformers import AutoTokenizer
40
+
41
+ #################################################################################################################
42
+ # You should run this script under the src folder of our GitHub repo: https://github.com/fengranMark/OpenDecoder
43
+ #################################################################################################################
44
+
45
+ from model.qwen_decoder.modeling import IModelForCausalLM
46
+ from model.qwen_decoder.configuration import IConfig
47
+
48
+ device = "cuda" if torch.cuda.is_available() else "cpu"
49
+
50
+
51
+ # ------------------
52
+ # Load model/tokenizer
53
+ # ------------------
54
+ config = IConfig.from_pretrained("Meranti/OpenDecoder")
55
+ model = IModelForCausalLM.from_pretrained("Meranti/OpenDecoder", config=config).to(device).eval()
56
+
57
+ tokenizer = AutoTokenizer.from_pretrained(
58
+ "Meranti/OpenDecoder",
59
+ trust_remote_code=True,
60
+ padding_side="left",
61
+ )
62
+
63
+ # Add Passage tokens (must match training)
64
+ special_passage_tokens = [f"Passage_{i+1}:" for i in range(20)]
65
+ tokenizer.add_special_tokens({"additional_special_tokens": special_passage_tokens})
66
+ model.resize_token_embeddings(len(tokenizer))
67
+
68
+ if tokenizer.pad_token is None:
69
+ tokenizer.pad_token = tokenizer.unk_token
70
+ if tokenizer.pad_token_id is None:
71
+ tokenizer.pad_token_id = tokenizer.eos_token_id
72
+
73
+
74
+ # ------------------
75
+ # Example input
76
+ # ------------------
77
+ question = "Who wrote the novel The Old Man and the Sea?"
78
+
79
+ documents = [
80
+ "The Old Man and the Sea is a short novel written by Ernest Hemingway in 1951.",
81
+ "Ernest Hemingway was an American novelist and short-story writer.",
82
+ "The book won the Pulitzer Prize for Fiction in 1953.",
83
+ "It tells the story of an aging Cuban fisherman.",
84
+ "Hemingway also wrote For Whom the Bell Tolls.",
85
+ "The novella was published in Life magazine.",
86
+ "It contributed to Hemingway winning the Nobel Prize.",
87
+ "The protagonist is named Santiago.",
88
+ "The story is set in the Gulf Stream.",
89
+ "The work is considered one of Hemingway's classics."
90
+ ]
91
+
92
+ # document-level relevance (length = 10)
93
+ doc_scores = [0.95, 0.9, 0.4, 0.2, 0.1, 0.1, 0.05, 0.05, 0.05, 0.05]
94
+
95
+ # normalize exactly like your dataset (normal mode)
96
+ mx = max(doc_scores)
97
+ norm_scores = [s / mx for s in doc_scores]
98
+
99
+
100
+ # ------------------
101
+ # Build RAG prompt
102
+ # ------------------
103
+ context_parts = []
104
+ for i, doc in enumerate(documents):
105
+ context_parts.append(f"Passage_{i+1}: {doc}")
106
+
107
+ context = "\n".join(context_parts)
108
+
109
+ messages = [
110
+ {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
111
+ {
112
+ "role": "user",
113
+ "content": (
114
+ "You should answer the question by referring to the knowledge provided below and integrating "
115
+ "the usefulness of your own knowledge. Just directly answer it in several words as a short answer "
116
+ "without any explanation.\n"
117
+ f"{context}\n\nQuestion:{question}\n"
118
+ ),
119
+ },
120
+ ]
121
+
122
+ prompt = tokenizer.apply_chat_template(messages, tokenize=False)
123
+
124
+ tokenized = tokenizer(
125
+ prompt,
126
+ return_tensors="pt",
127
+ padding="max_length",
128
+ truncation=True,
129
+ max_length=4096,
130
+ )
131
+
132
+ input_ids = tokenized["input_ids"].to(device)
133
+ attention_mask = tokenized["attention_mask"].to(device)
134
+
135
+ seq_len = input_ids.shape[1]
136
+
137
+
138
+ # ------------------
139
+ # Build token-level relevance_scores
140
+ # ------------------
141
+ relevance_scores = torch.ones(seq_len, dtype=torch.float)
142
+
143
+ # find Passage_i token positions
144
+ passage_starts = []
145
+ for i in range(len(documents)):
146
+ tok = f"Passage_{i+1}:"
147
+ tok_id = tokenizer.convert_tokens_to_ids(tok)
148
+ matches = (input_ids[0] == tok_id).nonzero(as_tuple=True)[0]
149
+ passage_starts.append(matches[0].item())
150
+
151
+ # find assistant start (same logic as dataset)
152
+ im_start = tokenizer.convert_tokens_to_ids("<|im_start|>")
153
+ assistant = tokenizer.convert_tokens_to_ids("assistant")
154
+
155
+ label_start = seq_len
156
+ positions = (input_ids[0] == im_start).nonzero(as_tuple=True)[0].tolist()
157
+ for p in reversed(positions):
158
+ if input_ids[0][p + 1] == assistant:
159
+ label_start = p
160
+ break
161
+
162
+ # compute passage spans
163
+ spans = []
164
+ for i in range(len(passage_starts)):
165
+ s = passage_starts[i]
166
+ e = passage_starts[i + 1] if i < len(passage_starts) - 1 else label_start - 1
167
+ spans.append((s, e))
168
+
169
+ # assign relevance per token
170
+ for i, (s, e) in enumerate(spans):
171
+ relevance_scores[s:e] = norm_scores[i]
172
+
173
+ relevance_scores = relevance_scores.unsqueeze(0).to(device)
174
+
175
+
176
+ # ------------------
177
+ # Generate
178
+ # ------------------
179
+ with torch.no_grad():
180
+ outputs = model.generate(
181
+ input_ids=input_ids,
182
+ attention_mask=attention_mask,
183
+ relevant_scores=relevance_scores,
184
+ max_new_tokens=64,
185
+ do_sample=False,
186
+ pad_token_id=tokenizer.pad_token_id,
187
+ eos_token_id=tokenizer.eos_token_id,
188
+ )
189
+
190
+ answer = tokenizer.decode(
191
+ outputs[0][input_ids.shape[-1]:],
192
+ skip_special_tokens=True,
193
+ ).strip().replace("assistant", "").replace("<|im_start|>\n", "").replace("system\n", "")
194
+
195
+ print("Answer:", answer)
196
+
197
+ # Result:
198
+ # Answer:
199
+ # Ernest Hemingway
200
+
201
+ ```
202
+
203
+ # Citation
204
+ If you find our paper or models helpful, please consider cite as follows:
205
+
206
+ ```
207
+ @article{mo2026opendecoder,
208
+ title={Opendecoder: Open large language model decoding to incorporate document quality in rag},
209
+ author={Mo, Fengran and Su, Zhan and Hui, Yuchen and Zhang, Jinghan and Sun, Jia Ao and Liu, Zheyuan and Zhang, Chao and Sakai, Tetsuya and Nie, Jian-Yun},
210
+ journal={arXiv preprint arXiv:2601.09028},
211
+ year={2026}
212
+ }
213
+ ```