File size: 1,601 Bytes
17433e5
 
 
07f7d59
 
17433e5
 
07f7d59
 
 
 
690344c
 
 
 
 
 
 
07f7d59
 
 
690344c
 
 
 
07f7d59
690344c
 
07f7d59
690344c
07f7d59
690344c
07f7d59
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
---
datasets:
- code-search-net/code_search_net
language:
- en
base_model:
- openai-community/gpt2
pipeline_tag: text-generation
library_name: transformers
tags:
- code
---

## Detailed Model Description

A GPT-2-based tokenizer further trained on 400 k+ Python functions. It keeps the original BPE backbone, adds robust encoding for indentation, common keywords, operators and camel-case variables, and is ready for any code-generation or code-understanding pipeline.

## Usage Examples:
```
example = """
  class LinearLayer():
    def __init__(self, input_size, output_size):
        self.weight = torch.randn(input_size, output_size)
       self.bias = torch.zeros(output_size)

    def __call__(self, x):
       return x @ self.weights + self.bias
    """
```
Performance:
```
['class', 'ĠLinear', 'Layer', '():', 'ĊĠĠĠ', 'Ġdef', 'Ġ__', 'init', '__(', 'self', ',', 'Ġinput', '_', 'size', ',', 'Ġoutput', '_', 'size', '):', 'ĊĠĠĠĠĠĠĠ', 'Ġself', '.', 'weight', 'Ġ=', 'Ġtorch', '.', 'randn', '(', 'input', '_', 'size', ',', 'Ġoutput', '_', 'size', ')', 'ĊĠĠĠĠĠĠ', 'Ġself', '.', 'bias', 'Ġ=', 'Ġtorch', '.', 'zeros', '(', 'output', '_', 'size', ')', 'ĊĊĠĠ', 'Ġdef', 'Ġ__', 'call', '__(', 'self', ',', 'Ġx', '):', 'ĊĠĠĠĠĠĠ', 'Ġreturn', 'Ġx', 'Ġ@', 'Ġself', '.', 'weights', 'Ġ+', 'Ġself', '.', 'bias', 'ĊĠĠĠĠ']
```

## Datasets feature(Train):
```
Dataset({
    features: ['repo', 'path', 'func_name', 'original_string', 'language', 'code', 'code_tokens', 'docstring', 'docstring_tokens', 'sha', 'url', 'partition'],
    num_rows: 412178
})
```