# GPT2 Tokenizer Java
Java implementation of GPT2 tokenizer
## Requirements
Please install the following dependencies to use the library.
```
implementation 'com.google.api-client:google-api-client:1.32.2'
implementation 'org.apache.commons:commons-lang3:3.12.0'
implementation 'org.springframework.boot:spring-boot-starter-web'
testImplementation 'org.junit.jupiter:junit-jupiter-api:5.3.1'
testRuntimeOnly 'org.junit.jupiter:junit-jupiter-engine:5.3.1'
```
## Add tokenizer files to resources directory
Please add `encoder.json` and `vocab.bpe` files to your project resources directory.
these files can be found [here](https://github.com/hyunwoongko/gpt2-tokenizer-java/tree/master/src/main/resources/tokenizers/gpt2).
## Usage
The following are simple examples of this library.
To check test code for this, refer to [here](https://github.com/hyunwoongko/gpt2-tokenizer-java/blob/master/src/test/java/ai/tunib/tokenizer/GPT2TokenizerTest.java).
### Encoding text to tokens
```java
import ai.tunib.tokenizer.GPT2Tokenizer;
import java.util.List;
GPT2Tokenizer tokenizer = GPT2Tokenizer.fromPretrained("PATH/IN/RESOURCES");
List<Integer> result = tokenizer.encode("Hello my name is Kevin.");
```
```
[15496, 616, 1438, 318, 7939, 13]
```
### Decoding tokens to text
```java
import ai.tunib.tokenizer.GPT2Tokenizer;
GPT2Tokenizer tokenizer = GPT2Tokenizer.fromPretrained("PATH/IN/RESOURCES");
String result = tokenizer.decode(List.of(15496, 616, 1438, 318, 7939, 13));
```
```
"Hello my name is Kevin."
```
## License
This project is licensed under the terms of the Apache License 2.0.
Copyright 2022 [Hyunwoong Ko](https://github.com/hyunwoongko). All Rights Reserved.
没有合适的资源?快使用搜索试试~ 我知道了~
gpt token计算源码
共32个文件
bin:7个
properties:6个
lock:5个
5星 · 超过95%的资源 需积分: 0 11 下载量 136 浏览量
2023-02-27
10:23:07
上传
评论
收藏 655KB ZIP 举报
温馨提示
gpt token计算源码
资源推荐
资源详情
资源评论
收起资源包目录
gpt2-tokenizer-java-master.zip (32个子文件)
gpt2-tokenizer-java-master
.gradle
7.4.1
fileChanges
last-build.bin 1B
checksums
md5-checksums.bin 19KB
sha1-checksums.bin 20KB
checksums.lock 17B
executionHistory
executionHistory.bin 188KB
executionHistory.lock 17B
dependencies-accessors
gc.properties 0B
dependencies-accessors.lock 17B
fileHashes
fileHashes.lock 17B
resourceHashesCache.bin 19KB
fileHashes.bin 20KB
gc.properties 0B
buildOutputCleanup
cache.properties 51B
buildOutputCleanup.lock 17B
outputFiles.bin 19KB
vcs-1
gc.properties 0B
file-system.probe 8B
gradle
wrapper
gradle-wrapper.jar 58KB
gradle-wrapper.properties 202B
src
test
java
ai
tunib
tokenizer
GPT2TokenizerTest.java 1KB
main
resources
application.properties 1B
tokenizers
gpt2
encoder.json 1018KB
vocab.bpe 446KB
java
ai
tunib
tokenizer
GPT2Tokenizer.java 8KB
Constants.java 185B
LICENSE 11KB
gradlew.bat 3KB
build.gradle 730B
settings.gradle 31B
gradlew 8KB
.gitignore 7KB
README.md 2KB
共 32 条
- 1
资源评论
- sanbaofengs2023-03-02#完美解决问题 #运行顺畅 #内容详尽 #全网独家 #注释完整
xiaoshun007~
- 粉丝: 3849
- 资源: 3131
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功