# Very Basic Draft
# Create Index Configuration
file: sourcerer-cc.properties
set following:
```IS_SHARDING=true
MIN_TOKENS=65
MAX_TOKENS=500000
SHARD_MAX_NUM_TOKENS=<comma seperated list of numbers. 75,90,100 would mean we want 4 shards: 65-75, 76-90, 91-100, 101-500000>
```
# Running SourcererCC
execute
`ant cdi`
## Step 1: Init
Modify runnodes.sh, only one for loop should exist there. It should look like as shwn below. Here num_nodes is the number of processes you want to create to do this step.
```
for i in $(seq 1 1 $num_nodes)
do
java -Dproperties.location="$rootPATH/NODE_$i/sourcerer-cc.properties" -Xms10g -Xmx10g -jar dist/indexbased.SearchManager.jar init $threshold &
done
```
Now run, `./runnodes.sh `
if you want to run 100 processes, run `./runnodes.sh 100`
# Step 2: Index
Modify runnodes.sh, only one for loop should exist there. It should look like as shwn below.
```
for i in $(seq 1 1 $num_nodes)
do
java -Dproperties.location="$rootPATH/NODE_$i/sourcerer-cc.properties" -Xms20g -Xmx20g -jar dist/indexbased.SearchManager.jar index $threshold &
done
```
# step 3: Merge
`ant cdmerge`
```java -Dproperties.location="$rootPATH/sourcerer-cc.properties" -Xms20g -Xmx20g -jar dist/indexbased.IndexMerger.jar merge```
# step 4: Search
In sourcerer-cc.properties, set the min_tokens and max_tokens values. Files with tokens between min_tokens and max_tokens will be considered, rest will be ignored.
Set
SEARCH_SHARD_ID=<shrad id>
shard id is 1 for 65-75, 2 for 76-90, and so on. (Yes it is manual right now)
Modify runnodes.sh, only one for loop should exist there. It should look like as shown below.
```
for i in $(seq 1 1 $num_nodes)
do
java -Dproperties.location="$rootPATH/NODE_$i/sourcerer-cc.properties" -Xms2g -Xmx2g -jar dist/indexbased.SearchManager.jar search $threshold &
done
```
没有合适的资源?快使用搜索试试~ 我知道了~
yuchen-xia / 毕业设计
共208个文件
class:56个
java:48个
py:26个
需积分: 0 0 下载量 138 浏览量
2023-03-13
21:58:41
上传
评论
收藏 109.31MB ZIP 举报
温馨提示
基于flask的token级别克隆检测系统 安装教程 基于flask的token级别克隆检测系统 使用说明 python app.py即可运行 上传github上下载的zip 选择相应语言,点击提交 点击下载结果即可得到结果的json文件 毕业快乐!
资源推荐
资源详情
资源评论
收起资源包目录
yuchen-xia / 毕业设计 (208个子文件)
SearchManager.class 25KB
Util.class 13KB
FileParser.class 12KB
WordFrequencyStore.class 11KB
ClonesBugsAssembler.class 10KB
CodeSearcher.class 8KB
IndexMerger.class 7KB
SummaryProcessor.class 7KB
CandidateProcessor.class 7KB
TermSearcher.class 7KB
Shard.class 6KB
CloneValidator.class 6KB
DocumentMaker.class 6KB
QueryLineProcessor.class 5KB
ClonesNamesAssembler.class 5KB
Aggregator.class 5KB
ClonedMethod.class 5KB
InvertedIndexCreator.class 4KB
QueryBlock.class 4KB
Tokenizer.class 4KB
CloneBugPattern.class 4KB
CandidateSearcher.class 4KB
InputGen.class 4KB
ThreadedChannel.class 3KB
FixIdMethodMap.class 3KB
TokensFileReader.class 3KB
ForwardIndexCreator.class 3KB
ConcurrentReader.class 3KB
TestGson.class 3KB
Bag.class 3KB
Util$3.class 3KB
BagSorter.class 3KB
TermFreq.class 2KB
GenerateInput.class 2KB
CustomCollector.class 2KB
SearchManager$1.class 2KB
QueryFileProcessor.class 2KB
CloneReporter.class 2KB
BlockInfo.class 2KB
CustomCollectorFwdIndex.class 2KB
TokenFrequency.class 2KB
CandidatePair.class 1KB
Token.class 1KB
ThreadedChannel$1.class 1KB
TokenInfo.class 1KB
Util$2.class 972B
SummaryProcessor$Project.class 830B
ClonePair.class 819B
IndexMerger$1.class 810B
TestGson$1.class 654B
Dummy.class 574B
Util$1.class 561B
CandidateSimInfo.class 556B
QueryCandidates.class 426B
ITokensFileProcessor.class 239B
IListener.class 118B
database.db 12KB
.DS_Store 14KB
blocks.file 130KB
.gitignore 75B
.gitignore 8B
index.html 10KB
index.html 9KB
login.html 3KB
config.ini 1KB
config.ini 864B
lucene-core-4.6.0.jar 2.24MB
guava-15.0.jar 2.07MB
lucene-analyzers-common-4.6.0.jar 1.52MB
log4j-core-2.7.jar 1.24MB
lucene-queryparser-4.6.0.jar 374KB
commons-lang3-3.1.jar 308KB
log4j-api-2.7.jar 214KB
gson-2.2.4.jar 186KB
commons-io-2.4.jar 181KB
json-lib-2.4-jdk15.jar 155KB
eproperties-1.1.5.jar 83KB
objenesis-tck-2.0.jar 62KB
commons-logging-1.2.jar 60KB
cloning-1.9.0.jar 23KB
SearchManager.java 37KB
FileParser.java 18KB
Util.java 15KB
ClonesBugsAssembler.java 12KB
WordFrequencyStore.java 12KB
TermSearcher.java 9KB
SummaryProcessor.java 8KB
CodeSearcher.java 8KB
IndexMerger.java 7KB
CloneValidator.java 7KB
DocumentMaker.java 6KB
CandidateProcessor.java 6KB
Shard.java 6KB
Aggregator.java 6KB
Tokenizer.java 6KB
QueryLineProcessor.java 4KB
ClonesNamesAssembler.java 4KB
ClonedMethod.java 4KB
CloneBugPattern.java 3KB
QueryBlock.java 3KB
共 208 条
- 1
- 2
- 3
资源评论
墨柒子
- 粉丝: 15
- 资源: 196
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功