Calculate cosine similarity
=============
Bochao Zhang
This script will read data from immuneDB and calculate the cosine similarities between samples of different charateristic.<br>
## Usage
```
-d name of database
-s name of subject
-f field of the columns used to separate data
-t size threshold, lower bound clone size, see methods below
```
For example
```
bash calCosSim.sh -d lp11 -s D207 -f tissue -t 20
```
will calculate the cosine similarities between tissue samples of subject D207 from database lp11, using only clones that have at least 20 instances in at least one tissue
** Note: you will need permission to access databases, replace your username and pwd in security.cnf. **
## Methods
### Instance
We considered clone size to be the sum of the number of uniquely mutated sequences and all the different instances of the same unique sequence that are found in separate sequencing libraries. We refer to this hybrid clone size measure as “unique sequence instances”.
### Lower bound clone size
When we say two compartments overlap or lack overlap, it is important to make sure we have enough coverage of the whole scenario so the lack of overlaps is not a result of under-sampling. Only clones with larger sizes will be sufficiently sampled to demonstrate overlap or lack of overlap. This lower bound clone size is defined as at least *X* instances in at least compartment. And they are generally referred to as C*X* clones, where *X* denotes the lower bound clone size.
### Calculation
The cosine similarity between different compartments is calculated as:
![equation](http://www.sciweavers.org/tex2img.php?eq=%5Cfrac%7B%5Csum_%7Bi%3D1%7D%5E%7Bn%7D%20A_iB_i%7D%7B%5Csqrt%7B%5Csum_%7Bi%3D1%7D%5E%7Bn%7D%20A_i%5E2%7D%5Csqrt%7B%5Csum_%7Bi%3D1%7D%5E%7Bn%7D%20B_i%5E2%7D%7D&bc=White&fc=Black&im=jpg&fs=12&ff=arev&edit=0)
where *Ai* and *Bi* are components of vectors *A* and *B*, respectively. Each attribute in vector *A* or *B* represents the number of samples in compartment 1 and compartment 2, respectively.
The value of cosine similarity will be in range of [0,1], with 0 meaning no similarity at all and 1 meaning completely similarity.
## Output files
The code will out put three files, each with prefix:
[subject]-[feature]-[C*X*]-
in which *X* denotes the lower bound clone size. The three files are:
**instanceTable.tsv**: each row is a clone, starts with a uniquely assigned clone id, and each column is the number of total instances in each compartment.
**sampleTable.tsv**: each row is a clone, starts with a uniquely assigned clone id, and each column is the number of samples in each compartment.
**externalSimilarity.tsv**: a symmetrical table with each compartment on both rows and columns. Each cell is the cosine similarity between compartment of row and column. Cells on diagonal will always have value of 1.
## Optional figures
You can make figures of cosine similarity using drawColSim.m (requires Matlab).
Type 'help drawColSim' for more information.
没有合适的资源?快使用搜索试试~ 我知道了~
余弦相似度计算matlab代码-cosineSimilarity:计算隔间之间的余弦相似度
共5个文件
py:1个
m:1个
md:1个
5星 · 超过95%的资源 需积分: 39 47 下载量 96 浏览量
2021-06-15
16:02:36
上传
评论
收藏 4KB ZIP 举报
温馨提示
余弦相似度计算matlab代码计算余弦相似度 张伯超 该脚本将从免疫数据库中读取数据并计算不同特征样本之间的余弦相似度。 用法 -d name of database -s name of subject -f field of the columns used to separate data -t size threshold, lower bound clone size, see methods below 例如 bash calCosSim.sh -d lp11 -s D207 -f tissue -t 20 将计算来自数据库 lp11 的受试者 D207 的组织样本之间的余弦相似度,仅使用在至少一个组织中具有至少 20 个实例的克隆 ** 注意:您将需要访问数据库的权限,在security.cnf 中替换您的用户名和密码。 ** 方法 实例 我们认为克隆大小是独特突变序列的数量和在不同测序文库中发现的相同独特序列的所有不同实例的总和。 我们将这种混合克隆大小度量称为“唯一序列实例”。 下限克隆大小 当我们说两个隔间重叠或缺少重叠时,重要的是要确保我们对整个场景有足够的覆盖范
资源详情
资源评论
资源推荐
收起资源包目录
cosineSimilarity-master.zip (5个子文件)
cosineSimilarity-master
calCosSim.py 1KB
README.md 3KB
security.cnf 56B
calCosSim.sh 3KB
drawColSim.m 805B
共 5 条
- 1
weixin_38518376
- 粉丝: 5
- 资源: 909
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 52444419078385995661728637838100.jpg
- 52444419078385995661728637906548.jpg
- IMG_20241011_235053.jpg
- Windows7的64位系统可安装使用的火狐和谷歌浏览器安装包
- CapCut_12.0(1).ipa
- 电视盒子外置系统启动器 外置系统启动器-1.1.apk
- 基于STM32F103 + cubeMX6.7的Freertos Demo工程详细步骤与说明
- 计算机二级备考需要.zip
- java写的小射击游戏资源.zip
- opencascade-7.5.0预编译库,使用环境window QT5.14.2,编译器mingw32位,64位
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论5