# EDGAR-reports-Text-Analysis
Data from EDGAR filling was extracted and text analysis was performed.
In this project, text data extraction and text analytics was performed on EDGAR filling. The analysis was on done on 10k and 10Q filling. It was performed using python.
## Input
The input files consist of different filling from EDGAR. The format was .txt. Total 152 files were processed.
## Extraction and Analysis
A. Basic cleaning was performed and target sections were extracted using regex.
Target section were -
1. Management's Discussion and Analysis
2. Quantitative and Qualitative Disclosures about Market Risk
3. Risk Factors
B. Different parts of text analysis were performed which included -
1. Sentiment Analysis
2. Analysis of Readability
3. complex word count
4. word count
### Sentiment Analysis
Sentiment Analysis was performed using lexical based approach.
**Positive Score**: This score is calculated by assigning the value of +1 for each word if found in the Positive Dictionary and then adding up all the values.
**Negative Score**: This score is calculated by assigning the value of -1 for each word if found in the Negative Dictionary and then adding up all the values. I multiply the score with -1 so that the score is a positive number.
Polarity Score: This is the score that determines if a given text is positive or negative in nature. It is calculated by using the formula:
**Polarity Score** = (Positive Score – Negative Score)/ ((Positive Score + Negative Score) + 0.000001)
Range is from -1 to +1
All the required dictionaries were created using - https://sraf.nd.edu/textual-analysis/resources/#LM%20Sentiment%20Word%20Lists
### Analysis of Readability
Average sentence length, Fog index, complex word count and total word count were calculated.
The following formulas were used -
**Average Sentence Length** = the number of words / the number of sentences
**Percentage of Complex words** = the number of complex words / the number of words
where Complex words are words in the text that contain more than two syllables.
**Fog Index** = 0.4 * (Average Sentence Length + Percentage of Complex words)
• Apart from these, 6 other metrics were calculated.
• They were -
• positive word proportion
• Negative word proportion
• uncertain word score and proportion
• constraining word score and proportion
Instruction to execute the python note book and script are included in Execution instrictions.pdf
Financial reports can be downloaded from EDGAR server during offline hours.
All the required dictionaries are included in the git.
没有合适的资源?快使用搜索试试~ 我知道了~
EDGAR-reports-Text-Analysis:从EDGAR灌装中提取数据并进行文本分析
共12个文件
txt:5个
md:2个
csv:2个
需积分: 23 6 下载量 140 浏览量
2021-05-19
04:12:15
上传
评论 1
收藏 84KB ZIP 举报
温馨提示
EDGAR-报告-文本分析 从EDGAR灌装中提取数据并进行文本分析。 在该项目中,对EDGAR灌装进行了文本数据提取和文本分析。 分析是在10k和10Q填充时完成的。 它是使用python执行的。 输入 输入文件包含与EDGAR不同的填充。 格式为.txt。 总共处理了152个文件。 提取与分析 A.进行基本清洁,并使用正则表达式提取目标切片。 目标部分是- 管理层的讨论与分析 关于市场风险的定量和定性披露 风险因素 B.进行了文本分析的不同部分,其中包括- 情绪分析 可读性分析 复杂字数 字数 情绪分析 使用基于词法的方法进行情感分析。 积极得分:如果在“积极字典”中找到每个单词,则为其分配+1值,然后将所有值相加即可得出该得分。 否定分数:如果在“否定字典”中找到每个单词,则为其分配-1的值,然后将所有值相加即可得出该分数。 我将分数乘以-1,以便分数为正数。 极性分数
资源详情
资源评论
资源推荐
收起资源包目录
EDGAR-reports-Text-Analysis-master.zip (12个子文件)
EDGAR-reports-Text-Analysis-master
uncertainty_dictionary.txt 3KB
textAnalysisOutput.csv 34KB
constraining_dictionary.txt 2KB
textAnalysisOutput.xlsx 47KB
NegativeWords.txt 26KB
EDGAR extraction and Analysis.ipynb 108KB
EDGAR extraction and Analysis.md 45KB
PositiveWords.txt 4KB
StopWords_Generic.txt 722B
README.md 3KB
EDGAR extraction and Analysis.py 15KB
cik_list1.csv 13KB
共 12 条
- 1
单身的小孩
- 粉丝: 23
- 资源: 4622
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论0