# EDGAR-reports-Text-Analysis
Data from EDGAR filling was extracted and text analysis was performed.
In this project, text data extraction and text analytics was performed on EDGAR filling. The analysis was on done on 10k and 10Q filling. It was performed using python.
## Input
The input files consist of different filling from EDGAR. The format was .txt. Total 152 files were processed.
## Extraction and Analysis
A. Basic cleaning was performed and target sections were extracted using regex.
Target section were -
1. Management's Discussion and Analysis
2. Quantitative and Qualitative Disclosures about Market Risk
3. Risk Factors
B. Different parts of text analysis were performed which included -
1. Sentiment Analysis
2. Analysis of Readability
3. complex word count
4. word count
### Sentiment Analysis
Sentiment Analysis was performed using lexical based approach.
**Positive Score**: This score is calculated by assigning the value of +1 for each word if found in the Positive Dictionary and then adding up all the values.
**Negative Score**: This score is calculated by assigning the value of -1 for each word if found in the Negative Dictionary and then adding up all the values. I multiply the score with -1 so that the score is a positive number.
Polarity Score: This is the score that determines if a given text is positive or negative in nature. It is calculated by using the formula:
**Polarity Score** = (Positive Score – Negative Score)/ ((Positive Score + Negative Score) + 0.000001)
Range is from -1 to +1
All the required dictionaries were created using - https://sraf.nd.edu/textual-analysis/resources/#LM%20Sentiment%20Word%20Lists
### Analysis of Readability
Average sentence length, Fog index, complex word count and total word count were calculated.
The following formulas were used -
**Average Sentence Length** = the number of words / the number of sentences
**Percentage of Complex words** = the number of complex words / the number of words
where Complex words are words in the text that contain more than two syllables.
**Fog Index** = 0.4 * (Average Sentence Length + Percentage of Complex words)
• Apart from these, 6 other metrics were calculated.
• They were -
• positive word proportion
• Negative word proportion
• uncertain word score and proportion
• constraining word score and proportion
Instruction to execute the python note book and script are included in Execution instrictions.pdf
Financial reports can be downloaded from EDGAR server during offline hours.
All the required dictionaries are included in the git.
单身的小孩
- 粉丝: 23
- 资源: 4622
最新资源
- 基于51单片机与12864无字库液晶的贪吃蛇程序设计源码
- 炫光舞蹈特效-Saber插件的高能应用教程
- 基于matlab的扩展卡尔曼滤波(Extended Kalman Filter,EKF),通过卡尔曼滤波算法近似计算系统的状态估计值和方差估计值,对信号进行滤波 程序已调通,可直接运行 程序保证
- 基于Next.js的Tsx语言驱动的刷题题库系统设计源码
- 基于Ejs框架的英语学习后台源码设计
- PF GNN机器学习预测裂缝扩展
- 路面附着系数估计,采用UKF和EKF两种算法 软件为Matlab Simulink,非Carsim联合仿真 dugoff轮胎模块:纯simulink搭非代码 整车模块:7自由度整车模型 估计模块:
- 基于Yii2+Vue2.0+uniapp的多端易用开店星公众版设计源码
- 三相整流电路;VOC控制;Simulink仿真 三相整流;三相整流器;三相整流转器; 输入交流380V,输出600V直流 输出可按需求更改
- 基于分布式、前后端分离的Chaos设计源码,支持多框架快速开发架构
- 基于Java语言的EcgLineView心电图与折线图自定义View设计源码
- abb机器人视觉引导抓取C#联合halcon联合RobotStudio实现仿真九点标定海康工业相机C#上位机视觉抓取 -本链接只出源码+工作站,不出任何硬件,工业相机请自备 -提供2个版本一个是有海康
- 基于PHP开发的WordPress WebStack导航主题源码下载
- 基于PHP、Bootstrap和MySQL的农产品溯源管理系统设计源码
- 基于Kotlin语言的AppMonitor设计源码,轻松监控App状态变化
- plecs三相并网逆变器序阻抗扫频程序 plecs联合matlab进行扫频 阻抗扫描 电力电子 弱电网 稳定性分析
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
评论0