EDGAR-reports-Text-Analysis:从EDGAR灌装中提取数据并进行文本分析资源-CSDN文库

共12个文件

txt：5个

md：2个

csv：2个

JupyterNotebook

需积分: 23 43 浏览量 2021-05-19 04:12:15 上传评论 1 收藏 84KB ZIP 举报

资源详情

资源评论

资源推荐

收起资源包目录

EDGAR-reports-Text-Analysis-master.zip （12个子文件）

EDGAR-reports-Text-Analysis-master

uncertainty_dictionary.txt 3KB

textAnalysisOutput.csv 34KB

constraining_dictionary.txt 2KB

textAnalysisOutput.xlsx 47KB

NegativeWords.txt 26KB

EDGAR extraction and Analysis.ipynb 108KB

EDGAR extraction and Analysis.md 45KB

PositiveWords.txt 4KB

StopWords_Generic.txt 722B

README.md 3KB

EDGAR extraction and Analysis.py 15KB

cik_list1.csv 13KB

# EDGAR-reports-Text-Analysis Data from EDGAR filling was extracted and text analysis was performed. In this project, text data extraction and text analytics was performed on EDGAR filling. The analysis was on done on 10k and 10Q filling. It was performed using python. ## Input The input files consist of different filling from EDGAR. The format was .txt. Total 152 files were processed. ## Extraction and Analysis A. Basic cleaning was performed and target sections were extracted using regex. Target section were - 1. Management's Discussion and Analysis 2. Quantitative and Qualitative Disclosures about Market Risk 3. Risk Factors B. Different parts of text analysis were performed which included - 1. Sentiment Analysis 2. Analysis of Readability 3. complex word count 4. word count ### Sentiment Analysis Sentiment Analysis was performed using lexical based approach. **Positive Score**: This score is calculated by assigning the value of +1 for each word if found in the Positive Dictionary and then adding up all the values. **Negative Score**: This score is calculated by assigning the value of -1 for each word if found in the Negative Dictionary and then adding up all the values. I multiply the score with -1 so that the score is a positive number. Polarity Score: This is the score that determines if a given text is positive or negative in nature. It is calculated by using the formula: **Polarity Score** = (Positive Score – Negative Score)/ ((Positive Score + Negative Score) + 0.000001) Range is from -1 to +1 All the required dictionaries were created using - https://sraf.nd.edu/textual-analysis/resources/#LM%20Sentiment%20Word%20Lists ### Analysis of Readability Average sentence length, Fog index, complex word count and total word count were calculated. The following formulas were used - **Average Sentence Length** = the number of words / the number of sentences **Percentage of Complex words** = the number of complex words / the number of words where Complex words are words in the text that contain more than two syllables. **Fog Index** = 0.4 * (Average Sentence Length + Percentage of Complex words) • Apart from these, 6 other metrics were calculated. • They were - • positive word proportion • Negative word proportion • uncertain word score and proportion • constraining word score and proportion Instruction to execute the python note book and script are included in Execution instrictions.pdf Financial reports can be downloaded from EDGAR server during offline hours. All the required dictionaries are included in the git.