被攻击网络日志分析和流量分析.zip_网络安全日志分析实战资源-CSDN文库

共3个文件

md：1个

log：1个

ipynb：1个

版权申诉

网络安全

流量分析

175 浏览量 2024-03-20 23:38:12 上传评论收藏 1.04MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

被攻击网络日志分析和流量分析.zip （3个子文件）

被攻击网络日志分析和流量分析

conn_sample.log 2.61MB

Network Log Attack and Traffic Analysis.ipynb 707KB

README.md 3KB

# Network-Log-and-Traffic-Analysis Identify malicious behavior and attacks using Machine Learning with Python # LAB A We'll be using IPython and panads functionality in this part. Our first goal is to get the information from the log files off of disk and into a dataframe. Since we're working with limited resources we'll use samples of the larger files. ## Requirements IPython Pandas Matplotlib Seaborn datetime warnings ## Tip To access keyboard shortcuts click on a (non-code) cell or the text "In []" to the left of the cell, and press the *H* key. Or select *Help* from the menu above, and then *Keyboard Shortcuts*. **Very useful saved us a lot of time during editing. # Business Understanding ## Overview The dataset that we've selected is from the field of Network Analysis and Security. We are using log files generated by BRO Network Security Monitor as our dataset. The dataset we've choosen has about 20 million records ( about 2 GB in size) and has 22 features with a number of sub-features explained in the feature description sections that follow. We'll be analyzing the log file, finding the correlation between attack behavioud and the features to come up with probable conclusions and results that helped us in identifying malicious behavior and potential threats and attacks in the network of our dataset. The plan is to understand the dataset, the features, attack behaviours, and their descriptions in-detail as they are stated by Bro. We will do a lot of preprocessing including elimination, grouping, standardization, and imputation to try and make the dataset more convenient to work on. After getting the dataset ready to be processed for extracting valuable statistical information, we then visualized those statistical information using the most appropriate plots (in our case, box plot was used extensively). Then we grouped some of the features (use them to visualize relationships) and then use correlation matrix to represent all relationships between the different features that are important in our analysis (for example,the services and packets generated as well as received have a high corelation). ## Purpose We selected this dataset because it is a complex as well as a technical dataset that is used on live data retaining value depending on its freshness. We are interested in learning more about security, its attacks, and their patterns. The amount of real-time processing that can be done by analyzing the data collected can reduce a lot of manual work and catch patterns in attacks that occur over a large period of time that a human cannot identify. These logs also allow us to see the amount of data being transferred and allowing organizations to allocate bandwidth depending based on the future scope of usage patterns. # Data Full dataset available [here](http://www.secrepo.com/Security-Data-Analysis/Lab_1/conn.log.zip). This is the *conn.log*.

评论收藏

内容反馈

版权申诉