# Network-Log-and-Traffic-Analysis
Identify malicious behavior and attacks using Machine Learning with Python
# LAB A
We'll be using IPython and panads functionality in this part.
Our first goal is to get the information from the log files off of disk and into a dataframe.
Since we're working with limited resources we'll use samples of the larger files.
## Requirements
IPython
Pandas
Matplotlib
Seaborn
datetime
warnings
## Tip
To access keyboard shortcuts click on a (non-code) cell or the text "In []" to the left of the cell, and press the *H* key. Or select *Help* from the menu above, and then *Keyboard Shortcuts*. **Very useful saved us a lot of time during editing.
# Business Understanding
## Overview
The dataset that we've selected is from the field of Network Analysis and Security. We are using log files generated by BRO Network Security Monitor as our dataset. The dataset we've choosen has about 20 million records ( about 2 GB in size) and has 22 features with a number of sub-features explained in the feature description sections that follow.
We'll be analyzing the log file, finding the correlation between attack behavioud and the features to come up with probable conclusions and results that helped us in identifying malicious behavior and potential threats and attacks in the network of our dataset.
The plan is to understand the dataset, the features, attack behaviours, and their descriptions in-detail as they are stated by Bro.
We will do a lot of preprocessing including elimination, grouping, standardization, and imputation to try and make the dataset more convenient to work on.
After getting the dataset ready to be processed for extracting valuable statistical information, we then visualized those statistical information using the most appropriate plots (in our case, box plot was used extensively). Then we grouped some of the features (use them to visualize relationships) and then use correlation matrix to represent all relationships between the different features that are important in our analysis (for example,the services and packets generated as well as received have a high corelation).
## Purpose
We selected this dataset because it is a complex as well as a technical dataset that is used on live data retaining value depending on its freshness. We are interested in learning more about security, its attacks, and their patterns.
The amount of real-time processing that can be done by analyzing the data collected can reduce a lot of manual work and catch patterns in attacks that occur over a large period of time that a human cannot identify.
These logs also allow us to see the amount of data being transferred and allowing organizations to allocate bandwidth depending based on the future scope of usage patterns.
# Data
Full dataset available [here](http://www.secrepo.com/Security-Data-Analysis/Lab_1/conn.log.zip). This is the *conn.log*.
没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
收起资源包目录
被攻击网络日志分析和流量分析.zip (3个子文件)
被攻击网络日志分析和流量分析
conn_sample.log 2.61MB
Network Log Attack and Traffic Analysis.ipynb 707KB
README.md 3KB
共 3 条
- 1
资源评论
小码蚁.
- 粉丝: 2670
- 资源: 4523
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功