# DataScienceTools
Aim: A Collection of Useful Data Science and Machine Learning Tools,Libraries and Packages
# Data Science Tools List
### Based On General Work Flow
#### Work Flow
![](datascienceworkflow1.png)
#### Work Flow
![](datascienceworkflow2.png)
### Fetching Data/Web Scrabing
- wget
- curl
- beautifulsoup
- mechanicalsoup
- request/urllib
- selenium (headless-browser framework for extracting javascript data)
- scrapy (OOP framework)
- newspaper3k: https://newspaper.readthedocs.io # easily extract text from articles
- requests-html: requests-html: https://github.com/kennethreitz/requests-html
- sql/msql/sqlite
### Data Cleaning
- Pandas
- Dedupe: Remove duplicates
- Fuzzywuzzy: String Matching
- Scrubadab: Anonymize privacy
- Address Parsing/usaddress
- Dora : Working with missing/nulls
- PdfTables : Extracting Tables in PDF
- Tabulate
- Arrow : Dates,Timezone
- Pendulum : Dates
- Inflect/Num2Words : Convert numbers to text
- Imbalance Learn
- Flashtext
### Data Analysis
- Numpy
- Scipy
- Pandas
- Tabel
### Data Visualization
- Matplotlib
- Seaborn
- Plotly
- Bokeh
- Altair
- Dash: dashboard library from plotly
- Dataspyre: dashboard framework with flask backend
- folium
- geoplot
- D-Tale
- plotnine: clone of R's ggplot2
- joypy: https://github.com/sbebo/joypy/blob/master/Joyplot.ipynb
- bqplot
- jmpy
- pyqtgraph
- toyplot
- ipyleaflet: https://github.com/jupyter-widgets/ipyleaflet/
- probscale: easily make probability scaled axis: https://github.com/matplotlib/mpl-probscale
- adjustText: easily add non-overlapping annotated text (https://github.com/Phlya/adjustText/blob/master/docs/source/Examples.ipynb)
- make MATPLOTLIB animations with animatplot: https://animatplot.readthedocs.io/en/stable/
### Machine Learning/Deep Learning/Model Building
- Scikitlearn
- Keras
- Tensorflow
- Theano
- Pytorch
- sklearn-pandas
- imbalanced-learn
- hyperopt-sklearn: https://github.com/hyperopt/hyperopt-sklearn # Not pip installable yet
- tpot
- xgboost
- lightgbm
- fastText
### Natural Language Processing
- NLTK
- SpaCy
- TextBlob
- Stanford NLP
- AllenNLP
- Polyglot
### Computer Vision
- OpenCV
- Scikit Images/Scikit Video
- Pillow
### Workspace and Environment
+ Virtual Env
- Anaconda
- Pipenv
- Venv
+ IDE
- Jupyter lab/Jupyter Notebook
- Nteract
- VS code/Sublime Text/Juno Atom/ etc
- Pycharm,etc
+ Cloud
- Colab
### Serializers
- Joblib
- Pickle
- Ray
- Csvkit
+ Json
### Speed and Large Dataset
- Pandas Modin
- Dask
- Pyarrow
- Fastparquet
- vaex: https://github.com/maartenbreddels/vaex
- Pandas on Ray: https://github.com/modin-project/modin
- dampr: https://github.com/Refefer/Dampr
- Cloud Computing Services (GCP,Bigquery,AWS,Azure)
### Forecasting
- pyramid-arima https://github.com/tgsmith61591/pyramid
- fbprophet: time series forecasting (additive model) which performs best with high frequency data
- pyflux: time series library: https://github.com/RJT1990/pyflux
### PyData stack
- numpy
- scipy
- pandas
- jupyter
- statsmodels
### Profiling
- pandas-profiling: https://github.com/pandas-profiling/pandas-profiling
- dataprofiler: https://github.com/capitalone/DataProfiler
- memory_profiler: https://github.com/pythonprofilers/memory_profiler
- py-spy: https://github.com/benfred/py-spy/blob/master/README.md
- pyflame: https://github.com/uber/pyflame # Does not support Windows
- pyinstrument
- scalene
- cprofile
### Forecasting
- pyramid-arima https://github.com/tgsmith61591/pyramid
- fbprophet: time series forecasting (additive model) which performs best with high frequency data
- pyflux: time series library: https://github.com/RJT1990/pyflux
### Niche stats libraries
- lifelines: survival analysis: https://github.com/CamDavidsonPilon/lifelines
- convoys: https://better.engineering/convoys/
### Jupyter Notebook Related
- ipysheet: https://github.com/QuantStack/ipysheet
- ipypivot: https://github.com/PierreMarion23/ipypivot
- ipytree: https://github.com/QuantStack/ipytree
- papermill- scrapbook: https://github.com/nteract/papermill (parameterized notebooks)
- nteract-scapbook: https://github.com/nteract/scrapbook
- how to pass a dataframe between notebooks: https://github.com/nteract/papermill/issues/215
- but instead, you should really save the dataframe somewhere and then just pass the path of the saved dataframe
- jupytext: edit notebooks as text files! https://github.com/mwouts/jupytext
### Database related
- pyodbc
- turbodbc
- ipython-sql
- db.py (dead project?)
- sqlalchemy
- sqlalchemy-turbodbc
- postgresql
- sqlmodel
### ETL or data engineering related sorted from lightest to heaviest framework
- papermill- scrapbook: https://github.com/nteract/papermill (parameterized notebooks)
- nteract-scapbook: https://github.com/nteract/scrapbook (for passing data between notebooks)
- dequindre: https://github.com/vogt4nick/dequindre
- petl: https://github.com/petl-developers/petl
- bonobo
- pypeln: https://github.com/cgarciae/pypeln/
- botflow: https://github.com/kkyon/botflow
- https://github.com/mara/data-integration
- dbt: https://www.getdbt.com/
- Spotify Luigi - (works with Windows)
- Apache Airflow - Windows not supported (it is a PITA to try to install on Windows)
- prefect: https://docs.prefect.io
### Data validation and cleaning frameworks
- https://github.com/pyeve/cerberus
- https://github.com/great-expectations/great_expectations
- https://github.com/cosmicBboy/pandera
- https://pyjanitor.readthedocs.io/
- https://github.com/keleshev/schema
- https://github.com/TMiguelT/PandasSchema
- https://github.com/TomAugspurger/engarde
### R related
- rpy2
- plydata (dplyr clone)
- plotnine (ggplot2 clone)
### Machine Learning Related - mostly for tabular data or non-NN
- scikit-learn
- sklearn-pandas
- imbalanced-learn
- hyperopt-sklearn: https://github.com/hyperopt/hyperopt-sklearn # Not pip installable yet
- tpot
- xgboost
- lightgbm
- fastText
https://github.com/kvh/recurrent - extract datetimes from English sentence
### Webscraping
- beautifulsoup4
- mechanicalsoup
- selenium (headless-browser framework for extracting javascript data)
- scrapy (OOP framework)
- newspaper3k: https://newspaper.readthedocs.io # easily extract text from articles
- requests-html: requests-html: https://github.com/kennethreitz/requests-html
### Data Web Apps and ML Web Apps
- Streamlit : https://github.com/streamlit
- Gradio : https://github.com/gradio-app/gradio
- Mercury Mljar: https://github.com/mljar/mercury
- Panel : https://github.com/holoviz/panel
- Dash
- Databutton
- Flask
- Django
- FastAPI
### Utilities
- https://github.com/tldr-pages/tldr-python-client # replacement for man pages
- bropages (http://bropages.org/): sudo apt-get install ruby-dev, sudo gem install bropages
- https://github.com/gleitz/howdoi
- inspect https://docs.python.org/3/library/inspect.html
- prettypandas
- https://github.com/seatgeek/fuzzywuzzy
- https://github.com/RobinL/fuzzymatcher
- pytest
- requests
- requests-html: https://github.com/kennethreitz/requests-html
- psutil
- pdir2: https://github.com/laike9m/pdir2
- helping: https://github.com/ConquerProgramming1/helping
### Amazon Web Services
- PyAthena - use plain 'ol SQL: https://github.com/laughingman7743/PyAthena
### CLI
- click - for making CLI
- fire - for making CLI
https://github.com/tmbo/questionary
### Progress Bars
- tqdm: https://github.com/tqdm/tqdm
- fastprogress: https://github.com/fastai/fastprogress
### Misc.
- Sending Windows 10 notifications: https://github.com/jithurjacob/Windows-10-Toast-Notifications
- Another Windows notification library: https://github.com/malja/zroya
- glances: CPU/memory monitoring
- pendul
没有合适的资源?快使用搜索试试~ 我知道了~
有用的数据科学和机器学习工具、库和包___下载.zip
共81个文件
ipynb:30个
csv:13个
png:12个
1.该资源内容由用户上传,如若侵权请联系客服进行举报
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
版权申诉
0 下载量 119 浏览量
2023-04-19
00:50:13
上传
评论
收藏 74.44MB ZIP 举报
温馨提示
有用的数据科学和机器学习工具、库和包___下载.zip
资源推荐
资源详情
资源评论
收起资源包目录
有用的数据科学和机器学习工具、库和包___下载.zip (81个子文件)
DataScienceTools-master
PyPDF_CrashCourse
mermaid-diagram-2023-01-08-081356.png 56KB
Nativity_Example.pdf 52KB
PyPDF2 Crash Course.ipynb 36KB
ML_Wiki.pdf 1017KB
Awesome_Online_Maps.md 4KB
datascienceworkflow2.png 109KB
Timeseries Analysis with Facebook Prophet
Time Series Analysis with FaceBook Prophet Tutorial.ipynb 1.1MB
timeseries_patterns.jpeg 70KB
facebookprophet.png 6KB
Facebook's Prophet Time Series Analysis.ipynb 451KB
flights_data.csv 667B
Data Science Tools- Pandas Summary.ipynb 31KB
HuggingFace-Datasets-Package
HuggingFace Datasets Tutorial.ipynb 149KB
datascienceworkflow1.png 176KB
Data_Analysis_of_Covid19_Tweets
data
covid19_tweets.csv 65.53MB
Data Analysis of Covid19 Tweets.ipynb 2.2MB
Data Analysis of Covid19 Tweets.pdf 1.52MB
XGBoost Tutorial
bank-additional-full.csv 5.56MB
Data Science Tools-Intro to XGBoost Python.ipynb 898KB
Memory_Profiler_Tutorial
report_mem.log 474B
app.py 507B
scripts_2.py 359B
scripts.py 565B
mprofile_20210606185840.dat 2KB
Data_analysis_of_Coronavirus_Outbreak_with_Python
coronavirus_data_clean.csv 284KB
data
coronavirus_data.csv 197KB
coronavirus_dataset.csv 143KB
coronavirus_data.csv.xlsx 182KB
Comparative Data Analysis of Disease Pandemic Sars Ebola Covid.ipynb 1.74MB
dataset
pandemic_sars_dataset.csv 100KB
coronavirus_dataset_20200403-142011.csv 831KB
pandemic_ebola_dataset.csv 111KB
Time_Series_Prediction_of_Coronavirus_Outbreak_with_Facebook_Prophet.ipynb 425KB
Comparative Data Analysis of Coronavirus,Ebola and Sars Outbreak.ipynb 1.87MB
Data Analysis of Coronavirus Outbreak (nCov19) with Python and Geopandas.ipynb 1.74MB
data.csv 497B
Fetching_Data
Fetching_Finance_Data_For_Timeseries.pdf 221KB
Fetching_Finance_Data_For_Timeseries.ipynb 4.53MB
PyPolars_Data_Analysis
PyPolars Tutorial-Crash Course.ipynb 142KB
data
diamonds.csv 2.64MB
api_polars.svg 5KB
Securing_Data_in_Python
Securing Data & Password in Python.ipynb 26KB
password_hashing_vs_encryption_jcharistechol.png 66KB
secret_filed.key 44B
securing_data_jcharistech.png 31KB
secret_file.key 44B
password_hashing_vs_encoding_encryption2_jcharistech.png 97KB
password_hashing_vs_encryption_jcharistech.png 93KB
WhatliesTuts
.ipynb_checkpoints
Whatlies Python Tutorial-checkpoint.ipynb 196KB
Whatlies Python Tutorial.ipynb 196KB
Working with Large Files with Python.ipynb 72KB
SnorkelPython_Tutorial
Snorkel_Python_Tutorial.ipynb 106KB
SnorkelWorkflow_JCharisTech (1).png 89KB
Dataset_Labeling_with_Snorkel_Python.ipynb 104KB
Extracting Metadata From Images and Audio
logo1.png 82KB
SAVIOUR KING BY DON MOEN MP3 DOWNLOAD.mp3 10.2MB
Extracting Metadata From Images and Audio.ipynb 8.02MB
14-I-Will-Sing.mp3 4.66MB
image_ex1.jpg 4.9MB
image_ex2.jpg 6.22MB
logo2.jpg 17KB
DataScience Tools - How to convert numbers to words.ipynb 7KB
DataScience Tools - Exploring Missing Values or NaN .ipynb 62KB
Python Pathlib Tutorial.ipynb 40KB
Julia-Data_Analysis_of_Coronavirus
Julia Data Analysis of Covid19 or Coronavirus.ipynb 482KB
PySpark-Crash-Course
data
hcvdata.csv 45KB
PySpark Crash Course in 50 Minutes.ipynb 52KB
DataScience Tools - How to Anonymize Data.ipynb 4KB
iris.csv 4KB
Interpreting ML Models with LIME and Eli5.ipynb 3.36MB
README.md 9KB
Julia_For_DataScience-Tools and Packages
julialogo.png 4KB
DataScience_Tools_In_Julia.md 5KB
FeedParser_Tutorial
FeedParser Tutorial Python.ipynb 153KB
Protein-Analysis_of_Covid19-using-BioPython
Covid_sequence.fasta 30KB
proteinsynthesis01.png 78KB
Comparative Sequence Analysis of Covid,SARS,MERS,Ebola using BioPython.ipynb 83KB
mmdb_6LU7.pdb 389KB
proteinsynthesis02.jpg 48KB
BioPython Crash Course.ipynb 1.45MB
sequence.fasta 30KB
共 81 条
- 1
资源评论
快撑死的鱼
- 粉丝: 1w+
- 资源: 9154
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 演讲稿.txt
- 基于Python的爬虫案例-软科中国大学TOP200
- 碳排放权交易明细数据(2024年5月更新).xlsx
- 特殊文件属性命令chattr和lsattr
- HTML、CSS 和 JavaScript动态、交互式的网页 .txt
- b0cd8f9b23d4e5e381b6a8fd8ee0e907.JPG
- ff45d61c5900e45634cf4cac6cff61a1.JPG
- springboot.springboot.springboot.springboot.txt
- linux-进程与服务管理
- 毕业设计基于Django+MySQL+Redis实现简单的天气预报系统python源码.zip
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功