# MongoDB
### Data Models
There are two types of data models:
1. Relational Model: Data is a set of relations and each relation has well-defined schema. The problem is that it can be hard to maintain this schema. Otherwise, if you have to make changes to schema, you would need to restructure the entire relation which can be cumbersome. Structured Query Language (SQL) is used to query data from relational models.
2. Non-Relational Model:
1. Document Model: Data is set of documents and each document is key-value store. Data is more localized i.e., all the information about a document resides within document. However, it can be harder to join different documents compared to relational models. MongoDB, Azure Cosmos DB, Cassandra DB are examples of No-SQL (Not Only SQL) databases.
2. Graph Model: This type of database is used when relationships between data sources are dominant over data sources itself. Neo4j is an example of Graph database.
### Static v/s Dynamic Models
Static models are those which are trained at pre-defined intervals, for example - 3 months. Training is stateless i.e. models are trained from scratch. Such models are built where things do not change at a fast rate - for example, forecasting.
Dynamic models are those which are trained at short intervals, for example - hourly/daily/weekly etc. Training is stateful i.e. models are finetuned on incoming data. Such models are built where things change at a very fast rate - for example, Twitter Hashtag prediction, Recommendations - YouTube videos/ Instagram Feed, predicting ETA on Zomato.
### Stateful v/s Stateless Training
Once the model is deployed, the model needs to be continuously monitored for following **data distributions shifts**:
1. **Covariate Shift:** P(X) changes, P(Y|X) remains same
2. **Label Shift:** P(Y) changes, P(X|Y) remains same
3. **Concept Drift:** P(X) remains same, P(Y|X) changes
where X is input distribution (features), Y is target distribution (predictions). Consequently, with any significant shift/drift, or periodically otherwise, the models are updated for maintaining performance.
So, the new model can either be:
1. Trained from scratch with incoming predictions - stateless training. Typically, when models are updated at long periods.
2. Finetuned with incoming predictions - stateful training. Typically, when models are updated at short periods.
**Continual Learning:** Stateful learning. Fine-tuning model with incoming predictions while model being deployed. The weights are not updated with every prediction made but over a batch of predictions. The new model also needs to ensure that it's better than previous through A/B testing.
We will look into building a basic web-app using Flask that takes in input as features and makes predictions of Iris type on Iris dataset. After a batch of predictions are generated by user, the model is fine-tuned (Continual Learning - Stateful Learning) on predictions. This helps in model generalize with incoming data and learn in life-long fashion.
### Steps:
1. Training the model
2. Saving the model as artifact (.pkl)
3. Building web-app using Flask
4. MongoDB steps:
1. Establish connection with MongoDB
2. Creating a database
3. Creating a collection (table), where predictions will be stored
5. Running web-app and making predictions
6. Fine-tuning the model with incoming predictions
7. Saving the fine-tuned model
没有合适的资源?快使用搜索试试~ 我知道了~
从头开始的高级机器学习算法
共37个文件
ipynb:26个
py:2个
png:2个
需积分: 1 1 下载量 106 浏览量
2023-04-19
22:39:43
上传
评论
收藏 1.81MB ZIP 举报
温馨提示
包含各种 ML 算法和技术实现的 jupyter 笔记本集合。随着我的学习,我会随着时间的推移继续添加更多的实现。 到目前为止,已实现以下算法/用例: Alternating Least Squares (ALS) 交替最小二乘法 (ALS) Anomaly Detection using Autoencoders 使用自动编码器进行异常检测 Understanding CNN Blocks - ResNet, Inception, Bottleneck etc. 了解CNN块 - ResNet,Inception,Bottleneck等。 Decision Tree from scratch 从头开始的决策树 Denoising Autoencoder (DAE) on MNIST 降噪自动编码器 (DAE) 在 MNIST 上 Entity Embeddings for categorical data 分类数据的实体嵌入 Expectation-Minimization (EM) algorithm 期望最小化 (EM) 算法 Fairness in ML 机器学习中的公平性
资源推荐
资源详情
资源评论
收起资源包目录
archive (6).zip (37个子文件)
sparsity.ipynb 24KB
thompson_sampling.ipynb 73KB
understanding_AUC.ipynb 9KB
opencv_transformations.ipynb 393KB
tokenizers_BPE.ipynb 16KB
mongoDB
app.py 2KB
saved_model.pkl 1KB
templates
index.html 1009B
model.py 818B
README.md 3KB
mongoDB.ipynb 9KB
multi_task_learning.ipynb 67KB
multi_armed_bandits.ipynb 24KB
time_series_ARIMA.ipynb 439KB
statistical_tests.ipynb 201KB
denoising_autoencoder.ipynb 96KB
data
dt_dataset.csv 436B
lego.png 184KB
shapes.png 94KB
sample.jpg 55KB
airline-passengers.csv 2KB
named_entity_recognition.ipynb 49KB
fairness.ipynb 41KB
object_detection_metrics.ipynb 47KB
unet_model.ipynb 138KB
cnn_blocks.ipynb 47KB
entity_embeddings.ipynb 43KB
learning_to_rank.ipynb 109KB
alternating_least_squares.ipynb 28KB
sequential_modelling_LSTM.ipynb 121KB
expectation_minimization.ipynb 46KB
anomaly_detection.ipynb 146KB
semi_supervised_learning.ipynb 61KB
contrastive_learning.ipynb 108KB
decision_tree.ipynb 19KB
README.md 1KB
factorization_machines.ipynb 19KB
共 37 条
- 1
资源评论
Java码库
- 粉丝: 1396
- 资源: 3918
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功