此存储库是AWS博客文章的一部分，该文章描述了如何组合和利用AmazonKinesis、AWSGlue和AmazonSa.zip资源-CSDN文库

共11个文件

md：3个

py：2个

ipynb：2个

版权申诉

9 浏览量 2023-09-17 14:57:59 上传评论收藏 82KB ZIP 举报

标题中的“此存储库是AWS博客文章的一部分，该文章描述了如何组合和利用AmazonKinesis、AWSGlue和AmazonSageMaker”揭示了一个关键的信息，即这个压缩包内容与一个AWS（亚马逊网络服务）的实战案例有关，特别是涉及到三个核心的服务：Amazon Kinesis、AWS Glue以及Amazon SageMaker。接下来，我们将深入探讨这三个AWS服务的关键知识点。 **Amazon Kinesis** 是AWS提供的一个实时数据流处理服务。它设计用于收集、处理和分析大量的实时数据，例如来自传感器、应用程序日志或用户行为数据。主要功能包括： 1. **数据摄取**：Kinesis Data Streams可以实时地摄取高吞吐量的数据，并且保证数据的顺序。 2. **数据处理**：通过Kinesis Data Analytics或者自定义的应用程序，可以在数据到达时实时处理数据。 3. **数据存储**：Kinesis Data Firehose可将数据流直接存入S3、Redshift或其他服务，方便后续分析。 4. **灵活性**：支持多种消费者并行处理数据，适应不同的实时分析需求。 **AWS Glue** 是一个完全托管的数据集成服务，帮助企业轻松找到、准备和加载数据。主要特性包括： 1. **数据目录**：Glue Catalog提供元数据存储，使数据在AWS服务间可被发现。 2. **ETL（提取、转换、加载）**：Glue ETL可以自动化创建和管理数据转换作业，支持多种数据源和目标。 3. **爬虫**：自动发现和分类数据表，更新数据结构信息。 4. **调度和监控**：配合AWS Lambda或Step Functions，Glue作业可以按计划运行，并通过CloudWatch监控。 **Amazon SageMaker** 是一个全面的机器学习和深度学习服务，涵盖了从数据预处理到模型训练、优化、部署的全过程。其关键功能包括： 1. **开发环境**：提供Jupyter笔记本环境，便于数据科学家进行探索性数据分析和建模。 2. **训练**：支持多种机器学习框架，如TensorFlow、PyTorch等，可以高效地训练大规模模型。 3. **模型优化**：通过自动机器学习（AutoML）功能，SageMaker能自动选择最佳模型和超参数。 4. **部署**：一键部署模型至生产环境，如SageMaker Endpoint，支持在线预测。 5. **监控和调优**：提供模型质量和性能监控，帮助优化模型效果。根据压缩包中"amazon-sagemaker-predictive-maintenance-main"的文件名，我们可以推测这个案例可能涉及到使用Amazon SageMaker进行预测性维护。预测性维护是一种利用机器学习预测设备故障的技术，以减少停机时间和维修成本。具体可能的过程可能包括从Kinesis Data Streams收集设备传感器数据，使用AWS Glue进行数据清洗和预处理，然后在SageMaker上训练预测模型，最终将模型部署到生产环境，实现实时故障预测。这个案例展示了如何在AWS云环境中构建一个完整的实时数据处理和机器学习解决方案，从数据的实时采集到分析和预测，体现了AWS服务的高度集成和灵活性。

资源推荐

资源详情

资源评论

收起资源包目录

此存储库是AWS博客文章的一部分，该文章描述了如何组合和利用AmazonKinesis、AWSGlue和AmazonSa.zip （11个子文件）

amazon-sagemaker-predictive-maintenance-main

LICENSE 927B

CONTRIBUTING.md 3KB

sam-template

samconfig.toml 446B

glue_streaming

app.py 3KB

template.yaml 9KB

invoke_endpoint_async

app.py 930B

CODE_OF_CONDUCT.md 309B

images

NRT ML Inference Reference Arch.png 65KB

README.md 5KB

notebooks

ModelTraining-Evaluation-and-Deployment.ipynb 17KB

Data_Pre-Processing.ipynb 13KB

## Building a Predictive Maintenance solution with AWS Kinesis, AWS Glue & Amazon SageMaker Organizations are increasingly building and leveraging Machine Learning (ML) powered solutions for a variety of use cases and problems ranging from predictive maintenance of machine parts, product recommendations based on customer preferences, credit profiling, content moderation, fraud detection etc. among many others. In many of these scenarios, the effectiveness and benefits derived from these ML powered solutions can be further enhanced, when they can process and derive insights from data events in near real-time. While the business value and benefits of near real-time ML powered solutions are well established, the architecture required to implement these solutions at scale, with optimum reliability and performance are complicated. This blog post describes how you can combine Amazon Kinesis, AWS Glue & Amazon SageMaker to build a near real-time feature engineering and inference solution for predictive maintenance. ### Use Case Overview We focus on a predictive maintenance use case where sensors deployed in the field (industrial equipment, network devices etc.), need to replaced or rectified before they become faulty and cause downtime. Downtime can be expensive for businesses and can lead to poor customer experience. Predictive maintenance powered by a ML model can also help in augmenting the regular schedule-based maintenance cycles, by informing when a machine part in good condition should not be replaced and therefore avoid unnecessary cost. In this post we will specifically focus on applying machine learning to a synthetic dataset containing machine failures due to features such as air temperature, process temperature, rotation speed, torque and tool wear. The dataset used is sourced from UCI Data Repository and more information can be found here: https://archive.ics.uci.edu/ml/datasets/AI4I+2020+Predictive+Maintenance+Dataset - Tool Wear Failure (TWF) - Heat Dissipation Failure (HDF) - Power Failure (PWF) - Over-strain Failure (OSF) - Random Failure (RNF) The machine failure consists of five independent failure modes: The 'machine failure' label indicates, whether the machine has failed for a particular data point, if any of the following failure modes are true. If at least one of the above failure modes is true, the process fails and the 'machine failure' label is set to 1. The objective for the ML model is to identify machine failures correctly, so a downstream predictive maintenance action can be initiated. ### Architecture Overview For our predictive maintenance use case, we assume that device sensors stream various measurements and readings about machine parts. Our solution then takes a slice of streaming data each time(micro-batch), performs processing and feature engineering to create features. The created features are then used to generate inferences from a trained and deployed ML model in near-real time. The generated inferences then can be further processed and consumed by downstream applications, to take appropriate actions and initiate maintenance activity. The following diagram shows the architecture of our overall solution. ![arch](https://github.com/aws-samples/amazon-sagemaker-predictive-maintenance/blob/94ea3a0bc82ff52423897454f1c36c8a3e961ae7/images/NRT%20ML%20Inference%20Reference%20Arch.png) The solution broadly consists of the following sections: - Streaming Data Source & Ingestion - We use Amazon Kinesis Data Streams to collect streaming data from the field sensors at scale and make available for further processing - Near Real-time Feature Engineering - We use AWS Glue Streaming jobs to read data from a Kinesis Data Stream and perform data processing and feature engineering, before storing the derived features in a S3 location. Amazon S3 provides reliable and cost-effective option to store large volumes of data. - Model Training & Deployment - We use the AI4I predictive maintenance dataset from UCI Data Repository to train a ML model based on XGBoost algorithm using Amazon SageMaker. We then deploy the trained model to a SageMaker Asynchronous Inference endpoint. - Near Real-time ML Inference - Once the features are available in S3, we need to generate inferences from the deployed model in near real time. SageMaker Asynchronous Inference endpoints are well suited for this requirement as they support larger payload sizes (up to 1 GB) and can generate inferences within minutes (up to a maximum of 15 minutes). We use S3 event notifications to run a Lambda function to invoke a SageMaker endpoint, asynchronously. SageMaker Asynchronous Inference endpoints accept S3 locations as input, generate inference from the deployed model and write these inferences back to S3 in near real time. Refer to this [blog]( https://aws.amazon.com/blogs/machine-learning/build-a-predictive-maintenance-solution-with-amazon-kinesis-aws-glue-and-amazon-sagemaker/) for details on how to deploy the solution. ## Security See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information. ## License This library is licensed under the MIT-0 License. See the LICENSE file.

评论收藏

内容反馈

版权申诉