没有合适的资源?快使用搜索试试~ 我知道了~
内容概要:本文档详细阐述了 INFS3200 第二学期 2024 年第二次作业的各项规定以及实施步骤,其中包括数据连接任务和技术问答部分的要求,特别是使用数据仓库对电子产品批发行进行销售数据分析的工作内容;需要完成基于 PostgreSQL 数据集记录相似性的编辑距离与三项组测量法对比,并建立符合特定模型的数据仓库来帮助销售业绩分析,还要求创建视图以便提取收益汇总。适合具有数据科学基础知识的学生使用。 适合人群:具备数据库管理基本技能,熟悉 SQL 查询语句,正在参与 INFS3200课程的学习成员。 使用场景及目标:适用于学生完成指定的数据工程实践项目;旨在掌握并实际应用数据集成技巧(如数据链接)和多维数据模型构建技能,同时学会从数据集中挖掘关键商业洞察力。 其他说明:本材料作为一份完整的作业指导,涵盖截止日期提醒、提交规范介绍以及其他具体的操作指引,能够确保参与者顺利达成预期成果并通过考试评估。
INFS3200 Sem 2, 2024
Assignment 2: Data Warehouse
Guidelines for Assignment Submission:
1. Submission Deadline: October 18
, 2024, 15:00 (AEST).
2. For questions requiring queries, kindly include both a screenshot of the query and a
screenshot of the execution results.
3. When addressing discussion-based questions, provide comprehensive textual
responses, supplemented with suitable visual aids if necessary.
4. The submission should be in PDF format, named as ‘A2_s1234567.pdf’ (replace
‘1234567’ with your student ID). The submitted file should not exceed 10 MB.
5. All implementations and project components should be finalized within the
UQZones environment. Evaluators might assess the assignment based on the
established checkpoints within UQZones.
6. All assignment submissions must be submitted exclusively through the UQ
Blackboard. Alternative methods of submission will not be acknowledged. Please be
mindful that submissions via email will not be entertained under any circumstances.
7. It is imperative to adhere to the stipulated submission deadline to avoid penalties, as
explained in the Educational Course Policies (ECP).
8. Ensuring the successful submission of your assignment within the designated
timeline is your responsibility.
Task 1: Data Linkage (3 points)
In this part, you should use the restaurant dataset provided in Prac3. Import data into
PostgreSQL database or directly read the data, and complete the following tasks:
1. (1 point) Link two two restaurant records by using edit-distance as the similarity
measure. Report the hyper-parameter choice and the total number of similar
2. (1 point) Link two two restaurant records by using tri-grams as the similarity
measure. Report the hyper-parameter choice and total number of similar records.
3. (1 point) Which similarity measure is better for the restaurant dataset? Provide the
Task 2: Data Warehouse (7 points)
You are provided with an electronic device wholesale dataset in CSV format. Download the
dataset by using your UQZone terminal:
curl -O https://stluc.manta.uqcloud.net/infs3200/public/Sales.csv
The dataset is exported from an OLTP system. Each record (row) in the dataset indicates a
sales transaction. You are required to construct a data warehouse to analyze the sales
- 粉丝: 2127
- 资源: 1738
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助