BigMart的数据科学家已经收集了不同城市10家商店的1559种产品的2013年销售数据。此外,还定义了每个产品和商店的某些属性。目的是建立一个预测模型,并找出每个产品在特定商店的销售情况。
使用此模型,BigMart将尝试了解在增加销售额中起关键作用的产品和商店的属性。
我们有train(8523)和test(5681)数据集,train数据集有输入和输出变量。您需要预测测试数据集的销售额。
变量 描述
Item_Identifier 独特的产品ID
产品重量 产品重量
Item_Fat_Content 产品是否低脂肪
Item_Visibility 分配给特定产品的商店中所有产品的总显示区域的百分比
物品种类 产品所属的类别
Item_MRP 产品的最大零售价(清单价格)
Outlet_Identifier 唯一商店ID
Outlet_Establishment_Year 商店成立的那一年
Outlet_Size 商店的面积覆盖面积
Outlet_Location_Type 商店所在的城市类型
Outlet_Type 出口是一家杂货店还是某种超市
Item_Outlet_Sales 在特定商店销售产品。这是要预测的结果变量。
Problem Statement
The data scientists at BigMart have collected 2013 sales data for 1559 products across 10 stores in different cities. Also, certain attributes of each product and store have been defined. The aim is to build a predictive model and find out the sales of each product at a particular store.
Using this model, BigMart will try to understand the properties of products and stores which play a key role in increasing sales.
Please note that the data may have missing values as some stores might not report all the data due to technical glitches. Hence, it will be required to treat them accordingly.
Data
We have train (8523) and test (5681) data set, train data set has both input and output variable(s). You need to predict the sales for test data set.
Variable Description
Item_Identifier Unique product ID
Item_Weight Weight of product
Item_Fat_Content Whether the product is low fat or not
Item_Visibility The % of total display area of all products in a store allocated to the particular product
Item_Type The category to which the product belongs
Item_MRP Maximum Retail Price (list price) of the product
Outlet_Identifier Unique store ID
Outlet_Establishment_Year The year in which store was established
Outlet_Size The size of the store in terms of ground area covered
Outlet_Location_Type The type of city in which the store is located
Outlet_Type Whether the outlet is just a grocery store or some sort of supermarket
Item_Outlet_Sales Sales of the product in the particulat store. This is the outcome variable to be predicted.