1. Optimize the real estate tables above. Perform the following operations.
(2) Analyze the storage characteristics of the tables when stored in the database. You are to submit the most accurate estimate of the number of tuples for each table and the number of disk pages that are used to store each table. Submit the code necessary to gather this information as well as the output of these queries. What do you calculate as the blocking factor for each table?
table Lot:
realestate=# select relname,reltuples,relpages
realestate-# from pg_class
realestate-# where relname='lot';
relname | reltuples | relpages
---------+-----------+----------
lot | 249990 | 3122
(1 行)
table Customer:
realestate=# select relname,reltuples,relpages
realestate-# from pg_class
realestate-# where relname='customer';
relname | reltuples | relpages
----------+-----------+----------
customer | 249999 | 1970
(1 行)
table Customer_lot:
realestate=# select relname,reltuples,relpages
realestate-# from pg_class
realestate-# where relname='customer_lot';
relname | reltuples | relpages
--------------+-----------+----------
customer_lot | 249999 | 1352
(1 行)
The blocking factor of the lot relation is: 249990/3122=80
The blocking factor of the customer relation is: 249999/1970=127
The blocking factor of the customer_lot relation is: 249999/1352=18511
(3) Analyze the run time characteristics of each query given above. Report both PostgreSQL's estimate and the actual total cost expressed in the number of disk accesses and the amount of time that each query takes to run. Submit the code necessary for gathering this data.
<1> Selecting the lot_id for all lots in a given size range. An example of such a query would be select lot_id from lot where lot_size between 300 and 15000;
realestate=# explain analyze
realestate-# select lot_id
realestate-# from lot
realestate-# where lot_size between 300 and 15000;
QUERY PLAN
--------------------------------------------------------------------------------
------------------------------
Seq Scan on lot (cost=0.00..6871.85 rows=148074 width=4) (actual time=31.000..
2845.000 rows=147220 loops=1)
Filter: ((lot_size >= 300::double precision) AND (lot_size <= 15000::double p
recision))
Total runtime: 3250.000 ms
(3 行)
<2> Selecting the lot_id for all lots in a given value range. An example of such a query would be select lot_id from lot where lot_value between 3000 and 15000;
realestate=# explain analyze
realestate-# select lot_id
realestate-# from lot
realestate-# where lot_value between 3000 and 15000;
QUERY PLAN
--------------------------------------------------------------------------------
-------------------------
Seq Scan on lot (cost=0.00..6871.85 rows=3378 width=4) (actual time=0.000..100
1.000 rows=3045 loops=1)
Filter: ((lot_value >= 3000::double precision) AND (lot_value <= 15000::doubl
e precision))
Total runtime: 1031.000 ms
(3 行)
<3> Selecting all of the information for a specific customer. An example of such a query would be select * from customer where customer_id=12;
realestate=# explain analyze
realestate-# select *
realestate-# from customer
realestate-# where customer_id=12;
QUERY PLAN
--------------------------------------------------------------------------------
-------------------------
Seq Scan on customer (cost=0.00..5094.99 rows=1 width=22) (actual time=0.000..
1063.000 rows=1 loops=1)
Filter: (customer_id = 12)
Total runtime: 1063.000 ms
(3 行)
<4> Inserting new customer or lot data. An example of such a query would be insert into customer values (250001, 'Vince', 'Smith' );
realestate=# explain analyze
realestate-# insert into customer
realestate-# values (250001,'Vince','Smith');
QUERY PLAN
--------------------------------------------------------------------------------
----
Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.000 rows=1 loops
=1)
Total runtime: 156.000 ms
(2 行)
<5> Deleting a row of customer or lot data. An example of such a query would be delete from customer where customer_id='250001';
realestate=# explain analyze
realestate-# delete from customer
realestate-# where customer_id='250001';
QUERY PLAN
--------------------------------------------------------------------------------
-------------------------
Seq Scan on customer (cost=0.00..5094.99 rows=1 width=6) (actual time=938.000.
.938.000 rows=1 loops=1)
Filter: (customer_id = 250001)
Total runtime: 938.000 ms
(3 行)
<6> Updating a row of customer or lot data. An example of such a query would be update customer set customer_first_name='Vinny' where customer_id='249001';
realestate=# explain analyze
realestate-# update customer
realestate-# set customer_first_name='Vinny'
realestate-# where customer_id='249001';
QUERY PLAN
--------------------------------------------------------------------------------
--------------------------
Seq Scan on customer (cost=0.00..5094.99 rows=1 width=19) (actual time=922.000
..922.000 rows=1 loops=1)
Filter: (customer_id = 249001)
Total runtime: 937.000 ms
(3 行)
<7> Selecting the average lot size of all lots. An example of such a query would be select avg(lot_size) from lot;
realestate=# explain analyze
realestate-# select avg(lot_value)
realestate-# from lot;
QUERY PLAN
--------------------------------------------------------------------------------
------------------------------------
Aggregate (cost=6246.88..6246.88 rows=1 width=8) (actual time=2890.000..2890.0
00 rows=1 loops=1)
-> Seq Scan on lot (cost=0.00..5621.90 rows=249990 width=8) (actual time=15
.000..1578.000 rows=249999 loops=1)
Total runtime: 2937.000 ms
(3 行)
(4) For each query above, suggest an index, if applicable, to improve performance. To answer this question completely, you must state what columns you are indexing and what index you will use. You must fully defend your choice with a complete explanation. If an index is not appropriate for a query, clearly state why. Note that you may only use indexes supported in PostgreSQL, e.g. hash or b-tree. You may also use clustering.
Query types 1 and 2 both use a sequential scan for a particular range of values. These queries may be aided by a b-tree on the columns referenced in the where clause.
Query type 3 uses an equality selection condition based on customer id. In this case, the performance will be helped using a hash on the customer id.
Query 4 is an insert and should not be indexed, although, it is on a value that is also queried in query 2. In this case, the b-tree from query 2 will most likely cause performance to suffer.
Queries 5 and 6 both use sequential scans and will be aided by a hash on the customer id and the lot id.
Query 7 is an aggregate function that performs a sequential scan of the lot table.
These indexes may be built with the following sql statements:
create index customer_id_index on customer using hash(customer_id);
create index lot_id_index on lot using hash(lot_id);
create index lot_value_index on lot using btree(lot_value);
create index lot_size_index on lot using btree(lot_size);
The lot_value index may be clustered using clus
SSD7所有习题答案
4星 · 超过85%的资源 需积分: 9 106 浏览量
2011-07-05
21:19:34
上传
评论
收藏 642KB RAR 举报
wangqing1644
- 粉丝: 0
- 资源: 1
最新资源
- 使用 C 语言实现的计算非负整数的阶乘
- 2011-2021最新版本北京大学数字普惠金融指数(PKU-DFIIC).xlsx
- 县域数字乡村指数2018-2020(1).xlsx
- Docker容器配置进阶
- tensorflow-gpu-2.7.4-cp37-cp37m-manylinux2010-x86-64.whl
- 多段线、 圆、弧转多段线(仅我可见)
- tensorflow-2.7.2-cp38-cp38-manylinux2010-x86-64.whl
- 李慧琴C语言基础部分.zip
- yeyue-p8Yi4-ve4a83792.apk
- tensorflow-gpu-2.7.3-cp38-cp38-manylinux2010-x86-64.whl
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
评论1