Oracle学习之数据仓库(二)Dimension的理解-创新互联

  在数据仓库中,有事实表、维度表两个概念。

成都创新互联公司10多年企业网站设计服务;为您提供网站建设,网站制作,网页设计及高端网站定制服务,企业网站设计及推广,对成都塑料袋等多个行业拥有丰富的营销推广经验的网站建设公司。

  事实表是数据仓库结构中的中央表,它包含联系事实与维度表的数字度量值和键。事实数据表包含描述业务(例如产品销售)内特定事件的数据。

  维度表是维度属性的集合。是分析问题的一个窗口。是人们观察数据的特定角度,是考虑问题时的一类属性,属性的集合构成一个维。

  如图示

Oracle 学习之 数据仓库(二) Dimension 的理解

我们以sh用户下的sales表和times表来看,

SALES为事实表

SQL> desc sales  Name    Null?    Type  ----------------------------------------- -------- ----------------------------  PROD_ID    NOT NULL NUMBER  CUST_ID    NOT NULL NUMBER  TIME_ID    NOT NULL DATE  CHANNEL_ID    NOT NULL NUMBER  PROMO_ID    NOT NULL NUMBER  QUANTITY_SOLD    NOT NULL NUMBER(10,2)  AMOUNT_SOLD    NOT NULL NUMBER(10,2)

TIMES为维度表

SQL> desc times  Name    Null?    Type  ----------------------------------------- -------- ----------------------------  TIME_ID    NOT NULL DATE  DAY_NAME    NOT NULL VARCHAR2(9)  DAY_NUMBER_IN_WEEK    NOT NULL NUMBER(1)  DAY_NUMBER_IN_MONTH    NOT NULL NUMBER(2)  CALENDAR_WEEK_NUMBER    NOT NULL NUMBER(2)  FISCAL_WEEK_NUMBER    NOT NULL NUMBER(2)  WEEK_ENDING_DAY    NOT NULL DATE  WEEK_ENDING_DAY_ID    NOT NULL NUMBER  CALENDAR_MONTH_NUMBER    NOT NULL NUMBER(2)  FISCAL_MONTH_NUMBER    NOT NULL NUMBER(2)  CALENDAR_MONTH_DESC    NOT NULL VARCHAR2(8)  CALENDAR_MONTH_ID    NOT NULL NUMBER  FISCAL_MONTH_DESC    NOT NULL VARCHAR2(8)  FISCAL_MONTH_ID    NOT NULL NUMBER  DAYS_IN_CAL_MONTH    NOT NULL NUMBER  DAYS_IN_FIS_MONTH    NOT NULL NUMBER  END_OF_CAL_MONTH    NOT NULL DATE  END_OF_FIS_MONTH    NOT NULL DATE  CALENDAR_MONTH_NAME    NOT NULL VARCHAR2(9)  FISCAL_MONTH_NAME    NOT NULL VARCHAR2(9)  CALENDAR_QUARTER_DESC    NOT NULL CHAR(7)  CALENDAR_QUARTER_ID    NOT NULL NUMBER  FISCAL_QUARTER_DESC    NOT NULL CHAR(7)  FISCAL_QUARTER_ID    NOT NULL NUMBER  DAYS_IN_CAL_QUARTER    NOT NULL NUMBER  DAYS_IN_FIS_QUARTER    NOT NULL NUMBER  END_OF_CAL_QUARTER    NOT NULL DATE  END_OF_FIS_QUARTER    NOT NULL DATE  CALENDAR_QUARTER_NUMBER    NOT NULL NUMBER(1)  FISCAL_QUARTER_NUMBER    NOT NULL NUMBER(1)  CALENDAR_YEAR    NOT NULL NUMBER(4)  CALENDAR_YEAR_ID    NOT NULL NUMBER  FISCAL_YEAR    NOT NULL NUMBER(4)  FISCAL_YEAR_ID     NOT NULL NUMBER  DAYS_IN_CAL_YEAR    NOT NULL NUMBER  DAYS_IN_FIS_YEAR    NOT NULL NUMBER  END_OF_CAL_YEAR    NOT NULL DATE  END_OF_FIS_YEAR    NOT NULL DATE

如果我们创建一个物化视图

create materialized view sales_month_sum  enable query rewrite as    SELECT t.calendar_month_id,          prod_id,          channel_id,          promo_id,          SUM (quantity_sold) quantity_sold,          SUM (amount_sold) amount_sold     FROM sales s, times t    WHERE s.time_id = t.time_id GROUP BY prod_id,          channel_id,          promo_id,          t.calendar_month_id;

如果我们做如下按月的分组查询

SQL> alter session set query_rewrite_enabled=true; SQL> alter session set query_rewrite_integrity=trusted; SQL> set autotrace traceonly SQL> set line 200 SQL>   SELECT t.calendar_month_id,          prod_id,          channel_id,          promo_id,          SUM (quantity_sold) quantity_sold,          SUM (amount_sold) amount_sold     FROM sales s, times t    WHERE s.time_id = t.time_id GROUP BY prod_id,          channel_id,          promo_id,          t.calendar_month_id;  9068 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 3287305789 ------------------------------------------------------------------------------------------------ | Id  | Operation      | Name        | Rows  | Bytes | Cost (%CPU)| Time     | ------------------------------------------------------------------------------------------------ |   0 | SELECT STATEMENT      |        |  9068 |   690K|    13  (0)| 00:00:01 | |   1 |  MAT_VIEW REWRITE ACCESS FULL| SALES_MONTH_SUM |  9068 |   690K|    13  (0)| 00:00:01 | ------------------------------------------------------------------------------------------------

可见查询使用的是物化视图,但是如果我需要按年、季度对数据做分组查询呢?

  SELECT t.calendar_quarter_id,prod_id,          channel_id,          promo_id,          SUM (quantity_sold) quantity_sold,          SUM (amount_sold) amount_sold     FROM sales s, times t    WHERE s.time_id = t.time_id GROUP BY prod_id,          channel_id,          promo_id,          t.calendar_quarter_id;

这个查看肯定是不能使用物化视图的,执行计划如下

Execution Plan ---------------------------------------------------------- Plan hash value: 3221963832 --------------------------------------------------------------------------------------------------------- | Id  | Operation       | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | --------------------------------------------------------------------------------------------------------- |   0 | SELECT STATEMENT       |  |  2037 | 79443 |   569   (6)| 00:00:07 | | | |   1 |  HASH GROUP BY       |  |  2037 | 79443 |   569   (6)| 00:00:07 | | | |*  2 |   HASH JOIN       |  |   918K|    34M|   546   (2)| 00:00:07 | | | |   3 |    PART JOIN FILTER CREATE    | :BF0000 |  1826 | 21912 |    18   (0)| 00:00:01 | | | |   4 |     TABLE ACCESS FULL       | TIMES |  1826 | 21912 |    18   (0)| 00:00:01 | | | |   5 |    PARTITION RANGE JOIN-FILTER|  |   918K|    23M|   525   (2)| 00:00:07 |:BF0000|:BF0000| |   6 |     TABLE ACCESS FULL       | SALES |   918K|    23M|   525   (2)| 00:00:07 |:BF0000|:BF0000| ---------------------------------------------------------------------------------------------------------

Oracle为了是查询重写更加的智能,引入了Dimension的概念。Dimension我们称之为维,它是基于维度表的,用来描述维度表的维度之间的层级关系。

CREATE DIMENSION SH.TIMES_DIM   LEVEL DAY                            IS      (SH.TIMES.TIME_ID)   LEVEL MONTH                          IS      (SH.TIMES.CALENDAR_MONTH_ID)   LEVEL QUARTER                        IS      (SH.TIMES.CALENDAR_QUARTER_ID)   LEVEL YEAR                           IS      (SH.TIMES.CALENDAR_YEAR_ID)   HIERARCHY CAL_ROLLUP     (DAY                               CHILD OF      MONTH                             CHILD OF      QUARTER                           CHILD OF      YEAR);

LEVEL定义等级,基于维度表,HIERARCHY关键字定义层级关系。由层级关系,我们知道quarter是由month组成的。

我们再次查询

SQL> SELECT t.calendar_quarter_id,prod_id,          channel_id,          promo_id,          SUM (quantity_sold) quantity_sold,          SUM (amount_sold) amount_sold     FROM sales s, times t    WHERE s.time_id = t.time_id GROUP BY prod_id,          channel_id,           promo_id,          t.calendar_quarter_id; 3375 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 3397140165 -------------------------------------------------------------------------------------------------- | Id  | Operation        | Name  | Rows  | Bytes | Cost (%CPU)| Time  | -------------------------------------------------------------------------------------------------- |   0 | SELECT STATEMENT        |  |    20 |  1720 |    36  (14)| 00:00:01 | |   1 |  HASH GROUP BY        |  |    20 |  1720 |    36  (14)| 00:00:01 | |*  2 |   HASH JOIN        |  |   128K|    10M|    33   (7)| 00:00:01 | |   3 |    VIEW         |  |   849 |  6792 |    19   (6)| 00:00:01 | |   4 |     HASH UNIQUE         |  |   849 |  6792 |    19   (6)| 00:00:01 | |   5 |      TABLE ACCESS FULL        | TIMES  |  1826 | 14608 |    18   (0)| 00:00:01 | |   6 |    MAT_VIEW REWRITE ACCESS FULL| SALES_MONTH_SUM |  9068 |   690K|    13   (0)| 00:00:01 | --------------------------------------------------------------------------------------------------

这次是使用物化视图与times表做关联,性能更高了。

我们对比如下两个查询

SQL>   SELECT t.calendar_quarter_id,          prod_id,          channel_id,          promo_id,          SUM (quantity_sold) quantity_sold,          SUM (amount_sold) amount_sold     FROM sales s, times t    WHERE s.time_id = t.time_id AND t.calendar_quarter_id = 1769 GROUP BY prod_id,          channel_id,          promo_id,          t.calendar_quarter_id;  168 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 3397140165 -------------------------------------------------------------------------------------------------- | Id  | Operation        | Name  | Rows  | Bytes | Cost (%CPU)| Time  | -------------------------------------------------------------------------------------------------- |   0 | SELECT STATEMENT        |  |     1 |    86 |    33   (7)| 00:00:01 | |   1 |  HASH GROUP BY        |  |     1 |    86 |    33   (7)| 00:00:01 | |*  2 |   HASH JOIN        |  |  6423 |   539K|    32   (4)| 00:00:01 | |   3 |    VIEW         |  |    34 |   272 |    19   (6)| 00:00:01 | |   4 |     HASH UNIQUE         |  |    34 |   272 |    19   (6)| 00:00:01 | |*  5 |      TABLE ACCESS FULL        | TIMES  |    90 |   720 |    18   (0)| 00:00:01 | |   6 |    MAT_VIEW REWRITE ACCESS FULL| SALES_MONTH_SUM |  9068 |   690K|    13   (0)| 00:00:01 | --------------------------------------------------------------------------------------------------

使用了物化视图

SQL>SELECT t.calendar_quarter_id,          prod_id,          channel_id,          promo_id,          SUM (quantity_sold) quantity_sold,          SUM (amount_sold) amount_sold     FROM sales s, times t    WHERE s.time_id = t.time_id AND t.calendar_quarter_desc = '1998-01' GROUP BY prod_id,          channel_id,          promo_id,          t.calendar_quarter_id; 168 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 3221963832 ----------------------------------------------------------------------------------------------------------------- | Id  | Operation       | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop | ----------------------------------------------------------------------------------------------------------------- |   0 | SELECT STATEMENT       |  |  8146 |   373K| |   848   (2)| 00:00:11 | | | |   1 |  HASH GROUP BY       |  |  8146 |   373K|  3632K|   848   (2)| 00:00:11 | | | |*  2 |   HASH JOIN       |  | 57459 |  2637K| |   546   (2)| 00:00:07 | | | |   3 |    PART JOIN FILTER CREATE    | :BF0000 |    91 |  1820 | |    18   (0)| 00:00:01 | | | |*  4 |     TABLE ACCESS FULL       | TIMES |    91 |  1820 | |    18   (0)| 00:00:01 | | | |   5 |    PARTITION RANGE JOIN-FILTER|  |   918K|    23M| |   525   (2)| 00:00:07 |:BF0000|:BF0000| |   6 |     TABLE ACCESS FULL       | SALES |   918K|    23M| |   525   (2)| 00:00:07 |:BF0000|:BF0000| -----------------------------------------------------------------------------------------------------------------

没有使用物化视图。

其实条件实质上是一样的,因为t.calendar_quarter_desc = '1998-01' 和t.calendar_quarter_id = 1769 在times表中表示相同的数据。

但是Oracle不知道CALENDAR_QUARTER_DESC与CALENDAR_QUARTER_ID的关系。

我们在创建Dimension时,可以为LEVEL指定属性值。

如下

CREATE DIMENSION SH.TIMES_DIM   LEVEL DAY                            IS      (SH.TIMES.TIME_ID)   LEVEL MONTH                          IS      (SH.TIMES.CALENDAR_MONTH_ID)   LEVEL QUARTER                        IS      (SH.TIMES.CALENDAR_QUARTER_ID)   LEVEL YEAR                           IS      (SH.TIMES.CALENDAR_YEAR_ID)   HIERARCHY CAL_ROLLUP     (DAY                               CHILD OF      MONTH                             CHILD OF      QUARTER                           CHILD OF      YEAR)   ATTRIBUTE QUARTER DETERMINES      (SH.TIMES.CALENDAR_QUARTER_DESC,      SH.TIMES.DAYS_IN_CAL_QUARTER,      SH.TIMES.END_OF_CAL_QUARTER,      SH.TIMES.CALENDAR_QUARTER_NUMBER)   ATTRIBUTE YEAR DETERMINES      (SH.TIMES.CALENDAR_YEAR,      SH.TIMES.DAYS_IN_CAL_YEAR,      SH.TIMES.END_OF_CAL_YEAR);

我们再次查询

SQL> SELECT t.calendar_quarter_id,          prod_id,          channel_id,          promo_id,          SUM (quantity_sold) quantity_sold,          SUM (amount_sold) amount_sold     FROM sales s, times t    WHERE s.time_id = t.time_id AND t.calendar_quarter_desc = '1998-01' GROUP BY prod_id,          channel_id,          promo_id,          t.calendar_quarter_id;   168 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 3290467316 -------------------------------------------------------------------------------------------------- | Id  | Operation        | Name  | Rows  | Bytes | Cost (%CPU)| Time  | -------------------------------------------------------------------------------------------------- |   0 | SELECT STATEMENT        |  |    20 |  2240 |    33   (7)| 00:00:01 | |   1 |  HASH GROUP BY        |  |    20 |  2240 |    33   (7)| 00:00:01 | |*  2 |   HASH JOIN        |  | 17191 |  1880K|    32   (4)| 00:00:01 | |   3 |    VIEW         | VW_GBF_5  |    91 |  3094 |    19   (6)| 00:00:01 | |   4 |     HASH GROUP BY        |  |    91 |   728 |    19   (6)| 00:00:01 | |   5 |      VIEW        |  |    91 |   728 |    19   (6)| 00:00:01 | |   6 |       HASH UNIQUE        |  |    91 |  1456 |    19   (6)| 00:00:01 | |*  7 |        TABLE ACCESS FULL       | TIMES  |    91 |  1456 |    18   (0)| 00:00:01 | |   8 |    MAT_VIEW REWRITE ACCESS FULL| SALES_MONTH_SUM |  9068 |   690K|    13   (0)| 00:00:01 | --------------------------------------------------------------------------------------------------

这次就使用了物化视图。

另外有需要云服务器可以了解下创新互联cdcxhl.cn,海内外云服务器15元起步,三天无理由+7*72小时售后在线,公司持有idc许可证,提供“云服务器、裸金属服务器、高防服务器、香港服务器、美国服务器、虚拟主机、免备案服务器”等云主机租用服务以及企业上云的综合解决方案,具有“安全稳定、简单易用、服务可用性高、性价比高”等特点与优势,专为企业上云打造定制,能够满足用户丰富、多元化的应用场景需求。


文章名称:Oracle学习之数据仓库(二)Dimension的理解-创新互联
标题来源:http://ybzwz.com/article/dgipje.html