免费视频淫片aa毛片_日韩高清在线亚洲专区vr_日韩大片免费观看视频播放_亚洲欧美国产精品完整版

打開APP
userphoto
未登錄

開通VIP,暢享免費(fèi)電子書等14項(xiàng)超值服

開通VIP
[翻譯]——MySQL 8.0 Histograms

前言: 本文是對這篇博客MySQL 8.0 Histograms的翻譯,翻譯如有不當(dāng)?shù)牡胤?,敬請諒解,請尊重原?chuàng)和翻譯勞動(dòng)成果,轉(zhuǎn)載的時(shí)候請注明出處。謝謝!

 

英文原文地址:https://lefred.be/content/mysql-8-0-histograms/

 

翻譯原文地址:https://www.cnblogs.com/kerrycode/p/11817026.html

 

 

在MySQL 8.0之前,MySQL缺失了其它關(guān)系數(shù)據(jù)庫中一個(gè)眾所周知的功能:優(yōu)化器的直方圖

 

優(yōu)化器團(tuán)隊(duì)(Optimizer Team)在越來越多的MySQL DBA的呼聲中實(shí)現(xiàn)了這個(gè)功能。

 

 

直方圖定義

 

但什么是直方圖呢?我們來看維基百科的定義吧,直方圖是數(shù)值數(shù)據(jù)分布的準(zhǔn)確表示。 對于RDBMS來說,直方圖是特定列內(nèi)數(shù)據(jù)分布的近似值。因此在MySQL中,直方圖能夠幫助優(yōu)化器找到最有效的執(zhí)行計(jì)劃。

 

直方圖例子

 

為了說明直方圖是如何影響優(yōu)化器工作的,我會(huì)用dbt3生成的數(shù)據(jù)來演示。

 

我們準(zhǔn)備了一個(gè)簡單查詢:

 

SELECT * FROM orders  
  JOIN customer ON o_custkey = c_custkey 
WHERE o_orderdate < '1993-01-01' 
  AND c_mktsegment = "AUTOMOBILE"\G

 

讓我們看一下傳統(tǒng)的執(zhí)行計(jì)劃的EXPLAIN輸出,以及可視化方式(VISUAL one):

 

mysql> EXPLAIN SELECT * FROM orders  
       JOIN customer ON o_custkey = c_custkey 
       WHERE o_orderdate < '1993-01-01' AND c_mktsegment = "AUTOMOBILE"\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: customer
   partitions: NULL
         type: ALL
possible_keys: PRIMARY
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 149050
     filtered: 10.00
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: orders
   partitions: NULL
         type: ref
possible_keys: i_o_custkey,i_o_orderdate
          key: i_o_custkey
      key_len: 5
          ref: dbt3.customer.c_custkey
         rows: 14
     filtered: 30.62
        Extra: Using where
2 rows in set, 1 warning (0.28 sec)

 

我們看到MySQL首先對customer表做了一個(gè)全表掃描,并且它的選擇估計(jì)記錄(過濾)是10%;

 

 

 

接下來讓我們運(yùn)行這個(gè)查詢(我使用了COUNT(*)),然后我們來看看有多少行記錄

 

mysql> SELECT count(*) FROM orders  
       JOIN customer ON o_custkey = c_custkey 
       WHERE o_orderdate < '1993-01-01' AND c_mktsegment = "AUTOMOBILE"\G
*************************** 1. row ***************************
count(*): 45127
1 row in set (49.98 sec)

 

 

創(chuàng)建直方圖

 

現(xiàn)在,我將在表customer上的字段c_mktsegment上創(chuàng)建一個(gè)直方圖

 

mysql> ANALYZE TABLE customer UPDATE HISTOGRAM ON c_mktsegment WITH 1024 BUCKETS;
+---------------+-----------+----------+---------------------------------------------------------+
| Table         | Op        | Msg_type | Msg_text                                                |
+---------------+-----------+----------+---------------------------------------------------------+
| dbt3.customer | histogram | status   | Histogram statistics created for column 'c_mktsegment'. |
+---------------+-----------+----------+---------------------------------------------------------+

 

接下來,我們來驗(yàn)證查詢的執(zhí)行計(jì)劃:

 

mysql> EXPLAIN SELECT * FROM orders  
               JOIN customer ON o_custkey = c_custkey 
               WHERE o_orderdate < '1993-01-01' AND c_mktsegment = "AUTOMOBILE"\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: orders
   partitions: NULL
         type: ALL
possible_keys: i_o_custkey,i_o_orderdate
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1494230
     filtered: 30.62
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: customer
   partitions: NULL
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: dbt3.orders.o_custkey
         rows: 1
     filtered: 19.84
        Extra: Using where
2 rows in set, 1 warning (1.06 sec)

 

 

現(xiàn)在,使用直方圖后,我們可以看到customer表的吸引力降低了,因?yàn)閛rder表按條件過濾的行的百分比(30.62)幾乎是customer表按條件過濾行的百分比的兩倍(19.84%),這將導(dǎo)致低order表進(jìn)行查找。

 

注意:這段感覺沒有翻譯恰當(dāng),英文原文如下,如果感覺翻譯比較生硬,參考原文

 

Now with the histogram we can see that it becomes less attractive to start with customer table since almost twice as many rows (19.84%) will cause look-ups into the order table.

 

 

 

優(yōu)化器選擇對order表進(jìn)行全表掃描(full sacn),此時(shí)執(zhí)行計(jì)劃的代價(jià)看起來似乎還高一些,,讓我們看一下SQL的執(zhí)行時(shí)間:

 

 

mysql> SELECT count(*) FROM orders  
       JOIN customer ON o_custkey = c_custkey 
       WHERE o_orderdate < '1993-01-01' AND c_mktsegment = "AUTOMOBILE"\G
*************************** 1. row ***************************
count(*): 45127
1 row in set (6.35 sec)

 

SQL語句的執(zhí)行時(shí)間更短,明顯比之前要快了

 

 

 

查看數(shù)據(jù)的分布

 

 

直方圖數(shù)據(jù)存貯在Information_Schema.column_statistics表中,這個(gè)表的定義如下

 

 

+-------------+-------------+------+-----+---------+-------+
| Field       | Type        | Null | Key | Default | Extra |
+-------------+-------------+------+-----+---------+-------+
| SCHEMA_NAME | varchar(64) | NO   |     | NULL    |       |
| TABLE_NAME  | varchar(64) | NO   |     | NULL    |       |
| COLUMN_NAME | varchar(64) | NO   |     | NULL    |       |
| HISTOGRAM   | json        | NO   |     | NULL    |       |
+-------------+-------------+------+-----+---------+-------+

 

 

它的一條記錄類似下面這樣:

 

SELECT SCHEMA_NAME, TABLE_NAME, COLUMN_NAME, JSON_PRETTY(HISTOGRAM) 
FROM information_schema.column_statistics 
WHERE COLUMN_NAME = 'c_mktsegment'\G
*************************** 1. row ***************************
           SCHEMA_NAME: dbt3
            TABLE_NAME: customer
           COLUMN_NAME: c_mktsegment
JSON_PRETTY(HISTOGRAM): {
  "buckets": [
    [
      "base64:type254:QVVUT01PQklMRQ==",
      0.19837010534684954
    ],
    [
      "base64:type254:QlVJTERJTkc=",
      0.3983104750546611
    ],
    [
      "base64:type254:RlVSTklUVVJF",
      0.5978433710991851
    ],
    [
      "base64:type254:SE9VU0VIT0xE",
      0.799801232359372
    ],
    [
      "base64:type254:TUFDSElORVJZ",
      1.0
    ]
  ],
  "data-type": "string",
  "null-values": 0.0,
  "collation-id": 255,
  "last-updated": "2018-03-02 20:21:48.271523",
  "sampling-rate": 0.6709158000670916,
  "histogram-type": "singleton",
  "number-of-buckets-specified": 1024
}

 

而且可以查看分布

 

SELECT FROM_BASE64(SUBSTRING_INDEX(v, ':', -1)) value, concat(round(c*100,1),'%') cumulfreq, 
       CONCAT(round((c - LAG(c, 1, 0) over()) * 100,1), '%') freq  
FROM information_schema.column_statistics, JSON_TABLE(histogram->'$.buckets', 
     '$[*]' COLUMNS(v VARCHAR(60) PATH '$[0]', c double PATH '$[1]')) hist  
WHERE schema_name  = 'dbt3' and table_name = 'customer' and column_name = 'c_mktsegment';
+------------+-----------+-------+
| value      | cumulfreq | freq  |
+------------+-----------+-------+
| AUTOMOBILE | 19.8%     | 19.8% |
| BUILDING   | 39.9%     | 20.1% |
| FURNITURE  | 59.9%     | 19.9% |
| HOUSEHOLD  | 79.9%     | 20.1% |
| MACHINERY  | 100.0%    | 20.1% |
+------------+-----------+-------+

 

你也可以用下面語法刪除直方圖信息。

 

 

mysql> ANALYZE TABLE customer DROP HISTOGRAM on c_mktsegment;
+---------------+-----------+----------+---------------------------------------------------------+
| Table         | Op        | Msg_type | Msg_text                                                |
+---------------+-----------+----------+---------------------------------------------------------+
| dbt3.customer | histogram | status   | Histogram statistics removed for column 'c_mktsegment'. |
+---------------+-----------+----------+---------------------------------------------------------+
1 row in set (0.00 sec)

 

 

Buckets

 

你會(huì)注意到,當(dāng)我們創(chuàng)建一個(gè)直方圖時(shí),我們需要指定buckets的數(shù)量,事實(shí)上,數(shù)據(jù)被分成包含特定值以及他們基數(shù)(cardinality)的一組Buckets,如果在上一個(gè)例子中檢查直方圖的類型,你會(huì)發(fā)現(xiàn)它是等寬直方圖(singleton)

 

 

"histogram-type": "singleton",

 

 

這種類型的直方圖最好的,因?yàn)榛鶖?shù)是針對單個(gè)特定值。 如果這次我僅使用2個(gè)存儲(chǔ)桶(buckets)來重新創(chuàng)建直方圖(請記住,在c_mktsegment列中有4個(gè)不同的值):

 

 

mysql> ANALYZE TABLE customer UPDATE HISTOGRAM ON c_mktsegment WITH 2 BUCKETS;
+---------------+-----------+----------+---------------------------------------------------------+
| Table         | Op        | Msg_type | Msg_text                                                |
+---------------+-----------+----------+---------------------------------------------------------+
| dbt3.customer | histogram | status   | Histogram statistics created for column 'c_mktsegment'. |
+---------------+-----------+----------+---------------------------------------------------------+

 

如果我檢查直方圖的類型:

 

 

mysql> SELECT SCHEMA_NAME, TABLE_NAME, COLUMN_NAME, 
              JSON_PRETTY(HISTOGRAM) 
       FROM information_schema.column_statistics 
      WHERE COLUMN_NAME = 'c_mktsegment'\G
*************************** 1. row ***************************
           SCHEMA_NAME: dbt3
            TABLE_NAME: customer
           COLUMN_NAME: c_mktsegment
JSON_PRETTY(HISTOGRAM): {
  "buckets": [
    [
      "base64:type254:QVVUT01PQklMRQ==",
      "base64:type254:RlVSTklUVVJF",
      0.5996992690844636,
      3
    ],
    [
      "base64:type254:SE9VU0VIT0xE",
      "base64:type254:TUFDSElORVJZ",
      1.0,
      2
    ]
  ],
  "data-type": "string",
  "null-values": 0.0,
  "collation-id": 255,
  "last-updated": "2018-03-02 20:42:26.165898",
  "sampling-rate": 0.6709158000670916,
  "histogram-type": "equi-height",
  "number-of-buckets-specified": 2
}

 

現(xiàn)在的直方圖類型是等高直方圖,這意味著將連續(xù)范圍的值分組到存儲(chǔ)桶中,以使落入每個(gè)存儲(chǔ)桶的數(shù)據(jù)項(xiàng)的數(shù)量相同。

 

 

結(jié)論:

 

直方圖對那些不是索引中第一列的列非常有用,這些列用于JOIN、IN子查詢(IN-subqueries)或ORDER BY…LIMIT的查詢的WHERE條件下使用。  

 

另外, 可以考慮嘗試使用足夠的存儲(chǔ)通來獲取等寬直方圖。

本站僅提供存儲(chǔ)服務(wù),所有內(nèi)容均由用戶發(fā)布,如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容,請點(diǎn)擊舉報(bào)。
打開APP,閱讀全文并永久保存 查看更多類似文章
猜你喜歡
類似文章
深入理解MySQL8.0直方圖
mysql命令
MySQL入門指南
MySQL入門學(xué)習(xí) (3)
Mysql Explain 詳解及Extended選項(xiàng)的使用
MySQL主從同步報(bào)錯(cuò)故障處理記錄
更多類似文章 >>
生活服務(wù)
分享 收藏 導(dǎo)長圖 關(guān)注 下載文章
綁定賬號(hào)成功
后續(xù)可登錄賬號(hào)暢享VIP特權(quán)!
如果VIP功能使用有故障,
可點(diǎn)擊這里聯(lián)系客服!

聯(lián)系客服