一、Hbase數(shù)據(jù)庫(kù)概述;
二、Hbase體系結(jié)構(gòu);
三、Hbase數(shù)據(jù)庫(kù)模型;
四、總結(jié)Hbase整體特點(diǎn);
五、案例:搭建Hbase分布式數(shù)據(jù)庫(kù)系統(tǒng);
一、Hbase數(shù)據(jù)庫(kù)概述;
概述:Hbase是一個(gè)基于HDFS的面向列的分布式數(shù)據(jù)庫(kù),源于Google的BigTable基于GFS進(jìn)行分布式數(shù)據(jù)存儲(chǔ)一樣,前文提到,Hbase是基于流式數(shù)據(jù)訪問(wèn),對(duì)于第時(shí)間延遲的數(shù)據(jù)訪問(wèn)并不適合在HDFS上運(yùn)行,所以需要實(shí)時(shí)性的隨機(jī)訪問(wèn)超大規(guī)模的數(shù)據(jù)集,使用Hbase則是更好的選擇;
作用:Hbase作為典型的非關(guān)系型數(shù)據(jù)庫(kù),Nosql數(shù)據(jù)庫(kù)主要分為以下幾類(lèi):
?基于鍵值對(duì)存儲(chǔ)的類(lèi)型;
?基于文檔存儲(chǔ)的類(lèi)型;
?基于列存儲(chǔ)的類(lèi)型;
?基于圖形數(shù)據(jù)存儲(chǔ)的類(lèi)型;
在Nosql領(lǐng)域中,Hbase本身不是最優(yōu)秀的,但得益于與hadoop的整合,為其帶來(lái)了強(qiáng)大的擴(kuò)展空間。Hbase本質(zhì)只有插入操作,更新刪除等操作都是通過(guò)插入操作來(lái)完成,這是由于底層HDFS流式訪問(wèn)(一次寫(xiě)入,多次讀?。Q定的,每次插入數(shù)據(jù)時(shí),數(shù)據(jù)會(huì)帶有“時(shí)間戳”的標(biāo)記,形成多個(gè)版本,Hbase對(duì)于一個(gè)數(shù)據(jù)會(huì)保留其固定的版本數(shù)量,如果在查詢(xún)時(shí),也是顯示出距離當(dāng)前時(shí)間最近的一個(gè)新版本;
二、Hbase體系結(jié)構(gòu);
體系結(jié)構(gòu):
架構(gòu)分析:Hbase體系結(jié)構(gòu)由單個(gè)HMaster服務(wù)器和多個(gè)HRegion Server服務(wù)器組成,而所有這些服務(wù)器是通過(guò)ZooKeeper來(lái)進(jìn)行協(xié)調(diào)并處理各個(gè)服務(wù)器運(yùn)行期間可能遇見(jiàn)的問(wèn)題;
組件分析:
?HStore:多個(gè)HStore組成一個(gè)HRegion,本身由兩部分組成:Memstore和Storefile。首先用戶(hù)寫(xiě)入的數(shù)據(jù)存放到Memstore中,當(dāng)Memstore滿(mǎn)了后刷入Storefile;
?HRegion:由多個(gè)HStore組成,Hbase使用表存儲(chǔ)數(shù)據(jù)集,表由行和列組成,但與傳統(tǒng)關(guān)系型數(shù)據(jù)庫(kù)不同的是,當(dāng)表的大小超過(guò)設(shè)定的值時(shí),Hbase會(huì)自動(dòng)將表劃分為不同的區(qū)域HRegion(此操作也稱(chēng)之為HRegion分裂),它是Hbase集群上分布式存儲(chǔ)和負(fù)載均衡的最小單位,這一點(diǎn)和HDFS中文件與文件塊存儲(chǔ)的概念類(lèi)似;
?Hlog:存儲(chǔ)數(shù)據(jù)日志,到達(dá)HRegion上的寫(xiě)操作首先被追加到日志中,然后才被加載到Memstore,主要功能為故障修復(fù),當(dāng)某臺(tái)HRegionServer發(fā)生故障,新的HRegionServer在加載HRegion的時(shí)候可以通過(guò)Hlog對(duì)數(shù)據(jù)進(jìn)行恢復(fù);
?HRegionServer:由多個(gè)HRegion組成,在整個(gè)集群中可能存在多個(gè)節(jié)點(diǎn),每個(gè)節(jié)點(diǎn)只能運(yùn)行一個(gè)HRegionServer,負(fù)責(zé)對(duì)HDFS中讀寫(xiě)數(shù)據(jù)和管理HRegion和Hlog;
?HMaster:每臺(tái)HRegionServer都會(huì)與HMaster進(jìn)行通信,HMaster的主要任務(wù)就是告訴HRegionServer它需要維護(hù)哪些HRegion,具體功能如下:
1.管理用戶(hù)對(duì)表的增刪改查操作;
2.管理HRegionServer的負(fù)載均衡,動(dòng)態(tài)調(diào)整HRegion分布;
3.在HRegion分裂后,負(fù)責(zé)新的HRegion的分配;
4.在HRegionServer停機(jī)后,負(fù)責(zé)失效HRegionServer上的HRegion的遷移;
?ZooKeeper:存儲(chǔ)的是Hbase中的ROOT表(根數(shù)據(jù)表)和META表(元數(shù)據(jù)表),元數(shù)據(jù)表保存普通用戶(hù)表的HRegion標(biāo)識(shí)符信息, 標(biāo)識(shí)符格式為:表名+開(kāi)始主鍵+唯一ID。隨著HRegion的分裂,標(biāo)識(shí)符信息也會(huì)發(fā)生變化,分成多個(gè)HRegion后,需要由一個(gè)根數(shù)據(jù)表來(lái)貫穿多個(gè)元數(shù)據(jù)表;
此外,ZooKeeper還負(fù)責(zé)HRegionServer故障時(shí),通知HMaster進(jìn)行HRegion遷移;若HMaster出現(xiàn)故障,ZooKeeper負(fù)責(zé)恢復(fù)HMaster,并且保證有且只有一個(gè)HMaster正在運(yùn)行;
?Client:客戶(hù)端訪問(wèn)Hbase的單位,訪問(wèn)時(shí),首先訪問(wèn)Zookeeper--ROOT--META--table;
三、Hbase數(shù)據(jù)庫(kù)模型;
1.數(shù)據(jù)模型:
表(table):不存儲(chǔ)值為null的數(shù)據(jù),索引是行關(guān)鍵字、列關(guān)鍵字、時(shí)間戳;
行關(guān)鍵字(row key):行的主鍵,唯一標(biāo)識(shí)一行數(shù)據(jù);
列族(Colume Family):行中的列被分為“列族”,同一個(gè)列族的所有成員具有相同的列族前綴,一個(gè)表的列族必須在創(chuàng)建表時(shí)預(yù)先定義,格式(列名:修飾符);
列關(guān)鍵字(Colume key):列鍵,格式為:
存儲(chǔ)單元格(Cell):在Hbase中,值作為一個(gè)單元保存在單元格中,要定位一個(gè)單元,需要滿(mǎn)足“行鍵+列鍵+時(shí)間戳”三個(gè)要素;
時(shí)間戳(Timestamp):插入單元格時(shí)的時(shí)間戳,默認(rèn)作為單元格的版本號(hào);
2.存儲(chǔ)方式:
關(guān)系型數(shù)據(jù)庫(kù):
主鍵設(shè)置為name列,查找時(shí)根據(jù)學(xué)生名字可以很容易的實(shí)現(xiàn)查找,那么請(qǐng)思考以下問(wèn)題;
?如果現(xiàn)在新增加一門(mén)課程,如何在不改變表結(jié)構(gòu)的情況下進(jìn)行保存新課程的成績(jī)呢?
?如果tom同學(xué)數(shù)學(xué)成績(jī)參加了補(bǔ)考,如何記錄其同學(xué)的兩次數(shù)學(xué)成績(jī)?
?如若tom同學(xué)數(shù)學(xué)沒(méi)有成績(jī),那么表中值為null,即使為空,也會(huì)占用存儲(chǔ)空間;
HBase數(shù)據(jù)庫(kù):
在不同時(shí)間插入不同數(shù)據(jù)時(shí),會(huì)生成時(shí)間戳,并且在列族內(nèi)生成數(shù)據(jù)記錄;
在HBase數(shù)據(jù)庫(kù)實(shí)際存儲(chǔ)時(shí),其表內(nèi)空值不計(jì)入存儲(chǔ)空間內(nèi);
四、總結(jié)Hbase整體特點(diǎn):
HBase就是這樣一個(gè)基于列模式的映射數(shù)據(jù)庫(kù),它只能表示簡(jiǎn)單的鍵值的映射關(guān)系。與關(guān)系型數(shù)據(jù)庫(kù)相比,它有如下特點(diǎn):
?數(shù)據(jù)類(lèi)型:HBase只有簡(jiǎn)單的字符串類(lèi)型,它只保存字符串。而關(guān)系型數(shù)據(jù)庫(kù)有豐富的類(lèi)型選擇和存儲(chǔ)方式;
?數(shù)據(jù)操作:HBase 只有簡(jiǎn)單的插入、查詢(xún)、刪除、清空等操作,表和表之間是分離的,沒(méi)有復(fù)雜的表和表之間的關(guān)系,所以不能、也沒(méi)有必要實(shí)現(xiàn)表和表之間的關(guān)聯(lián)操作。而關(guān)系型數(shù)據(jù)庫(kù)有多種連接操作;
?存儲(chǔ)模式:HBase 是基于列存儲(chǔ)的,每個(gè)列族都由幾個(gè)文件保存,不同列族的文件是分離的。關(guān)系型數(shù)據(jù)庫(kù)是基于表格結(jié)構(gòu)和行模式存儲(chǔ)的;
?數(shù)據(jù)維護(hù):HBase 的更新操作實(shí)際上是插入了新的數(shù)據(jù),它的舊版本依然會(huì)保留,而不是關(guān)系型數(shù)據(jù)庫(kù)的替換修改;
?可伸縮性:HBase 這類(lèi)分布式數(shù)據(jù)庫(kù)就是為了這個(gè)目的而開(kāi)發(fā)出來(lái)的,所以它能夠輕松地增加或減少硬件數(shù)量,并且對(duì)錯(cuò)誤的兼容性比較高。而關(guān)系型數(shù)據(jù)庫(kù)通常需要增加中間層才能實(shí)現(xiàn)類(lèi)似的功能;
五、案例:搭建Hbase完全分布式數(shù)據(jù)庫(kù)系統(tǒng);
案例環(huán)境:
系統(tǒng)類(lèi)型 | IP地址 | 主機(jī)名、角色 | 所需軟件 |
Centos 7.4 1708 64bit | 192.168.100.101 | master hadoop:namenode hbase:HMaster | hadoop-2.7.6.tar.gz jdk-8u171-linux-x64.tar.gz hbase-2.0.1-bin.tar.gz |
Centos 7.4 1708 64bit | 192.168.100.102 | slave1 hadoop:datanode hbase:HRegionServer | hadoop-2.7.6.tar.gz jdk-8u171-linux-x64.tar.gz hbase-2.0.1-bin.tar.gz |
Centos 7.4 1708 64bit | 192.168.100.103 | slave2 hadoop:datanode hbase:HRegionServer | hadoop-2.7.6.tar.gz jdk-8u171-linux-x64.tar.gz hbase-2.0.1-bin.tar.gz |
版本對(duì)應(yīng):
下載位置:http://www.apache.org/index.html#projects-list
Hbase部署環(huán)境:
單機(jī)模式:在單臺(tái)主機(jī)運(yùn)行Hbase;
偽分布式模式:HBase只在hadoop的namenode節(jié)點(diǎn)運(yùn)行,與單機(jī)模式類(lèi)似,只是其數(shù)據(jù)文件可以存儲(chǔ)在datanode節(jié)點(diǎn)上;
完全分布式模式:HBase運(yùn)行在hadoop集群的多個(gè)節(jié)點(diǎn)上,通常將HMaster運(yùn)行在namenode節(jié)點(diǎn)上,將HRegionServer運(yùn)行在datanode節(jié)點(diǎn)上;
案例步驟(保證多個(gè)節(jié)點(diǎn)之間時(shí)間的統(tǒng)一):
?搭建Hadoop分布式存儲(chǔ)集群(namenode和datanode);
?在master節(jié)點(diǎn)安裝部署Hbase程序;
?在master節(jié)點(diǎn)配置HBase程序;
?將master節(jié)點(diǎn)的habse程序復(fù)制到slave節(jié)點(diǎn);
?在master節(jié)點(diǎn)上開(kāi)啟HBase進(jìn)程并查看進(jìn)程;
?驗(yàn)證slave節(jié)點(diǎn)上的進(jìn)程狀態(tài);
?訪問(wèn)網(wǎng)頁(yè),查看HBase運(yùn)行狀態(tài);
?在master節(jié)點(diǎn)登錄HBase數(shù)據(jù)庫(kù),查看數(shù)據(jù)庫(kù)狀態(tài);
?HBase數(shù)據(jù)庫(kù)中基本管理操作;
?MapReduce結(jié)合HBase查詢(xún)表中行數(shù);
?搭建Hadoop分布式存儲(chǔ)集群(namenode和datanode);
?在master節(jié)點(diǎn)安裝部署Hbase程序;
[root@master ~]# ls hbase-2.0.1-bin.tar.gz
hbase-2.0.1-bin.tar.gz
[root@master ~]# tar zxvf hbase-2.0.1-bin.tar.gz
[root@master ~]# mv hbase-2.0.1 /usr/local/hbase
[root@master ~]# ls /usr/local/hbase
bin conf hbase-webapps lib NOTICE.txt RELEASENOTES.md
CHANGES.md docs LEGAL LICENSE.txt README.txt
[root@master ~]# chown hadoop:hadoop /usr/local/hbase/ -R
?在master節(jié)點(diǎn)配置HBase程序;
[root@master ~]# su - hadoop
[hadoop@master ~]$ vi /usr/local/hbase/conf/hbase-site.xml ##HBase站點(diǎn)相關(guān)配置文件
[hadoop@master ~]$ vi /usr/local/hbase/conf/hbase-env.sh ##HBase變量配置文件
export JAVA_HOME=/usr/local/java
export HADOOP_HOME=/usr/local/hadoop
export HBASE_HOME=/usr/local/hbase
export HBASE_MANAGES_ZK=true
注解:export HBASE_MANAGES_ZK=true此配置項(xiàng)意為開(kāi)啟habse內(nèi)置的zookeeper進(jìn)程,使其隨HBase進(jìn)程一同啟動(dòng);
[hadoop@master ~]$ vi /usr/local/hbase/conf/regionservers ##HBase的節(jié)點(diǎn)
slave1
slave2
?將master節(jié)點(diǎn)的habse程序復(fù)制到slave節(jié)點(diǎn);
[root@slave1 ~]# mkdir /usr/local/hbase
[root@slave1 ~]# chown hadoop:hadoop /usr/local/hbase/
[root@slave2 ~]# mkdir /usr/local/hbase
[root@slave2 ~]# chown hadoop:hadoop /usr/local/hbase/
[hadoop@master ~]$ scp -r /usr/local/hbase/* hadoop@slave1:/usr/local/hbase
[hadoop@master ~]$ scp -r /usr/local/hbase/* hadoop@slave2:/usr/local/hbase
?在master節(jié)點(diǎn)上開(kāi)啟HBase進(jìn)程并查看進(jìn)程;
注解:如若啟動(dòng)hbase時(shí),出現(xiàn):錯(cuò)誤:找不到或無(wú)法加載主類(lèi);
由于habse版本與hadoop版本導(dǎo)致,或者環(huán)境變量導(dǎo)致;
?驗(yàn)證slave節(jié)點(diǎn)上的進(jìn)程狀態(tài);
?訪問(wèn)網(wǎng)頁(yè),查看HBase運(yùn)行狀態(tài);
http://192.168.100.101:16010
?在master節(jié)點(diǎn)登錄HBase數(shù)據(jù)庫(kù),查看數(shù)據(jù)庫(kù)狀態(tài);
?在master節(jié)點(diǎn)訪問(wèn)hadoop存儲(chǔ)中數(shù)據(jù),驗(yàn)證數(shù)據(jù)文件狀態(tài);
?HBase數(shù)據(jù)庫(kù)中基本管理操作;
[hadoop@master ~]# /usr/local/hbase/bin/hbase shell
hbase(main):001:0> status ##查看狀態(tài)
1 active master, 0 backup masters, 2 servers, 0 dead, 1.0000 average load
Took 0.8818 seconds
hbase(main):002:0> create 'class','age','chengji' ##創(chuàng)建表,語(yǔ)法:create 表名 列族 列鍵
Created table class
Took 1.5186 seconds
=> Hbase::Table - class
hbase(main):003:0> list ##查看所有表
TABLE
class
1 row(s)
Took 0.0940 seconds
=> ["class"]
hbase(main):004:0> describe 'class' ##查看表的詳細(xì)信息
Table class is ENABLED
class
COLUMN FAMILIES DESCRIPTION
{NAME => 'age', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'f
alse', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING =>
'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW'
, CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PR
EFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '
65536'}
{NAME => 'chengji', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR =
> 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING
=> 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => '
ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false'
, PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE
=> '65536'}
2 row(s)
Took 0.1701 seconds
hbase(main):012:0> put 'class','tom','age','18' ##添加數(shù)據(jù),語(yǔ)法:put 表名 行鍵 列鍵 值
Took 0.1784 seconds
hbase(main):013:0> put 'class','marry','age','20'
Took 0.0262 seconds
hbase(main):014:0> scan 'class' ##掃描class表中數(shù)據(jù)
ROW COLUMN+CELL
marry column=age:, timestamp=1535528846020, value=20
tom column=age:, timestamp=1535528825217, value=18
2 row(s)
Took 0.0628 seconds
hbase(main):017:0> put 'class','tom','chengji:math','95' ##插入數(shù)據(jù)
Took 0.0217 seconds
hbase(main):018:0> put 'class','tom','chengji:english','90'
Took 0.0100 seconds
hbase(main):019:0> put 'class','marry','chengji:math','85'
Took 0.0130 seconds
hbase(main):020:0> put 'class','marry','chengji:english','90'
Took 0.0085 seconds
hbase(main):021:0> scan 'class'
ROW COLUMN+CELL
marry column=age:, timestamp=1535528846020, value=20
marry column=chengji:english, timestamp=1535529132585, value=90
marry column=chengji:math, timestamp=1535529119078, value=85
tom column=age:, timestamp=1535528825217, value=18
tom column=chengji:english, timestamp=1535529101465, value=90
tom column=chengji:math, timestamp=1535529089638, value=95
2 row(s)
Took 0.0120 seconds
hbase(main):033:0> scan 'class',{COLUMN=>'chengji:math',LIMIT=>1} ##根據(jù)條件查找,顯示一行
ROW COLUMN+CELL
marry column=age:, timestamp=1535528846020, value=20
marry column=chengji:english, timestamp=1535529132585, value=90
marry column=chengji:math, timestamp=1535529119078, value=85
1 row(s)
Took 0.0456 seconds
hbase(main):038:0> get 'class','tom' ##獲取表中數(shù)據(jù),語(yǔ)法:get 表名 行鍵
COLUMN CELL
age: timestamp=1535528825217, value=18
chengji:english timestamp=1535529101465, value=90
chengji:math timestamp=1535529089638, value=95
1 row(s)
Took 0.0125 seconds
hbase(main):042:0> get 'class','tom',{COLUMN=>'age:'} ##根據(jù)條件獲取表中數(shù)據(jù),語(yǔ)法:get 表名 行鍵 {COLUMN=>列族}
COLUMN CELL
age: timestamp=1535528825217, value=18
1 row(s)
Took 0.0188 seconds
hbase(main):043:0> get 'class','tom','age:' ##根據(jù)條件獲取表中數(shù)據(jù),同上
COLUMN CELL
age: timestamp=1535528825217, value=18
1 row(s)
Took 0.0171 seconds
hbase(main):044:0> get 'class','tom','chengji:english'
COLUMN CELL
chengji:english timestamp=1535529101465, value=90
1 row(s)
Took 0.0162 seconds
hbase(main):045:0> delete 'class','tom','chengji:english' ##刪除表中數(shù)據(jù)記錄,語(yǔ)法:delete 表名 行鍵 列鍵
Took 0.0367 seconds
hbase(main):046:0> get 'class','tom','chengji:english' ##獲取表中數(shù)據(jù)記錄,無(wú)法獲取
COLUMN CELL
0 row(s)
Took 0.0226 seconds
hbase(main):047:0> get 'class','tom' ##獲取表中tom此行鍵的所有內(nèi)容
COLUMN CELL
age: timestamp=1535528825217, value=18
chengji:math timestamp=1535529089638, value=95
1 row(s)
Took 0.0106 seconds
hbase(main):048:0> disable 'class' ##刪除表之前,需要先將表關(guān)閉disable
Took 0.8495 seconds
hbase(main):049:0> drop 'class' ##刪除表
Took 0.4907 seconds
hbase(main):050:0> list ##查看所有表
TABLE
0 row(s)
Took 0.0086 seconds
=> []
hbase(main):051:0> exit
?MapReduce結(jié)合HBase查詢(xún)表中行數(shù);
[hadoop@master ~]$ cp /usr/local/hbase/conf/hbase-site.xml /usr/local/hadoop/etc/hadoop/
[hadoop@master ~]$ vi /usr/local/hadoop/etc/hadoop/hadoop-env.sh
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/local/hbase/lib/*
[hadoop@master ~]$ scp -r /usr/local/hadoop/etc/hadoop/hadoop-env.sh hadoop@slave1:/usr/local/hadoop/etc/hadoop/
[hadoop@master ~]$ scp -r /usr/local/hbase/conf/hbase-site.xml hadoop@slave1:/usr/local/hbase/conf/
[hadoop@master ~]$ scp -r /usr/local/hadoop/etc/hadoop/hadoop-env.sh hadoop@slave2:/usr/local/hadoop/etc/hadoop/
[hadoop@master ~]$ scp -r /usr/local/hbase/conf/hbase-site.xml hadoop@slave2:/usr/local/hbase/conf/
[hadoop@master ~]$ hadoop jar /usr/local/hbase/lib/hbase-server-2.0.1.jar
RunJar jarFile [mainClass] args...
[hadoop@master ~]$ /usr/local/hbase/bin/hbase shell
[hadoop@master ~]$ /usr/local/hbase/bin/hbase org.apache.hadoop.hbase.mapreduce.RowCounter 'haha1'
聯(lián)系客服