国产精品久久婷婷六月丁香,精品国产精品国产偷麻豆

昨天出去玩了，今天繼續(xù)。

前面講了InputFormat，就順便講一下Writable的東西吧，本來應(yīng)當是放在HDFS中的。

當要在進程間傳遞對象或持久化對象的時候，就需要序列化對象成字節(jié)流，反之當要將接收到或從磁盤讀取的字節(jié)流轉(zhuǎn)換為對象，就要進行反序列化。Writable是Hadoop的序列化格式，Hadoop定義了這樣一個Writable接口。

[html] view plain copy print ?

public interface Writable {
void write(DataOutput out) throws IOException;
void readFields(DataInput in) throws IOException;
}

一個類要支持可序列化只需實現(xiàn)這個接口即可。下面是Writable類得層次結(jié)構(gòu)，借用了<<Hadoop in Action:The Definitive Guide>>的圖。

下面我們一點一點來看，先是IntWritable和LongWritable。

WritableComparable接口擴展了Writable和Comparable接口，以支持比較。正如層次圖中看到，IntWritable、LongWritable、ByteWritable等基本類型都實現(xiàn)了這個接口。IntWritable和LongWritable的readFields()都直接從實現(xiàn)了DataInput接口的輸入流中讀取二進制數(shù)據(jù)并分別重構(gòu)成int型和long型，而write()則直接將int型數(shù)據(jù)和long型數(shù)據(jù)直接轉(zhuǎn)換成二進制流。IntWritable和LongWritable都含有相應(yīng)的Comparator內(nèi)部類，這是用來支持對在不反序列化為對象的情況下對數(shù)據(jù)流中的數(shù)據(jù)單位進行直接的，這是一個優(yōu)化，因為無需創(chuàng)建對象。看下面IntWritable的代碼片段：

[html] view plain copy print ?

public class IntWritable implements WritableComparable {
private int value;
//…… other methods
public static class Comparator extends WritableComparator {
public Comparator() {
super(IntWritable.class);
}
public int compare(byte[] b1, int s1, int l1,
byte[] b2, int s2, int l2) {
int thisValue = readInt(b1, s1);
int thatValue = readInt(b2, s2);
return (thisValue<thatValue ? -1 : (thisValue==thatValue ? 0 : 1));
}
}
static { // register this comparator
WritableComparator.define(IntWritable.class, new Comparator());
}
}

代碼中的static塊調(diào)用WritableComparator的static方法define()用來注冊上面這個Comparator，就是將其加入WritableComparator的comparators成員中，comparators是HashMap類型且是static的。這樣，就告訴WritableComparator，當我使用WritableComparator.get（IntWritable.class）方法的時候，你返回我注冊的這個Comparator給我[對IntWritable來說就是IntWritable.Comparator]，然后我就可以使用comparator.compare(byte[] b1, int s1, int l1,byte[] b2, int s2, int l2)來比較b1和b2，而不需要將它們反序列化成對象[像下面代碼中]。comparator.compare(byte[] b1, int s1, int l1,byte[] b2, int s2, int l2)中的readInt()是從WritableComparator繼承來的，它將IntWritable的value從byte數(shù)組中通過移位轉(zhuǎn)換出來。

[html] view plain copy print ?

//params byte[] b1, byte[] b2
RawComparator<IntWritable> comparator = WritableComparator.get(IntWritable.class);
comparator.compare(b1,0,b1.length,b2,0,b2.length);

注意，當comparators中沒有注冊要比較的類的Comparator，則會返回一個默認的Comparator，然后使用這個默認Comparator的compare(byte[] b1, int s1, int l1,byte[] b2, int s2, int l2)方法比較b1、b2的時候還是要序列化成對象的，詳見后面細講WritableComparator。

LongWritable的方法基本和IntWritable一樣，區(qū)別就是LongWritable的值是long型，且多了一個額外的LongWritable.DecresingComparator，它繼承于LongWritable.Comparator，只是它的比較方法返回值與使用LongWritable.Comparator比較相反[取負]，這個應(yīng)當是為降序排序準備的。

[html] view plain copy print ?

public class LongWritable implements WritableComparable {
private long value;
//……others
/** A decreasing Comparator optimized for LongWritable. */
public static class DecreasingComparator extends Comparator {
public int compare(WritableComparable a, WritableComparable b) {
return -super.compare(a, b);
}
public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
return -super.compare(b1, s1, l1, b2, s2, l2);
}
}
static { // register default comparator
WritableComparator.define(LongWritable.class, new Comparator());
}
}

另外，ByteWritable、BooleanWritable、FloatWritable、DoubleWritable都基本一樣。

然后我們看VIntWritable和VLongWritable，這兩個類基本一樣而且VIntWritable[反]的value編碼的時候也是使用VLongWritable的value編解碼時的方法，主要區(qū)別是VIntWritable對象使用int型value成員，而VLongWritable使用long型value成員，這是由它們的取值范圍決定的。它們都沒有Comparator，不像上面的類。

我們只看VLongWritable即可，先看看其源碼長什么樣。

[html] view plain copy print ?

public class VLongWritable implements WritableComparable {
private long value;
public VLongWritable() {}
public VLongWritable(long value) { set(value); }
/** Set the value of this LongWritable. */
public void set(long value) { this.value = value; }
/** Return the value of this LongWritable. */
public long get() { return value; }
public void readFields(DataInput in) throws IOException {
value = WritableUtils.readVLong(in);
}
public void write(DataOutput out) throws IOException {
WritableUtils.writeVLong(out, value);
}
/** Returns true iff <code>o</code> is a VLongWritable with the same value. */
public boolean equals(Object o) {
if (!(o instanceof VLongWritable))
return false;
VLongWritable other = (VLongWritable)o;
return this.value == other.value;
}
public int hashCode() {
return (int)value;
}
/** Compares two VLongWritables. */
public int compareTo(Object o) {
long thisValue = this.value;
long thatValue = ((VLongWritable)o).value;
return (thisValue < thatValue ? -1 : (thisValue == thatValue ? 0 : 1));
}
public String toString() {
return Long.toString(value);
}
}

在上面可以看到它編碼時使用WritableUtils.writeVLong()方法。WritableUtils是關(guān)于編解碼等的，暫時只看關(guān)于VIntWritable和VLongWritable的。

VIntWritable的value的編碼實際也是使用writeVLong()：

[html] view plain copy print ?

public static void writeVInt(DataOutput stream, int i) throws IOException {
writeVLong(stream, i);
}

首先VIntWritable的長度是[1-5],VLonWritable長度是[1-9]，如果數(shù)值在[-112,127]時，使用1Byte表示，即編碼后的1Byte存儲的就是這個數(shù)值。{中文版權(quán)威指南上p91我看見說范圍是[-127,127]，我猜可能是編碼方法進行更新了}。如果不是在這個范圍內(nèi)，則需要更多的Byte，而第一個Byte將被用作存儲長度，其它Byte存儲數(shù)值。

writeVLong()的操作過程如下圖，解析附在代碼中[不知道說的夠明白不，如果感覺難理解，個人覺得其實也不一定要了解太細節(jié)]。

WritableUtils.writeVLong()源碼：

[html] view plain copy print ?

public static void writeVLong(DataOutput stream, long i) throws IOException {
if (i >= -112 && i <= 127) {
stream.writeByte((byte)i);
return; //-112~127 only use one byte
}
int len = -112;
if (i < 0) {
i ^= -1L; // take one's complement' ~1 = (11111111)2 得到這
//個i_2, i_2 + 1 = |i|,可想一下負數(shù)的反碼如何能得到其正數(shù)[連符號一起取反+1]
len = -120;
}
long tmp = i; //到這里，i一定是正數(shù)，這個數(shù)介于[0,2^64-1]
//然后用這個循環(huán)計算一下長度,i越大，實際長度越大，偏離長度起始值[原來len]越大，len值越小
while (tmp != 0) {
tmp = tmp >> 8;
len--;
}
//現(xiàn)在，我們顯然計算出了一個能表示其長度的值len,只要看其偏離長度起始值多少即可
stream.writeByte((byte)len);
len = (len < -120) ? -(len + 120) : -(len + 112); //看吧，計算出了長度,不包含第一個Byte哈[表示長度的Byte]
for (int idx = len; idx != 0; idx--) { //然后，這里從將i的二進制碼從左到右8位8位地拿出來，然后寫入流中
int shiftbits = (idx - 1) * 8;
long mask = 0xFFL << shiftbits;
stream.writeByte((byte)((i & mask) >> shiftbits));
}
}

現(xiàn)在知道它是怎么寫出去的了，再看看它是怎么讀進來，這顯然是個反過程。

WritableUtils.readVLong():

[html] view plain copy print ?

public static long readVLong(DataInput stream) throws IOException {
byte firstByte = stream.readByte();
int len = decodeVIntSize(firstByte);
if (len == 1) {
return firstByte;
}
long i = 0;
for (int idx = 0; idx < len-1; idx++) {
byte b = stream.readByte();
i = i << 8;
i = i | (b & 0xFF);
}
return (isNegativeVInt(firstByte) ? (i ^ -1L) : i);
}

這顯然就是讀出字節(jié)表示長度[包括表示長度],然后從輸入流中一個Byte一個Byte讀出來，& 0xFF是為了不讓系統(tǒng)自動類型轉(zhuǎn)換，然后再^ -1L，也就是連符號一起取反.

WritableUtils.decodeVIntSize()就是獲取編碼長度：

[html] view plain copy print ?

public static int decodeVIntSize(byte value) {
if (value >= -112) {
return 1;
} else if (value < -120) {
return -119 - value;
}
return -111 - value;
}

顯然，就是按照上面圖中的反過程，使用了-119和-111只是為了獲取編碼長度而不是實際數(shù)值長度[不包含表示長度的第一個Byte]而已。

繼續(xù)說前面的WritableComparator，它是實現(xiàn)了RawComparator接口。RawComparator無非就是一個compare()方法。

[html] view plain copy print ?

public interface RawComparator<T> extends Comparator<T> {
public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2);
}

WritableComparator是RawComparator實例的工廠[注冊了的Writable的實現(xiàn)類]，它為這些Writable實現(xiàn)類提供了反序列化用的方法，這些方法都比較簡單，比較難的readVInt()和readVLong()也就是上面說到的過程。Writable還提供了compare()的默認實現(xiàn)，它會反序列化才比較。如果WritableComparator.get()沒有得到注冊的Comparator，則會創(chuàng)建一個新的Comparator[其實是WritableComparator的實例]，然后當你使用 public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2)進行比較，它會去使用你要比較的Writable的實現(xiàn)的readFields()方法讀出value來。

比如，VIntWritable沒有注冊，我們get()時它就構(gòu)造一個WritableComparator，然后設(shè)置key1,key2,buffer,keyClass，當你使用 public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) ，則使用VIntWritable.readField從編碼后的byte[]中讀取value值再進行比較。

然后是ArrayWritable和TwoDArrayWritable，AbstractMapWritable

這兩個Writable實現(xiàn)分別是對一位數(shù)組和二維數(shù)組的封裝，不難想象它們都應(yīng)該提供一個Writable數(shù)組和保持關(guān)于這個數(shù)組的類型，而且序列化和反序列化也將使用封裝的Writable實現(xiàn)的readFields()方法和write()方法。

[html] view plain copy print ?

public class TwoDArrayWritable implements Writable {
private Class valueClass;
private Writable[][] values;
//……others
public void readFields(DataInput in) throws IOException {
// construct matrix
values = new Writable[in.readInt()][];
for (int i = 0; i < values.length; i++) {
values[i] = new Writable[in.readInt()];
}
// construct values
for (int i = 0; i < values.length; i++) {
for (int j = 0; j < values[i].length; j++) {
Writable value; // construct value
try {
value = (Writable)valueClass.newInstance();
} catch (InstantiationException e) {
throw new RuntimeException(e.toString());
} catch (IllegalAccessException e) {
throw new RuntimeException(e.toString());
}
value.readFields(in); // read a value
values[i][j] = value; // store it in values
}
}
}
public void write(DataOutput out) throws IOException {
out.writeInt(values.length); // write values
for (int i = 0; i < values.length; i++) {
out.writeInt(values[i].length);
}
for (int i = 0; i < values.length; i++) {
for (int j = 0; j < values[i].length; j++) {
values[i][j].write(out);
}
}
}
}

也就是那樣，沒什么好講的了。

另外還有些TupleWritable，AbstractMapWritable->{MapWritable,SortMapWritable}，DBWritable，CompressedWritable，VersionedWritable，GenericWritable之類的，有必要時去再談它們，其實也差不多，功能不一樣而已。

參考資料：

[1]Hadoop權(quán)威指南中文版第二版

本站僅提供存儲服務(wù)，所有內(nèi)容均由用戶發(fā)布，如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容，請點擊舉報。

免费视频淫片aa毛片_日韩高清在线亚洲专区vr_日韩大片免费观看视频播放_亚洲欧美国产精品完整版