2. floating point number
待处理
Scale e.g. E8M0
类型转换
2.1. Microscaling Formats
E4M3 |
E5M2 |
|
|---|---|---|
Exponent bias |
7 |
15 |
Infinities |
N/A |
S 1111 002 |
NaN |
S 1111 1112 |
S 11111 {01, 10, 11}2 |
Zeros |
S 0000 0002 |
S 00000 002 |
Max normal |
S 1111 1102 = ± 28 × 1.75 = ± 448 |
S 11110 112 = ± 215 × 1.75 = ± 57,344 |
Min normal |
S 0001 0002 = ± 2−6 |
S 00001 002 = ± 2−14 |
Max subnormal |
S 0000 1112 = ± 2−6 × 0.875 |
S 00000 112 = ± 2−14 × 0.75 |
Min subnormal |
S 0000 0012 = ± 2−9 |
S 00000 012 = ± 2−16 |
E2M3 |
E3M2 |
|
|---|---|---|
Exponent bias |
1 |
3 |
Infinities |
N/A |
N/A |
NaN |
N/A |
N/A |
Zeros |
S 00 0002 |
S 000 002 |
Max normal |
S 11 1112 = ± 22 × 1.875 = ± 7.5 |
S 111 112 = ± 24 × 1.75 = ± 28 |
Min normal |
S 01 0002 = ± 20 = ± 1 |
S 001 002 = ± 2−2 = ± 0.25 |
Max subnormal |
S 00 1112 = ± 20 × 0.875 = ± 0.875 |
S 000 112 = ± 2−2 × 0.75 = ± 0.1875 |
Min subnormal |
S 00 0012 = ± 2−3 = ± 0.125 |
S 000 012 = ± 2−4 = ± 0.0625 |
E2M1 |
|
|---|---|
Exponent bias |
1 |
Infinities |
N/A |
NaN |
N/A |
Zeros |
S 00 02 |
Max normal |
S 11 12 = ± 22 × 1.5 = ± 6 |
Min normal |
S 01 02 = ± 20 × 1.0 = ± 1 |
Subnormal |
S 00 12 = ± 20 × 0.5 = ± 0.5 |
2.2. IEEE 浮点数
浮点数最新标准为IEEE 754-2019
浮点数格式如下:
S(sign) |
E (biased exponent) |
T (trailing significand field) |
1 bit |
w bits |
t bits, t = p -1 |
具有如下关系:
关于biased E的说明:
normal number: [1 , \(2^w - 2\)], 值为 \((-1)^s \times 2^{E-bias} \times (1+ 2^{1-p} \times T)\)
0, 当T=0表示 \(\pm 0\); 当T!=0 表示 subnormal number, 值为 \((-1)^s \times 2^{e_{min}} \times (0+ 2^{1-p} \times T)\)
\(2^w − 1\) (二进制全部为1), 当T=0, 表示 \(\pm \infty\); 当T != 0, 表示 NaN.
2.2.1. ieee 规定的16, 32, 64, 128比特的浮点数格式列表
参数 |
binary16 |
binary32 |
binary64 |
binary128 |
|---|---|---|---|---|
指数位数 |
5 |
8 |
11 |
15 |
emax/bias |
15 |
127 |
1023 |
16383 |
小数位数 |
10 |
23 |
52 |
112 |