Knowledge

Floating-point arithmetic

Source 📝

5973:
magnitude the risk of numerical anomalies, in addition to, or in lieu of, a more careful numerical analysis. These include: as noted above, computing all expressions and intermediate results in the highest precision supported in hardware (a common rule of thumb is to carry twice the precision of the desired result, i.e. compute in double precision for a final single-precision result, or in double extended or quad precision for up to double-precision results); and rounding input data and results to only the precision required and supported by the input data (carrying excess precision in the final result beyond that required and supported by the input data can be misleading, increases storage cost and decreases speed, and the excess bits can affect convergence of numerical procedures: notably, the first form of the iterative example given below converges correctly when using this rule of thumb). Brief descriptions of several additional issues and techniques follow.
246: 5300: 4639: 3515:"sticky" means that they are not reset by the next (arithmetic) operation, but stay set until explicitly reset. The use of "sticky" flags thus allows for testing of exceptional conditions to be delayed until after a full floating-point expression or subroutine: without them exceptional conditions that could not be otherwise ignored would require explicit testing immediately after every floating-point operation. By default, an operation always returns a result according to specification without interrupting computation. For instance, 1/0 returns +∞, while also setting the divide-by-zero flag bit (this default of ∞ is designed to often return a finite result when used in subsequent operations and so be safely ignored). 5295:{\displaystyle {\begin{aligned}\operatorname {fl} (x\cdot y)&=\operatorname {fl} {\big (}\operatorname {fl} (x_{1}\cdot y_{1})+\operatorname {fl} (x_{2}\cdot y_{2}){\big )},&&{\text{ where }}\operatorname {fl} (){\text{ indicates correctly rounded floating-point arithmetic}}\\&=\operatorname {fl} {\big (}(x_{1}\cdot y_{1})(1+\delta _{1})+(x_{2}\cdot y_{2})(1+\delta _{2}){\big )},&&{\text{ where }}\delta _{n}\leq \mathrm {E} _{\text{mach}},{\text{ from above}}\\&={\big (}(x_{1}\cdot y_{1})(1+\delta _{1})+(x_{2}\cdot y_{2})(1+\delta _{2}){\big )}(1+\delta _{3})\\&=(x_{1}\cdot y_{1})(1+\delta _{1})(1+\delta _{3})+(x_{2}\cdot y_{2})(1+\delta _{2})(1+\delta _{3}),\end{aligned}}} 3113:, sometimes called Banker's Rounding) is more commonly used. This method rounds the ideal (infinitely precise) result of an arithmetic operation to the nearest representable value, and gives that representation as the result. In the case of a tie, the value that would make the significand end in an even digit is chosen. The IEEE 754 standard requires the same rounding to be applied to all fundamental algebraic operations, including square root and conversions, when there is a numeric (non-NaN) result. It means that the results of IEEE 754 operations are completely determined in all bits of the result, except for the representation of NaNs. ("Library" functions such as cosine and log are not mandated.) 258: 8242:, while IEEE places the decimal point after the assumed bit. ieee_exp = msbin - 2; /* actually, msbin-1-128+127 */ _dmsbintoieee(double *src8, double *dest8) MS Binary Format byte order => m7 | m6 | m5 | m4 | m3 | m2 | m1 | exponent m1 is most significant byte => smmm|mmmm m7 is the least significant byte MBF is bias 128 and IEEE is bias 1023. MBF places the decimal point before the assumed bit, while IEEE places the decimal point after the assumed bit. ieee_exp = msbin - 128 - 1 + 1023; 4583:, can be used to establish that an algorithm implementing a numerical function is numerically stable. The basic approach is to show that although the calculated result, due to roundoff errors, will not be exactly correct, it is the exact solution to a nearby problem with slightly perturbed input data. If the perturbation required is small, on the order of the uncertainty in the input data, then the results are in some sense as accurate as the data "deserves". The algorithm is then defined as 6153:, where epsilon is sufficiently small and tailored to the application, such as 1.0E−13). The wisdom of doing this varies greatly, and can require numerical analysis to bound epsilon. Values derived from the primary data representation and their comparisons should be performed in a wider, extended, precision to minimize the risk of such inconsistencies due to round-off errors. It is often better to organize the code in such a way that such tests are unnecessary. For example, in 3578: 1423: 2959:
Any rational with a denominator that has a prime factor other than 2 will have an infinite binary expansion. This means that numbers that appear to be short and exact when written in decimal format may need to be approximated when converted to binary floating-point. For example, the decimal number 0.1 is not representable in binary floating-point of any finite precision; the exact binary representation would have a "1100" sequence continuing endlessly:
1073: 40: 1682: 2630: 1331: 771: 6870:. The "fast math" option on many compilers (ICC, GCC, Clang, MSVC...) turns on reassociation along with unsafe assumptions such as a lack of NaN and infinite numbers in IEEE 754. Some compilers also offer more granular options to only turn on reassociation. In either case, the programmer is exposed to many of the precision pitfalls mentioned above for the portion of the program using "fast" math. 5652: 3083:, the ULP is 2×16, or 2. For numbers with a base-2 exponent part of 0, i.e. numbers with an absolute value higher than or equal to 1 but lower than 2, an ULP is exactly 2 or about 10 in single precision, and exactly 2 or about 10 in double precision. The mandated behavior of IEEE-compliant hardware is that the result be within one-half of a ULP. 3523:/C11 and Fortran) have been updated to specify methods to access and change status flag bits. The 2008 version of the IEEE 754 standard now specifies a few operations for accessing and handling the arithmetic flag bits. The programming model is based on a single thread of execution and use of them by multiple threads has to be handled by a 5981:), and at its worst when it is expected to model the interactions of quantities expressed as decimal strings that are expected to be exact. An example of the latter case is financial calculations. For this reason, financial software tends not to use a binary floating-point number representation. The "decimal" data type of the 6846:. As the recurrence is applied repeatedly, the accuracy improves at first, but then it deteriorates. It never gets better than about 8 digits, even though 53-bit arithmetic should be capable of about 16 digits of precision. When the second form of the recurrence is used, the value converges to 15 digits of precision. 1758:(64-bit) binary floating-point number has a coefficient of 53 bits (including 1 implied bit), an exponent of 11 bits, and 1 sign bit. Since 2 = 1024, the complete range of the positive normal floating-point numbers in this format is from 2 ≈ 2 × 10 to approximately 2 ≈ 2 × 10. 2951:
floating-point format then the conversion is exact. If there is not an exact representation then the conversion requires a choice of which floating-point number to use to represent the original value. The representation chosen will have a different value from the original, and the value thus adjusted is called the
1068:{\displaystyle {\begin{aligned}&\left(\sum _{n=0}^{p-1}{\text{bit}}_{n}\times 2^{-n}\right)\times 2^{e}\\={}&\left(1\times 2^{-0}+1\times 2^{-1}+0\times 2^{-2}+0\times 2^{-3}+1\times 2^{-4}+\cdots +1\times 2^{-23}\right)\times 2^{1}\\\approx {}&1.5707964\times 2\\\approx {}&3.1415928\end{aligned}}} 1411:." The format he proposed shows the need for a fixed-sized significand as is presently used for floating-point data, fixing the location of the decimal point in the significand so that each representation was unique, and how to format such numbers by specifying a syntax to be used that could be entered through a 5380: 191: 3915:
Also, the non-representability of π (and π/2) means that an attempted computation of tan(π/2) will not yield a result of infinity, nor will it even overflow in the usual floating-point formats (assuming an accurate implementation of tan). It is simply not possible for standard floating-point hardware
3518:
The original IEEE 754 standard, however, failed to recommend operations to handle such sets of arithmetic exception flag bits. So while these were implemented in hardware, initially programming language implementations typically did not provide a means to access them (apart from assembler). Over time
2781:
based on the Nvidia Ampere architecture. The drawback of this format is its size, which is not a power of 2. However, according to Nvidia, this format should only be used internally by hardware to speed up computations, while inputs and outputs should be stored in the 32-bit single-precision IEEE 754
2749:
until Microsoft adopted the IEEE-754 standard format in all its products starting in 1988 to their current releases. MBF consists of the MBF single-precision format (32 bits, "6-digit BASIC"), the MBF extended-precision format (40 bits, "9-digit BASIC"), and the MBF double-precision format (64 bits);
1748:
components, whose range depends exclusively on the number of bits or digits in their representation. Whereas components linearly depend on their range, the floating-point range linearly depends on the significand range and exponentially on the range of exponent component, which attaches outstandingly
1156:
representation uses integer hardware operations controlled by a software implementation of a specific convention about the location of the binary or decimal point, for example, 6 bits or digits from the right. The hardware to manipulate these representations is less costly than floating point, and it
396:
to which numbers can be represented. The radix point position is assumed always to be somewhere within the significand—often just after or just before the most significant digit, or to the right of the rightmost (least significant) digit. This article generally follows the convention that the radix
2938:
or √2, or non-terminating rational numbers, must be approximated. The number of digits (or bits) of precision also limits the set of rational numbers that can be represented exactly. For example, the decimal number 123456789 cannot be exactly represented if only eight decimal digits of precision are
5730:
for that data is used: apparently equivalent formulations of expressions in a programming language can differ markedly in their numerical stability. One approach to remove the risk of such loss of accuracy is the design and analysis of numerically stable algorithms, which is an aim of the branch of
2958:
Whether or not a rational number has a terminating expansion depends on the base. For example, in base-10 the number 1/2 has a terminating expansion (0.5) while the number 1/3 does not (0.333...). In base-2 only rationals with denominators that are powers of 2 (such as 1/2 or 3/16) are terminating.
3514:
Here, the required default method of handling exceptions according to IEEE 754 is discussed (the IEEE 754 optional trapping and other "alternate exception handling" modes are not discussed). Arithmetic exceptions are (by default) required to be recorded in "sticky" status flag bits. That they are
1723:
A precisely specified behavior for the arithmetic operations: A result is required to be produced as if infinitely precise arithmetic were used to yield a value that is then rounded according to specific rules. This means that a compliant computer program would always produce the same result when
1661:
Initially, computers used many different representations for floating-point numbers. The lack of standardization at the mainframe level was an ongoing problem by the early 1970s for those writing and maintaining higher-level source code; these manufacturer floating-point standards differed in the
8354:
Micikevicius, Paulius; Stosic, Dusan; Burgess, Neil; Cornea, Marius; Dubey, Pradeep; Grisenthwaite, Richard; Ha, Sangwon; Heinecke, Alexander; Judd, Patrick; Kamalu, John; Mellempudi, Naveen; Oberman, Stuart; Shoeybi, Mohammad; Siu, Michael; Wu, Hao (2022-09-12). "FP8 Formats for Deep Learning".
2347:
Any integer with absolute value less than 2 can be exactly represented in the single-precision format, and any integer with absolute value less than 2 can be exactly represented in the double-precision format. Furthermore, a wide range of powers of 2 times such a number can be represented. These
207:
Floating-point arithmetic operations, such as addition and division, approximate the corresponding real number arithmetic operations by rounding any result that is not a floating-point number itself to a nearby floating-point number. For example, in a floating-point arithmetic with five base-ten
2590:
In the IEEE binary interchange formats the leading 1 bit of a normalized significand is not actually stored in the computer datum. It is called the "hidden" or "implicit" bit. Because of this, the single-precision format actually has a significand with 24 bits of precision, the double-precision
5972:
A detailed treatment of the techniques for writing high-quality floating-point software is beyond the scope of this article, and the reader is referred to, and the other references at the bottom of this article. Kahan suggests several rules of thumb that can substantially decrease by orders of
3876:
For example, the decimal numbers 0.1 and 0.01 cannot be represented exactly as binary floating-point numbers. In the IEEE 754 binary32 format with its 24-bit significand, the result of attempting to square the approximation to 0.1 is neither 0.01 nor the representable number closest to it. The
7719:
the Maniac's floating base, which is 2 = 65,536. The Maniac's large base permits a considerable increase in the speed of floating point arithmetic. Although such a large base implies the possibility of as many as 15 lead zeros, the large word size of 48 bits guarantees adequate significance.
2950:
When a number is represented in some format (such as a character string) which is not a native floating-point representation supported in a computer implementation, then it will require a conversion before it can be used in that implementation. If the number can be represented exactly in the
6896:
A common problem in "fast" math is that subexpressions may not be optimized identically from place to place, leading to unexpected differences. One interpretation of the issue is that "fast" math as implemented currently has a poorly defined semantics. One attempt at formalizing "fast" math
5968:
is required. Certain "optimizations" that compilers might make (for example, reordering operations) can work against the goals of well-behaved software. There is some controversy about the failings of compilers and language designs in this area: C99 is an example of a language where such
2374:
Comparison of floating-point numbers, as defined by the IEEE standard, is a bit different from usual integer comparison. Negative and positive zero compare equal, and every NaN compares unequal to every value, including itself. All finite floating-point numbers are strictly smaller than
2391:
Floating-point numbers are typically packed into a computer datum as the sign bit, the exponent field, and the significand or mantissa, from left to right. For the IEEE 754 binary formats (basic and extended) which have extant hardware implementations, they are apportioned as follows:
3148:
Converting a double-precision binary floating-point number to a decimal string is a common operation, but an algorithm producing results that are both accurate and minimal did not appear in print until 1990, with Steele and White's Dragon4. Some of the improvements since then include:
6179:, and so an awareness of when loss of significance can occur is essential. For example, if one is adding a very large number of numbers, the individual addends are very small compared with the sum. This can lead to loss of significance. A typical addition would then be something like 6197:
approximated π by calculating the perimeters of polygons inscribing and circumscribing a circle, starting with hexagons, and successively doubling the number of sides. As noted above, computations may be rearranged in a way that is mathematically equivalent but less prone to error
1715:
A precisely specified floating-point representation at the bit-string level, so that all compliant computers interpret bit patterns the same way. This makes it possible to accurately and efficiently transfer floating-point numbers from one computer to another (after accounting for
602:
The way in which the significand (including its sign) and exponent are stored in a computer is implementation-dependent. The common IEEE formats are described in detail later and elsewhere, but as an example, in the binary single-precision (32-bit) floating-point representation,
5976:
As decimal fractions can often not be exactly represented in binary floating-point, such arithmetic is at its best when it is simply being used to measure real-world quantities over a wide range of scales (such as the orbital period of a moon around Saturn or the mass of a
6888:
compilers, as allowed by the ISO/IEC 1539-1:2004 Fortran standard, reassociation is the default, with breakage largely prevented by the "protect parens" setting (also on by default). This setting stops the compiler from reassociating beyond the boundaries of parentheses.
8742: 4138:: subtraction of nearly equal operands may cause extreme loss of accuracy. When we subtract two almost equal numbers we set the most significant digits to zero, leaving ourselves with just the insignificant, and most erroneous, digits. For example, when determining a 343:, so that it lies within a specific range—typically between 1 and 10, with the radix point appearing immediately after the first digit. As a power of ten, the scaling factor is then indicated separately at the end of the number. For example, the orbital period of 5735:. Another approach that can protect against the risk of numerical instabilities is the computation of intermediate (scratch) values in an algorithm at a higher precision than the final result requires, which can remove, or reduce by orders of magnitude, such risk: 3358:
There are no cancellation or absorption problems with multiplication or division, though small errors may accumulate as operations are performed in succession. In practice, the way these operations are carried out in digital logic can be quite complex (see
195:
However, unlike 12.345, 12.3456 is not a floating-point number in base ten with five digits of precision—it needs six digits of precision; the nearest floating-point number with only five digits is 12.346. In practice, most floating-point systems use
7291:
that, on rare occasions, gave slightly incorrect results. Many computers had been shipped before the error was discovered. Until the defective computers were replaced, patched versions of compilers were developed that could avoid the failing cases. See
1167:, the graph of the logarithm function) is smooth (except at 0). Conversely to floating-point arithmetic, in a logarithmic number system multiplication, division and exponentiation are simple to implement, but addition and subtraction are complex. The ( 1662:
word sizes, the representations, and the rounding behavior and general accuracy of operations. Floating-point compatibility across multiple computing systems was in desperate need of standardization by the early 1980s, leading to the creation of the
3486:
An operation can be legal in principle, but the result can be impossible to represent in the specified format, because the exponent is too large or too small to encode in the exponent field. Such an event is called an overflow (exponent too large),
6185:
The low 3 digits of the addends are effectively lost. Suppose, for example, that one needs to add many numbers, all approximately equal to 3. After 1000 of them have been added, the running sum is about 3000; the lost digits are not regained. The
100: 7307:
But an attempted computation of cos(π) yields −1 exactly. Since the derivative is nearly zero near π, the effect of the inaccuracy in the argument is far smaller than the spacing of the floating-point numbers around −1, and the rounded result is
3185:
The problem of parsing a decimal string into a binary FP representation is complex, with an accurate parser not appearing until Clinger's 1990 work (implemented in dtoa.c). Further work has likewise progressed in the direction of faster parsing.
2283:, also ambiguously called "extended precision" format. This is a binary format that occupies at least 79 bits (80 if the hidden/implicit bit rule is not used) and its significand has a precision of at least 64 bits (about 19 decimal digits). The 3140:. The alternative rounding modes are also useful in diagnosing numerical instability: if the results of a subroutine vary substantially between rounding to + and − infinity then it is likely numerically unstable and affected by round-off error. 3507:. (The term "exception" as used in IEEE 754 is a general term meaning an exceptional condition, which is not necessarily an error, and is a different usage to that typically defined in programming languages such as a C++ or Java, in which an " 3261:
In the above conceptual examples it would appear that a large number of extra digits would need to be provided by the adder to ensure correct rounding; however, for binary addition or subtraction using careful implementation techniques only a
1157:
can be used to perform normal integer operations, too. Binary fixed point is usually used in special-purpose applications on embedded processors that can only do integer arithmetic, but decimal fixed point is common in commercial applications.
4575: 3868:
The fact that floating-point numbers cannot accurately represent all real numbers, and that floating-point operations cannot accurately represent true arithmetic operations, leads to many surprising situations. This is related to the finite
2303:
purposes, many tools store this 80-bit value in a 96-bit or 128-bit space. On other processors, "long double" may stand for a larger format, such as quadruple precision, or just double precision, if any form of extended precision is not
3229:
Hence: 123456.7 + 101.7654 = (1.234567 × 10^5) + (1.017654 × 10^2) = (1.234567 × 10^5) + (0.001017654 × 10^5) = (1.234567 + 0.001017654) × 10^5 = 1.235584654 × 10^5
5993:
standard, are designed to avoid the problems of binary floating-point representations when applied to human-entered exact decimal values, and make the arithmetic always behave as expected when numbers are printed in decimal.
3222:
A simple method to add floating-point numbers is to first represent them with the same exponent. In the example below, the second number is shifted right by three digits, and one then proceeds with the usual addition method:
5647:{\displaystyle {\begin{aligned}{\hat {x}}_{1}&=x_{1}(1+\delta _{1});&{\hat {x}}_{2}&=x_{2}(1+\delta _{2});\\{\hat {y}}_{1}&=y_{1}(1+\delta _{3});&{\hat {y}}_{2}&=y_{2}(1+\delta _{3}),\\\end{aligned}}} 1595:. The arithmetic is actually implemented in software, but with a one megahertz clock rate, the speed of floating-point and fixed-point operations in this machine were initially faster than those of many competing computers. 3095:: that is, the rounded result is as if infinitely precise arithmetic was used to compute the value and then rounded (although in implementation only three extra bits are needed to ensure this). There are several different 2587:; values of all 1s are reserved for the infinities and NaNs. The exponent range for normal numbers is for single precision, for double, or for quad. Normal numbers exclude subnormal values, zeros, infinities, and NaNs. 8214: 579:/100. The base determines the fractions that can be represented; for instance, 1/5 cannot be represented exactly as a floating-point number using a binary base, but 1/5 can be represented exactly using a decimal base ( 5371: 689: 331:
systems, a position in the string is specified for the radix point. So a fixed-point scheme might use a string of 8 decimal digits with the decimal point in the middle, whereby "00012345" would represent 0001.2345.
1896: 2772:
The TensorFloat-32 format combines the 8 bits of exponent of the Bfloat16 with the 10 bits of trailing significand field of half-precision formats, resulting in a size of 19 bits. This format was introduced by
4475: 3741: 428:, equivalent to shifting the radix point from its implied position by a number of places equal to the value of the exponent—to the right if the exponent is positive or to the left if the exponent is negative. 4126:
1234.567 × 3.333333 = 4115.223 1.234567 × 3.333333 = 4.115223 4115.223 + 4.115223 = 4119.338 but 1234.567 + 1.234567 = 1235.802 1235.802 × 3.333333 = 4119.340
1108:
It can be required that the most significant digit of the significand of a non-zero number be non-zero (except when the corresponding exponent would be smaller than the minimum one). This process is called
693:
In this binary expansion, let us denote the positions from 0 (leftmost bit, or most significant bit) to 32 (rightmost bit). The 24-bit significand will stop at position 23, shown as the underlined bit
2696:
prior to version 4.00. QuickBASIC version 4.00 and 4.50 switched to the IEEE 754-1985 format but can revert to the MBF format using the /MBF command option. MBF was designed and developed on a simulated
6415: 6330: 757: 2330: 5698: 6132: 2583:
While the exponent can be positive or negative, in binary formats it is stored as an unsigned number that has a fixed "bias" added to it. Values of all 0s in this field are reserved for the zeros and
2011: 4412: 5918:), then up to full precision in the final double result can be maintained. Alternatively, a numerical analysis of the algorithm reveals that if the following non-obvious change to line is made: 3091:
Rounding is used when the exact result of a floating-point operation (or a conversion to floating-point format) would need more digits than there are digits in the significand. IEEE 754 requires
5385: 4644: 776: 513: 6468: 5726:, meaning that the correct result is hypersensitive to tiny perturbations in its data. However, even functions that are well-conditioned can suffer from large loss of accuracy if an algorithm 3352:
e=3; s=4.734612 × e=5; s=5.417242 ----------------------- e=8; s=25.648538980104 (true product) e=8; s=25.64854 (after rounding) e=9; s=2.564854 (after normalization)
2717:
4-kilobytes memory. In December 1975, the 8-kilobytes version added a double-precision (64 bits) format. A single-precision (40 bits) variant format was adopted for other CPU's, notably the
4268:
Conversions to integer are not intuitive: converting (63.0/9.0) to integer yields 7, but converting (0.63/0.09) may yield 6. This is because conversions generally truncate rather than round.
3173:
Schubfach, an always-succeeding algorithm that is based on a similar idea to Ryū, developed almost simultaneously and independently. Performs better than Ryū and Grisu3 in certain benchmarks.
6842:
While the two forms of the recurrence formula are clearly mathematically equivalent, the first subtracts 1 from a number extremely close to 1, leading to an increasingly problematic loss of
4593:
of a function for a given problem indicates the inherent sensitivity of the function to small perturbations in its input and is independent of the implementation used to solve the problem.
4218: 6245: 4507: 3258:
e=5; s=1.234567 + e=5; s=0.00000009876543 (after shifting) ---------------------- e=5; s=1.23456709876543 (true sum) e=5; s=1.234567 (after rounding and normalization)
6069: 3617:
The default return value for each of the exceptions is designed to give the correct result in the majority of cases such that the exceptions can be ignored in the majority of codes.
2934:
with a terminating expansion in the relevant base (for example, a terminating decimal expansion in base-10, or a terminating binary expansion in base-2). Irrational numbers, such as
219:
can "float" anywhere to the left, right, or between the significant digits of the number. This position is indicated by the exponent, so floating point can be considered a form of
8137:
Since the IEEE-754 floating-point specification does not define a 16-bit format, ILM created the "half" format. Half values have 1 sign bit, 5 exponent bits, and 10 mantissa bits.
7333:
of this function demonstrates that it is well-conditioned near 1: A(x) = 1 − (x−1)/2 + (x−1)^2/12 − (x−1)^4/720 + (x−1)^6/30240 − (x−1)^8/1209600 + ... for |x−1| < π.
7320:
notes: "Except in extremely uncommon situations, extra-precise arithmetic generally attenuates risks due to roundoff at far less cost than the price of a competent error-analyst."
6494: 8194: 5765:
which is well-conditioned at 1.0, however it can be shown to be numerically unstable and lose up to half the significant digits carried by the arithmetic when computed near 1.0.
1658:
systems. In 1998, IBM implemented IEEE-compatible binary floating-point arithmetic in its mainframes; in 2005, IBM also added IEEE-compatible decimal floating-point arithmetic.
1646:, also introduced in 1962, supported single-precision and double-precision representations, but with no relation to the UNIVAC's representations. Indeed, in 1964, IBM introduced 1449:; it uses a 24-bit binary floating-point number representation with a 7-bit signed exponent, a 17-bit significand (including one implicit bit), and a sign bit. The more reliable 1163:(LNSs) represent a real number by the logarithm of its absolute value and a sign bit. The value distribution is similar to floating point, but the value-to-representation curve ( 7537: 315:
There are several mechanisms by which strings of digits can represent numbers. In standard mathematical notation, the digit string can be of any length, and the location of the
1499: 7552: 7270:
Computer hardware does not necessarily compute the exact value; it simply has to produce the equivalent rounded result as though it had computed the infinitely precise result.
443:
together with 5 as the exponent. To determine the actual value, a decimal point is placed after the first digit of the significand and the result is multiplied by 10 to give
9061:(NB. Kahan estimates that the incidence of excessively inaccurate results near singularities is reduced by a factor of approx. 1/2000 using the 11 extra bits of precision of 3355:
Similarly, division is accomplished by subtracting the divisor's exponent from the dividend's exponent, and dividing the dividend's significand by the divisor's significand.
1525: 3060:
The result of rounding differs from the true value by about 0.03 parts per million, and matches the decimal representation of π in the first 7 digits. The difference is the
1724:
given a particular input, thus mitigating the almost mystical reputation that floating-point computation had developed for its hitherto seemingly non-deterministic behavior.
8062: 3136:
Alternative modes are useful when the amount of error being introduced must be bounded. Applications that require a bounded error are multi-precision floating-point, and
1551: 1313: 1274: 1606:. For many decades after that, floating-point hardware was typically an optional feature, and computers that had it were said to be "scientific computers", or to have " 1204:
represent numbers as fractions with integral numerator and denominator, and can therefore represent any rational number exactly. Such packages generally need to use "
1196:, 1/3 and 1/10) cannot be represented exactly in binary floating point, no matter what the precision is. Using a different radix allows one to represent some of them ( 8283: 4259:
grows smaller, cancelling out the most significant and least erroneous digits and making the most erroneous digits more important. As a result the smallest number of
3836: 3803: 3612: 3561:, set if the absolute value of the rounded value is too large to be represented. An infinity or maximal finite value is returned, depending on which rounding is used. 768:. The significand is assumed to have a binary point to the right of the leftmost bit. So, the binary representation of π is calculated from left-to-right as follows: 3768: 1930: 238:. For this reason, floating-point arithmetic is often used to allow very small and very large real numbers that require fast processing times. The result of this 1389: 1250: 627: 8656: 4130:
In addition to loss of significance, inability to represent numbers such as π and 0.1 exactly, and other slight inaccuracies, the following phenomena may occur:
186:{\displaystyle 12.345=\!\underbrace {12345} _{\text{significand}}\!\times \!\underbrace {10} _{\text{base}}\!\!\!\!\!\!\!\overbrace {{}^{-3}} ^{\text{exponent}}} 4634: 4614: 1149:
The floating-point representation is by far the most common way of representing in computers an approximation to real numbers. However, there are alternatives:
6157:, exact tests of whether a point lies off or on a line or plane defined by other points can be performed using adaptive precision or exact arithmetic methods. 5305: 3633:
exception subsequently if not, and so can also typically be ignored. For example, the effective resistance of n resistors in parallel (see fig. 1) is given by
1214:
allows one to represent numbers as intervals and obtain guaranteed bounds on results. It is generally based on other arithmetics, in particular floating point.
640: 242:
is that the numbers that can be represented are not uniformly spaced; the difference between two consecutive representable numbers varies with their exponent.
7746: 4265:
possible will give a more erroneous approximation of a derivative than a somewhat larger number. This is perhaps the most common and serious accuracy problem.
455:. In storing such a number, the base (10) need not be stored, since it will be the same for the entire range of supported numbers, and can thus be inferred. 8027: 6862:
cannot as effectively reorder arithmetic expressions as they could with integer and fixed-point arithmetic, presenting a roadblock in optimizations such as
3407:, when the decimal exponent is omitted, a decimal point is needed to differentiate them from integers. Other languages do not have an integer type (such as 245: 3333:
illustrates the danger in assuming that all of the digits of a computed result are meaningful. Dealing with the consequences of these errors is a topic in
2939:
available (it would be rounded to one of the two straddling representable values, 12345678 × 10 or 12345679 × 10), the same applies to
2789:
architecture GPUs provide two FP8 formats: one with the same numerical range as half-precision (E5M2) and one with higher precision, but less range (E4M3).
2765:
number. The tradeoff is a reduced precision, as the trailing significand field is reduced from 10 to 7 bits. This format is mainly used in the training of
1457:, completed in 1941, has representations for both positive and negative infinities; in particular, it implements defined operations with infinity, such as 1315:" exactly, because it is programmed to process the underlying mathematics directly, instead of using approximate values for each intermediate calculation. 273:
Standard for Floating-Point Arithmetic was established, and since the 1990s, the most commonly encountered representations are those defined by the IEEE.
9816:. (NB. This website contains open source floating-point IP cores for the implementation of floating-point operators in FPGA or ASIC devices. The project 5722:, more complicated formulae can suffer from larger errors for a variety of reasons. The loss of accuracy can be substantial if a problem or its data are 4417: 2348:
properties are sometimes used for purely integer data, to get 53-bit integers on platforms that have double-precision floats but only 32-bit integers.
2214: 1125:. Therefore, it does not need to be represented in memory, allowing the format to have one more bit of precision. This rule is variously called the 6160:
Small errors in floating-point arithmetic can grow when mathematical algorithms perform operations an enormous number of times. A few examples are
3555:
inexact (or maybe limited to if it has denormalization loss, as per the 1985 version of IEEE 754), returning a subnormal value including the zeros.
2291:
standards of the C language family, in their annex F ("IEC 60559 floating-point arithmetic"), recommend such an extended format to be provided as "
8475: 717: 9810:(NB. A compendium of non-intuitive behaviors of floating point on popular architectures, with implications for program verification and testing.) 7178: 7174: 3242:
This is the true result, the exact sum of the operands. It will be rounded to seven digits and then normalized if necessary. The final result is
2333:(binary128). This is a binary format that occupies 128 bits (16 bytes) and its significand has a precision of 113 bits (about 34 decimal digits). 5661: 4341: 4289:
Testing for equality is problematic. Two computational sequences that are mathematically equal may well produce different floating-point values.
4227:
very close to zero; however, when using floating-point operations, the smallest number will not give the best approximation of a derivative. As
8971: 8924: 3167:
Errol3, an always-succeeding algorithm similar to, but slower than, Grisu3. Apparently not as good as an early-terminating Grisu with fallback.
9705:
Muller, Jean-Michel; Brunie, Nicolas; de Dinechin, Florent; Jeannerod, Claude-Pierre; Joldes, Mioara; Lefèvre, Vincent; Melquiond, Guillaume;
8295: 7149:(1964), the Siemens 4004 (1965), 7.700 (1974), 7.800, 7.500 (1977) series mainframes and successors, the Unidata 7.000 series mainframes, the 2299:
architecture. Often on such processors, this format can be used with "long double", though extended precision is not available with MSVC. For
1813: 9829: 7985: 2295:". A format satisfying the minimal requirements (64-bit significand precision, 15-bit exponent, thus fitting on 80 bits) is provided by the 9186: 4365: 3132:
round toward zero (truncation; it is similar to the common behavior of float-to-integer conversions, which convert −3.9 to −3 and 3.9 to 3)
1361:
and described a way to store floating-point numbers in a consistent manner. He stated that numbers will be stored in exponential format as
9214: 8770: 2750:
each of them is represented with an 8-bit exponent, followed by a sign bit, followed by a significand of respectively 23, 31, and 55 bits.
7957:"Extended" is IEC 60559's double-extended data format. Extended refers to both the common 80-bit and quadruple 128-bit IEC 60559 formats. 2277:
family. This is a binary format that occupies 64 bits (8 bytes) and its significand has a precision of 53 bits (about 16 decimal digits).
461: 8694: 3636: 2267:
family. This is a binary format that occupies 32 bits (4 bytes) and its significand has a precision of 24 bits (about 7 decimal digits).
1936:
which has a 1 as the leading digit and 0 for the remaining digits of the significand, and the smallest possible value for the exponent.
7704: 7526: 9987: 8713:. The Morgan Kaufmann series in computer architecture and design (5th ed.). Waltham, Massachusetts, USA: Elsevier. p. 793. 7011: 6168:
computation, and differential equation solving. These algorithms must be very carefully designed, using numerical approaches such as
5736: 2096: 4146: 1391:, and offered three rules by which consistent manipulation of floating-point numbers by machines could be implemented. For Torres, " 7879:"The Baleful Effect of Computer Languages and Benchmarks upon Applied Mathematics, Physics and Chemistry. John von Neumann Lecture" 3120:
round to nearest, where ties round to the nearest even digit in the required position (the default and by far the most common mode)
2200: 8787: 7406:
Muller, Jean-Michel; Brisebarre, Nicolas; de Dinechin, Florent; Jeannerod, Claude-Pierre; Lefèvre, Vincent; Melquiond, Guillaume;
6337: 6252: 5969:
optimizations are carefully specified to maintain numerical precision. See the external references at the bottom of this article.
3856:
routine, as part of its normal operation, may evaluate a passed-in function at values outside of its domain, returning NaN and an
3071:
The arithmetical difference between two consecutive representable floating-point numbers which have the same exponent is called a
10358: 9992: 7805: 2769:
models, where range is more valuable than precision. Many machine learning accelerators provide hardware support for this format.
2106: 8051: 6874: 4585: 3503:
that the programmer might be able to catch. How this worked was system-dependent, meaning that floating-point programs were not
9982: 9977: 9320: 7934: 7822: 7022: 6074: 2905: 2762: 2637: 2623: 2270: 2260: 2086: 2076: 1946: 1754: 9088: 7533: 3300:
e=5; s=1.234571 − e=5; s=1.234567 ---------------- e=5; s=0.000004 e=−1; s=4.000000 (after rounding and normalization)
2323:
floating-point formats. These formats (especially decimal128) are pervasive in financial transactions because, along with the
9732: 9641: 9608: 9568: 9545: 9158: 9116: 9046: 9010: 8979: 8932: 8895: 8387: 7918: 7885: 7592: 7500: 7433: 7375:
of the first. By multiplying the top and bottom of the first expression by this conjugate, one obtains the second expression.
3239:
e=5; s=1.234567 + e=5; s=0.001017654 (after shifting) -------------------- e=5; s=1.235584654 (true sum: 123558.4654)
1611: 1186: 6877:
at startup, affecting the floating-point behavior of not only the generated code, but also any program using such code as a
6149:
is approximately equal to -4.44089209850063e-16). Consequently, such tests are sometimes replaced with "fuzzy" comparisons (
5997:
Expectations from mathematics may not be realized in the field of floating-point computation. For example, it is known that
3839: 366:
Floating-point representation is similar in concept to scientific notation. Logically, a floating-point number consists of:
9965: 9866: 9280: 8718: 8461: 8097: 7670: 7576: 7157:(1982) computers, and in 360/370-compatible mainframe families made by Fujitsu, Amdahl and Hitachi. It is also used in the 6973: 2850: 2794:
Bfloat16, TensorFloat-32, and the two FP8 formats, compared with IEEE 754 half-precision and single-precision formats
2758: 2336: 2069: 231: 2244:
The standard provides for many closely related formats, differing in only a few details. Five of these formats are called
2153: 1735:, etc.) to propagate through a computation in a benign manner and then be handled by the software in a controlled fashion. 1704:
for being the primary architect behind this proposal; he was aided by his student Jerome Coonen and a visiting professor,
235: 9779: 3625:
returns a value less than or equal to the smallest positive normal number in magnitude and can almost always be ignored.
3360: 2610:
The sum of the exponent bias (127) and the exponent (1) is 128, so this is represented in the single-precision format as
1416: 7633: 6421: 3368: 7756: 9691: 3349:
To multiply, the significands are multiplied while the exponents are added, and the result is rounded and normalized.
2343:
graphics language, and in the openEXR standard (where it actually predates the introduction in the IEEE 754 standard).
10143: 9671: 9516: 9475: 7471: 6867: 6134:, however these facts cannot be relied on when the quantities involved are the result of floating-point computation. 3327: = 4.000000 of the approximations. In extreme cases, all significant digits of precision can be lost. This 2526: 9087:. IFIP/SIAM/NIST Working Conference on Uncertainty Quantification in Scientific Computing, Boulder, CO. p. 33. 3975:
By the same token, an attempted computation of sin(π) will not yield zero. The result will be (approximately) 0.1225
327:
and the unstated radix point would be off the right-hand end of the string, next to the least significant digit. In
9081:
Desperately Needed Remedies for the Undebuggability of Large Floating-Point Computations in Science and Engineering
8740:, Huberto M Sierra, "Floating decimal point arithmetic control means for calculator", issued 1962-06-05 6863: 3123:
round to nearest, where ties round away from zero (optional for binary floating-point and commonly used in decimal)
2320: 2100: 1588: 1168: 3629:
returns infinity exactly, which will typically then divide a finite number and so give zero, or else will give an
3206:
or precision, except that normalization is optional (it does not affect the numerical value of the result). Here,
596: 10116: 9624: 8863: 8020: 7969: 7154: 6953: 2324: 2316: 2090: 2080: 553: 28: 7683:
Systems such as the DFS IV and DFS V were quaternary floating-point systems and used gain steps of 12 dB.
7554:(NB. This reference incorrectly gives the MANIAC II's floating point base as 256, whereas it actually is 65536.) 4589:. Stability is a measure of the sensitivity to rounding errors of a given numerical procedure; by contrast, the 1670:
had become commonplace. This standard was significantly based on a proposal from Intel, which was designing the
10233: 10038: 9970: 9932: 9655: 9560: 9498: 8121: 7351: 6984: 6911: 5982: 3524: 3499:
Prior to the IEEE standard, such conditions usually caused the program to terminate, or triggered some kind of
3475:
An operation can be legal in principle, but not supported by the specific format, for example, calculating the
3412: 2869: 2754: 2226: 2174: 2148: 2133: 1647: 1280:), without dealing with a specific encoding of the significand. Such a program can evaluate expressions like " 17: 8149: 7703:. Los Alamos, NM, USA: Los Alamos Scientific Laboratory of the University of California. p. 14. LA-2083. 7235:(1962) computer. It is also used in the Digital Field System DFS IV and V high-resolution site survey systems. 6208: 6000: 5964:
To maintain the properties of such carefully constructed numerically stable programs, careful handling by the
2652:
standard formats, other floating-point formats are used, or have been used, in certain domain-specific areas.
5986: 4325: 3545:, set if the rounded (and returned) value is different from the mathematically exact result of the operation. 262: 227: 8493:"Added Grisu3 algorithm support for double.ToString(). by mazong1123 · Pull Request #14646 · dotnet/coreclr" 8320: 8262: 8226:
_fmsbintoieee(float *src4, float *dest4) MS Binary Format byte order => m3 | m2 | m1 | exponent m1 is
4066:(a + b) + c: 1234.567 (a) + 45.67834 (b) ____________ 1280.24534 rounds to 1280.245 10414: 10133: 10063: 9911: 4286:
is problematic: Checking that the divisor is not zero does not guarantee that a division will not overflow.
3870: 3538:
IEEE 754 specifies five arithmetic exceptions that are to be recorded in the status flags ("sticky bits"):
2193: 1667: 591:). However, 1/3 cannot be represented exactly by either binary (0.010101...) or decimal (0.333...), but in 226:
A floating-point system can be used to represent, with a fixed number of digits, numbers of very different
6473: 6193:
Round-off error can affect the convergence and accuracy of iterative numerical procedures. As an example,
3075:(ULP). For example, if there is no representable number lying between the representable numbers 1.45a70c22 2308:
Increasing the precision of the floating-point representation generally reduces the amount of accumulated
540:
Historically, several number bases have been used for representing floating-point numbers, with base two (
10388: 10023: 9833: 9678:(1213 pages) (NB. This is a single-volume edition. This work was also available in a two-volume version.) 8340: 7655:"Chapter 2 - High resolution digital site survey systems - Chapter 2.1 - Digital field recording systems" 5911:
If, however, intermediate computations are all performed in extended precision (e.g. by setting line to
4269: 1403:
will be of order of tenths, the second of hundredths, etc, and one will write each quantity in the form:
9762:(NB. This page gives a very brief summary of floating-point formats that have been used over the years.) 8448:. PLDI '10: ACM SIGPLAN Conference on Programming Language Design and Implementation. pp. 233–243. 8438: 7461: 1200:, 1/10 in decimal floating point), but the possibilities remain limited. Software packages that perform 10011: 9600: 9002: 8737: 7938: 7573:
The Mathematical-Function Computation Handbook - Programming Using the MathCW Portable Software Library
7162: 7134: 4283: 2786: 1460: 549: 295:) is a part of a computer system specially designed to carry out operations on floating-point numbers. 269:
Over the years, a variety of floating-point representations have been used in computers. In 1985, the
80: 9179: 2660:
was developed for the Microsoft BASIC language products, including Microsoft's first ever product the
10321: 10273: 10185: 10163: 10158: 10086: 9906: 8763: 6989: 6187: 4135: 3528: 3329: 2713:. The initial release of July 1975 supported a single-precision (32 bits) format due to cost of the 2340: 2288: 1504: 1160: 8552: 7732: 4596:
As a trivial example, consider a simple expression giving the inner product of (length two) vectors
4309: 2021:− 1 as the value for each digit of the significand and the largest possible value for the exponent. 1530:
Zuse also proposed, but did not complete, carefully rounded floating-point arithmetic that includes
323:(dot or comma) there. If the radix point is not specified, then the string implicitly represents an 10195: 9859: 9536:. Prentice-Hall Series in Automatic Computation (1st ed.). Englewood Cliffs, New Jersey, USA: 8880:(2003-09-08). "Error Analysis". In Ralston, Anthony; Reilly, Edwin D.; Hemmendinger, David (eds.). 8417: 7077: 6963: 6942: 5707: 3404: 3392: 2778: 2300: 1176: 8697: 7982: 4570:{\displaystyle \left|{\frac {\operatorname {fl} (x)-x}{x}}\right|\leq \mathrm {E} _{\text{mach}}.} 3916:
to attempt to compute tan(π/2), because π/2 cannot be represented exactly. This computation in C:
3116:
Alternative rounding options are also available. IEEE 754 specifies the following rounding modes:
2221:(a.k.a. IEC 60559) in 1985. This first standard is followed by almost all modern machines. It was 1553:
and NaN representations, anticipating features of the IEEE Standard by four decades. In contrast,
10409: 10348: 10263: 9837: 9504: 6994: 3852:
exceptions can typically not be ignored, but do not necessarily represent errors: for example, a
3511:" is an alternative flow of control, closer to what is termed a "trap" in IEEE 754 terminology.) 2746: 2657: 2186: 2143: 1622: 1533: 1346: 1334: 1283: 1255: 1217: 9394: 8446:
Proceedings of the 31st ACM SIGPLAN Conference on Programming Language Design and Implementation
8234:
m = mantissa byte s = sign bit b = bit MBF is bias 128 and IEEE is bias 127. MBF places the
6499:
Here is a computation using IEEE "double" (a significand with 53 bits of precision) arithmetic:
10091: 9947: 9901: 9253: 9207: 8412: 8231: 6948: 6932: 6890: 6154: 5719: 5718:
Although individual arithmetic operations of IEEE 754 are guaranteed accurate to within half a
3072: 2230: 2222: 2052: 1745: 1607: 1592: 1577: 1172: 1153: 592: 545: 432: 328: 201: 3567:, set if the result is infinite given finite operands, returning an infinity, either +∞ or −∞. 10081: 10056: 9586: 9494: 9458: 8996: 8918: 8877: 8665: 8227: 7654: 7411: 6176: 4087: 3808: 3773: 3584: 3573:, set if a real-valued result cannot be returned e.g. sqrt(−1) or 0/0, returning a quiet NaN. 2742: 1180: 541: 371: 309: 7697: 95:. For example, 12.345 is a floating-point number in base ten with five digits of precision: 9883: 9485: 9309: 9203: 9175: 9147: 9105: 9075: 9029: 8759: 8376: 8086: 7874: 7317: 7166: 7053: 6855: 6169: 4313: 4301: 4027: 3746: 3532: 3488: 3061: 1908: 1277: 9313: 8860:"Patriot missile defense, Software problem led to system failure at Dharhan, Saudi Arabia" 4275:
Limited exponent range: results might overflow yielding infinity, or underflow yielding a
1368: 1235: 606: 8: 10353: 10331: 10258: 10111: 10103: 9852: 9578: 9468: 8811: 7368: 7347: 7158: 7017: 6916: 6878: 5727: 4340:
is a quantity that characterizes the accuracy of a floating-point system, and is used in
3388: 3137: 2441: 1446: 1211: 552:), base eight (octal floating point), base four (quaternary floating point), base three ( 336: 288: 220: 9151: 9109: 9033: 8380: 7878: 7571:
Beebe, Nelson H. F. (2017-08-22). "Chapter H. Historical floating-point architectures".
5961:
then the algorithm becomes numerically stable and can compute to full double precision.
5746:
For example, the following algorithm is a direct implementation to compute the function
3400: 2761:, but allocates 8 bits to the exponent instead of 5, thus providing the same range as a 10336: 10316: 10268: 10243: 10028: 9997: 9801: 9783: 9651: 9619: 9412: 9079: 9062: 8831:
Far more worrying is cancellation error which can yield catastrophic loss of precision.
8764:"Lecture Notes on the Status of IEEE Standard 754 for Binary Floating-Point Arithmetic" 8684: 8633: 8615: 8533: 8467: 8356: 7851: 7606: 7280: 7232: 6199: 6141:) requires care when dealing with floating-point numbers. Even simple expressions like 5740: 5732: 4619: 4599: 4580: 3853: 3508: 3364: 3334: 2280: 2274: 2264: 2112: 706:. It is used to round the 33-bit approximation to the nearest 24-bit number (there are 8834: 8090: 3459: 3297: = 1.234567 are approximations to the rationals 123457.1467 and 123456.659. 2339:, also called binary16, a 16-bit floating-point value. It is being used in the NVIDIA 1728: 10223: 10153: 10128: 9942: 9937: 9805: 9738: 9728: 9667: 9637: 9604: 9594: 9590: 9582: 9564: 9541: 9512: 9471: 9376: 9006: 8975: 8928: 8891: 8887: 8816: 8714: 8688: 8637: 8537: 8457: 8411:(Technical report). NUMERICAL ANALYSIS MANUSCRIPT 90-10, AT&T BELL LABORATORIES. 7914: 7850:(2014-06-07). "The Z1: Architecture and Algorithms of Konrad Zuse's First Computer". 7783:, p. 545, Digital Computers: Origins, Encyclopedia of Computer Science, January 2003. 7780: 7752: 7666: 7598: 7588: 7496: 7467: 7439: 7429: 7245: 7005: 6922: 6843: 6145:
will, on most computers, fail to be true (in IEEE 754 double precision, for example,
3308: 2940: 2584: 2383:, and they are ordered in the same way as their values (in the set of real numbers). 1358: 1338: 1225: 320: 9820:
contains verilog source code of a double-precision floating-point unit. The project
9377:"55522 – -funsafe-math-optimizations is unexpectedly harmful, especially w/ -shared" 7797: 7610: 3303:
The floating-point difference is computed exactly because the numbers are close—the
1618:
personal computers had floating-point capability in hardware as a standard feature.
10368: 10253: 10051: 9793: 9720: 9435: 9289: 9276:"Adaptive Precision Floating-Point Arithmetic and Fast Robust Geometric Predicates" 8674: 8625: 8523: 8449: 7814: 7580: 7421: 7330: 7294: 6937: 6873:
In some compilers (GCC and Clang), turning on "fast" math may cause the program to
6161: 5723: 4590: 4276: 3500: 3492: 3469: 2766: 2714: 2710: 2498: 2470: 2029: 2024:
In addition, there are representable values strictly between −UFL and UFL. Namely,
1732: 1638:: 72 bits, organized as a 1-bit sign, an 11-bit exponent, and a 60-bit significand. 1554: 1229: 714:
in this example, is added to the integer formed by the leftmost 24 bits, yielding:
355:
seconds, a value that would be represented in standard-form scientific notation as
208:
digits of precision, the sum 12.345 + 1.0001 = 13.3451 might be rounded to 13.345.
48: 8471: 7363:
The equivalence of the two forms can be verified algebraically by noting that the
3248:
The lowest three digits of the second operand (654) are essentially lost. This is
3107:
was the typical approach. Since the introduction of IEEE 754, the default method (
1632:: 36 bits, organized as a 1-bit sign, an 8-bit exponent, and a 27-bit significand. 370:
A signed (meaning positive or negative) digit string of a given length in a given
257: 10373: 10238: 10190: 10123: 9757: 9710: 9508: 9481: 9462: 9440: 8965: 8961: 8881: 8202: 7989: 7910: 7219: 7215: 7211: 7207: 7109: 4346: 4305: 4079:
1234.567 (a) + 45.67874 (b + c) ____________ 1280.24574 rounds to
3252:. In extreme cases, the sum of two non-zero numbers may be equal to one of them: 3249: 3065: 2931: 2309: 1705: 1396: 1354: 564: 281: 8381:"How Futile are Mindless Assessments of Roundoff in Floating-Point Computation?" 4069:
1280.245 (a + b) + 0.0004 (c) ____________ 1280.2454 rounds to
707: 595:, it is trivial (0.1 or 1×3) . The occasions on which infinite expansions occur 284:, especially for applications that involve intensive mathematical calculations. 10326: 10148: 10138: 10046: 9706: 9663: 9633: 8492: 8002: 7407: 7203: 7170: 7150: 7146: 7142: 7138: 4494: 3480: 3464:
Floating-point computation in a computer can run into three kinds of problems:
3304: 2702: 2554: 2437: 2351:
The standard specifies some special values, and their representation: positive
2256:. Three formats are especially widely used in computer hardware and languages: 2138: 1655: 567:, because it can be represented as one integer divided by another; for example 439:, which has ten decimal digits of precision, is represented as the significand 340: 9724: 8586: 7584: 7425: 4579:
Backward error analysis, the theory of which was developed and popularized by
2614:
0 10000000 10010010000111111011011 (excluding the hidden bit) = 40490FDB as a
2594:
For example, it was shown above that π, rounded to 24 bits of precision, has:
2217:
standardized the computer representation for binary floating-point numbers in
10403: 10248: 9537: 8820: 8235: 7902: 7625: 7284: 7190: 6510:, second form --------------------------------------------------------- 0 5990: 4272:
may produce answers which are off by one from the intuitively expected value.
3226:
123456.7 = 1.234567 × 10^5 101.7654 = 1.017654 × 10^2 = 0.001017654 × 10^5
3143: 2738: 2734: 2726: 2364: 1697: 1685: 1654:
mainframes; these same representations are still available for use in modern
1603: 1569: 1454: 1442: 1430: 239: 197: 44: 9797: 9716: 9434:. CAV 2019: Computer Aided Verification. Vol. 11562. pp. 155–173. 8845: 8528: 8511: 8453: 8091:"On the Cost of Floating-Point Computation Without Extra-Precise Arithmetic" 7847: 7793: 7417: 5702:
by definition, which is the sum of two slightly perturbed (on the order of Ε
3379:
Literals for floating-point numbers depend on languages. They typically use
1353:, where he designed a special-purpose electromechanical calculator based on 761:
When this is stored in memory using the IEEE 754 encoding, this becomes the
10205: 10180: 9683: 8661:"What Every Computer Scientist Should Know About Floating-Point Arithmetic" 8170: 4321: 3170:
Ryū, an always-succeeding algorithm that is faster and simpler than Grisu3.
2661: 1701: 1587:
has binary floating-point arithmetic, and it became operational in 1950 at
1572:
computer, designed in 1942–1945. In 1946, Bell Laboratories introduced the
1325: 9430:
Becker, Heiko; Darulova, Eva; Myreen, Magnus O.; Tatlock, Zachary (2019).
8859: 8806: 8679: 8660: 6202:). Two forms of the recurrence formula for the circumscribed polygon are: 3968:
will give a result of 16331239353195370.0. In single precision (using the
2367:(−0) distinct from ordinary ("positive") zero, and "not a number" values ( 10383: 10378: 10228: 10175: 10002: 9337: 8239: 7364: 7343: 7186: 7182: 7080:) is ambiguous, as it was historically also used to specify some form of 7044: 6927: 6165: 5366:{\displaystyle \operatorname {fl} (x\cdot y)={\hat {x}}\cdot {\hat {y}},} 3987: 3476: 2677: 2615: 2292: 2025: 1558: 1438: 1426: 1221: 1089:
is the position of the bit of the significand from the left (starting at
762: 684:{\displaystyle 11001001\ 00001111\ 1101101{\underline {0}}\ 10100010\ 0.} 380: 316: 292: 250: 216: 84: 76: 9782:(ACM) Transactions on programming languages and systems (TOPLAS): 1–41. 9358: 9346:
We support floating point reduction operations when -ffast-math is used.
8606:
Lemire, Daniel (2021-03-22). "Number parsing at a gigabyte per second".
5706:) input data, and so is backward stable. For more realistic examples in 3577: 3198:
with 7 digit precision will be used in the examples, as in the IEEE 754
312:
specifies some way of encoding a number, usually as a string of digits.
10288: 10283: 10200: 10168: 10073: 10016: 9767: 9294: 9275: 8566: 8321:"TensorFloat-32 in the A100 GPU Accelerates AI Training, HPC up to 20x" 6194: 6175:
Summation of a vector of floating-point values is a basic algorithm in
4139: 4076:
a + (b + c): 45.67834 (b) + 0.0004 (c) ____________ 45.67874
3408: 3319: = 4.877000, which differs more than 20% from the difference 3104: 2706: 2698: 2693: 1891:{\displaystyle 2\left(B-1\right)\left(B^{P-1}\right)\left(U-L+1\right)} 1717: 1671: 1651: 1412: 72: 9432:
Icing: Supporting Fast-Math Style Optimizations in a Verified Compiler
9045:, Symposium on Computer Arithmetic (Keynote Address). pp. 6, 18. 8439:"Printing floating-point numbers quickly and accurately with integers" 7818: 3311:
is supported. Despite this, the difference of the original numbers is
3011:
11.0010010000111111011010101000100010000101101000110000100011010011...
276:
The speed of floating-point operations, commonly measured in terms of
39: 10363: 10341: 10298: 10293: 9960: 9916: 9875: 9824:
contains vhdl source code of a single-precision floating-point unit.)
8629: 8291: 7662: 7488: 7372: 7258: 7057: 7000: 6859: 3281:
to two nearly equal numbers are subtracted. In the following example
2128: 1681: 1584: 1422: 348: 60: 8125: 6919:
for code examples demonstrating access and use of IEEE 754 features.
2930:
By their nature, all numbers expressed in floating-point format are
2777:, which provides hardware support for it in the Tensor Cores of its 2745:). All Microsoft language products from 1975 through 1987 used the 1625:, introduced in 1962, supported two floating-point representations: 10278: 9788: 9423: 9231: 9042: 8620: 8361: 8021:"Procedure Call Standard for the ARM 64-bit Architecture (AArch64)" 7095: 6978: 6968: 5965: 4344:
of floating-point algorithms. It is also known as unit roundoff or
3396: 3109: 3096: 3024: 2718: 2689: 2649: 2352: 2238: 2234: 2218: 2059: 2041: 1689: 1663: 1643: 1614:(XSC)). It was not until the launch of the Intel i486 in 1989 that 1201: 401: 270: 88: 9742: 8424: 7856: 7602: 7443: 5743:
are designed for this purpose when computing at double precision.
4470:{\displaystyle \mathrm {E} _{\text{mach}}={\tfrac {1}{2}}B^{1-P},} 3736:{\displaystyle R_{\text{tot}}=1/(1/R_{1}+1/R_{2}+\cdots +1/R_{n})} 3236:
e=5; s=1.234567 (123456.7) + e=2; s=1.017654 (101.7654)
3177:
Many modern language runtimes use Grisu3 with a Dragon4 fallback.
3157:, a practical open-source implementation of many ideas in Dragon4. 3129:
round down (toward −∞; negative results thus round away from zero)
3030:
In binary single-precision floating-point, this is represented as
2722: 1433:
computer, which uses a 22-bit binary floating-point representation
9490:(NB. Classic influential treatises on floating-point arithmetic.) 8711:
Computer Organization and Design, The Hardware/Software Interface
8693:(With the addendum "Differences Among IEEE 754 Implementations": 8258: 8210: 8206: 6885: 4317: 3900:
0.010000000298023226097399174250313080847263336181640625 exactly.
3860:
exception flag to be ignored until finding a useful start point.
3504: 2925: 2312:
caused by intermediate calculations. Other IEEE formats include:
1599: 1573: 344: 324: 9704: 9577: 7121:) is ambiguous, as it was historically also used to specify the 4332: 4324:, contributing to the death of 28 soldiers from the U.S. Army's 3921:/* Enough digits to be sure we get the correct approximation. */ 3049:
whereas a more accurate approximation of the true value of π is
2629: 1337:, in 1914, published an analysis of floating point based on the 8591: 8571: 8409:
Correctly Rounded Binary-Decimal and Decimal-Binary Conversions
8353: 7493:
The Scientist and Engineer's Guide to Digital Signal Processing
5978: 3164:. Must be used with a fallback, as it fails for ~0.5% of cases. 3161: 2774: 2685: 2681: 2673: 2665: 2035: 1330: 1205: 52: 8284:"IEEE vs. Microsoft Binary Format; Rounding Issues (Complete)" 3551:, set if the rounded value is tiny (as specified in IEEE 754) 3468:
An operation can be mathematically undefined, such as ∞/∞, or
397:
point is set just after the most significant (leftmost) digit.
9844: 8792:
Intel 64 and IA-32 Architectures Software Developers' Manuals
7231:
Quaternary (base-4) floating-point arithmetic is used in the
6958: 6410:{\textstyle t_{i+1}={\frac {t_{i}}{{\sqrt {t_{i}^{2}+1}}+1}}} 6325:{\textstyle t_{i+1}={\frac {{\sqrt {t_{i}^{2}+1}}-1}{t_{i}}}} 5713: 3274:
bit need to be carried beyond the precision of the operands.
3203: 3195: 3126:
round up (toward +∞; negative results thus round toward zero)
2730: 2158: 1675: 1450: 375: 277: 9208:"Lecture notes of System Support for Scientific Computation" 7099:
of a floating-point number is sometimes also referred to as
7072:
by some authors is potentially misleading as well. The term
3000:
which is actually 0.100000001490116119384765625 in decimal.
752:{\displaystyle 11001001\ 00001111\ 1101101{\underline {1}}.} 525:
is the precision (the number of digits in the significand),
47:, included floating-point arithmetic (replica on display at 9896: 9429: 9252:
Christiansen, Tom; Torkington, Nathan; et al. (2006).
7761: 7735:, pp. 575–583, Revista de Obras Públicas, 19 November 1914. 7733:
Automática: Complemento de la Teoría de las Máquinas, (pdf)
7288: 4793: indicates correctly rounded floating-point arithmetic 4359:, its value depends on the particular rounding being used. 3007:, represented in binary as an infinite sequence of bits is 2669: 1901:
There is a smallest positive normal floating-point number,
9035:
Floating-Point Arithmetic Besieged by "Business Decisions"
7983:
Using the GNU Compiler Collection, i386 and x86-64 Options
7350:
then full double precision is retained; if long double is
5693:{\displaystyle \delta _{n}\leq \mathrm {E} _{\text{mach}}} 3986:
While floating-point addition and multiplication are both
3144:
Binary-to-decimal conversion with minimal number of digits
548:), and other less common varieties, such as base sixteen ( 9891: 9813: 8052:"ARM Compiler toolchain Compiler Reference, Version 5.03" 7798:"Konrad Zuse's Legacy: The Architecture of the Z1 and Z3" 6127:{\displaystyle \sin ^{2}{\theta }+\cos ^{2}{\theta }=1\,} 5912: 3838:
of 0, as expected (see the continued fraction example of
3520: 2368: 2296: 2284: 2006:{\displaystyle \left(1-B^{-P}\right)\left(B^{U+1}\right)} 1761:
The number of normal floating-point numbers in a system (
1674:
numerical coprocessor; Motorola, which was designing the
1144: 630: 521:
is the significand (ignoring any implied decimal point),
8150:"Technical Introduction to OpenEXR – The half Data Type" 7202:
Octal (base-8) floating-point arithmetic is used in the
4504:
within the normalized range of a floating-point system:
3903:
Squaring it with rounding to the 24-bit precision gives
3479:
of −1 or the inverse sine of 2 (both of which result in
3403:
with a base-2 exponent instead of 10. In languages like
3160:
Grisu3, with a 4× speedup as it removes the use of
1557:
recommended against floating-point numbers for the 1951
698:
above. The next bit, at position 24, is called the
9768:"The pitfalls of verifying floating-point computations" 9142: 9140: 9138: 9136: 8702: 3581:
Fig. 1: resistances in parallel, with total resistance
3202:
format. The fundamental principles are the same in any
3004: 2935: 2643: 634: 9251: 9180:"Why do we need a floating-point arithmetic standard?" 8852: 8847:
PEP 485 -- A Function for testing approximate equality
8579: 8276: 8195:"Converting between Microsoft Binary and IEEE formats" 7466:. Englewood Cliffs, NJ, United States: Prentice-Hall. 6340: 6255: 6211: 5989:
programming languages, and the decimal formats of the
4437: 1678:
around the same time, gave significant input as well.
416:
To derive the value of the floating-point number, the
43:
An early electromechanical programmable computer, the
9772:
ACM Transactions on Programming Languages and Systems
9413:"Bug in zheevd · Issue #43 · Reference-LAPACK/lapack" 9302: 9110:"How Java's floating-point hurts everyone everywhere" 9024: 9022: 8259:"Create your own Version of Microsoft BASIC for 6502" 7145:(1970) as well as various newer IBM machines, in the 6945:– utilizes high precision floating-point computations 6849: 6476: 6424: 6077: 6003: 5664: 5383: 5308: 4642: 4622: 4602: 4510: 4420: 4407:{\displaystyle \mathrm {E} _{\text{mach}}=B^{1-P},\,} 4368: 4149: 3811: 3776: 3749: 3639: 3587: 2606:= 110010010000111111011011 (including the hidden bit) 1949: 1911: 1816: 1561:, arguing that fixed-point arithmetic is preferable. 1536: 1507: 1463: 1371: 1286: 1258: 1238: 774: 720: 643: 609: 464: 103: 27:"Floating point" redirects here. For other uses, see 9133: 8988: 8754: 8752: 7455: 7453: 7354:
then additional, but not full precision is retained.
7283:
once led to a famous error. An early version of the
7257:
Base-65536 floating-point arithmetic is used in the
3277:
Another problem of loss of significance occurs when
3194:
For ease of presentation and understanding, decimal
2757:
requires the same amount of memory (16 bits) as the
9596:
Numerical Recipes - The Art of Scientific Computing
9503:. Monographs on Numerical Analysis (1st ed.). 9467:(1st ed.). Englewood Cliffs, New Jersey, USA: 9098: 8369: 8188: 8186: 6858:of floating-point operations in general means that 2273:(binary64), usually used to represent the "double" 1591:. Thirty-three were later sold commercially as the 508:{\displaystyle {\frac {s}{b^{\,p-1}}}\times b^{e},} 9830:"Microsoft Visual C++ Floating-Point Optimization" 9622:(1997). "Section 4.2: Floating-Point Arithmetic". 9168: 9019: 8998:Writing Scientific Software: A Guide to Good Style 8252: 8250: 7869: 7867: 7244:Base-256 floating-point arithmetic is used in the 7052:by some authors—not to be confused with the 6488: 6463:{\displaystyle \pi \sim 6\times 2^{i}\times t_{i}} 6462: 6409: 6324: 6239: 6126: 6063: 5692: 5646: 5365: 5294: 4628: 4608: 4569: 4469: 4406: 4212: 3873:with which computers generally represent numbers. 3830: 3797: 3762: 3735: 3606: 3411:), or allow overloading of numeric types (such as 2263:(binary32), usually used to represent the "float" 2005: 1924: 1890: 1739: 1545: 1519: 1493: 1383: 1307: 1268: 1244: 1067: 751: 710:, which is not the case here). This bit, which is 683: 621: 507: 185: 9650: 9245: 8995:Oliveira, Suely; Stewart, David E. (2006-09-07). 8870: 8749: 8708: 8347: 8163: 8044: 7939:"An Interview with the Old Man of Floating-Point" 7907:The Origins of Digital Computers: Selected Papers 7450: 3453: 2947:to be rounded to either .55555555 or .55555556). 1568:computer with floating-point hardware was Zuse's 1113:. For binary formats (which uses only the digits 156: 155: 154: 153: 152: 151: 150: 132: 128: 110: 91:of a fixed base. Numbers of this form are called 10401: 9224: 9068: 8920:Accuracy and reliability in scientific computing 8651: 8649: 8647: 8183: 8013: 7527:"Rechnerarithmetik: Fest- und Gleitkommasysteme" 7520: 7518: 4060:. Using 7-digit significand decimal arithmetic: 3909:But the representable number closest to 0.01 is 3307:guarantees this, even in case of underflow when 1501:, and it stops on undefined operations, such as 529:is the base (in our example, this is the number 8994: 8709:Patterson, David A.; Hennessy, John L. (2014). 8247: 7864: 6981:– Standard for Binary Floating-Point Arithmetic 4279:or zero. In these cases precision will be lost. 3877:decimal number 0.1 is represented in binary as 3180: 2926:Representable numbers, conversion and rounding 2233:in addition to the IEEE 754 binary format. The 1189:, which does not appear to be used in practice. 544:) being the most common, followed by base ten ( 392:. The length of the significand determines the 8972:Society for Industrial and Applied Mathematics 8967:Accuracy and Stability of Numerical Algorithms 8956: 8954: 8952: 8950: 8925:Society for Industrial and Applied Mathematics 8910: 8732: 8730: 8314: 8312: 7927: 7895: 7646: 5710:, see Higham 2002 and other references below. 4213:{\displaystyle Q(h)={\frac {f(a+h)-f(a)}{h}}.} 3805:will return +infinity which will give a final 3344: 1696:In 1989, mathematician and computer scientist 412:), which modifies the magnitude of the number. 9860: 9660:Computer Architecture: Concepts and Evolution 9555:Golub, Gene F.; van Loan, Charles F. (1986). 9554: 9395:"Code Gen Options (The GNU Fortran Compiler)" 9196: 8644: 8333: 8192: 7953:ISO/IEC 9899:1999 - Programming languages - C 7515: 7401: 7399: 7397: 7395: 7393: 7391: 7068:are also used by some. The usage of the term 7036: 5099: 4987: 4926: 4814: 4763: 4683: 4485:is the precision of the significand (in base 4333:Machine precision and backward error analysis 4142:of a function the following formula is used: 2194: 2036:IEEE 754: floating point in modern computers 1790:is the precision of the significand (in base 1602:followed in 1954; it introduced the use of a 249:Single-precision floating-point numbers on a 8430: 8142: 7781:Digital Computers, History of Origins, (pdf) 7689: 7566: 7564: 7562: 7560: 3189: 3038: = 1. This has a decimal value of 3034: = 1.10010010000111111011011 with 2327:format, they allow correct decimal rounding. 253:: the green lines mark representable values. 9330: 9314:"Roundoff Degrades an Idealized Cantilever" 8947: 8727: 8599: 8402: 8400: 8309: 8003:"long double (GCC specific) and __float128" 3972:function), the result will be −22877332.0. 3519:some programming language standards (e.g., 3217: 2227:IBM's own hexadecimal floating point format 629:, and so the significand is a string of 24 378:). This digit string is referred to as the 9867: 9853: 9308: 9104: 8503: 8294:. 2006-11-21. Article ID KB35826, Q35826. 8079: 7840: 7786: 7617: 7388: 7367:of the fraction in the second form is the 7087: 7048:of a floating-point number is also called 6182:3253.671 + 3.141276 ----------- 3256.812 5714:Minimizing the effect of accuracy problems 3245:e=5; s=1.235585 (final sum: 123558.5) 2386: 2201: 2187: 1939:There is a largest floating-point number, 1648:hexadecimal floating-point representations 9787: 9493: 9457: 9439: 9293: 8916: 8876: 8678: 8619: 8544: 8527: 8416: 8360: 7995: 7933: 7855: 7652: 7575:(1st ed.). Salt Lake City, UT, USA: 7557: 7524: 7495:. California Technical Pub. p. 514. 7489:"Chapter 28, Fixed versus Floating Point" 7480: 7012:Quadruple-precision floating-point format 6240:{\textstyle t_{0}={\frac {1}{\sqrt {3}}}} 6123: 6060: 4498:in representing any non-zero real number 4403: 3912:0.009999999776482582092285156250 exactly. 3906:0.010000000707805156707763671875 exactly. 3422:Examples of floating-point literals are: 3415:). In these cases, digit strings such as 1232:can often handle irrational numbers like 1208:" arithmetic for the individual integers. 475: 303: 9765: 9531: 9273: 9267: 8655: 8397: 7459: 7410:; Stehlé, Damien; Torres, Serge (2010). 6064:{\displaystyle (x+y)(x-y)=x^{2}-y^{2}\,} 3621:returns a correctly rounded result, and 3576: 2970:= 1100110011001100110011001100110011..., 1744:A floating-point number consists of two 1680: 1421: 1329: 1179:and Peter Turner is a scheme based on a 597:depend on the base and its prime factors 256: 244: 38: 9827: 9369: 8436: 8114: 7901: 7806:IEEE Annals of the History of Computing 7767: 7695: 7405: 4063:a = 1234.567, b = 45.67834, c = 0.0004 2241:still uses Cray floating-point format. 1800:is the smallest exponent of the system, 34:Computer approximation for real numbers 14: 10402: 9681: 9615:(NB. Edition with source code CD-ROM.) 9464:Rounding Errors in Algebraic Processes 9405: 8974:(SIAM). pp. 27–28, 110–123, 493. 8960: 8838: 8804: 8798: 8605: 8512:"Ryū: fast float-to-string conversion" 8485: 8318: 7623: 7023:Single-precision floating-point format 6834:62246 The true value is 6828:62246 28 6822:62246 27 6816:62246 26 6810:68907 25 4492:This is important since it bounds the 4233:grows smaller, the difference between 3894:0.100000001490116119384765625 exactly. 3367:). For a fast, simple method, see the 3338: 3003:As a further example, the real number 1806:is the largest exponent of the system, 1145:Alternatives to floating-point numbers 1121:), this non-zero digit is necessarily 280:, is an important characteristic of a 9848: 9712:Handbook of Floating-Point Arithmetic 9618: 9281:Discrete & Computational Geometry 9202: 9174: 9146: 9074: 9028: 8780: 8758: 8559: 8553:"The Schubfach way to render doubles" 8550: 8509: 8375: 8341:"NVIDIA Hopper Architecture In-Depth" 8256: 8085: 7873: 7846: 7792: 7626:"The Decimal Floating-Point Standard" 7570: 7486: 7413:Handbook of Floating-Point Arithmetic 7251: 7238: 7225: 3889: = 110011001100110011001101 3419:may also be floating-point literals. 2985:When rounded to 24 bits this becomes 1711:Among the x86 innovations are these: 1666:standard once the 32-bit (or 64-bit) 1612:Extensions for Scientific Computation 1187:Tapered floating-point representation 261:Augmented version above showing both 215:refers to the fact that the number's 7970:"IEEE Floating-Point Representation" 7962: 7577:Springer International Publishing AG 7357: 7336: 7323: 7311: 7301: 7273: 7264: 7196: 7135:Hexadecimal (base-16) floating-point 6974:Half-precision floating-point format 6489:{\displaystyle i\rightarrow \infty } 3863: 3255:e=5; s=1.234567 + e=−3; s=9.876543 2644:Other notable floating-point formats 2237:series had an IEEE version, but the 435:notation) as an example, the number 319:is indicated by placing an explicit 9780:Association for Computing Machinery 9387: 9351: 9274:Shewchuk, Jonathan Richard (1997). 9206:(2001-06-04). Bindel, David (ed.). 8406: 7976: 7945: 7534:Friedrich-Schiller-Universität Jena 7525:Zehendner, Eberhard (Summer 2008). 7128: 7122: 3743:. If a short-circuit develops with 458:Symbolically, this final value is: 83:with a fixed precision, called the 24: 9758:"Survey of Floating-Point Formats" 9451: 7909:(3rd ed.). Berlin; New York: 7279:The enormous complexity of modern 6837:3.14159265358979323846264338327... 6483: 6190:may be used to reduce the errors. 5680: 4957: 4554: 4423: 4371: 3979:10 in double precision, or −0.8742 3460:IEEE 754 § Exception handling 3054:3.14159265358979323846264338327950 2628: 1540: 1514: 1480: 1395:will always be the same number of 1175:(LI and SLI) of Charles Clenshaw, 339:, the given number is scaled by a 25: 10426: 9750: 9312:; Ivory, Melody Y. (1997-07-03). 9254:"perlfaq4 / Why is int() broken?" 8608:Software: Practice and Experience 7696:Lazarus, Roger B. (1957-01-30) . 3933:3.1415926535897932384626433832795 3374: 3086: 2591:format has 53, and quad has 113. 1494:{\displaystyle ^{1}/_{\infty }=0} 1445:, the first binary, programmable 708:specific rules for halfway values 9500:The Algebraic Eigenvalue Problem 9326:from the original on 2003-12-05. 9220:from the original on 2013-05-17. 9192:from the original on 2004-12-04. 9164:from the original on 2003-08-15. 9094:from the original on 2013-06-20. 8883:Encyclopedia of Computer Science 8807:"You're Going To Have To Think!" 8805:Harris, Richard (October 2010). 8776:from the original on 2002-06-22. 8481:from the original on 2014-07-29. 8393:from the original on 2004-12-21. 7891:from the original on 2008-09-05. 7081: 7060:. Somewhat vague, terms such as 6864:common subexpression elimination 3361:Booth's multiplication algorithm 2709:, during spring of 1975 for the 1752:On a typical computer system, a 1589:National Physical Laboratory, UK 9694:from the original on 2018-07-03 9656:Brooks, Jr., Frederick Phillips 9625:The Art of Computer Programming 9122:from the original on 2000-08-16 9052:from the original on 2006-03-17 8864:US Government Accounting Office 8298:from the original on 2020-08-28 8265:from the original on 2016-05-30 8217:from the original on 2019-02-20 8103:from the original on 2006-05-25 8068:from the original on 2015-06-27 8033:from the original on 2013-07-31 7828:from the original on 2022-07-03 7773: 7738: 7725: 7710:from the original on 2018-08-07 7653:Parkinson, Roger (2000-12-07). 7636:from the original on 2018-07-03 7543:from the original on 2018-08-07 6985:IBM Floating Point Architecture 6954:Floating-point error mitigation 5915: 3454:Dealing with exceptional cases 2648:In addition to the widely used 2149:IBM floating-point architecture 1740:Range of floating-point numbers 1520:{\displaystyle 0\times \infty } 1399:(e.g. six), the first digit of 554:balanced ternary floating point 230:— such as the number of meters 29:Floating point (disambiguation) 9874: 9561:Johns Hopkins University Press 9152:"Marketing versus Mathematics" 8199:Technical Information Database 7532:(Lecture script) (in German). 7352:IEEE double extended precision 6912:Arbitrary-precision arithmetic 6480: 6151:if (abs(x-y) < epsilon) ... 6137:The use of the equality test ( 6031: 6019: 6016: 6004: 5634: 5615: 5586: 5571: 5552: 5523: 5506: 5487: 5458: 5443: 5424: 5395: 5354: 5339: 5327: 5315: 5282: 5263: 5260: 5241: 5238: 5212: 5206: 5187: 5184: 5165: 5162: 5136: 5123: 5104: 5094: 5075: 5072: 5046: 5040: 5021: 5018: 4992: 4921: 4902: 4899: 4873: 4867: 4848: 4845: 4819: 4788: 4785: 4758: 4732: 4720: 4694: 4665: 4653: 4530: 4524: 4481:is the base of the system and 4310:prevented it from intercepting 4221:Intuitively one would want an 4198: 4192: 4183: 4171: 4159: 4153: 4086:They are also not necessarily 3730: 3661: 3531:specifies that the flags have 3527:outside of the standard (e.g. 3110:round to nearest, ties to even 2759:IEEE 754 half-precision format 1578:decimal floating-point numbers 1417:Electromechanical Arithmometer 1345:In 1914, the Spanish engineer 1302: 1293: 1276:in a completely "formal" way ( 1192:Some simple rational numbers ( 13: 1: 9933:Arbitrary-precision or bignum 8319:Kharya, Paresh (2020-05-14). 8257:Steil, Michael (2008-10-20). 8193:Borland staff (1998-07-02) . 7382: 4414:whereas rounding to nearest, 4326:14th Quartermaster Detachment 2658:Microsoft Binary Format (MBF) 1688:, principal architect of the 563:A floating-point number is a 556:) and even base 256 and base 9766:Monniaux, David (May 2008). 9682:Savard, John J. G. (2018) , 9441:10.1007/978-3-030-25543-5_10 9338:"Auto-Vectorization in LLVM" 9232:"General Decimal Arithmetic" 7659:High Resolution Site Surveys 7624:Savard, John J. G. (2018) , 6172:, if they are to work well. 5737:IEEE 754 quadruple precision 4294: 4045:is not necessarily equal to 4026:), they are not necessarily 3210:denotes the significand and 3181:Decimal-to-binary conversion 1610:" (SC) capability (see also 633:. For instance, the number 431:Using base-10 (the familiar 7: 9834:Microsoft Developer Network 8230:=> sbbb|bbbb m3 is the 7955:. Iso.org. §F.2, note 307. 7751:Springer, pp. 84–85, 2017. 6904: 6854:The aforementioned lack of 4270:Floor and ceiling functions 3897:Squaring this number gives 3345:Multiplication and division 3027:to a precision of 24 bits. 2996:= 110011001100110011001101, 2808:Trailing significand field 2622:An example of a layout for 2254:extendable precision format 2026:positive and negative zeros 1749:wider range to the number. 1546:{\displaystyle \pm \infty } 1308:{\displaystyle \sin(3\pi )} 1269:{\displaystyle {\sqrt {3}}} 424:raised to the power of the 298: 75:that represents subsets of 10: 10431: 9601:Cambridge University Press 9534:Floating-Point Computation 9003:Cambridge University Press 8587:"google/double-conversion" 7731:Torres Quevedo, Leonardo. 7463:Floating-Point Computation 7163:Data General Eclipse S/200 7137:arithmetic is used in the 7125:of floating-point numbers. 7084:of floating-point numbers. 6804:.2245152435345525443 6534:.2153903091734723496 2 6528:.2153903091734710173 6522:.4641016151377543863 1 6516:.4641016151377543863 3457: 3401:hexadecimal literal syntax 3289: = 1.234571 and 2379:and strictly greater than 2250:extended precision formats 2039: 1784:is the base of the system, 1323: 1319: 1161:Logarithmic number systems 550:hexadecimal floating point 291:(FPU, colloquially a math 236:between protons in an atom 26: 10307: 10274:Strongly typed identifier 10216: 10102: 10072: 10037: 9925: 9882: 9725:10.1007/978-3-319-76526-6 9709:; Torres, Serge (2018) . 9532:Sterbenz, Pat H. (1974). 9108:; Darcy, Joseph (2001) . 8866:. GAO report IMTEC 92-26. 8736: 8510:Adams, Ulf (2018-12-02). 8437:Loitsch, Florian (2010). 8124:. openEXR. Archived from 7585:10.1007/978-3-319-64110-2 7487:Smith, Steven W. (1997). 7460:Sterbenz, Pat H. (1974). 7426:10.1007/978-0-8176-4705-6 7014:(including double-double) 6990:Kahan summation algorithm 6897:optimizations is seen in 6506:, first form 6 × 2 × t 6188:Kahan summation algorithm 3840:IEEE 754 design rationale 3190:Floating-point operations 3019:11.0010010000111111011011 2763:IEEE 754 single-precision 2415: 2410: 2405: 2403: 2400: 2397: 2225:. IBM mainframes support 1415:, as was the case of his 404:(also referred to as the 65:floating-point arithmetic 9684:"Floating-Point Formats" 9630:Seminumerical Algorithms 7287:chip was shipped with a 7029: 6943:Experimental mathematics 6875:disable subnormal floats 6850:"Fast math" optimization 6780:349453756585929919 6546:596599420975006733 3 6540:596599420974940120 5920: 5767: 5708:numerical linear algebra 3983:10 in single precision. 3918: 3491:(exponent too small) or 3218:Addition and subtraction 2248:, and others are termed 1905:Underflow level = UFL = 1441:of Berlin completed the 1218:Computer algebra systems 10349:Parametric polymorphism 9798:10.1145/1353445.1353446 9505:Oxford University Press 9157:. pp. 15, 35, 47. 8529:10.1145/3296979.3192369 8454:10.1145/1806596.1806623 7119:excess n representation 7008:for constant resolution 6995:Microsoft Binary Format 6901:, a verified compiler. 6792:00068646912273617 6768:00068646912273617 6756:05434924008406305 6570:27145996453689225 5 6564:27145996453136334 6558:60862151314352708 4 6552:60862151314012979 4362:With rounding to zero, 4342:backward error analysis 4300:On 25 February 1991, a 4105:may not be the same as 3831:{\displaystyle R_{tot}} 3798:{\displaystyle 1/R_{1}} 3607:{\displaystyle R_{tot}} 3399:standard also define a 2978:is the significand and 2747:Microsoft Binary Format 2387:Internal representation 2144:Microsoft Binary Format 1943:Overflow level = OFL = 1692:floating-point standard 1623:UNIVAC 1100/2200 series 1347:Leonardo Torres Quevedo 1335:Leonardo Torres Quevedo 1326:IEEE 754 § History 1131:implicit bit convention 265:of representable values 87:, scaled by an integer 9828:Fleegal, Eric (2004). 9587:Vetterling, William T. 9495:Wilkinson, James Hardy 9459:Wilkinson, James Hardy 8927:(SIAM). pp. 50–. 8917:Einarsson, Bo (2005). 8878:Wilkinson, James Hardy 8738:US patent 3037701A 8407:Gay, David M. (1990). 8232:least significant byte 7248:computer (since 1958). 6949:Fixed-point arithmetic 6933:Decimal floating point 6893:is a notable outlier. 6891:Intel Fortran Compiler 6744:4061547378810956 6606:6101766046906629 8 6600:6101765997805905 6594:6627470568494473 7 6588:6627470548084133 6582:8730499798241950 6 6576:8730499801259536 6490: 6464: 6411: 6326: 6241: 6155:computational geometry 6128: 6065: 5694: 5648: 5367: 5296: 4630: 4610: 4571: 4471: 4408: 4214: 3842:for another example). 3832: 3799: 3764: 3737: 3614: 3608: 3393:C programming language 3214:denotes the exponent. 3073:unit in the last place 3064:and is limited by the 2974:where, as previously, 2941:non-terminating digits 2737:(MITS Altair 680) and 2633: 2527:x86 extended precision 2359:), negative infinity ( 2275:type in the C language 2265:type in the C language 2231:decimal floating point 2007: 1926: 1892: 1729:exceptional conditions 1693: 1608:scientific computation 1593:English Electric DEUCE 1547: 1521: 1495: 1434: 1385: 1342: 1309: 1270: 1246: 1173:level-index arithmetic 1139:assumed bit convention 1127:leading bit convention 1069: 811: 753: 685: 637:'s first 33 bits are: 623: 546:decimal floating point 509: 304:Floating-point numbers 266: 254: 202:decimal floating point 187: 93:floating-point numbers 56: 9342:LLVM 13 documentation 9310:Kahan, William Morton 9204:Kahan, William Morton 9176:Kahan, William Morton 9148:Kahan, William Morton 9106:Kahan, William Morton 9076:Kahan, William Morton 9030:Kahan, William Morton 8962:Higham, Nicholas John 8760:Kahan, William Morton 8680:10.1145/103162.103163 8666:ACM Computing Surveys 8551:Giulietti, Rafaello. 8377:Kahan, William Morton 8228:most significant byte 8087:Kahan, William Morton 7875:Kahan, William Morton 7747:Numbers and Computers 6964:Gal's accurate tables 6720:810075796233302 6491: 6465: 6412: 6327: 6242: 6129: 6066: 5731:mathematics known as 5695: 5649: 5368: 5297: 4631: 4611: 4572: 4472: 4409: 4215: 3833: 3800: 3765: 3763:{\displaystyle R_{1}} 3738: 3609: 3580: 3458:Further information: 3023:when approximated by 2743:TRS-80 Color Computer 2632: 2624:32-bit floating point 2008: 1927: 1925:{\displaystyle B^{L}} 1893: 1700:was honored with the 1684: 1548: 1522: 1496: 1425: 1386: 1333: 1310: 1271: 1247: 1181:generalized logarithm 1135:hidden bit convention 1070: 785: 754: 686: 624: 575:is (145/100)×1000 or 510: 420:is multiplied by the 310:number representation 260: 248: 188: 42: 9636:. pp. 214–264. 9579:Press, William Henry 8890:. pp. 669–674. 8844:Christopher Barker: 8057:. 2013. Section 6.3 7770:, pp. 6, 11–13. 7289:division instruction 7167:Gould Powernode 9080 6696:19358822321783 6630:37487713536668 10 6624:37488171150615 6618:70343215275928 9 6612:70343230776862 6474: 6422: 6338: 6253: 6209: 6177:scientific computing 6170:iterative refinement 6075: 6001: 5728:numerically unstable 5662: 5381: 5306: 4640: 4620: 4600: 4508: 4418: 4366: 4302:loss of significance 4147: 3809: 3774: 3747: 3637: 3585: 3533:thread-local storage 3062:discretization error 1947: 1909: 1814: 1576:, which implemented 1534: 1505: 1461: 1384:{\displaystyle ^{m}} 1369: 1351:Essays on Automatics 1284: 1278:symbolic computation 1256: 1245:{\displaystyle \pi } 1236: 772: 718: 641: 622:{\displaystyle p=24} 607: 462: 101: 10415:Computer arithmetic 10354:Primitive data type 10259:Recursive data type 10112:Algebraic data type 9988:Quadruple precision 9652:Blaauw, Gerrit Anne 9620:Knuth, Donald Ervin 9557:Matrix Computations 9469:Prentice-Hall, Inc. 9359:"FloatingPointMath" 8567:"abolz/Drachennest" 8516:ACM SIGPLAN Notices 8171:"IEEE-754 Analysis" 7796:(April–June 1997). 7744:Ronald T. Kneusel. 7348:IEEE quad precision 7281:division algorithms 7159:Illinois ILLIAC III 7018:Significant figures 6654:7220386148377 12 6648:7256228504127 6642:9273850979885 11 6636:9278733740748 6389: 6294: 3389:scientific notation 3138:interval arithmetic 2795: 2640:layout is similar. 2331:Quadruple precision 2175:Arbitrary precision 1447:mechanical computer 1429:, architect of the 1212:Interval arithmetic 1202:rational arithmetic 337:scientific notation 289:floating-point unit 228:orders of magnitude 221:scientific notation 200:, though base ten ( 10317:Abstract data type 9998:Extended precision 9957:Reduced precision 9591:Flannery, Brian P. 9583:Teukolsky, Saul A. 9295:10.1007/PL00009321 9256:. perldoc.perl.org 7988:2015-01-16 at the 7935:Severance, Charles 7233:Illinois ILLIAC II 6844:significant digits 6732:717412858693 6708:717412858693 6684:717412858693 6672:189011456060 6666:707019992125 13 6660:717412858693 6486: 6460: 6407: 6375: 6322: 6280: 6237: 6200:numerical analysis 6124: 6061: 5741:extended precision 5733:numerical analysis 5690: 5644: 5642: 5363: 5292: 5290: 4626: 4606: 4581:James H. Wilkinson 4567: 4467: 4446: 4404: 4350:. Usually denoted 4210: 3828: 3795: 3760: 3733: 3615: 3604: 3365:Division algorithm 3335:numerical analysis 3270:bit and one extra 2793: 2634: 2229:and IEEE 754-2008 2159:G.711 8-bit floats 2113:Extended precision 2003: 1922: 1888: 1694: 1598:The mass-produced 1543: 1517: 1491: 1435: 1381: 1343: 1305: 1266: 1242: 1105:in this example). 1085:in this example), 1081:is the precision ( 1065: 1063: 749: 744: 681: 667: 619: 505: 267: 255: 204:) is also common. 183: 149: 142: 127: 120: 57: 10397: 10396: 10129:Associative array 9993:Octuple precision 9734:978-3-319-76525-9 9643:978-0-201-89684-8 9610:978-0-521-88407-5 9570:978-0-8018-5413-2 9547:978-0-13-322495-5 9234:. Speleotrove.com 9041:. IEEE-sponsored 9012:978-1-139-45862-7 8981:978-0-89871-521-7 8934:978-0-89871-815-7 8897:978-0-470-86412-8 8288:Microsoft Support 8261:. pagetable.com. 7920:978-3-540-11319-5 7819:10.1109/85.586067 7594:978-3-319-64109-6 7502:978-0-9660176-3-2 7435:978-0-8176-4704-9 7246:Rice Institute R1 7222:(1972) computers. 7076:(as used e.g. by 7006:Q (number format) 6923:Computable number 6690:46593073709 15 6678:78678454728 14 6405: 6396: 6320: 6301: 6235: 6234: 5687: 5589: 5526: 5461: 5398: 5357: 5342: 4973: 4964: 4940: 4939: where  4794: 4777: 4776: where  4629:{\displaystyle y} 4609:{\displaystyle x} 4561: 4543: 4445: 4430: 4378: 4338:Machine precision 4205: 3864:Accuracy problems 3647: 3495:(precision loss). 3449:in C and IEEE 754 3339:Accuracy problems 3323: = −1; 3315: = −1; 3309:gradual underflow 3103:). Historically, 3045:7410125732421875, 2982:is the exponent. 2923: 2922: 2638:64-bit ("double") 2585:subnormal numbers 2581: 2580: 2211: 2210: 2030:subnormal numbers 1359:analytical engine 1339:analytical engine 1264: 1101:is the exponent ( 1093:and finishing at 816: 737: 732: 726: 677: 671: 660: 655: 649: 537:is the exponent. 487: 400:A signed integer 321:"point" character 182: 180: 175: 147: 135: 133: 125: 113: 111: 16:(Redirected from 10422: 10369:Type constructor 10254:Opaque data type 10186:Record or Struct 9983:Double precision 9978:Single precision 9869: 9862: 9855: 9846: 9845: 9841: 9836:. Archived from 9809: 9791: 9761: 9746: 9715:(2nd ed.). 9701: 9700: 9699: 9677: 9662:(1st ed.). 9647: 9632:(3rd ed.). 9614: 9599:(3rd ed.). 9574: 9559:(3rd ed.). 9551: 9528: 9526: 9525: 9489: 9446: 9445: 9443: 9427: 9421: 9420: 9409: 9403: 9402: 9391: 9385: 9384: 9373: 9367: 9366: 9355: 9349: 9348: 9334: 9328: 9327: 9325: 9318: 9306: 9300: 9299: 9297: 9271: 9265: 9264: 9262: 9261: 9249: 9243: 9242: 9240: 9239: 9228: 9222: 9221: 9219: 9212: 9200: 9194: 9193: 9191: 9184: 9172: 9166: 9165: 9163: 9156: 9144: 9131: 9130: 9128: 9127: 9121: 9114: 9102: 9096: 9095: 9093: 9086: 9072: 9066: 9060: 9058: 9057: 9051: 9040: 9026: 9017: 9016: 9005:. pp. 10–. 8992: 8986: 8985: 8984:. 0-89871-355-2. 8970:(2nd ed.). 8958: 8945: 8944: 8942: 8941: 8914: 8908: 8907: 8905: 8904: 8874: 8868: 8867: 8856: 8850: 8842: 8836: 8833: 8828: 8827: 8802: 8796: 8795: 8784: 8778: 8777: 8775: 8768: 8756: 8747: 8746: 8745: 8741: 8734: 8725: 8724: 8720:978-9-86605267-5 8706: 8700: 8692: 8682: 8653: 8642: 8641: 8630:10.1002/spe.2984 8623: 8614:(8): 1700–1727. 8603: 8597: 8596: 8583: 8577: 8576: 8563: 8557: 8556: 8548: 8542: 8541: 8531: 8507: 8501: 8500: 8489: 8483: 8482: 8480: 8463:978-1-45030019-3 8443: 8434: 8428: 8425:dtoa.c in netlab 8422: 8420: 8404: 8395: 8394: 8392: 8385: 8373: 8367: 8366: 8364: 8351: 8345: 8344: 8337: 8331: 8330: 8328: 8327: 8316: 8307: 8306: 8304: 8303: 8280: 8274: 8273: 8271: 8270: 8254: 8245: 8244: 8223: 8222: 8190: 8181: 8180: 8178: 8177: 8167: 8161: 8160: 8158: 8157: 8146: 8140: 8139: 8134: 8133: 8118: 8112: 8111: 8109: 8108: 8102: 8095: 8083: 8077: 8076: 8074: 8073: 8067: 8059:Basic data types 8056: 8048: 8042: 8041: 8039: 8038: 8032: 8025: 8017: 8011: 8010: 7999: 7993: 7980: 7974: 7973: 7966: 7960: 7959: 7949: 7943: 7942: 7931: 7925: 7924: 7899: 7893: 7892: 7890: 7883: 7871: 7862: 7861: 7859: 7844: 7838: 7836: 7834: 7833: 7827: 7802: 7790: 7784: 7779:Randell, Brian. 7777: 7771: 7765: 7759: 7742: 7736: 7729: 7723: 7722: 7716: 7715: 7709: 7702: 7693: 7687: 7685: 7680: 7679: 7672:978-0-20318604-6 7661:(1st ed.). 7650: 7644: 7643: 7642: 7641: 7621: 7615: 7614: 7568: 7555: 7551: 7549: 7548: 7542: 7531: 7522: 7513: 7512: 7510: 7509: 7484: 7478: 7477: 7457: 7448: 7447: 7416:(1st ed.). 7403: 7376: 7361: 7355: 7340: 7334: 7331:Taylor expansion 7327: 7321: 7315: 7309: 7305: 7299: 7295:Pentium FDIV bug 7277: 7271: 7268: 7262: 7261:(1956) computer. 7255: 7249: 7242: 7236: 7229: 7223: 7200: 7194: 7132: 7126: 7091: 7085: 7040: 6938:Double precision 6838: 6832: 6831:3.14159265358979 6826: 6825:3.14159265358979 6820: 6819:3.14159265358979 6814: 6813:3.14159265358979 6808: 6807:3.14159265358979 6802: 6796: 6795:3.14159265358979 6790: 6784: 6778: 6772: 6766: 6760: 6754: 6748: 6742: 6736: 6730: 6726:6065061913 18 6724: 6718: 6714:6566394222 17 6712: 6706: 6702:8571730119 16 6700: 6694: 6688: 6682: 6676: 6670: 6664: 6658: 6652: 6646: 6640: 6634: 6628: 6622: 6616: 6610: 6604: 6598: 6592: 6586: 6580: 6574: 6568: 6562: 6556: 6550: 6544: 6538: 6532: 6526: 6520: 6514: 6495: 6493: 6492: 6487: 6470:, converging as 6469: 6467: 6466: 6461: 6459: 6458: 6446: 6445: 6416: 6414: 6413: 6408: 6406: 6404: 6397: 6388: 6383: 6374: 6371: 6370: 6361: 6356: 6355: 6331: 6329: 6328: 6323: 6321: 6319: 6318: 6309: 6302: 6293: 6288: 6279: 6276: 6271: 6270: 6246: 6244: 6243: 6238: 6236: 6230: 6226: 6221: 6220: 6162:matrix inversion 6152: 6148: 6144: 6140: 6133: 6131: 6130: 6125: 6116: 6108: 6107: 6095: 6087: 6086: 6070: 6068: 6067: 6062: 6059: 6058: 6046: 6045: 5957: 5954: 5951: 5948: 5945: 5942: 5939: 5936: 5933: 5930: 5927: 5924: 5917: 5907: 5904: 5901: 5898: 5895: 5894: 5891: 5888: 5885: 5882: 5879: 5876: 5873: 5870: 5867: 5863: 5860: 5857: 5854: 5851: 5848: 5845: 5842: 5839: 5836: 5833: 5830: 5827: 5824: 5821: 5818: 5815: 5812: 5809: 5808: 5805: 5802: 5799: 5796: 5793: 5789: 5786: 5783: 5780: 5777: 5774: 5771: 5764: 5699: 5697: 5696: 5691: 5689: 5688: 5685: 5683: 5674: 5673: 5653: 5651: 5650: 5645: 5643: 5633: 5632: 5614: 5613: 5597: 5596: 5591: 5590: 5582: 5570: 5569: 5551: 5550: 5534: 5533: 5528: 5527: 5519: 5505: 5504: 5486: 5485: 5469: 5468: 5463: 5462: 5454: 5442: 5441: 5423: 5422: 5406: 5405: 5400: 5399: 5391: 5372: 5370: 5369: 5364: 5359: 5358: 5350: 5344: 5343: 5335: 5301: 5299: 5298: 5293: 5291: 5281: 5280: 5259: 5258: 5237: 5236: 5224: 5223: 5205: 5204: 5183: 5182: 5161: 5160: 5148: 5147: 5129: 5122: 5121: 5103: 5102: 5093: 5092: 5071: 5070: 5058: 5057: 5039: 5038: 5017: 5016: 5004: 5003: 4991: 4990: 4978: 4974: 4972: from above 4971: 4966: 4965: 4962: 4960: 4951: 4950: 4941: 4938: 4935: 4930: 4929: 4920: 4919: 4898: 4897: 4885: 4884: 4866: 4865: 4844: 4843: 4831: 4830: 4818: 4817: 4799: 4795: 4792: 4778: 4775: 4772: 4767: 4766: 4757: 4756: 4744: 4743: 4719: 4718: 4706: 4705: 4687: 4686: 4635: 4633: 4632: 4627: 4615: 4613: 4612: 4607: 4591:condition number 4576: 4574: 4573: 4568: 4563: 4562: 4559: 4557: 4548: 4544: 4539: 4516: 4503: 4476: 4474: 4473: 4468: 4463: 4462: 4447: 4438: 4432: 4431: 4428: 4426: 4413: 4411: 4410: 4405: 4399: 4398: 4380: 4379: 4376: 4374: 4358: 4308:missile battery 4277:subnormal number 4264: 4258: 4247: 4232: 4226: 4219: 4217: 4216: 4211: 4206: 4201: 4166: 4122: 4104: 4059: 4044: 4025: 4007: 3982: 3978: 3971: 3964: 3961: 3958: 3955: 3952: 3949: 3946: 3943: 3940: 3937: 3934: 3931: 3928: 3925: 3922: 3890: 3883: 3837: 3835: 3834: 3829: 3827: 3826: 3804: 3802: 3801: 3796: 3794: 3793: 3784: 3769: 3767: 3766: 3761: 3759: 3758: 3742: 3740: 3739: 3734: 3729: 3728: 3719: 3702: 3701: 3692: 3681: 3680: 3671: 3660: 3649: 3648: 3645: 3613: 3611: 3610: 3605: 3603: 3602: 3470:division by zero 3448: 3443: 3438: 3433: 3428: 3418: 3386: 3382: 3293: = 5; 3285: = 5; 3093:correct rounding 2946: 2932:rational numbers 2906:Single-precision 2796: 2792: 2767:machine learning 2715:MITS Altair 8800 2711:MITS Altair 8800 2705:, a dormmate of 2598:sign = 0 ; 2395: 2394: 2382: 2378: 2362: 2358: 2271:Double precision 2261:Single precision 2203: 2196: 2189: 2046: 2045: 2012: 2010: 2009: 2004: 2002: 1998: 1997: 1978: 1974: 1973: 1972: 1931: 1929: 1928: 1923: 1921: 1920: 1897: 1895: 1894: 1889: 1887: 1883: 1862: 1858: 1857: 1838: 1834: 1755:double-precision 1636:Double precision 1630:Single precision 1552: 1550: 1549: 1544: 1526: 1524: 1523: 1518: 1500: 1498: 1497: 1492: 1484: 1483: 1478: 1472: 1471: 1390: 1388: 1387: 1382: 1380: 1379: 1314: 1312: 1311: 1306: 1275: 1273: 1272: 1267: 1265: 1260: 1251: 1249: 1248: 1243: 1124: 1120: 1116: 1104: 1100: 1096: 1092: 1088: 1084: 1080: 1074: 1072: 1071: 1066: 1064: 1055: 1035: 1026: 1025: 1013: 1009: 1008: 1007: 980: 979: 958: 957: 936: 935: 914: 913: 892: 891: 866: 857: 856: 844: 840: 839: 838: 823: 822: 817: 814: 810: 799: 778: 767: 758: 756: 755: 750: 745: 730: 724: 713: 697: 690: 688: 687: 682: 675: 669: 668: 653: 647: 628: 626: 625: 620: 590: 588: 582: 578: 574: 572: 559: 536: 528: 524: 520: 514: 512: 511: 506: 501: 500: 488: 486: 485: 466: 454: 450: 448: 442: 438: 362: 360: 354: 232:between galaxies 192: 190: 189: 184: 181: 178: 176: 171: 170: 162: 159: 157: 148: 145: 143: 126: 123: 121: 49:Deutsches Museum 21: 10430: 10429: 10425: 10424: 10423: 10421: 10420: 10419: 10400: 10399: 10398: 10393: 10374:Type conversion 10309: 10303: 10239:Enumerated type 10212: 10098: 10092:null-terminated 10068: 10033: 9921: 9878: 9873: 9756: 9753: 9735: 9707:Revol, Nathalie 9697: 9695: 9674: 9644: 9611: 9571: 9548: 9523: 9521: 9519: 9509:Clarendon Press 9478: 9454: 9452:Further reading 9449: 9428: 9424: 9411: 9410: 9406: 9393: 9392: 9388: 9375: 9374: 9370: 9357: 9356: 9352: 9336: 9335: 9331: 9323: 9316: 9307: 9303: 9272: 9268: 9259: 9257: 9250: 9246: 9237: 9235: 9230: 9229: 9225: 9217: 9210: 9201: 9197: 9189: 9182: 9173: 9169: 9161: 9154: 9145: 9134: 9125: 9123: 9119: 9112: 9103: 9099: 9091: 9084: 9073: 9069: 9063:double extended 9055: 9053: 9049: 9038: 9027: 9020: 9013: 8993: 8989: 8982: 8959: 8948: 8939: 8937: 8935: 8915: 8911: 8902: 8900: 8898: 8875: 8871: 8858: 8857: 8853: 8843: 8839: 8825: 8823: 8803: 8799: 8786: 8785: 8781: 8773: 8766: 8757: 8750: 8743: 8735: 8728: 8721: 8707: 8703: 8657:Goldberg, David 8654: 8645: 8604: 8600: 8585: 8584: 8580: 8565: 8564: 8560: 8549: 8545: 8508: 8504: 8491: 8490: 8486: 8478: 8464: 8441: 8435: 8431: 8405: 8398: 8390: 8383: 8374: 8370: 8352: 8348: 8339: 8338: 8334: 8325: 8323: 8317: 8310: 8301: 8299: 8282: 8281: 8277: 8268: 8266: 8255: 8248: 8220: 8218: 8203:Embarcadero USA 8201:(TI1431C.txt). 8191: 8184: 8175: 8173: 8169: 8168: 8164: 8155: 8153: 8148: 8147: 8143: 8131: 8129: 8120: 8119: 8115: 8106: 8104: 8100: 8093: 8084: 8080: 8071: 8069: 8065: 8054: 8050: 8049: 8045: 8036: 8034: 8030: 8023: 8019: 8018: 8014: 8001: 8000: 7996: 7990:Wayback Machine 7981: 7977: 7968: 7967: 7963: 7951: 7950: 7946: 7932: 7928: 7921: 7913:. p. 244. 7911:Springer-Verlag 7905:, ed. (1982) . 7900: 7896: 7888: 7881: 7872: 7865: 7845: 7841: 7831: 7829: 7825: 7800: 7791: 7787: 7778: 7774: 7766: 7762: 7743: 7739: 7730: 7726: 7713: 7711: 7707: 7700: 7694: 7690: 7677: 7675: 7673: 7651: 7647: 7639: 7637: 7622: 7618: 7595: 7579:. p. 948. 7569: 7558: 7546: 7544: 7540: 7529: 7523: 7516: 7507: 7505: 7503: 7485: 7481: 7474: 7458: 7451: 7436: 7408:Revol, Nathalie 7404: 7389: 7385: 7380: 7379: 7362: 7358: 7341: 7337: 7328: 7324: 7316: 7312: 7306: 7302: 7278: 7274: 7269: 7265: 7256: 7252: 7243: 7239: 7230: 7226: 7220:Burroughs B7700 7216:Burroughs B6700 7212:Burroughs B5700 7208:Burroughs B5500 7201: 7197: 7181:as well as the 7133: 7129: 7110:biased exponent 7092: 7088: 7041: 7037: 7032: 7027: 6907: 6852: 6840: 6836: 6830: 6824: 6818: 6812: 6806: 6800: 6794: 6788: 6782: 6776: 6770: 6764: 6762:900560168 21 6758: 6752: 6750:908393901 20 6746: 6740: 6738:939728836 19 6734: 6728: 6722: 6716: 6710: 6704: 6698: 6692: 6686: 6680: 6674: 6668: 6662: 6656: 6650: 6644: 6638: 6632: 6626: 6620: 6614: 6608: 6602: 6596: 6590: 6584: 6578: 6572: 6566: 6560: 6554: 6548: 6542: 6536: 6530: 6524: 6518: 6512: 6509: 6505: 6475: 6472: 6471: 6454: 6450: 6441: 6437: 6423: 6420: 6419: 6384: 6379: 6373: 6372: 6366: 6362: 6360: 6345: 6341: 6339: 6336: 6335: 6314: 6310: 6289: 6284: 6278: 6277: 6275: 6260: 6256: 6254: 6251: 6250: 6225: 6216: 6212: 6210: 6207: 6206: 6183: 6150: 6146: 6142: 6138: 6112: 6103: 6099: 6091: 6082: 6078: 6076: 6073: 6072: 6054: 6050: 6041: 6037: 6002: 5999: 5998: 5959: 5958: 5955: 5952: 5949: 5946: 5943: 5940: 5937: 5934: 5931: 5928: 5925: 5922: 5909: 5908: 5905: 5902: 5899: 5896: 5892: 5889: 5886: 5883: 5880: 5877: 5874: 5871: 5868: 5865: 5864: 5861: 5858: 5855: 5852: 5849: 5846: 5843: 5840: 5837: 5834: 5831: 5828: 5825: 5822: 5819: 5816: 5813: 5810: 5806: 5803: 5800: 5797: 5794: 5791: 5790: 5787: 5784: 5781: 5778: 5775: 5772: 5769: 5762: 5758: 5754: 5750: 5747: 5724:ill-conditioned 5716: 5705: 5684: 5679: 5678: 5669: 5665: 5663: 5660: 5659: 5641: 5640: 5628: 5624: 5609: 5605: 5598: 5592: 5581: 5580: 5579: 5577: 5565: 5561: 5546: 5542: 5535: 5529: 5518: 5517: 5516: 5513: 5512: 5500: 5496: 5481: 5477: 5470: 5464: 5453: 5452: 5451: 5449: 5437: 5433: 5418: 5414: 5407: 5401: 5390: 5389: 5388: 5384: 5382: 5379: 5378: 5349: 5348: 5334: 5333: 5307: 5304: 5303: 5289: 5288: 5276: 5272: 5254: 5250: 5232: 5228: 5219: 5215: 5200: 5196: 5178: 5174: 5156: 5152: 5143: 5139: 5127: 5126: 5117: 5113: 5098: 5097: 5088: 5084: 5066: 5062: 5053: 5049: 5034: 5030: 5012: 5008: 4999: 4995: 4986: 4985: 4976: 4975: 4970: 4961: 4956: 4955: 4946: 4942: 4937: 4934: 4925: 4924: 4915: 4911: 4893: 4889: 4880: 4876: 4861: 4857: 4839: 4835: 4826: 4822: 4813: 4812: 4797: 4796: 4791: 4774: 4771: 4762: 4761: 4752: 4748: 4739: 4735: 4714: 4710: 4701: 4697: 4682: 4681: 4668: 4643: 4641: 4638: 4637: 4621: 4618: 4617: 4601: 4598: 4597: 4586:backward stable 4558: 4553: 4552: 4517: 4515: 4511: 4509: 4506: 4505: 4502: 4499: 4452: 4448: 4436: 4427: 4422: 4421: 4419: 4416: 4415: 4388: 4384: 4375: 4370: 4369: 4367: 4364: 4363: 4357: 4354: 4351: 4347:machine epsilon 4335: 4306:MIM-104 Patriot 4297: 4292: 4263: 4260: 4256: 4252: 4249: 4245: 4241: 4237: 4234: 4231: 4228: 4225: 4222: 4167: 4165: 4148: 4145: 4144: 4128: 4121: 4117: 4113: 4109: 4106: 4103: 4099: 4095: 4091: 4084: 4077: 4074: 4067: 4064: 4057: 4053: 4049: 4046: 4043: 4039: 4035: 4031: 4024: 4020: 4016: 4012: 4009: 4006: 4002: 3998: 3994: 3991: 3980: 3976: 3969: 3966: 3965: 3962: 3959: 3956: 3953: 3950: 3947: 3944: 3941: 3938: 3935: 3932: 3929: 3926: 3923: 3920: 3913: 3907: 3901: 3895: 3888: 3885: 3882: = −4 3881: 3878: 3866: 3816: 3812: 3810: 3807: 3806: 3789: 3785: 3780: 3775: 3772: 3771: 3754: 3750: 3748: 3745: 3744: 3724: 3720: 3715: 3697: 3693: 3688: 3676: 3672: 3667: 3656: 3644: 3640: 3638: 3635: 3634: 3592: 3588: 3586: 3583: 3582: 3493:denormalization 3481:complex numbers 3462: 3456: 3447:0x1.fffffep+127 3446: 3441: 3436: 3431: 3426: 3416: 3384: 3380: 3377: 3353: 3347: 3301: 3259: 3256: 3250:round-off error 3246: 3240: 3237: 3231: 3227: 3220: 3192: 3183: 3153:David M. Gay's 3146: 3089: 3082: 3078: 3066:machine epsilon 2944: 2928: 2888:TensorFloat-32 2755:Bfloat16 format 2666:TRS-80 LEVEL II 2646: 2418:decimal digits 2417: 2412: 2407: 2389: 2380: 2376: 2360: 2356: 2310:round-off error 2281:Double extended 2223:revised in 2008 2207: 2154:PMBus Linear-11 2044: 2038: 1987: 1983: 1979: 1965: 1961: 1954: 1950: 1948: 1945: 1944: 1916: 1912: 1910: 1907: 1906: 1867: 1863: 1847: 1843: 1839: 1824: 1820: 1815: 1812: 1811: 1742: 1727:The ability of 1616:general-purpose 1604:biased exponent 1535: 1532: 1531: 1506: 1503: 1502: 1479: 1474: 1473: 1467: 1464: 1462: 1459: 1458: 1375: 1372: 1370: 1367: 1366: 1355:Charles Babbage 1328: 1322: 1285: 1282: 1281: 1259: 1257: 1254: 1253: 1237: 1234: 1233: 1183:representation. 1147: 1122: 1118: 1114: 1102: 1098: 1094: 1090: 1086: 1082: 1078: 1062: 1061: 1056: 1054: 1048: 1047: 1036: 1034: 1028: 1027: 1021: 1017: 1000: 996: 972: 968: 950: 946: 928: 924: 906: 902: 884: 880: 873: 869: 867: 865: 859: 858: 852: 848: 831: 827: 818: 813: 812: 800: 789: 784: 780: 775: 773: 770: 769: 765: 736: 719: 716: 715: 711: 695: 659: 642: 639: 638: 608: 605: 604: 586: 584: 580: 576: 570: 568: 565:rational number 557: 534: 526: 522: 518: 496: 492: 474: 470: 465: 463: 460: 459: 452: 446: 444: 440: 436: 358: 356: 352: 306: 301: 282:computer system 177: 163: 161: 160: 158: 144: 134: 122: 112: 102: 99: 98: 35: 32: 23: 22: 15: 12: 11: 5: 10428: 10418: 10417: 10412: 10410:Floating point 10395: 10394: 10392: 10391: 10386: 10381: 10376: 10371: 10366: 10361: 10356: 10351: 10346: 10345: 10344: 10334: 10329: 10327:Data structure 10324: 10319: 10313: 10311: 10305: 10304: 10302: 10301: 10296: 10291: 10286: 10281: 10276: 10271: 10266: 10261: 10256: 10251: 10246: 10241: 10236: 10231: 10226: 10220: 10218: 10214: 10213: 10211: 10210: 10209: 10208: 10198: 10193: 10188: 10183: 10178: 10173: 10172: 10171: 10161: 10156: 10151: 10146: 10141: 10136: 10131: 10126: 10121: 10120: 10119: 10108: 10106: 10100: 10099: 10097: 10096: 10095: 10094: 10084: 10078: 10076: 10070: 10069: 10067: 10066: 10061: 10060: 10059: 10054: 10043: 10041: 10035: 10034: 10032: 10031: 10026: 10021: 10020: 10019: 10009: 10008: 10007: 10006: 10005: 9995: 9990: 9985: 9980: 9975: 9974: 9973: 9968: 9966:Half precision 9963: 9953:Floating point 9950: 9945: 9940: 9935: 9929: 9927: 9923: 9922: 9920: 9919: 9914: 9909: 9904: 9899: 9894: 9888: 9886: 9880: 9879: 9872: 9871: 9864: 9857: 9849: 9843: 9842: 9840:on 2017-07-06. 9825: 9811: 9763: 9752: 9751:External links 9749: 9748: 9747: 9733: 9702: 9679: 9672: 9664:Addison-Wesley 9648: 9642: 9634:Addison-Wesley 9616: 9609: 9575: 9569: 9552: 9546: 9529: 9517: 9491: 9476: 9453: 9450: 9448: 9447: 9422: 9404: 9386: 9368: 9350: 9329: 9301: 9288:(3): 305–363. 9266: 9244: 9223: 9195: 9185:. p. 26. 9178:(1981-02-12). 9167: 9150:(2000-08-27). 9132: 9097: 9078:(2011-08-03). 9067: 9032:(2005-07-15). 9018: 9011: 8987: 8980: 8946: 8933: 8909: 8896: 8869: 8851: 8837: 8797: 8794:. Vol. 1. 8779: 8762:(1997-10-01). 8748: 8726: 8719: 8701: 8659:(March 1991). 8643: 8598: 8578: 8558: 8543: 8522:(4): 270–282. 8502: 8484: 8462: 8429: 8418:10.1.1.31.4049 8396: 8379:(2006-01-11). 8368: 8346: 8332: 8308: 8275: 8246: 8182: 8162: 8141: 8113: 8089:(2004-11-20). 8078: 8043: 8026:. 2013-05-22. 8012: 7994: 7975: 7961: 7944: 7937:(1998-02-20). 7926: 7919: 7903:Randell, Brian 7894: 7877:(1997-07-15). 7863: 7839: 7785: 7772: 7760: 7757:978-3319505084 7737: 7724: 7688: 7671: 7665:. p. 24. 7645: 7616: 7593: 7556: 7514: 7501: 7479: 7472: 7449: 7434: 7386: 7384: 7381: 7378: 7377: 7356: 7335: 7322: 7310: 7300: 7272: 7263: 7250: 7237: 7224: 7204:Ferranti Atlas 7195: 7175:SEL Systems 85 7171:Interdata 8/32 7151:Manchester MU5 7147:RCA Spectra 70 7139:IBM System 360 7127: 7105:characteristic 7086: 7074:characteristic 7034: 7033: 7031: 7028: 7026: 7025: 7020: 7015: 7009: 7003: 6998: 6992: 6987: 6982: 6976: 6971: 6966: 6961: 6956: 6951: 6946: 6940: 6935: 6930: 6925: 6920: 6914: 6908: 6906: 6903: 6851: 6848: 6783:3.141592653589 6771:3.141592653589 6507: 6503: 6501: 6497: 6496: 6485: 6482: 6479: 6457: 6453: 6449: 6444: 6440: 6436: 6433: 6430: 6427: 6417: 6403: 6400: 6395: 6392: 6387: 6382: 6378: 6369: 6365: 6359: 6354: 6351: 6348: 6344: 6332: 6317: 6313: 6308: 6305: 6300: 6297: 6292: 6287: 6283: 6274: 6269: 6266: 6263: 6259: 6247: 6233: 6229: 6224: 6219: 6215: 6181: 6122: 6119: 6115: 6111: 6106: 6102: 6098: 6094: 6090: 6085: 6081: 6057: 6053: 6049: 6044: 6040: 6036: 6033: 6030: 6027: 6024: 6021: 6018: 6015: 6012: 6009: 6006: 5921: 5768: 5760: 5756: 5752: 5748: 5715: 5712: 5703: 5682: 5677: 5672: 5668: 5639: 5636: 5631: 5627: 5623: 5620: 5617: 5612: 5608: 5604: 5601: 5599: 5595: 5588: 5585: 5578: 5576: 5573: 5568: 5564: 5560: 5557: 5554: 5549: 5545: 5541: 5538: 5536: 5532: 5525: 5522: 5515: 5514: 5511: 5508: 5503: 5499: 5495: 5492: 5489: 5484: 5480: 5476: 5473: 5471: 5467: 5460: 5457: 5450: 5448: 5445: 5440: 5436: 5432: 5429: 5426: 5421: 5417: 5413: 5410: 5408: 5404: 5397: 5394: 5387: 5386: 5362: 5356: 5353: 5347: 5341: 5338: 5332: 5329: 5326: 5323: 5320: 5317: 5314: 5311: 5287: 5284: 5279: 5275: 5271: 5268: 5265: 5262: 5257: 5253: 5249: 5246: 5243: 5240: 5235: 5231: 5227: 5222: 5218: 5214: 5211: 5208: 5203: 5199: 5195: 5192: 5189: 5186: 5181: 5177: 5173: 5170: 5167: 5164: 5159: 5155: 5151: 5146: 5142: 5138: 5135: 5132: 5130: 5128: 5125: 5120: 5116: 5112: 5109: 5106: 5101: 5096: 5091: 5087: 5083: 5080: 5077: 5074: 5069: 5065: 5061: 5056: 5052: 5048: 5045: 5042: 5037: 5033: 5029: 5026: 5023: 5020: 5015: 5011: 5007: 5002: 4998: 4994: 4989: 4984: 4981: 4979: 4977: 4969: 4959: 4954: 4949: 4945: 4936: 4933: 4928: 4923: 4918: 4914: 4910: 4907: 4904: 4901: 4896: 4892: 4888: 4883: 4879: 4875: 4872: 4869: 4864: 4860: 4856: 4853: 4850: 4847: 4842: 4838: 4834: 4829: 4825: 4821: 4816: 4811: 4808: 4805: 4802: 4800: 4798: 4790: 4787: 4784: 4781: 4773: 4770: 4765: 4760: 4755: 4751: 4747: 4742: 4738: 4734: 4731: 4728: 4725: 4722: 4717: 4713: 4709: 4704: 4700: 4696: 4693: 4690: 4685: 4680: 4677: 4674: 4671: 4669: 4667: 4664: 4661: 4658: 4655: 4652: 4649: 4646: 4645: 4625: 4605: 4566: 4556: 4551: 4547: 4542: 4538: 4535: 4532: 4529: 4526: 4523: 4520: 4514: 4500: 4495:relative error 4466: 4461: 4458: 4455: 4451: 4444: 4441: 4435: 4425: 4402: 4397: 4394: 4391: 4387: 4383: 4373: 4355: 4352: 4334: 4331: 4330: 4329: 4296: 4293: 4291: 4290: 4287: 4280: 4273: 4266: 4261: 4254: 4250: 4243: 4239: 4235: 4229: 4223: 4209: 4204: 4200: 4197: 4194: 4191: 4188: 4185: 4182: 4179: 4176: 4173: 4170: 4164: 4161: 4158: 4155: 4152: 4132: 4125: 4119: 4115: 4111: 4107: 4101: 4097: 4093: 4083:← a + (b + c) 4078: 4075: 4073:← (a + b) + c 4068: 4065: 4062: 4055: 4051: 4047: 4041: 4037: 4033: 4022: 4018: 4014: 4010: 4004: 4000: 3996: 3992: 3919: 3911: 3905: 3899: 3893: 3886: 3879: 3865: 3862: 3825: 3822: 3819: 3815: 3792: 3788: 3783: 3779: 3757: 3753: 3732: 3727: 3723: 3718: 3714: 3711: 3708: 3705: 3700: 3696: 3691: 3687: 3684: 3679: 3675: 3670: 3666: 3663: 3659: 3655: 3652: 3643: 3627:divide-by-zero 3601: 3598: 3595: 3591: 3575: 3574: 3568: 3565:divide-by-zero 3562: 3556: 3546: 3497: 3496: 3484: 3473: 3455: 3452: 3451: 3450: 3444: 3439: 3434: 3429: 3376: 3375:Literal syntax 3373: 3351: 3346: 3343: 3305:Sterbenz lemma 3299: 3279:approximations 3257: 3254: 3244: 3238: 3235: 3228: 3225: 3219: 3216: 3191: 3188: 3182: 3179: 3175: 3174: 3171: 3168: 3165: 3158: 3145: 3142: 3134: 3133: 3130: 3127: 3124: 3121: 3101:rounding modes 3088: 3087:Rounding modes 3085: 3080: 3079:and 1.45a70c24 3076: 3058: 3057: 3047: 3046: 3021: 3020: 3013: 3012: 2998: 2997: 2972: 2971: 2927: 2924: 2921: 2920: 2917: 2914: 2911: 2908: 2902: 2901: 2898: 2895: 2892: 2889: 2885: 2884: 2881: 2878: 2875: 2872: 2866: 2865: 2862: 2859: 2856: 2853: 2851:Half-precision 2847: 2846: 2843: 2840: 2837: 2834: 2830: 2829: 2826: 2823: 2820: 2817: 2813: 2812: 2809: 2806: 2803: 2800: 2791: 2790: 2783: 2770: 2751: 2703:Monte Davidoff 2645: 2642: 2620: 2619: 2608: 2607: 2579: 2578: 2575: 2572: 2569: 2566: 2563: 2560: 2557: 2551: 2550: 2547: 2544: 2541: 2538: 2535: 2532: 2529: 2523: 2522: 2519: 2516: 2513: 2510: 2507: 2504: 2501: 2495: 2494: 2491: 2488: 2485: 2482: 2479: 2476: 2473: 2467: 2466: 2463: 2460: 2457: 2454: 2451: 2448: 2445: 2434: 2433: 2430: 2427: 2424: 2420: 2419: 2414: 2409: 2404: 2402: 2399: 2388: 2385: 2345: 2344: 2337:Half precision 2334: 2328: 2306: 2305: 2278: 2268: 2209: 2208: 2206: 2205: 2198: 2191: 2183: 2180: 2179: 2178: 2177: 2169: 2168: 2164: 2163: 2162: 2161: 2156: 2151: 2146: 2141: 2139:TensorFloat-32 2136: 2131: 2123: 2122: 2118: 2117: 2116: 2115: 2110: 2103: 2093: 2083: 2073: 2063: 2062: 2056: 2055: 2050:Floating-point 2040:Main article: 2037: 2034: 2015: 2014: 2001: 1996: 1993: 1990: 1986: 1982: 1977: 1971: 1968: 1964: 1960: 1957: 1953: 1934: 1933: 1919: 1915: 1886: 1882: 1879: 1876: 1873: 1870: 1866: 1861: 1856: 1853: 1850: 1846: 1842: 1837: 1833: 1830: 1827: 1823: 1819: 1808: 1807: 1801: 1795: 1785: 1741: 1738: 1737: 1736: 1733:divide by zero 1725: 1721: 1656:z/Architecture 1640: 1639: 1633: 1542: 1539: 1516: 1513: 1510: 1490: 1487: 1482: 1477: 1470: 1466: 1378: 1374: 1321: 1318: 1317: 1316: 1304: 1301: 1298: 1295: 1292: 1289: 1263: 1241: 1215: 1209: 1190: 1184: 1158: 1146: 1143: 1060: 1057: 1053: 1050: 1049: 1046: 1043: 1040: 1037: 1033: 1030: 1029: 1024: 1020: 1016: 1012: 1006: 1003: 999: 995: 992: 989: 986: 983: 978: 975: 971: 967: 964: 961: 956: 953: 949: 945: 942: 939: 934: 931: 927: 923: 920: 917: 912: 909: 905: 901: 898: 895: 890: 887: 883: 879: 876: 872: 868: 864: 861: 860: 855: 851: 847: 843: 837: 834: 830: 826: 821: 809: 806: 803: 798: 795: 792: 788: 783: 779: 777: 748: 743: 740: 735: 729: 723: 680: 674: 666: 663: 658: 652: 646: 618: 615: 612: 504: 499: 495: 491: 484: 481: 478: 473: 469: 414: 413: 406:characteristic 398: 305: 302: 300: 297: 213:floating point 174: 169: 166: 141: 138: 131: 119: 116: 109: 106: 33: 18:Floating-point 9: 6: 4: 3: 2: 10427: 10416: 10413: 10411: 10408: 10407: 10405: 10390: 10387: 10385: 10382: 10380: 10377: 10375: 10372: 10370: 10367: 10365: 10362: 10360: 10357: 10355: 10352: 10350: 10347: 10343: 10340: 10339: 10338: 10335: 10333: 10330: 10328: 10325: 10323: 10320: 10318: 10315: 10314: 10312: 10306: 10300: 10297: 10295: 10292: 10290: 10287: 10285: 10282: 10280: 10277: 10275: 10272: 10270: 10267: 10265: 10262: 10260: 10257: 10255: 10252: 10250: 10249:Function type 10247: 10245: 10242: 10240: 10237: 10235: 10232: 10230: 10227: 10225: 10222: 10221: 10219: 10215: 10207: 10204: 10203: 10202: 10199: 10197: 10194: 10192: 10189: 10187: 10184: 10182: 10179: 10177: 10174: 10170: 10167: 10166: 10165: 10162: 10160: 10157: 10155: 10152: 10150: 10147: 10145: 10142: 10140: 10137: 10135: 10132: 10130: 10127: 10125: 10122: 10118: 10115: 10114: 10113: 10110: 10109: 10107: 10105: 10101: 10093: 10090: 10089: 10088: 10085: 10083: 10080: 10079: 10077: 10075: 10071: 10065: 10062: 10058: 10055: 10053: 10050: 10049: 10048: 10045: 10044: 10042: 10040: 10036: 10030: 10027: 10025: 10022: 10018: 10015: 10014: 10013: 10010: 10004: 10001: 10000: 9999: 9996: 9994: 9991: 9989: 9986: 9984: 9981: 9979: 9976: 9972: 9969: 9967: 9964: 9962: 9959: 9958: 9956: 9955: 9954: 9951: 9949: 9946: 9944: 9941: 9939: 9936: 9934: 9931: 9930: 9928: 9924: 9918: 9915: 9913: 9910: 9908: 9905: 9903: 9900: 9898: 9895: 9893: 9890: 9889: 9887: 9885: 9884:Uninterpreted 9881: 9877: 9870: 9865: 9863: 9858: 9856: 9851: 9850: 9847: 9839: 9835: 9831: 9826: 9823: 9819: 9815: 9812: 9807: 9803: 9799: 9795: 9790: 9785: 9781: 9777: 9773: 9769: 9764: 9759: 9755: 9754: 9744: 9740: 9736: 9730: 9726: 9722: 9718: 9714: 9713: 9708: 9703: 9693: 9689: 9685: 9680: 9675: 9673:0-201-10557-8 9669: 9665: 9661: 9657: 9653: 9649: 9645: 9639: 9635: 9631: 9627: 9626: 9621: 9617: 9612: 9606: 9602: 9598: 9597: 9592: 9588: 9584: 9580: 9576: 9572: 9566: 9562: 9558: 9553: 9549: 9543: 9539: 9538:Prentice Hall 9535: 9530: 9520: 9518:9780198534037 9514: 9510: 9506: 9502: 9501: 9496: 9492: 9487: 9483: 9479: 9477:9780486679990 9473: 9470: 9466: 9465: 9460: 9456: 9455: 9442: 9437: 9433: 9426: 9418: 9414: 9408: 9400: 9396: 9390: 9382: 9378: 9372: 9364: 9360: 9354: 9347: 9343: 9339: 9333: 9322: 9315: 9311: 9305: 9296: 9291: 9287: 9283: 9282: 9277: 9270: 9255: 9248: 9233: 9227: 9216: 9209: 9205: 9199: 9188: 9181: 9177: 9171: 9160: 9153: 9149: 9143: 9141: 9139: 9137: 9118: 9111: 9107: 9101: 9090: 9083: 9082: 9077: 9071: 9064: 9048: 9044: 9037: 9036: 9031: 9025: 9023: 9014: 9008: 9004: 9000: 8999: 8991: 8983: 8977: 8973: 8969: 8968: 8963: 8957: 8955: 8953: 8951: 8936: 8930: 8926: 8922: 8921: 8913: 8899: 8893: 8889: 8885: 8884: 8879: 8873: 8865: 8861: 8855: 8849: 8848: 8841: 8835: 8832: 8822: 8818: 8814: 8813: 8808: 8801: 8793: 8789: 8783: 8772: 8769:. p. 9. 8765: 8761: 8755: 8753: 8739: 8733: 8731: 8722: 8716: 8712: 8705: 8698: 8695: 8690: 8686: 8681: 8676: 8672: 8668: 8667: 8662: 8658: 8652: 8650: 8648: 8639: 8635: 8631: 8627: 8622: 8617: 8613: 8609: 8602: 8595:. 2020-09-21. 8594: 8593: 8588: 8582: 8575:. 2022-11-10. 8574: 8573: 8568: 8562: 8554: 8547: 8539: 8535: 8530: 8525: 8521: 8517: 8513: 8506: 8498: 8494: 8488: 8477: 8473: 8469: 8465: 8459: 8455: 8451: 8447: 8440: 8433: 8426: 8419: 8414: 8410: 8403: 8401: 8389: 8382: 8378: 8372: 8363: 8358: 8350: 8343:. 2022-03-22. 8342: 8336: 8322: 8315: 8313: 8297: 8293: 8289: 8285: 8279: 8264: 8260: 8253: 8251: 8243: 8241: 8237: 8236:decimal point 8233: 8229: 8216: 8212: 8209:(originally: 8208: 8204: 8200: 8196: 8189: 8187: 8172: 8166: 8151: 8145: 8138: 8128:on 2013-05-08 8127: 8123: 8117: 8099: 8092: 8088: 8082: 8064: 8060: 8053: 8047: 8029: 8022: 8016: 8008: 8007:StackOverflow 8004: 7998: 7991: 7987: 7984: 7979: 7972:. 2021-08-03. 7971: 7965: 7958: 7954: 7948: 7940: 7936: 7930: 7922: 7916: 7912: 7908: 7904: 7898: 7887: 7884:. p. 3. 7880: 7876: 7870: 7868: 7858: 7853: 7849: 7843: 7824: 7820: 7816: 7812: 7808: 7807: 7799: 7795: 7789: 7782: 7776: 7769: 7764: 7758: 7754: 7750: 7748: 7741: 7734: 7728: 7721: 7706: 7699: 7692: 7684: 7674: 7668: 7664: 7660: 7656: 7649: 7635: 7631: 7627: 7620: 7612: 7608: 7604: 7600: 7596: 7590: 7586: 7582: 7578: 7574: 7567: 7565: 7563: 7561: 7553: 7539: 7536:. p. 2. 7535: 7528: 7521: 7519: 7504: 7498: 7494: 7490: 7483: 7475: 7473:0-13-322495-3 7469: 7465: 7464: 7456: 7454: 7445: 7441: 7437: 7431: 7427: 7423: 7419: 7415: 7414: 7409: 7402: 7400: 7398: 7396: 7394: 7392: 7387: 7374: 7370: 7366: 7360: 7353: 7349: 7345: 7339: 7332: 7326: 7319: 7318:William Kahan 7314: 7304: 7297: 7296: 7290: 7286: 7285:Intel Pentium 7282: 7276: 7267: 7260: 7254: 7247: 7241: 7234: 7228: 7221: 7217: 7213: 7209: 7205: 7199: 7192: 7191:Xerox Sigma 9 7188: 7184: 7180: 7176: 7173:(1970s), the 7172: 7168: 7164: 7160: 7156: 7152: 7148: 7144: 7140: 7136: 7131: 7124: 7120: 7116: 7115:exponent bias 7112: 7111: 7106: 7102: 7098: 7097: 7090: 7083: 7079: 7075: 7071: 7067: 7063: 7059: 7055: 7051: 7047: 7046: 7039: 7035: 7024: 7021: 7019: 7016: 7013: 7010: 7007: 7004: 7002: 6999: 6996: 6993: 6991: 6988: 6986: 6983: 6980: 6977: 6975: 6972: 6970: 6967: 6965: 6962: 6960: 6957: 6955: 6952: 6950: 6947: 6944: 6941: 6939: 6936: 6934: 6931: 6929: 6926: 6924: 6921: 6918: 6915: 6913: 6910: 6909: 6902: 6900: 6894: 6892: 6887: 6882: 6880: 6876: 6871: 6869: 6868:vectorization 6865: 6861: 6857: 6856:associativity 6847: 6845: 6839: 6833: 6827: 6821: 6815: 6809: 6803: 6797: 6791: 6786:8122118 23 6785: 6779: 6774:8608396 22 6773: 6767: 6761: 6755: 6749: 6743: 6737: 6731: 6725: 6719: 6713: 6707: 6701: 6695: 6689: 6683: 6677: 6671: 6665: 6659: 6653: 6647: 6641: 6635: 6629: 6623: 6617: 6611: 6605: 6599: 6593: 6587: 6581: 6575: 6569: 6563: 6557: 6551: 6545: 6539: 6533: 6527: 6521: 6515: 6502:i 6 × 2 × t 6500: 6477: 6455: 6451: 6447: 6442: 6438: 6434: 6431: 6428: 6425: 6418: 6401: 6398: 6393: 6390: 6385: 6380: 6376: 6367: 6363: 6357: 6352: 6349: 6346: 6342: 6334:second form: 6333: 6315: 6311: 6306: 6303: 6298: 6295: 6290: 6285: 6281: 6272: 6267: 6264: 6261: 6257: 6248: 6231: 6227: 6222: 6217: 6213: 6205: 6204: 6203: 6201: 6196: 6191: 6189: 6180: 6178: 6173: 6171: 6167: 6163: 6158: 6156: 6139:if (x==y) ... 6135: 6120: 6117: 6113: 6109: 6104: 6100: 6096: 6092: 6088: 6083: 6079: 6055: 6051: 6047: 6042: 6038: 6034: 6028: 6025: 6022: 6013: 6010: 6007: 5995: 5992: 5991:IEEE 754-2008 5988: 5984: 5980: 5974: 5970: 5967: 5962: 5919: 5914: 5766: 5744: 5742: 5738: 5734: 5729: 5725: 5721: 5711: 5709: 5700: 5675: 5670: 5666: 5657: 5654: 5637: 5629: 5625: 5621: 5618: 5610: 5606: 5602: 5600: 5593: 5583: 5574: 5566: 5562: 5558: 5555: 5547: 5543: 5539: 5537: 5530: 5520: 5509: 5501: 5497: 5493: 5490: 5482: 5478: 5474: 5472: 5465: 5455: 5446: 5438: 5434: 5430: 5427: 5419: 5415: 5411: 5409: 5402: 5392: 5376: 5373: 5360: 5351: 5345: 5336: 5330: 5324: 5321: 5318: 5312: 5309: 5285: 5277: 5273: 5269: 5266: 5255: 5251: 5247: 5244: 5233: 5229: 5225: 5220: 5216: 5209: 5201: 5197: 5193: 5190: 5179: 5175: 5171: 5168: 5157: 5153: 5149: 5144: 5140: 5133: 5131: 5118: 5114: 5110: 5107: 5089: 5085: 5081: 5078: 5067: 5063: 5059: 5054: 5050: 5043: 5035: 5031: 5027: 5024: 5013: 5009: 5005: 5000: 4996: 4982: 4980: 4967: 4952: 4947: 4943: 4931: 4916: 4912: 4908: 4905: 4894: 4890: 4886: 4881: 4877: 4870: 4862: 4858: 4854: 4851: 4840: 4836: 4832: 4827: 4823: 4809: 4806: 4803: 4801: 4782: 4779: 4768: 4753: 4749: 4745: 4740: 4736: 4729: 4726: 4723: 4715: 4711: 4707: 4702: 4698: 4691: 4688: 4678: 4675: 4672: 4670: 4662: 4659: 4656: 4650: 4647: 4623: 4603: 4594: 4592: 4588: 4587: 4582: 4577: 4564: 4549: 4545: 4540: 4536: 4533: 4527: 4521: 4518: 4512: 4497: 4496: 4490: 4488: 4484: 4480: 4464: 4459: 4456: 4453: 4449: 4442: 4439: 4433: 4400: 4395: 4392: 4389: 4385: 4381: 4360: 4349: 4348: 4343: 4339: 4327: 4323: 4319: 4315: 4311: 4307: 4303: 4299: 4298: 4288: 4285: 4284:safe division 4281: 4278: 4274: 4271: 4267: 4220: 4207: 4202: 4195: 4189: 4186: 4180: 4177: 4174: 4168: 4162: 4156: 4150: 4141: 4137: 4134: 4133: 4131: 4124: 4089: 4082: 4072: 4061: 4029: 3989: 3984: 3973: 3917: 3910: 3904: 3898: 3892: 3874: 3872: 3861: 3859: 3855: 3851: 3847: 3843: 3841: 3823: 3820: 3817: 3813: 3790: 3786: 3781: 3777: 3755: 3751: 3725: 3721: 3716: 3712: 3709: 3706: 3703: 3698: 3694: 3689: 3685: 3682: 3677: 3673: 3668: 3664: 3657: 3653: 3650: 3641: 3632: 3628: 3624: 3620: 3599: 3596: 3593: 3589: 3579: 3572: 3569: 3566: 3563: 3560: 3557: 3554: 3550: 3547: 3544: 3541: 3540: 3539: 3536: 3534: 3530: 3526: 3522: 3516: 3512: 3510: 3506: 3502: 3494: 3490: 3485: 3482: 3478: 3474: 3471: 3467: 3466: 3465: 3461: 3445: 3440: 3435: 3430: 3425: 3424: 3423: 3420: 3414: 3410: 3406: 3402: 3398: 3394: 3390: 3372: 3370: 3369:Horner method 3366: 3362: 3356: 3350: 3342: 3340: 3336: 3332: 3331: 3326: 3322: 3318: 3314: 3310: 3306: 3298: 3296: 3292: 3288: 3284: 3280: 3275: 3273: 3269: 3265: 3253: 3251: 3243: 3234: 3224: 3215: 3213: 3209: 3205: 3201: 3197: 3187: 3178: 3172: 3169: 3166: 3163: 3159: 3156: 3152: 3151: 3150: 3141: 3139: 3131: 3128: 3125: 3122: 3119: 3118: 3117: 3114: 3112: 3111: 3106: 3102: 3098: 3094: 3084: 3074: 3069: 3067: 3063: 3055: 3052: 3051: 3050: 3044: 3041: 3040: 3039: 3037: 3033: 3028: 3026: 3018: 3017: 3016: 3010: 3009: 3008: 3006: 3001: 2995: 2991: 2988: 2987: 2986: 2983: 2981: 2977: 2969: 2965: 2962: 2961: 2960: 2956: 2954: 2953:rounded value 2948: 2942: 2937: 2933: 2918: 2915: 2912: 2909: 2907: 2904: 2903: 2899: 2896: 2893: 2890: 2887: 2886: 2882: 2879: 2876: 2873: 2871: 2868: 2867: 2863: 2860: 2857: 2854: 2852: 2849: 2848: 2844: 2841: 2838: 2835: 2832: 2831: 2827: 2824: 2821: 2818: 2815: 2814: 2810: 2807: 2804: 2801: 2798: 2797: 2788: 2784: 2780: 2776: 2771: 2768: 2764: 2760: 2756: 2752: 2748: 2744: 2740: 2739:Motorola 6809 2736: 2735:Motorola 6800 2732: 2728: 2727:Commodore PET 2724: 2720: 2716: 2712: 2708: 2704: 2700: 2695: 2691: 2687: 2683: 2679: 2675: 2671: 2667: 2663: 2659: 2655: 2654: 2653: 2651: 2641: 2639: 2631: 2627: 2625: 2617: 2613: 2612: 2611: 2605: 2601: 2597: 2596: 2595: 2592: 2588: 2586: 2576: 2573: 2570: 2567: 2564: 2561: 2558: 2556: 2553: 2552: 2548: 2545: 2542: 2539: 2536: 2533: 2530: 2528: 2525: 2524: 2520: 2517: 2514: 2511: 2508: 2505: 2502: 2500: 2497: 2496: 2492: 2489: 2486: 2483: 2480: 2477: 2474: 2472: 2469: 2468: 2464: 2461: 2458: 2455: 2452: 2449: 2446: 2443: 2442:IEEE 754-2008 2439: 2436: 2435: 2431: 2428: 2425: 2422: 2421: 2396: 2393: 2384: 2372: 2370: 2366: 2365:negative zero 2354: 2349: 2342: 2338: 2335: 2332: 2329: 2326: 2322: 2318: 2315: 2314: 2313: 2311: 2302: 2298: 2294: 2290: 2286: 2282: 2279: 2276: 2272: 2269: 2266: 2262: 2259: 2258: 2257: 2255: 2251: 2247: 2246:basic formats 2242: 2240: 2236: 2232: 2228: 2224: 2220: 2216: 2204: 2199: 2197: 2192: 2190: 2185: 2184: 2182: 2181: 2176: 2173: 2172: 2171: 2170: 2166: 2165: 2160: 2157: 2155: 2152: 2150: 2147: 2145: 2142: 2140: 2137: 2135: 2132: 2130: 2127: 2126: 2125: 2124: 2120: 2119: 2114: 2111: 2108: 2104: 2102: 2099:(binary128), 2098: 2094: 2092: 2088: 2084: 2082: 2078: 2074: 2071: 2067: 2066: 2065: 2064: 2061: 2058: 2057: 2054: 2051: 2048: 2047: 2043: 2033: 2031: 2028:, as well as 2027: 2022: 2020: 1999: 1994: 1991: 1988: 1984: 1980: 1975: 1969: 1966: 1962: 1958: 1955: 1951: 1942: 1941: 1940: 1937: 1917: 1913: 1904: 1903: 1902: 1899: 1884: 1880: 1877: 1874: 1871: 1868: 1864: 1859: 1854: 1851: 1848: 1844: 1840: 1835: 1831: 1828: 1825: 1821: 1817: 1805: 1802: 1799: 1796: 1793: 1789: 1786: 1783: 1780: 1779: 1778: 1776: 1772: 1768: 1764: 1759: 1757: 1756: 1750: 1747: 1734: 1730: 1726: 1722: 1719: 1714: 1713: 1712: 1709: 1707: 1703: 1699: 1698:William Kahan 1691: 1687: 1686:William Kahan 1683: 1679: 1677: 1673: 1669: 1665: 1659: 1657: 1653: 1649: 1645: 1637: 1634: 1631: 1628: 1627: 1626: 1624: 1619: 1617: 1613: 1609: 1605: 1601: 1596: 1594: 1590: 1586: 1581: 1579: 1575: 1571: 1567: 1562: 1560: 1556: 1537: 1528: 1511: 1508: 1488: 1485: 1475: 1468: 1465: 1456: 1452: 1448: 1444: 1440: 1432: 1428: 1424: 1420: 1418: 1414: 1410: 1406: 1402: 1398: 1394: 1376: 1373: 1364: 1360: 1356: 1352: 1348: 1340: 1336: 1332: 1327: 1299: 1296: 1290: 1287: 1279: 1261: 1239: 1231: 1227: 1223: 1219: 1216: 1213: 1210: 1207: 1203: 1199: 1195: 1191: 1188: 1185: 1182: 1178: 1174: 1170: 1166: 1162: 1159: 1155: 1152: 1151: 1150: 1142: 1140: 1136: 1132: 1128: 1112: 1111:normalization 1106: 1075: 1058: 1051: 1044: 1041: 1038: 1031: 1022: 1018: 1014: 1010: 1004: 1001: 997: 993: 990: 987: 984: 981: 976: 973: 969: 965: 962: 959: 954: 951: 947: 943: 940: 937: 932: 929: 925: 921: 918: 915: 910: 907: 903: 899: 896: 893: 888: 885: 881: 877: 874: 870: 862: 853: 849: 845: 841: 835: 832: 828: 824: 819: 807: 804: 801: 796: 793: 790: 786: 781: 764: 759: 746: 741: 738: 733: 727: 721: 709: 705: 701: 691: 678: 672: 664: 661: 656: 650: 644: 636: 632: 616: 613: 610: 600: 598: 594: 566: 561: 555: 551: 547: 543: 538: 532: 515: 502: 497: 493: 489: 482: 479: 476: 471: 467: 456: 441:1,528,535,047 434: 429: 427: 423: 419: 411: 407: 403: 399: 395: 391: 387: 383: 382: 377: 373: 369: 368: 367: 364: 350: 346: 342: 338: 333: 330: 326: 322: 318: 313: 311: 296: 294: 290: 285: 283: 279: 274: 272: 264: 259: 252: 247: 243: 241: 240:dynamic range 237: 233: 229: 224: 222: 218: 214: 209: 205: 203: 199: 193: 172: 167: 164: 139: 136: 129: 117: 114: 107: 104: 96: 94: 90: 86: 82: 78: 74: 70: 66: 62: 54: 50: 46: 41: 37: 30: 19: 10154:Intersection 9952: 9838:the original 9821: 9817: 9775: 9771: 9711: 9696:, retrieved 9687: 9659: 9629: 9623: 9595: 9556: 9533: 9522:. Retrieved 9499: 9463: 9431: 9425: 9416: 9407: 9398: 9389: 9380: 9371: 9362: 9353: 9345: 9341: 9332: 9304: 9285: 9279: 9269: 9258:. Retrieved 9247: 9236:. Retrieved 9226: 9198: 9170: 9124:. Retrieved 9100: 9080: 9070: 9054:. Retrieved 9034: 8997: 8990: 8966: 8938:. Retrieved 8919: 8912: 8901:. Retrieved 8882: 8872: 8854: 8846: 8840: 8830: 8824:. Retrieved 8815:(99): 5–10. 8810: 8800: 8791: 8782: 8710: 8704: 8670: 8664: 8611: 8607: 8601: 8590: 8581: 8570: 8561: 8546: 8519: 8515: 8505: 8496: 8487: 8445: 8432: 8408: 8371: 8349: 8335: 8324:. Retrieved 8300:. Retrieved 8287: 8278: 8267:. Retrieved 8225: 8219:. Retrieved 8213:). ID 1400. 8198: 8174:. Retrieved 8165: 8154:. Retrieved 8144: 8136: 8130:. Retrieved 8126:the original 8116: 8105:. Retrieved 8081: 8070:. Retrieved 8058: 8046: 8035:. Retrieved 8015: 8006: 7997: 7978: 7964: 7956: 7952: 7947: 7929: 7906: 7897: 7842: 7830:. Retrieved 7810: 7804: 7788: 7775: 7768:Randell 1982 7763: 7745: 7740: 7727: 7718: 7712:. Retrieved 7691: 7682: 7676:. Retrieved 7658: 7648: 7638:, retrieved 7629: 7619: 7572: 7545:. Retrieved 7506:. Retrieved 7492: 7482: 7462: 7412: 7359: 7338: 7325: 7313: 7303: 7293: 7275: 7266: 7253: 7240: 7227: 7198: 7165:(ca. 1974), 7153:(1972), the 7130: 7118: 7114: 7108: 7104: 7100: 7094: 7089: 7073: 7069: 7065: 7061: 7049: 7043: 7038: 6898: 6895: 6883: 6872: 6853: 6841: 6835: 6829: 6823: 6817: 6811: 6805: 6799: 6793: 6787: 6781: 6775: 6769: 6763: 6759:3.1415926535 6757: 6751: 6747:3.1415926535 6745: 6739: 6735:3.1415926535 6733: 6727: 6721: 6715: 6709: 6703: 6697: 6691: 6685: 6679: 6673: 6667: 6661: 6655: 6649: 6643: 6637: 6631: 6625: 6619: 6613: 6607: 6601: 6595: 6589: 6583: 6577: 6571: 6565: 6559: 6553: 6547: 6541: 6535: 6529: 6523: 6517: 6511: 6498: 6249:First form: 6192: 6184: 6174: 6159: 6143:0.6/0.2-3==0 6136: 5996: 5975: 5971: 5963: 5960: 5910: 5745: 5717: 5701: 5658: 5655: 5377: 5374: 4595: 4584: 4578: 4493: 4491: 4486: 4482: 4478: 4361: 4345: 4337: 4336: 4322:Saudi Arabia 4312:an incoming 4282:Testing for 4143: 4136:Cancellation 4129: 4088:distributive 4085: 4080: 4070: 3985: 3974: 3967: 3914: 3908: 3902: 3896: 3875: 3867: 3857: 3854:root-finding 3849: 3845: 3844: 3630: 3626: 3622: 3618: 3616: 3570: 3564: 3558: 3552: 3548: 3542: 3537: 3517: 3513: 3498: 3463: 3421: 3378: 3357: 3354: 3348: 3330:cancellation 3328: 3324: 3320: 3316: 3312: 3302: 3294: 3290: 3286: 3282: 3278: 3276: 3271: 3267: 3263: 3260: 3247: 3241: 3232: 3221: 3211: 3207: 3199: 3193: 3184: 3176: 3154: 3147: 3135: 3115: 3108: 3100: 3099:schemes (or 3092: 3090: 3070: 3059: 3053: 3048: 3042: 3035: 3031: 3029: 3022: 3014: 3002: 2999: 2993: 2989: 2984: 2979: 2975: 2973: 2967: 2963: 2957: 2952: 2949: 2929: 2662:Altair BASIC 2647: 2635: 2621: 2609: 2603: 2599: 2593: 2589: 2582: 2429:Significand 2390: 2373: 2350: 2346: 2307: 2253: 2249: 2245: 2243: 2212: 2167:Alternatives 2089:(binary64), 2079:(binary32), 2049: 2023: 2018: 2016: 1938: 1935: 1900: 1809: 1803: 1797: 1791: 1787: 1781: 1774: 1770: 1766: 1762: 1760: 1753: 1751: 1743: 1710: 1706:Harold Stone 1702:Turing Award 1695: 1660: 1641: 1635: 1629: 1620: 1615: 1597: 1582: 1574:Model V 1565: 1563: 1529: 1436: 1408: 1404: 1400: 1392: 1362: 1350: 1344: 1197: 1193: 1164: 1148: 1138: 1134: 1130: 1126: 1110: 1107: 1076: 760: 704:rounding bit 703: 699: 692: 601: 562: 539: 530: 516: 457: 453:152,853.5047 437:152,853.5047 430: 425: 421: 417: 415: 409: 405: 393: 389: 385: 379: 365: 353:152,853.5047 334: 314: 307: 286: 275: 268: 225: 212: 210: 206: 194: 97: 92: 77:real numbers 68: 64: 58: 36: 10384:Type theory 10379:Type system 10229:Bottom type 10176:Option type 10117:generalized 10003:Long double 9948:Fixed point 9399:gcc.gnu.org 9381:gcc.gnu.org 8673:(1): 5–48. 8240:assumed bit 8238:before the 7848:Rojas, Raúl 7813:(2): 5–16. 7794:Rojas, Raúl 7698:"MANIAC II" 7686:(256 pages) 7365:denominator 7344:long double 7218:(1971) and 7189:(1966) and 7183:SDS Sigma 5 7141:(1964) and 7123:significand 7103:. The term 7062:coefficient 7045:significand 6928:Coprocessor 6798:95552 24 6723:3.141592653 6711:3.141592653 6699:3.141592653 6166:eigenvector 6147:0.6/0.2 - 3 6071:, and that 5916:long double 5759:−1) / (exp( 4316:missile in 4090:. That is, 4030:. That is, 4028:associative 3988:commutative 3891:, which is 3477:square root 3337:; see also 3233:In detail: 2833:FP8 (E5M2) 2816:FP8 (E4M3) 2811:Total bits 2678:IBM PC 5150 2616:hexadecimal 2602:= 1 ; 2293:long double 2109:(binary256) 1746:fixed-point 1731:(overflow, 1559:IAS machine 1555:von Neumann 1439:Konrad Zuse 1427:Konrad Zuse 1222:Mathematica 1177:Frank Olver 1154:Fixed-point 763:significand 445:1.528535047 418:significand 390:coefficient 381:significand 357:1.528535047 341:power of 10 329:fixed-point 317:radix point 293:coprocessor 251:number line 217:radix point 124:significand 85:significand 10404:Categories 10289:Empty type 10284:Type class 10234:Collection 10191:Refinement 10169:metaobject 10017:signedness 9876:Data types 9818:double_fpu 9789:cs/0701192 9743:2018935254 9717:Birkhäuser 9698:2018-07-16 9628:, Vol. 2: 9524:2016-02-11 9260:2011-01-11 9238:2012-04-25 9126:2003-09-05 9056:2013-05-23 8940:2013-05-14 8903:2013-05-14 8826:2011-09-24 8621:2101.11408 8362:2209.05433 8326:2020-05-16 8302:2010-02-24 8269:2016-05-30 8221:2016-05-30 8176:2024-08-29 8156:2024-04-16 8132:2012-04-25 8107:2012-02-19 8072:2019-11-08 8037:2019-09-22 7837:(12 pages) 7832:2022-07-03 7714:2018-08-07 7678:2019-08-18 7640:2018-07-16 7603:2017947446 7547:2018-08-07 7508:2012-12-31 7444:2009939668 7418:Birkhäuser 7383:References 6687:3.14159265 6675:3.14159265 6195:Archimedes 4140:derivative 3770:set to 0, 3409:JavaScript 3387:to denote 3105:truncation 2707:Bill Gates 2699:Intel 8080 2694:QuickBASIC 2413:precision 2321:decimal128 2304:available. 2101:decimal128 2072:(binary16) 2017:which has 1718:endianness 1652:System/360 1566:commercial 1564:The first 1419:in 1920. 1413:typewriter 1349:published 1324:See also: 1097:here) and 73:arithmetic 10364:Subtyping 10359:Interface 10342:metaclass 10294:Unit type 10264:Semaphore 10244:Exception 10149:Inductive 10139:Dependent 10104:Composite 10082:Character 10064:Reference 9961:Minifloat 9917:Bit array 9814:OpenCores 9806:218578808 9688:quadibloc 9593:(2007) . 8821:1354-3172 8788:"D.3.2.1" 8689:222008826 8638:231718830 8538:218472153 8413:CiteSeerX 8292:Microsoft 8152:. openEXR 8122:"openEXR" 7857:1406.1886 7663:CRC Press 7630:quadibloc 7373:numerator 7369:conjugate 7259:MANIAC II 7169:(1980s), 7058:logarithm 7001:Minifloat 6866:and auto- 6860:compilers 6729:3.1415926 6705:3.1415926 6681:3.1415926 6669:3.1415926 6663:3.1415926 6657:3.1415926 6484:∞ 6481:→ 6448:× 6435:× 6429:∼ 6426:π 6304:− 6114:θ 6110:⁡ 6093:θ 6089:⁡ 6048:− 6026:− 5676:≤ 5667:δ 5626:δ 5587:^ 5563:δ 5524:^ 5498:δ 5459:^ 5435:δ 5396:^ 5355:^ 5346:⋅ 5340:^ 5322:⋅ 5313:⁡ 5274:δ 5252:δ 5226:⋅ 5198:δ 5176:δ 5150:⋅ 5115:δ 5086:δ 5060:⋅ 5032:δ 5006:⋅ 4953:≤ 4944:δ 4913:δ 4887:⋅ 4859:δ 4833:⋅ 4810:⁡ 4783:⁡ 4746:⋅ 4730:⁡ 4708:⋅ 4692:⁡ 4679:⁡ 4660:⋅ 4651:⁡ 4550:≤ 4534:− 4522:⁡ 4457:− 4393:− 4295:Incidents 4187:− 3871:precision 3707:⋯ 3623:underflow 3549:underflow 3509:exception 3489:underflow 3200:decimal32 2805:Exponent 2426:Exponent 2416:Number of 2325:decimal32 2317:Decimal64 2301:alignment 2129:Minifloat 2105:256-bit: 2097:Quadruple 2095:128-bit: 2091:decimal64 2081:decimal32 1967:− 1959:− 1872:− 1852:− 1829:− 1585:Pilot ACE 1541:∞ 1538:± 1515:∞ 1512:× 1481:∞ 1437:In 1938, 1300:π 1291:⁡ 1240:π 1169:symmetric 1137:, or the 1059:3.1415928 1052:≈ 1042:× 1039:1.5707964 1032:≈ 1015:× 1002:− 994:× 985:⋯ 974:− 966:× 952:− 944:× 930:− 922:× 908:− 900:× 886:− 878:× 846:× 833:− 825:× 805:− 787:∑ 742:_ 700:round bit 665:_ 490:× 480:− 394:precision 363:seconds. 211:The term 173:⏞ 165:− 140:⏟ 130:× 118:⏟ 79:using an 61:computing 10389:Variable 10279:Top type 10144:Equality 10052:physical 10029:Rational 10024:Interval 9971:bfloat16 9692:archived 9658:(1997). 9497:(1965). 9461:(1963). 9363:GCC Wiki 9321:Archived 9215:Archived 9187:Archived 9159:Archived 9117:Archived 9089:Archived 9047:Archived 9043:ARITH 17 8964:(2002). 8812:Overload 8771:Archived 8476:Archived 8388:Archived 8296:Archived 8263:Archived 8215:Archived 8098:Archived 8063:Archived 8028:Archived 7986:Archived 7886:Archived 7823:Archived 7705:Archived 7634:archived 7611:30244721 7538:Archived 7214:(1971), 7210:(1964), 7206:(1962), 7185:(1967), 7161:(1966), 7096:exponent 7082:exponent 7070:fraction 7066:argument 7054:mantissa 7050:mantissa 6979:IEEE 754 6969:GNU MPFR 6905:See also 6884:In most 6651:3.141592 6645:3.141592 6639:3.141592 6633:3.141592 5966:compiler 5763:−1) − 1) 4081:1280.246 4071:1280.245 3846:Overflow 3559:overflow 3505:portable 3432:-5000.12 3397:IEEE 754 3395:and the 3268:rounding 3097:rounding 3043:3.141592 3025:rounding 2870:Bfloat16 2723:Apple // 2719:MOS 6502 2690:GW-BASIC 2664:(1975), 2650:IEEE 754 2636:and the 2406:Exponent 2353:infinity 2235:Cray T90 2219:IEEE 754 2134:bfloat16 2085:64-bit: 2075:32-bit: 2068:16-bit: 2060:IEEE 754 2042:IEEE 754 1777:) where 1690:IEEE 754 1664:IEEE 754 1644:IBM 7094 1220:such as 728:00001111 722:11001001 673:10100010 651:00001111 645:11001001 426:exponent 402:exponent 386:mantissa 347:'s moon 299:Overview 271:IEEE 754 198:base two 179:exponent 89:exponent 10332:Generic 10308:Related 10224:Boolean 10181:Product 10057:virtual 10047:Address 10039:Pointer 10012:Integer 9943:Decimal 9938:Complex 9926:Numeric 9822:fpuvhdl 9486:0161456 8211:Borland 8207:Inprise 7371:of the 7193:(1970). 6886:Fortran 6879:library 6693:3.14159 6627:3.14159 6621:3.14159 6615:3.14159 6609:3.14159 5302:and so 4636:, then 4318:Dhahran 3858:invalid 3850:invalid 3631:invalid 3619:inexact 3571:invalid 3543:inexact 3437:6.02e23 3413:Haskell 3266:bit, a 3162:bignums 3015:but is 2782:format. 2618:number. 2107:Octuple 2053:formats 1650:in its 1600:IBM 704 1453:-based 1320:History 734:1101101 657:1101101 577:145,000 533:), and 433:decimal 345:Jupiter 325:integer 81:integer 10322:Boxing 10310:topics 10269:Stream 10206:tagged 10164:Object 10087:String 9804:  9741:  9731:  9670:  9640:  9607:  9567:  9544:  9515:  9484:  9474:  9417:GitHub 9009:  8978:  8931:  8894:  8819:  8744:  8717:  8687:  8636:  8592:GitHub 8572:GitHub 8536:  8497:GitHub 8472:910409 8470:  8460:  8415:  7917:  7755:  7669:  7609:  7601:  7591:  7499:  7470:  7442:  7432:  7308:exact. 6717:3.1415 5987:Python 5979:proton 5897:return 5792:double 5779:double 5770:double 5656:where 5375:where 4477:where 3939:double 3924:double 3442:-3e-45 3391:. The 3272:sticky 3155:dtoa.c 2992:= −4; 2966:= −4; 2787:Hopper 2775:Nvidia 2686:MS-DOS 2682:BASICA 2674:MBASIC 2577:~34.0 2571:16383 2549:~19.2 2543:16383 2521:~15.9 2499:Double 2471:Single 2432:Total 2087:Double 2077:Single 1397:digits 1228:, and 1226:Maxima 1206:bignum 1133:, the 1129:, the 1077:where 731:  725:  676:  670:  654:  648:  593:base 3 558:65,536 542:binary 517:where 105:12.345 53:Munich 10217:Other 10201:Union 10134:Class 10124:Array 9907:Tryte 9802:S2CID 9784:arXiv 9778:(3). 9324:(PDF) 9317:(PDF) 9218:(PDF) 9211:(PDF) 9190:(PDF) 9183:(PDF) 9162:(PDF) 9155:(PDF) 9120:(PDF) 9113:(PDF) 9092:(PDF) 9085:(PDF) 9050:(PDF) 9039:(PDF) 8888:Wiley 8774:(PDF) 8767:(PDF) 8685:S2CID 8634:S2CID 8616:arXiv 8534:S2CID 8479:(PDF) 8468:S2CID 8442:(PDF) 8391:(PDF) 8384:(PDF) 8357:arXiv 8101:(PDF) 8094:(PDF) 8066:(PDF) 8055:(PDF) 8031:(PDF) 8024:(PDF) 7889:(PDF) 7882:(PDF) 7852:arXiv 7826:(PDF) 7801:(PDF) 7708:(PDF) 7701:(PDF) 7607:S2CID 7541:(PDF) 7530:(PDF) 7117:, or 7107:(for 7101:scale 7056:of a 7030:Notes 6997:(MBF) 6959:FLOPS 6899:Icing 6741:3.141 6603:3.141 6597:3.141 6591:3.141 6585:3.141 6579:3.141 6573:3.141 5755:) = ( 4304:in a 3525:means 3264:guard 3204:radix 3196:radix 2802:Sign 2799:Type 2731:Atari 2515:1023 2493:~7.2 2465:~3.3 2423:Sign 2408:bias 2401:Bits 2398:Type 2363:), a 2121:Other 1676:68000 1672:i8087 1451:relay 1230:Maple 583:, or 451:, or 410:scale 408:, or 388:, or 376:radix 278:FLOPS 263:signs 115:12345 71:) is 10337:Kind 10299:Void 10159:List 10074:Text 9912:Word 9902:Trit 9897:Byte 9739:LCCN 9729:ISBN 9668:ISBN 9638:ISBN 9605:ISBN 9565:ISBN 9542:ISBN 9513:ISBN 9472:ISBN 9007:ISBN 8976:ISBN 8929:ISBN 8892:ISBN 8817:ISSN 8715:ISBN 8458:ISBN 7915:ISBN 7753:ISBN 7667:ISBN 7599:LCCN 7589:ISBN 7497:ISBN 7468:ISBN 7440:LCCN 7430:ISBN 7329:The 7177:and 7093:The 7042:The 6789:3.14 6765:3.14 6753:3.14 6567:3.14 6561:3.14 6555:3.14 6549:3.14 5985:and 5739:and 5704:mach 5686:mach 4963:mach 4616:and 4560:mach 4429:mach 4377:mach 4356:mach 4314:Scud 4248:and 4100:) × 4040:) + 4008:and 3970:tanf 3848:and 3501:trap 3427:99.9 3363:and 2785:The 2779:GPUs 2753:The 2692:and 2670:CP/M 2656:The 2574:113 2568:128 2565:112 2555:Quad 2487:127 2438:Half 2411:Bits 2369:NaNs 2319:and 2287:and 2252:and 2215:IEEE 2213:The 2070:Half 1668:word 1642:The 1621:The 1583:The 1365:x 10 1198:e.g. 1194:e.g. 1165:i.e. 1117:and 631:bits 569:1.45 422:base 374:(or 372:base 146:base 10196:Set 9892:Bit 9794:doi 9721:doi 9436:doi 9290:doi 8675:doi 8626:doi 8524:doi 8450:doi 7815:doi 7581:doi 7422:doi 7346:is 7342:If 7155:HEP 7143:370 7078:CDC 7064:or 6917:C99 6777:3.1 6543:3.1 6537:3.1 6101:cos 6080:sin 5953:1.0 5929:log 5913:C99 5893:// 5887:1.0 5859:1.0 5835:exp 5823:1.0 5807:// 5720:ULP 4489:). 4050:+ ( 3960:2.0 3948:tan 3646:tot 3553:and 3535:). 3529:C11 3521:C99 3417:123 3383:or 3081:hex 3077:hex 3056:... 2919:32 2916:23 2900:19 2897:10 2883:16 2864:16 2861:10 2733:), 2701:by 2688:'s 2680:'s 2672:'s 2626:is 2562:15 2546:64 2540:80 2537:64 2534:15 2518:53 2512:64 2509:52 2506:11 2490:24 2484:32 2481:23 2462:11 2459:15 2456:16 2453:10 2371:). 2297:x86 2289:C11 2285:C99 2239:SV1 1810:is 1357:'s 1288:sin 1252:or 815:bit 702:or 581:0.2 531:ten 351:is 335:In 234:or 59:In 51:in 10406:: 9832:. 9800:. 9792:. 9776:30 9774:. 9770:. 9737:. 9727:. 9719:. 9690:, 9686:, 9666:. 9654:; 9603:. 9589:; 9585:; 9581:; 9563:. 9540:. 9511:. 9507:/ 9482:MR 9480:. 9415:. 9397:. 9379:. 9361:. 9344:. 9340:. 9319:. 9286:18 9284:. 9278:. 9213:. 9135:^ 9115:. 9065:.) 9021:^ 9001:. 8949:^ 8923:. 8886:. 8862:. 8829:. 8809:. 8790:. 8751:^ 8729:^ 8696:, 8683:. 8671:23 8669:. 8663:. 8646:^ 8632:. 8624:. 8612:51 8610:. 8589:. 8569:. 8532:. 8520:53 8518:. 8514:. 8495:. 8474:. 8466:. 8456:. 8444:. 8399:^ 8386:. 8311:^ 8290:. 8286:. 8249:^ 8224:. 8205:/ 8197:. 8185:^ 8135:. 8096:. 8061:. 8005:. 7866:^ 7821:. 7811:19 7809:. 7803:. 7717:. 7681:. 7657:. 7632:, 7628:, 7605:. 7597:. 7587:. 7559:^ 7517:^ 7491:. 7452:^ 7438:. 7428:. 7420:. 7390:^ 7179:86 7113:, 6881:. 6164:, 5983:C# 5956:); 5890:); 5856:!= 5847:if 5844:); 5310:fl 4807:fl 4780:fl 4727:fl 4689:fl 4676:fl 4648:fl 4519:fl 4320:, 4242:+ 4123:: 4118:× 4114:+ 4110:× 4096:+ 4054:+ 4036:+ 4021:× 4017:= 4013:× 4003:+ 3999:= 3995:+ 3963:); 3954:pi 3927:pi 3884:; 3483:). 3371:. 3341:. 3068:. 2955:. 2943:(. 2913:8 2910:1 2894:8 2891:1 2880:7 2877:8 2874:1 2858:5 2855:1 2845:8 2842:2 2839:5 2836:1 2828:8 2825:3 2822:4 2819:1 2729:, 2725:, 2684:, 2676:, 2668:, 2559:1 2531:1 2503:1 2478:8 2475:1 2450:5 2447:1 2444:) 2381:−∞ 2377:+∞ 2361:−∞ 2357:+∞ 2341:Cg 2032:. 1898:. 1794:), 1773:, 1769:, 1765:, 1720:). 1708:. 1580:. 1570:Z4 1527:. 1455:Z3 1443:Z1 1431:Z3 1407:; 1224:, 1171:) 1141:. 1095:23 1083:24 1005:23 679:0. 617:24 599:. 589:10 573:10 560:. 449:10 384:, 361:10 349:Io 308:A 287:A 223:. 137:10 69:FP 63:, 55:). 45:Z3 9868:e 9861:t 9854:v 9808:. 9796:: 9786:: 9760:. 9745:. 9723:: 9676:. 9646:. 9613:. 9573:. 9550:. 9527:. 9488:. 9444:. 9438:: 9419:. 9401:. 9383:. 9365:. 9298:. 9292:: 9263:. 9241:. 9129:. 9059:. 9015:. 8943:. 8906:. 8723:. 8699:) 8691:. 8677:: 8640:. 8628:: 8618:: 8555:. 8540:. 8526:: 8499:. 8452:: 8427:) 8423:( 8421:. 8365:. 8359:: 8329:. 8305:. 8272:. 8179:. 8159:. 8110:. 8075:. 8040:. 8009:. 7992:. 7941:. 7923:. 7860:. 7854:: 7835:. 7817:: 7749:, 7613:. 7583:: 7550:. 7511:. 7476:. 7446:. 7424:: 7298:. 7187:7 6801:3 6531:3 6525:3 6519:3 6513:3 6508:i 6504:i 6478:i 6456:i 6452:t 6443:i 6439:2 6432:6 6402:1 6399:+ 6394:1 6391:+ 6386:2 6381:i 6377:t 6368:i 6364:t 6358:= 6353:1 6350:+ 6347:i 6343:t 6316:i 6312:t 6307:1 6299:1 6296:+ 6291:2 6286:i 6282:t 6273:= 6268:1 6265:+ 6262:i 6258:t 6232:3 6228:1 6223:= 6218:0 6214:t 6198:( 6121:1 6118:= 6105:2 6097:+ 6084:2 6056:2 6052:y 6043:2 6039:x 6035:= 6032:) 6029:y 6023:x 6020:( 6017:) 6014:y 6011:+ 6008:x 6005:( 5950:- 5947:Z 5944:( 5941:/ 5938:) 5935:Z 5932:( 5926:= 5923:Z 5906:} 5903:; 5900:Z 5884:- 5881:Z 5878:( 5875:/ 5872:Y 5869:= 5866:Z 5862:) 5853:Z 5850:( 5841:Y 5838:( 5832:= 5829:Z 5826:; 5820:- 5817:X 5814:= 5811:Y 5804:; 5801:Z 5798:, 5795:Y 5788:{ 5785:) 5782:X 5776:( 5773:A 5761:x 5757:x 5753:x 5751:( 5749:A 5681:E 5671:n 5638:, 5635:) 5630:3 5622:+ 5619:1 5616:( 5611:2 5607:y 5603:= 5594:2 5584:y 5575:; 5572:) 5567:3 5559:+ 5556:1 5553:( 5548:1 5544:y 5540:= 5531:1 5521:y 5510:; 5507:) 5502:2 5494:+ 5491:1 5488:( 5483:2 5479:x 5475:= 5466:2 5456:x 5447:; 5444:) 5439:1 5431:+ 5428:1 5425:( 5420:1 5416:x 5412:= 5403:1 5393:x 5361:, 5352:y 5337:x 5331:= 5328:) 5325:y 5319:x 5316:( 5286:, 5283:) 5278:3 5270:+ 5267:1 5264:( 5261:) 5256:2 5248:+ 5245:1 5242:( 5239:) 5234:2 5230:y 5221:2 5217:x 5213:( 5210:+ 5207:) 5202:3 5194:+ 5191:1 5188:( 5185:) 5180:1 5172:+ 5169:1 5166:( 5163:) 5158:1 5154:y 5145:1 5141:x 5137:( 5134:= 5124:) 5119:3 5111:+ 5108:1 5105:( 5100:) 5095:) 5090:2 5082:+ 5079:1 5076:( 5073:) 5068:2 5064:y 5055:2 5051:x 5047:( 5044:+ 5041:) 5036:1 5028:+ 5025:1 5022:( 5019:) 5014:1 5010:y 5001:1 4997:x 4993:( 4988:( 4983:= 4968:, 4958:E 4948:n 4932:, 4927:) 4922:) 4917:2 4909:+ 4906:1 4903:( 4900:) 4895:2 4891:y 4882:2 4878:x 4874:( 4871:+ 4868:) 4863:1 4855:+ 4852:1 4849:( 4846:) 4841:1 4837:y 4828:1 4824:x 4820:( 4815:( 4804:= 4789:) 4786:( 4769:, 4764:) 4759:) 4754:2 4750:y 4741:2 4737:x 4733:( 4724:+ 4721:) 4716:1 4712:y 4703:1 4699:x 4695:( 4684:( 4673:= 4666:) 4663:y 4657:x 4654:( 4624:y 4604:x 4565:. 4555:E 4546:| 4541:x 4537:x 4531:) 4528:x 4525:( 4513:| 4501:x 4487:B 4483:P 4479:B 4465:, 4460:P 4454:1 4450:B 4443:2 4440:1 4434:= 4424:E 4401:, 4396:P 4390:1 4386:B 4382:= 4372:E 4353:Ε 4328:. 4262:h 4257:) 4255:a 4253:( 4251:f 4246:) 4244:h 4240:a 4238:( 4236:f 4230:h 4224:h 4208:. 4203:h 4199:) 4196:a 4193:( 4190:f 4184:) 4181:h 4178:+ 4175:a 4172:( 4169:f 4163:= 4160:) 4157:h 4154:( 4151:Q 4120:c 4116:b 4112:c 4108:a 4102:c 4098:b 4094:a 4092:( 4058:) 4056:c 4052:b 4048:a 4042:c 4038:b 4034:a 4032:( 4023:a 4019:b 4015:b 4011:a 4005:a 4001:b 3997:b 3993:a 3990:( 3981:× 3977:× 3957:/ 3951:( 3945:= 3942:z 3936:; 3930:= 3887:s 3880:e 3824:t 3821:o 3818:t 3814:R 3791:1 3787:R 3782:/ 3778:1 3756:1 3752:R 3731:) 3726:n 3722:R 3717:/ 3713:1 3710:+ 3704:+ 3699:2 3695:R 3690:/ 3686:1 3683:+ 3678:1 3674:R 3669:/ 3665:1 3662:( 3658:/ 3654:1 3651:= 3642:R 3600:t 3597:o 3594:t 3590:R 3472:. 3405:C 3385:E 3381:e 3325:s 3321:e 3317:s 3313:e 3295:s 3291:e 3287:s 3283:e 3212:e 3208:s 3036:e 3032:s 3005:π 2994:s 2990:e 2980:e 2976:s 2968:s 2964:e 2945:5 2936:π 2741:( 2721:( 2604:s 2600:e 2440:( 2355:( 2202:e 2195:t 2188:v 2019:B 2013:, 2000:) 1995:1 1992:+ 1989:U 1985:B 1981:( 1976:) 1970:P 1963:B 1956:1 1952:( 1932:, 1918:L 1914:B 1885:) 1881:1 1878:+ 1875:L 1869:U 1865:( 1860:) 1855:1 1849:P 1845:B 1841:( 1836:) 1832:1 1826:B 1822:( 1818:2 1804:U 1798:L 1792:B 1788:P 1782:B 1775:U 1771:L 1767:P 1763:B 1509:0 1489:0 1486:= 1476:/ 1469:1 1409:m 1405:n 1401:n 1393:n 1377:m 1363:n 1341:. 1303:) 1297:3 1294:( 1262:3 1123:1 1119:1 1115:0 1103:1 1099:e 1091:0 1087:n 1079:p 1045:2 1023:1 1019:2 1011:) 998:2 991:1 988:+ 982:+ 977:4 970:2 963:1 960:+ 955:3 948:2 941:0 938:+ 933:2 926:2 919:0 916:+ 911:1 904:2 897:1 894:+ 889:0 882:2 875:1 871:( 863:= 854:e 850:2 842:) 836:n 829:2 820:n 808:1 802:p 797:0 794:= 791:n 782:( 766:s 747:. 739:1 712:1 696:0 662:0 635:π 614:= 611:p 587:× 585:2 571:× 535:e 527:b 523:p 519:s 503:, 498:e 494:b 483:1 477:p 472:b 468:s 447:× 359:× 168:3 108:= 67:( 31:. 20:)

Index

Floating-point
Floating point (disambiguation)

Z3
Deutsches Museum
Munich
computing
arithmetic
real numbers
integer
significand
exponent
base two
decimal floating point
radix point
scientific notation
orders of magnitude
between galaxies
between protons in an atom
dynamic range

number line

signs
IEEE 754
FLOPS
computer system
floating-point unit
coprocessor
number representation

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.