### 19 Floats

Starting with version 4.5, GAP has built-in support for floating-point numbers in machine format, and allows package to implement arbitrary-precision floating-point arithmetic in a uniform manner. For now, one such package, Float exists, and is based on the arbitrary-precision routines in mpfr.

A word of caution: GAP deals primarily with algebraic objects, which can be represented exactly in a computer. Numerical imprecision means that floating-point numbers do not form a ring in the strict GAP sense, because addition is in general not associative ((1.0e-100+1.0)-1.0 is not the same as 1.0e-100+(1.0-1.0), in the default precision setting).

Most algorithms in GAP which require ring elements will therefore not be applicable to floating-point elements. In some cases, such a notion would not even make any sense (what is the greatest common divisor of two floating-point numbers?)

#### 19.1 A sample run

Floating-point numbers can be input into GAP in the standard floating-point notation:

gap> 3.14;
3.14
gap> last^2/6;
1.64327
gap> h := 6.62606896e-34;
6.62607e-34
gap> pi := 4*Atan(1.0);
3.14159
gap> hbar := h/(2*pi);
1.05457e-34


Floating-point numbers can also be created using Float, from strings or rational numbers; and can be converted back using String,Rat,Int.

GAP allows rational and floating-point numbers to be mixed in the elementary operations +,-,*,/. However, floating-point numbers and rational numbers may not be compared. Conversions are performed using the creator Float:

gap> Float("3.1416");
3.1416
gap> Float(355/113);
3.14159
gap> Rat(last);
355/113
gap> Rat(0.33333);
1/3
gap> Int(1.e10);
10000000000
gap> Int(1.e20);
100000000000000000000
gap> Int(1.e30);
1000000000000000019884624838656


#### 19.2 Methods

Floating-point numbers may be directly input, as in any usual mathematical software or language; with the exception that every floating-point number must contain a decimal digit. Therefore .1, .1e1, -.999 etc. are all valid GAP inputs.

Floating-point numbers so entered in GAP are stored as strings. They are converted to floating-point when they are first used. This means that, if the floating-point precision is increased, the constants are reevaluated to fit the new format.

Floating-point numbers may be followed by an underscore, as in 1._. This means that they are to be immediately converted to the current floating-point format. The underscore may be followed by a single letter, which specifies which format/precision to use. By default, GAP has a single floating-point handler, with fixed (53 bits) precision, and its format specifier is 'l' as in 1._l. Higher-precision floating-point computations is available via external packages; float for example.

A record, FLOAT (19.2-5), contains all relevant constants for the current floating-point format; see its documentation for details. Typical fields are FLOAT.MANT_DIG=53, the constant FLOAT.VIEW_DIG=6 specifying the number of digits to view, and FLOAT.PI for the constant $$\pi$$. The constants have the same name as their C counterparts, except for the missing initial DBL_ or M_.

Floating-point numbers may be created using the single function Float (19.2-1), which accepts as arguments rational, string, or floating-point numbers. Floating-point numbers may also be created, in any floating-point representation, using NewFloat (19.2-1) as in NewFloat(IsIEEE754FloatRep,355/113), by supplying the category filter of the desired new floating-point number; or using MakeFloat (19.2-1) as in MakeFloat(1.0,355/113), by supplying a sample floating-point number.

Floating-point numbers may also be converted to other GAP formats using the usual commands Int (14.2-3), Rat (17.2-6), String (27.7-6).

Exact conversion to and from floating-point format may be done using external representations. The "external representation" of a floating-point number x is a pair [m,e] of integers, such that x=m*2^(-1+e-LogInt(AbsInt(m),2)). Conversion to and from external representation is performed as usual using ExtRepOfObj (79.8-1) and ObjByExtRep (79.8-1):

gap> ExtRepOfObj(3.14);
[ 7070651414971679, 2 ]
gap> ObjByExtRep(IEEE754FloatsFamily,last);
3.14


Computations with floating-point numbers never raise any error. Division by zero is allowed, and produces a signed infinity. Illegal operations, such as 0./0., produce NaN's (not-a-number); this is the only floating-point number x such that not EqFloat(x+0.0,x).

The IEEE754 standard requires NaN to be non-equal to itself. On the other hand, GAP requires every object to be equal to itself. To respect the IEEE754 standard, the function EqFloat (19.2-6) should be used instead of =.

The category a floating-point belongs to can be checked using the filters IsFinite (30.4-2), IsPInfinity (19.2-9), IsNInfinity (19.2-9), IsXInfinity (19.2-9), IsNaN (19.2-9).

Comparisons between floating-point numbers and rationals are explicitly forbidden. The rationale is that objects belonging to different families should in general not be comparable in GAP. Floating-point numbers are also approximations of real numbers, and don't follow the same rules; consider for example, using the default GAP implementation of floating-point numbers,

gap> 1.0/3.0 = Float(1/3);
true
gap> (1.0/3.0)^5 = Float((1/3)^5);
false


##### 19.2-1 Float creators
 ‣ Float( obj ) ( function )
 ‣ NewFloat( filter, obj ) ( constructor )
 ‣ MakeFloat( sample, obj, obj ) ( operation )

Returns: A new floating-point number, based on obj

This function creates a new floating-point number.

If obj is a rational number, the created number is created with sufficient precision so that the number can (usually) be converted back to the original number (see Rat (Reference: Rat) and Rat (17.2-6)). For an integer, the precision, if unspecified, is chosen sufficient so that Int(Float(obj))=obj always holds, but at least 64 bits.

obj may also be a string, which may be of the form "3.14e0" or ".314e1" or ".314@1" etc.

An option may be passed to specify, it bits, a desired precision. The format is Float("3.14":PrecisionFloat:=1000) to create a 1000-bit approximation of $$3.14$$.

In particular, if obj is already a floating-point number, then Float(obj:PrecisionFloat:=prec) creates a copy of obj with a new precision. prec

##### 19.2-2 Rat
 ‣ Rat( f ) ( attribute )

Returns: A rational approximation to f

This command constructs a rational approximation to the floating-point number f. Of course, it is not guaranteed to return the original rational number f was created from, though it returns the most reasonable' one given the precision of f.

Two options control the precision of the rational approximation: In the form Rat(f:maxdenom:=md,maxpartial:=mp), the rational returned is such that the denominator is at most md and the partials in its continued fraction expansion are at most mp. The default values are maxpartial:=10000 and maxdenom:=2^(precision/2).

##### 19.2-3 Cyc
 ‣ Cyc( f[, degree] ) ( operation )

Returns: A cyclotomic approximation to f

This command constructs a cyclotomic approximation to the floating-point number f. Of course, it is not guaranteed to return the original rational number f was created from, though it returns the most reasonable' one given the precision of f. An optional argument degree specifies the maximal degree of the cyclotomic to be constructed.

The method used is LLL lattice reduction.

##### 19.2-4 SetFloats
 ‣ SetFloats( rec[, bits][, install] ) ( function )

Installs a new interface to floating-point numbers in GAP, optionally with a desired precision bits in binary digits. The last optional argument install is a boolean value; if false, it only installs the eager handler and the precision for the floateans, without making them the default.

##### 19.2-5 FLOAT
 ‣ FLOAT ( global variable )

This record contains useful floating-point constants:

DECIMAL_DIG

Maximal number of useful digits;

DIG

Number of significant digits;

VIEW_DIG

Number of digits to print in short view;

EPSILON

Smallest number such that $$1\neq1+\epsilon$$;

MANT_DIG

Number of bits in the mantissa;

MAX

Maximal representable number;

MAX_10_EXP

Maximal decimal exponent;

MAX_EXP

Maximal binary exponent;

MIN

Minimal positive representable number;

MIN_10_EXP

Minimal decimal exponent;

MIN_EXP

Minimal exponent;

INFINITY

Positive infinity;

NINFINITY

Negative infinity;

NAN

Not-a-number,

as well as mathematical constants E, LOG2E, LOG10E, LN2, LN10, PI, PI_2, PI_4, 1_PI, 2_PI, 2_SQRTPI, SQRT2, SQRT1_2.

##### 19.2-6 EqFloat
 ‣ EqFloat( x, y ) ( operation )

Returns: Whether the floateans x and y are equal

This function compares two floating-point numbers, and returns true if they are equal, and false otherwise; with the exception that NaN is always considered to be different from itself.

##### 19.2-7 PrecisionFloat
 ‣ PrecisionFloat( x ) ( attribute )

Returns: The precision of x

This function returns the precision, counted in number of binary digits, of the floating-point number x.

##### 19.2-8 SignBit
 ‣ SignBit( x ) ( attribute )
 ‣ SignFloat( x ) ( attribute )

Returns: The sign of x.

The first function SignBit returns the sign bit of the floating-point number x: true if x is negative (including -0.) and false otherwise.

The second function SignFloat returns the integer -1 if x<0, 0 if x=0 and 1 if x>0.

##### 19.2-9 Infinity testers
 ‣ IsPInfinity( x ) ( property )
 ‣ IsNInfinity( x ) ( property )
 ‣ IsXInfinity( x ) ( property )
 ‣ IsFinite( x ) ( property )
 ‣ IsNaN( x ) ( property )

Returns true if the floating-point number x is respectively $$+\infty$$, $$-\infty$$, $$\pm\infty$$, finite, or not a number', such as the result of 0.0/0.0.

##### 19.2-10 Standard mathematical operations
 ‣ Sin( f ) ( attribute )
 ‣ Cos( f ) ( attribute )
 ‣ Tan( f ) ( attribute )
 ‣ Sec( f ) ( attribute )
 ‣ Csc( f ) ( attribute )
 ‣ Cot( f ) ( attribute )
 ‣ Asin( f ) ( attribute )
 ‣ Acos( f ) ( attribute )
 ‣ Atan( f ) ( attribute )
 ‣ Sinh( f ) ( attribute )
 ‣ Cosh( f ) ( attribute )
 ‣ Tanh( f ) ( attribute )
 ‣ Sech( f ) ( attribute )
 ‣ Csch( f ) ( attribute )
 ‣ Coth( f ) ( attribute )
 ‣ Asinh( f ) ( attribute )
 ‣ Acosh( f ) ( attribute )
 ‣ Atanh( f ) ( attribute )
 ‣ Log( f ) ( operation )
 ‣ Log2( f ) ( attribute )
 ‣ Log10( f ) ( attribute )
 ‣ Log1p( f ) ( attribute )
 ‣ Exp( f ) ( attribute )
 ‣ Exp2( f ) ( attribute )
 ‣ Exp10( f ) ( attribute )
 ‣ Expm1( f ) ( attribute )
 ‣ CubeRoot( f ) ( attribute )
 ‣ Square( f ) ( attribute )
 ‣ Atan2( y, x ) ( operation )
 ‣ Hypothenuse( x, y ) ( operation )
 ‣ Ceil( f ) ( attribute )
 ‣ Floor( f ) ( attribute )
 ‣ Round( f ) ( attribute )
 ‣ Trunc( f ) ( attribute )
 ‣ FrExp( f ) ( attribute )
 ‣ LdExp( f, exp ) ( operation )
 ‣ AbsoluteValue( f ) ( attribute )
 ‣ Norm( f ) ( attribute )
 ‣ Frac( f ) ( attribute )
 ‣ SinCos( f ) ( attribute )
 ‣ Erf( f ) ( attribute )
 ‣ Zeta( f ) ( attribute )
 ‣ Gamma( f ) ( attribute )

Standard math functions.

#### 19.3 High-precision-specific methods

GAP provides a mechanism for packages to implement new floating-point numerical interfaces. The following describes that mechanism, actual examples of packages are documented separately.

A package must create a record with fields (all optional)

creator

a function converting strings to floating-point;

eager

a character allowing immediate conversion to floating-point;

objbyextrep

a function creating a floating-point number out of a list [mantissa,exponent];

filter

a filter for the new floating-point objects;

constants

a record containing numerical constants, such as MANT_DIG, MAX, MIN, NAN.

The package must install methods Int, Rat, String for its objects, and creators NewFloat(filter,IsRat), NewFloat(IsString).

It must then install methods for all arithmetic and numerical operations: SUM, Exp, ...

The user chooses that implementation by calling SetFloats (19.2-4) with the record as argument, and with an optional second argument requesting a precision in binary digits.

#### 19.4 Complex arithmetic

Complex arithmetic may be implemented in packages, and is present in float. Complex numbers are treated as usual numbers; they may be input with an extra "i" as in -0.5+0.866i. They may also be created using NewFloat (19.2-1) with three arguments: the float filter, the real part, and the imaginary part.

Methods should then be implemented for Norm, RealPart, ImaginaryPart, ComplexConjugate, ...

##### 19.4-1 Argument
 ‣ Argument( z ) ( attribute )

Returns the argument of the complex number z, namely the value Atan2(ImaginaryPart(z),RealPart(z)).

#### 19.5 Interval-specific methods

Interval arithmetic may also be implemented in packages. Intervals are in fact efficient implementations of sets of real numbers. The only non-trivial issue is how they should be compared. The standard EQ tests if the intervals are equal; however, it is usually more useful to know if intervals overlap, or are disjoint, or are contained in each other.

Note the usual convention that intervals are compared as in $$[a,b]\leq[c,d]$$ if and only if $$a\leq c$$ and $$b\leq d$$.

##### 19.5-1 Sup
 ‣ Sup( x ) ( attribute )

Returns the supremum of the interval x.

##### 19.5-2 Inf
 ‣ Inf( x ) ( attribute )

Returns the infimum of the interval x.

##### 19.5-3 Mid
 ‣ Mid( x ) ( attribute )

Returns the midpoint of the interval x.

##### 19.5-4 AbsoluteDiameter
 ‣ AbsoluteDiameter( x ) ( attribute )
 ‣ Diameter( x ) ( operation )

Returns the absolute diameter of the interval x, namely the difference Sup(x)-Inf(x).

##### 19.5-5 RelativeDiameter
 ‣ RelativeDiameter( x ) ( attribute )

Returns the relative diameter of the interval x, namely (Sup(x)-Inf(x))/AbsoluteValue(Min(x)).

##### 19.5-6 IsDisjoint
 ‣ IsDisjoint( x1, x2 ) ( operation )

Returns true if the two intervals x1, x2 are disjoint.

##### 19.5-7 IsSubset
 ‣ IsSubset( x1, x2 ) ( operation )

Returns true if the interval x1 contains x2.

##### 19.5-8 IncreaseInterval
 ‣ IncreaseInterval( x, delta ) ( operation )

Returns an interval with same midpoint as x but absolute diameter increased by delta.

##### 19.5-9 BlowupInterval
 ‣ BlowupInterval( x, ratio ) ( operation )

Returns an interval with same midpoint as x but relative diameter increased by ratio.

##### 19.5-10 BisectInterval
 ‣ BisectInterval`( x ) ( operation )

Returns a list of two intervals whose union equals the interval x.

generated by GAPDoc2HTML