このページは http://www.slideshare.net/maksym_zavershynskyi/fast-inverse-square-root の内容を掲載しています。

掲載を希望されないスライド著者の方は、こちらよりご連絡下さい。

3年弱前 (2013/12/14)にアップロードinテクノロジー

Quake 3 was probably the most famous first-person shooter back in 1999. It had fascinating graphi...

Quake 3 was probably the most famous first-person shooter back in 1999. It had fascinating graphics and very high-responsiveness which is the result of a performance optimization and high-quality code written by id

Software team. One of the most famous optimization tricks is the function that computes the approximate of inverse (reciprocal) square root through some clever bit hacking. This function is the subject of investigations by mathematicians and programmers even today. In this presentation we try to understand how it works and we also try to find the author.

- Behind the Performance of

Quake 3 Engine:

Fast Inverse Square Root

Maksym Zavershynskyi - Quake 3 Arena

First Person Shooter

Released: 1999

Engine: Id Tech 3

Average reviewers

score: ~9/10 - Architecture

• C-Language

• Client-Server separation

• Virtual Machine

• Local C Compiler for Scripts

• Highly Optimized Code - Shading

Creates the depth of perception - Material Based Shading

+

=

[1] - What makes a nice picture?

•Shading

•Lighting

•Reflections

•... - Angle of Incidence

normal

α

greater α - darker shading

view - Vector Normalization

(x,y,z)

(a,b,c)

1 - Vector Normalization

(x,y,z)

(a,b,c)

1 - Fast Inverse Square

Root - Inverse Square Root

float Q_rsqrt( float number )

{

return 1.0f/sqrt(number);

} - Fast Approximate

Inverse Square Root

float Q_rsqrt( float number )

{

long i;

float x2, y;

const float threehalfs = 1.5F;

x2 = number * 0.5F;

y = number;

i = * ( long * ) &y; // evil floating

//point bit level hacking

i = 0x5f3759df - ( i >> 1 ); // what the f☀✿k?

y = * ( float * ) &i;

y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration

// y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration,

//this can be removed

return y;

} - float Q_rsqrt( float number )

{

long i;

float x2, y;

const float threehalfs = 1.5F;

x2 = number * 0.5F;

y = number;

(1)

i = * ( long * ) &y; // evil floating point bit level hacking

(2)

i = 0x5f3759df - ( i >> 1 ); // what the f☀✿k?

y

(1)

= * ( float * ) &i;

(3)

y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration

// y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed

return y;

}

(1)Interpret float as integer

(2)Good initial guess with magic number 0x5f3759df

(3)One iteration of Newton’s approximation - (1)Interpret float as integer

32-bit float:

0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

E

M

0.15625 which is 1.01x2-3 in binary

E=-3+127=124 or 01111100 in binary

M=.01 - (1)Interpret float as integer

float x=0.15625

0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

x as integer i

0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

shift right i>>1

0 0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - (1)Interpret float as integer

float x=0.15625

0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

x as integer i

0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

shift right i>>1

0 0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

E → E/2 - (1)Interpret float as integer

float x=0.15625

0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

x as integer i

0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

shift right i>>1

0 0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

E → E/2 - (1)Interpret float as integer

float x=0.15625

0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

x as integer i

0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

shift right i>>1

0 0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

the magic number 0x5f3759df

0 1 0 1 1 1 1 1 0 0 1 1 0 1 1 1 0 1 0 1 1 0 0 1 1 1 0 1 1 1 1 1

0x5f3759df - (i>>1)

0 1 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0 1 0 1 1 0 0 1 1 1 0 1 1 1 1 1

result: 2.614 (exact value 1/sqrt(x)=2.52982..) - (1)Interpret float as integer

float x=0.15625

0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

x as integer i

0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

shift right i>>1

0 0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

the magic number 0x5f3759df

0 1 0 1 1 1 1 1 0 0 1 1 0 1 1 1 0 1 0 1 1 0 0 1 1 1 0 1 1 1 1 1

0x5f3759df - (i>>1)

0 1 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0 1 0 1 1 0 0 1 1 1 0 1 1 1 1 1

result: 2.614 (exact value 1/sqrt(x)=2.52982..) - (2)Magic Number: 0x5f3759df

•Gives a good initial guess.

•Minimizes the relative error.

•Trying to find a better number that minimizes

the error of initial guess we come up with:

0x5f37642f [4] - (2)Magic Number: 0x5f3759df

•Gives a good initial guess.

•Minimizes the relative error.

•Trying to find a better number that minimizes

the error of initial guess we come up with:

0x5f37642f [4]

Did we find a better magical number? ;) - (3)One iteration of Newton’s method

Newton’s method:

Given a suitable approximation yn to the root of f(y),

gives a better one yn+1 using

root - (3)One iteration of Newton’s method

Newton’s method:

Given a suitable approximation yn to the root of f(y),

gives a better one yn+1 using

In our case:

y = y * ( 1.5f - ( 0.5f * x * y * y ) ); - (3)One iteration of Newton’s method

After one iteration of Newton’s method

our magic number 0x5f37642f gives worse approximation

than the original magic number 0x5f3759df !!! [4]

Open Question:

How was the original magic number derived? - Open Question:

How was the original magic number 0x5f3759df derived?

•Lomont in 2003 numerical y found a slightly better

magic number 0x5f375a86 [4]

•Robertson in 2012 analytical y found the same

better magic number 0x5f375a86 [3] - How good?

Max relative error: 0.177% [3]

With the 2nd iteration of Newton’s method: 0.00047% [3] - How fast?

In 1999: ???

Today: on CPUs 3-4 times faster

With the 2nd iteration of Newton’s method: 2-2.5 faster

[3] - Who wrote it?
- Who?

John Carmack?

Lead Programmer of Quake, Doom,

Wolfenstein 3D

[8]

Michael Abrash?

Author of:

Zen of Assembly Language

Zen of Graphics Programming - Who?

John Carmack?

Lead Programmer of Quake, Doom,

Wolfenstein 3D

“...Not me, and I don’t think it is Michael (Abrash).

Terje Mathison perhaps?.. ”

[8]

Michael Abrash?

Author of:

Zen of Assembly Language

Zen of Graphics Programming - Who?

Terje Mathisen?

Assembly language optimization for x86

microprocessors.

“. . I wrote fast & accurate invssqrt(). . for a

computational fluid chemistry problem..

...The code is not the same as I wrote...”

[8] - Who?

Gary Tarolli?

Co-founder of 3dfx (predecessor of Nvidia)

[8] - Who?

Gary Tarolli?

Co-founder of 3dfx (predecessor of Nvidia)

“It did pass by my keyboard many many years ago, I

may have tweaked the hex constant a bit or so, but

other than that I can’t take credit for it, except that

I used it a lot and probably contributed to its

popularity and longevity. “

[8] - Who?

Gary Tarolli?

Co-founder of 3dfx (predecessor of Nvidia)

“It did pass by my keyboard many many years ago, I

may have tweaked the hex constant a bit or so, but

other than that I can’t take credit for it, except that

I used it a lot and probably contributed to its

popularity and longevity. “

[8]

This hack is older than 1990!!! - Who?

Cleve Moler inspiration

[9]

Founder of the first MATLAB,

one of the founders of MathWorks,

is currently a Chief Mathematician there.

Greg Walsch author (most probably)

[9]

Being working on Internet and distributed

computing technologies since before it was even

the Internet, and helping to engineer the first

WYSIWYG word processor at Xerox PARC

while at Stanford University - Who?

Inspired by Cleve Moler from the code written [10]

by Velvel Kahan and K.C. Ng at Berkeley around

1986!!!

http://www.netlib.org/fdlibm/e_sqrt.c - Final y

It is Fast: 3-4 faster than the straightforward code

It is Good: 0.17% maximum relative error

It can be Improved

Dates back in 1986 - Thank you!

http://zavermax.github.io - Some literature here

Quake 1,3 Architecture

1)

Fabien Sanglard, Quake 3 source code review. 2012 http://fabiensanglard.net/quake3/

2)

Michael Abrash, Ramblings in Realtime http://www.bluesnews.com/abrash/

Inverse Square Root

3)

Matthew Robertson, A Brief History of InvSqrt. 2012 Bachelor’s Thesis. Brunswick, Germany

4)

Chris Lomont, Fast Inverse Square root, Indiana: Purdue University, 2003

5)

Jim Blinn, Floating-point tricks, IEEE Comp. Graphics and Applications 17, no 4, 1997

6)

David Elbery, Fast Inverse square root (Revisited), Geometric Tools, LLC, 2010

7)

Charles McEniry, The Mathematics Behind the Fast Inverse Square Root Function Code, 2007

Investigation of the Authorship

8)

Rys Sommefeldt, Origin of Quake3’s Fast InvSqrt() 2006 http://www.beyond3d.com/content/articles/8/

9)

Rys Sommefeldt, Origin of Quake3’s Fast InvSqrt() - Part Two 2007 http://www.beyond3d.com/content/articles/15/

10)

http://blogs.mathworks.com/cleve/2012/06/19/symplectic-spacewar/#comment-13

Additional

11)

http://en.wikipedia.org/wiki/Fast_inverse_square_root

12)

https://github.com/id-Software/Quake-III-Arena