在实时应用中有效使用四元数(译文)
By robot-v1.0
本文链接 https://www.kyfws.com/games/using-quaternions-efficiently-in-real-time-applica-zh/
版权声明 本博客所有文章除特别声明外,均采用 BY-NC-SA 许可协议。转载请注明出处!
- 18 分钟阅读 - 8547 个词 阅读量 0在实时应用中有效使用四元数(译文)
原文地址:https://www.codeproject.com/Articles/125269/Using-Quaternions-Efficiently-in-Real-time-Applica
原文作者:gtdelarosa
译文由本站 robot-v1.0 翻译
前言
Various methods for using quaternions in ways that maximize performance
使用四元数以最大化性能的各种方法
介绍(Introduction)
四元数代替矩阵,通常用于表示游戏和模拟中三维实体的方向.(Instead of matrices, quaternions are often used to represent the orientation of three-dimensional entities in games and simulations.)四元数仅需要4个浮点数,而3x3矩阵所需的9个浮点数则相反.(Quaternions require only 4 floats as opposed to the 9 floats required by a 3x3 matrix.)四元数乘法也只需要16次乘法和12次加法,而矩阵乘法则需要27次乘法和18次加法.(Quaternion multiplication also requires only 16 multiplications and 12 additions versus the 27 multiplications and 18 additions of matrix multiplication.) 尽管具有这些优点,但为基于四元数的实体实现代码以最大程度地提高时间效率仍然是棘手的.(Despite these advantages, it can still be tricky to implement code for quaternion-based entities that maximizes time efficiency.)在本文中,我们将研究在性能至关重要的实时应用程序中使用四元数时如何实现最佳性能.(In this article, we will take a look at how to achieve the most performance when using quaternions in real-time applications where performance is essential.) 向量转换(Vector Transformations) 尽管四元数乘法在计算上相对便宜,但对于3D向量的四元数转换却没有这样做.(Although quaternion multiplications are relatively computationally inexpensive, the same does not go for quaternion transformations of 3D vectors.)在这种情况下,等效矩阵运算实际上要便宜得多,只需要3个点积即可实现9个乘法和6个加法.(In this case, the equivalent matrix operation is actually cheaper to compute, requiring only 3 dot products for a total of 9 multiplications and 6 additions.)甚至四元数运算的优化版本也需要2个叉积,总共15次乘法和8次加法.(Even an optimized version of the quaternion operation requires 2 cross products for a total of 15 multiplications and 8 additions.)因此,许多AI和游戏程序员会以矩阵形式和四元数形式维护每个实体方向的单独副本.(Because of this, many AI and gameplay programmers will maintain a separate copy of each entity’s orientation in matrix form as well as in quaternion form.) 让我们考虑一个称为" updatePursuitAction"的典型AI函数,其中AI实体必须确定从其位置到要追求的目标位置的方向.(Let us consider a typical AI function called “updatePursuitAction” in which an AI entity must determine the direction from its location to the location of a target that it wants to pursue.)然后,它改变方向以拦截目标.(It then alters its direction to intercept the target.)在AI系统决定实体应切换到其他AI动作之前,每帧都会调用此功能.(This function is called every frame until the AI system decides the entity should switch to some other AI action.)为了确定方向,我们必须将实体及其目标之间的世界空间向量转换为实体的局部空间.(To determine direction we must transform the world-space vector between the entity and its target into the entity’s local space.) 在里面(In the) 示例代码(example code) ,我们有两个类以各自的方式实现相同的AI功能.(, we have two classes which implement the same AI function each in its own way.)FastGameEntity类使用四元数来实现它,而SlowGameEntity类使用一个矩阵来实现它,该矩阵表示为与等效矩阵的3行(或列,如果使用列主矩阵)相对应的三个单独的3D向量.(The FastGameEntity class implements it using a quaternion and the SlowGameEntity class implements it using a matrix represented as three separate 3D vectors corresponding to the 3 rows ( or columns if using column-major matrices ) of the equivalent matrix.) 清单1.(Listing 1.)
virtual void FastGameEntity::updatePursuitAction( const D3DXVECTOR3 &targetPos )
{
//get vector from my pos to the target
D3DXVECTOR3 toTarget = targetPos - pos_;
//transform into my local space - 15 mults 8 adds
D3DXQUATERNION conjQ;
D3DXQuaternionConjugate( &conjQ, &ori_ );
rotateVectorWithQuat( conjQ, toTarget );
//update direction to intercept target
//...
}
virtual void SlowGameEntity::updatePursuitAction( const D3DXVECTOR3 &targetPos )
{
//get vector from my pos to the target
D3DXVECTOR3 worldToTarget = targetPos - pos_;
//transform into my local space
//transform vector with orthonormal basis - 3 dot products - 9 mults 6 adds
D3DXVECTOR3 localToTarget;
localToTarget.x = D3DXVec3Dot( &side_, &worldToTarget );
localToTarget.y = D3DXVec3Dot( &up_, &worldToTarget );
localToTarget.z = D3DXVec3Dot( &dir_, &worldToTarget );
//update direction to intercept target
//...
}
对于FastGameEntity,我们可以通过使用目标的当前位置简单地调用" updatePursuitAction"函数来更新其AI.(For the FastGameEntity we can update its AI by simply calling the “updatePursuitAction” function with the current location of the target.)对于SlowGameEntity,我们还必须在每个帧上执行另一个功能,即" UpdateRotation"功能.(For the SlowGameEntity we must also execute another function on every frame, the “UpdateRotation” function.)此函数将实体的四元数方向转换为矩阵形式.(This function converts the quaternion orientation of the entity into matrix form.)必须这样做以使矩阵与四元数保持同步,以确保两者始终表示相同的方向.(This must be done in order to keep the matrix in sync with the quaternion ensuring both always represent the same orientation.)这种额外的运算至少需要12个乘法和12个加法.(This extra operation requires at least 12 more multiplications and 12 more additions.) 清单2.(Listing 2.)
virtual void FastGameEntity::updateAI( )
{
D3DXVECTOR3 targetPos;
updatePursuitAction( targetPos );
}
virtual void SlowGameEntity::updateAI( )
{
D3DXVECTOR3 targetPos;
updatePursuitAction( targetPos );
UpdateRotation();
}
如果我们在AMD CodeAnalyst中分析这些功能,则可以看到FastGameEntity版本实际上比SlowGameEntity版本运行得更快,即使它正在使用更昂贵的四元数转换也是如此.(If we profile these functions in AMD CodeAnalyst, we see that the FastGameEntity version does in fact run faster than the SlowGameEntity version even though it is using the more expensive quaternion transformation.)这是因为从四元数转换为矩阵的额外步骤消除了矩阵转换可能提供的任何优势.(This is because the extra step of converting from a quaternion to a matrix wipes out any advantage the matrix transformation may have provided.) 在图1中,我们看到事件探查器输出,其中FastGameEntity调用的函数用绿色下划线标出,而SlowGameEntity调用的功能用红色下划线标出.(In Fig. 1 we see the profiler output with the functions called by FastGameEntity underlined in green and those by SlowGameEntity underlined in red.)现在我们可以看到SlowGameEntity函数花费了更多时间:(We can now see that more time is spent in the SlowGameEntity functions:) 图.1(Fig. 1) CS:EIP,符号+偏移,64位,计时器样本,(CS:EIP,Symbol + Offset,64-bit,Timer samples,) " 0x401c90"," D3DXQuaternionConjugate",""," 2.2",(“0x401c90”,“D3DXQuaternionConjugate”,"",“2.2”,) " 0x4017b0"," D3DXVec3Cross",""," 9.4",(“0x4017b0”,“D3DXVec3Cross”,"",“9.4”,) " 0x401b10"," D3DXVec3Dot",""," 10.67",(“0x401b10”,“D3DXVec3Dot”,"",“10.67”,) " 0x401780"," D3DXVECTOR3 :: D3DXVECTOR3",""," 7.31",(“0x401780”,“D3DXVECTOR3::D3DXVECTOR3”,"",“7.31”,) " 0x401ab0"," D3DXVECTOR3 :: operator-",""," 13.92",(“0x401ab0”,“D3DXVECTOR3::operator-”,"",“13.92”,) " 0x403900"," FastGameEntity :: FastGameEntity",""," 0.12",(“0x403900”,“FastGameEntity::FastGameEntity”,"",“0.12”,) " 0x401db0"," FastGameEntity :: updateAI",""," 0.12",(“0x401db0”,“FastGameEntity::updateAI”,"",“0.12”,) " 0x401de0"," FastGameEntity :: updatePursuitAction",""," 0.12",(“0x401de0”,“FastGameEntity::updatePursuitAction”,"",“0.12”,) " 0x4039d0"," GameEntityBase :: GameEntityBase",""," 15.89",(“0x4039d0”,“GameEntityBase::GameEntityBase”,"",“15.89”,) " 0x401600",“主要”,""," 0.23",(“0x401600”,“main”,"",“0.23”,) " 0x403200",“新操作员”,""," 0.12",(“0x403200”,“operator new”,"",“0.12”,) " 0x4012f0"," quatToMatrix",""," 25.17"(“0x4012f0”,“quatToMatrix”,"",“25.17”),(,) " 0x4010e0"," rotateVectorWithQuat",""," 10.21",(“0x4010e0”,“rotateVectorWithQuat”,"",“10.21”,) " 0x403810"," SlowGameEntity :: SlowGameEntity",""," 0.23",(“0x403810”,“SlowGameEntity::SlowGameEntity”,"",“0.23”,) " 0x4018c0"," SlowGameEntity :: updateAI",""," 0.12",(“0x4018c0”,“SlowGameEntity::updateAI”,"",“0.12”,) " 0x401a40"," SlowGameEntity :: updatePursuitAction",""," 0.12",(“0x401a40”,“SlowGameEntity::updatePursuitAction”,"",“0.12”,) 这是否意味着在处理基于四元数的实体时,我们永远不要使用矩阵变换?(Does this mean that we should never use matrix transformations when dealing with quaternion-based entities?)不可以,因为在某些情况下它们会产生更好的性能结果.(No, because there are situations where they will yield better results for performance.)在上面的AI示例中,我们只需要每帧进行一次转换,但是如果我们要进行几千次转换怎么办?(In the AI example above, we only need to make one transformation per frame, but what if we need to make several thousand?)例如,具有成千上万个顶点的网格需要通过实体的当前旋转进行变换.(Take for instance a mesh with thousands of vertices that need to be transformed by the entity’s current rotation.)如果使用四元数运算完成此操作,则将需要比使用矩阵进行更多的计算.(If this is done with quaternion operations it will require more computation than with matrices.)因此,在这种情况下,首先将四元数转换为矩阵会更有意义,因此可以使用矩阵转换所有顶点.(So in this case it would make more sense to convert the quaternion into a matrix first, so all of the vertices can be transformed with the matrix.) 在清单中. 3我们有两个不同版本的" transformManyVertices"函数,一个使用四元数,另一个使用从四元数转换的矩阵.(In Listing. 3 we have two different versions of a “transformManyVertices” function, one which uses a quaternion and one which uses a matrix converted from the quaternion.) 清单3.(Listing 3.)
virtual void SlowGameEntity::transformManyVertices( unsigned int numVerts )
{
D3DXVECTOR3 *vertices = new D3DXVECTOR3[ numVerts ];
D3DXVECTOR3 transformedVertex;
D3DXQUATERNION conjQ;
D3DXQuaternionConjugate( &conjQ, &ori_ );
for ( unsigned int i=0; i< numVerts; i++)
{
//transform into my local space - 15 mults 8 adds
transformedVertex = vertices[i];
rotateVectorWithQuat( conjQ, transformedVertex );
}
delete [] vertices;
}
virtual void FastGameEntity::transformManyVertices( unsigned int numVerts )
{
UpdateRotation();
D3DXVECTOR3 *vertices = new D3DXVECTOR3[ numVerts ];
D3DXVECTOR3 col1( side_.x, up_.x, dir_.x );
D3DXVECTOR3 col2( side_.y, up_.y, dir_.y );
D3DXVECTOR3 col3( side_.z, up_.z, dir_.z );
for ( unsigned int i=0; i< numVerts; i++)
{
//transform into my local space
//multiply vector with rotation matrix - 3 dot products - 9 mults 6 adds
D3DXVECTOR3 transformedVertex;
transformedVertex.x = D3DXVec3Dot( &col1, &vertices[i] );
transformedVertex.y = D3DXVec3Dot( &col2, &vertices[i] );
transformedVertex.z = D3DXVec3Dot( &col3, &vertices[i] );
}
delete [] vertices;
}
如果我们对此进行剖析,我们将看到,尽管我们必须执行转换为矩阵的额外步骤,但矩阵版本现在比四元数版本更快.(If we profile this, we see that now the matrix version is faster than the quaternion version even though we must perform the extra step of converting to a matrix.)因此,是否使用矩阵还是四元数取决于您要实现的功能.(So whether or not to use matrices versus quaternions depends on the functionality you are trying to implement.)与优化一样,我们应该使用探查器来确定代码中真正的瓶颈是什么,以确定真正需要优化的内容.(As always with optimization, we should use a profiler to determine what the real bottlenecks are in the code to determine what really needs to be optimized.)
图2(Fig. 2)
CS:EIP,符号+偏移,64位,计时器样本,(CS:EIP,Symbol + Offset,64-bit,Timer samples,)
" 0x4018a0","向量构造函数迭代器'",""," 0.12",(*"0x4018a0","
vector constructor iterator'","",“0.12”,)
" 0x4026f0"," D3DXQUATERNION :: D3DXQUATERNION",""," 0.64",(“0x4026f0”,“D3DXQUATERNION::D3DXQUATERNION”,"",“0.64”,)
" 0x401800"," D3DXVec3Cross",""," 21.36",(“0x401800”,“D3DXVec3Cross”,"",“21.36”,)
" 0x401b60"," D3DXVec3Dot",""," 1.17",(“0x401b60”,“D3DXVec3Dot”,"",“1.17”,)
" 0x4017d0"," D3DXVECTOR3 :: D3DXVECTOR3",""," 18.78",(“0x4017d0”,“D3DXVECTOR3::D3DXVECTOR3”,"",“18.78”,)
" 0x403c50"," FastGameEntity :: FastGameEntity",""," 0.18",(“0x403c50”,“FastGameEntity::FastGameEntity”,"",“0.18”,)
" 0x401ea0"," FastGameEntity :: transformManyVertices",""," 0.23",(“0x401ea0”,“FastGameEntity::transformManyVertices”,"",“0.23”,)
" 0x403ca0"," GameEntityBase :: GameEntityBase",""," 8.07",(“0x403ca0”,“GameEntityBase::GameEntityBase”,"",“8.07”,)
" 0x401600",“主要”,""," 0.06",(“0x401600”,“main”,"",“0.06”,)
" 0x403360",“新操作员”,""," 0.06",(“0x403360”,“operator new”,"",“0.06”,)
" 0x4010e0"," rotateVectorWithQuat",""," 47.63",(“0x4010e0”,“rotateVectorWithQuat”,"",“47.63”,)
" 0x403bf0"," SlowGameEntity :: SlowGameEntity",""," 0.12",(“0x403bf0”,“SlowGameEntity::SlowGameEntity”,"",“0.12”,)
" 0x401ba0"," SlowGameEntity :: transformManyVertices",""," 0.12",(“0x401ba0”,“SlowGameEntity::transformManyVertices”,"",“0.12”,*)
更新方向(Updating orientation)
在更改实体的方向时,许多AI和Gameplay程序员都依赖于Euler角度和方向的矩阵形式来为每个帧计算实体的新方向.(When it comes to changing the orientation of an entity, many AI and Gameplay programmers rely on Euler angles and the matrix form of the orientation to calculate the new orientation of the entity for each frame.)这要求在渲染实体之前将新的欧拉角转换为四元数形式,并且此操作在计算上相对昂贵.(This requires the new Euler angles to be converted into quaternion form before the entity is rendered and this operation is relatively expensive computationally.)它需要使用大量的正弦和余弦,或者使用三个四元数的乘法,从而导致许多乘法和加法.我们如何避免这种情况?(It requires the use of either lots of sines and cosines, or the multiplication of three quaternions which results in many multiplications and additions. How can we avoid this?)另一种选择是使用角速度来计算对象的角运动,角速度是三个旋转轴的每一个.(Another alternative is to compute the angular motion of the object using angular velocities, one for each of the three axes of rotation.)这些速度可以存储为单个3D向量.(These velocities can be stored as a single 3D vector.)该方法的优点是需要更少的计算来更新实体的方向,只需要一个四元数与速度矢量相乘和四元数加法即可.(This method has the advantage of requiring much less computation to update the orientation of the entity, needing only one quaternion multiplication with the velocity vector and a quaternion addition.)
SlowGameEntity类和FastGameEntity类各自具有自己的" calcOrientation"函数版本,如清单1所示. 4.(The SlowGameEntity class and the FastGameEntity class each have their own version of a “calcOrientation” function, which are shown in Listing. 4.)
清单4.(Listing 4.)
virtual void calculateOrientation( float dt )
{
//calculate angular velocity
//...
//update quaternion with angular velocity vector
ori_ += ( (ori_ * angularVel_) * 0.5f * dt );
}
virtual void calculateOrientation( float dt )
{
//calculate new euler angles
//...
//update current orientation with new quaternion formed from euler angles
D3DXQUATERNION tempQ;
//D3DXQuaternionRotationYawPitchRoll( &tempQ, yaw_, pitch_, roll_ );
quatFromEuler( tempQ, yaw_, pitch_, roll_ );
ori_ = tempQ * ori_;
}
再次查看分析器的输出,我们看到角速度方法稍快一些.(Again looking at the profiler output, we see that the angular velocity method is slightly faster.) 图3(Fig. 3) CS:EIP,符号+偏移,64位,计时器样本,(CS:EIP,Symbol + Offset,64-bit,Timer samples,) " 0x406eb8"," _ cos_pentium4",""," 1.32",(“0x406eb8”,"_cos_pentium4","",“1.32”,) " 0x407428"," _ sin_pentium4",""," 1.15",(“0x407428”,"_sin_pentium4","",“1.15”,) " 0x404410"," calcAllEntitysOrientation ",""," 0.33",(“0x404410”,“calcAllEntitysOrientation”,"",“0.33”,) " 0x405240"," cos",""," 0.16",(“0x405240”,“cos”,"",“0.16”,) " 0x401980"," cosf",""," 1.65",(“0x401980”,“cosf”,"",“1.65”,) " 0x401870"," D3DXQUATERNION :: D3DXQUATERNION",""," 0.82",(“0x401870”,“D3DXQUATERNION::D3DXQUATERNION”,"",“0.82”,) " 0x401ec0"," D3DXQUATERNION :: operator ",""," 0.16",(“0x401ec0”,“D3DXQUATERNION::operator*”,"",“0.16”,*) " 0x402230"," D3DXQUATERNION :: operator *",""," 0.33",(*“0x402230”,“D3DXQUATERNION::operator*”,"",“0.33”,*) " 0x4021d0"," D3DXQUATERNION :: operator + =",""," 3.62",(*“0x4021d0”,“D3DXQUATERNION::operator+=”,"",“3.62”,*) " 0x402170"," FastGameEntity :: calculateOrientation",""," 0.16",(*“0x402170”,“FastGameEntity::calculateOrientation”,"",“0.16”,*) " 0x403c80"," GameEntityBase :: GameEntityBase",""," 23.39",(*“0x403c80”,“GameEntityBase::GameEntityBase”,"",“23.39”,*) " 0x4032f0",“运算符新”,""," 0.16",(*“0x4032f0”,“operator new”,"",“0.16”,*) " 0x401000",“运算符*”,""," 25.86",(*“0x401000”,“operator*”,"",“25.86”,*) " 0x401450"," quatFromEuler",""," 29.16",(*“0x401450”,“quatFromEuler”,"",“29.16”,*) " 0x405370",“罪”,""," 0.33",(*“0x405370”,“sin”,"",“0.33”,*) " 0x4019a0"," sinf",""," 3.95",(*“0x4019a0”,“sinf”,"",“3.95”,*) " 0x401e50"," SlowGameEntity :: calculateOrientation",""," 2.64",(*“0x401e50”,“SlowGameEntity::calculateOrientation”,"",“2.64”,*) " 0x403a60"," SlowGameEntity :: SlowGameEntity",""," 0.49",(*“0x403a60”,“SlowGameEntity::SlowGameEntity”,"",“0.49”,*) 与缓存交朋友(*Making Friends With the Cache*) 最后,让我们看一下如何使四分之一对缓存更友好,这通常是提高性能的一种好方法,因为事实是内存访问速度比缓存快得多.(*Finally, let us take a look at how we can make quaterions more cache-friendly, which is usually a good way to improve performance given the fact that memory acesses are much faster from the cache.*)由于缓存是根据空间局部性原则设计的,因此我们将在连续内存中组织四元数,以使它们始终彼此相邻.(*Since caches are designed on the principle of spatial locality, we will organize our quaternions in contiguous memory, so that they are always adjacent to each other.*)高速缓存将能够对该数据进行操作而不会丢失任何数据,因为它不仅一次加载一个四元数,而且还加载了所有相邻字节,以便填充一个高速缓存行.高速缓存行的大小取决于硬件,但通常为64个字节.(*The cache will be able to operate on this data with less misses because it does not just load one quaternion at a time, but rather loads all of the adjacent bytes as well in order to fill up one cache line. The size of a cache line is hardware dependent, but it is often 64 bytes.*)因此,如果所有四元数都相邻,则加载一个四元数也将同时加载其他四元数.(*So, if all of the quaternions are adjacent, loading one will also load other quaternions at the same time.*)如果我们编写顺序处理四元数的代码,则当需要时,它们中的许多可能已经在缓存中了.(*If we write code that processes the quaternions sequentially, it is likely many of them will already be in the cache when they are needed.*) 因此在示例代码中,我们有两个不同的类SlowCacheObject和FastCacheObject.(*So in the example code we have two different classes, SlowCacheObject and FastCacheObject.*)SlowCacheObject包含两个四元数,一个用于对象的本地空间转换,另一个用于其世界空间转换.(*SlowCacheObject contains two quaternions, one for the object’s local space transform and the other for its world space transform.*)FastCacheObject不直接包含这些四元数,而是包含指向包含它们的结构的指针.(*FastCacheObject does not contain these quaternions directly, but rather it contains a pointer to a struct which contains them.*)这样,所有FastCacheObjet的所有转换都可以保留在这些结构的连续数组中.(*This way, all of the transforms for all FastCacheObjets can be kept in contiguous arrays of these structs.*) 清单.图6显示了两个类:(*Listing. 6 shows the two classes:*) 清单6.(*Listing 6.*)
class SlowCacheObject : public ObjectBase
{
public:
virtual void updateWorldTransform( SlowCacheObject *parent )
{
if ( parent )
D3DXQuaternionMultiply( &worldTransform_, &localTransform_, parent->getWorldTransform() );
}
D3DXQUATERNION *getWorldTransform() { return &worldTransform_; }
protected:
D3DXQUATERNION localTransform_;
D3DXQUATERNION worldTransform_;
};
class FastCacheObject : public ObjectBase
{
public:
struct Transforms
{
D3DXQUATERNION localTransform_;
D3DXQUATERNION worldTransform_;
};
Transforms *getTransforms() { return transforms_; }
void setTransforms( Transforms *transforms )
{
transforms_ = transforms;
}
virtual void updateWorldTransform( FastCacheObject *parent ){}
protected:
Transforms *transforms_;
};
我们将创建两棵树,一棵具有SlowCacheObject实例,一棵具有FastCacheObject实例.(We will create two trees, one with SlowCacheObject instances and one with FastCacheObject instances.)每棵树将有一个根节点,该根节点将有两个孩子.(Each tree will have a single root node which will have two children.)每个根孩子将依次有四个孩子.(Each root child will in turn have four children.)两棵树中的节点将分别分配,因此它们在内存中不会相邻.(The nodes in both trees will be allocated separately so they will not be adjacent in memory.)SlowCacheObject树的四元数将存储在每个节点中.(The quaternions for the SlowCacheObject tree will be stored in each node.)对于FastCacheObject节点,将存在三个Transforms结构数组,每个树结构中的一个级别.(For the FastCacheObject nodes, there will be three arrays of Transforms structs, one for each level in the tree.)这些数组将在每个节点的连续内存中包含四元数.(These arrays will contain quaternions in contiguous memory for each node.)
为了更新树中每个对象的世界变换,我们必须通过其父对象的世界变换来混合其局部变换,然后将其存储在对象自己的世界变换四元数中.(In order to update the world transform of each object in the tree, we must mulitply its local transform by its parent’s world transform and then store it in the object’s own world transform quaternion.)对于两棵树,我们将以自顶向下的方式从根开始,一直向下进行.(For both trees, we will do this in a top-down fashion starting at the root and working our way down.)图4显示了SlowCacheObject树的此过程,图5显示了FastCacheObject树的过程.(Fig. 4 shows this process for the SlowCacheObject tree, and Fig.5 shows it for the FastCacheObject tree.)
<描边joinstyle =" miter"> < f eqn =" sum @ 10 21600 0"> <路径o:connecttype =" rect" gradientshapeok =" t" o:extrusionok =" f"> <锁长宽比=" t" v:ext =" edit"> ()
<图像数据o:title =" fig5 " src =" file:///C:\ Users \ Cecille \ AppData \ Local \ Temp \ msohtmlclip1 \ 01 \ clip_image003.jpg">()
在示例代码中,所有用于更新SlowCacheObject树的代码都包含在Node.h和SlowCacheObject.h文件中.(In the example code, all of the code for updating the SlowCacheObject tree is contained in the Node.h and the SlowCacheObject.h files.)对于FastCacheObject树,所有更新树的功能都包含在QuatDemoCode.cpp文件中的" updateWorldTransforms"函数中.(For the FastCacheObject tree, all of the functionality for updating the tree is contained in the “updateWorldTransforms” function located in the QuatDemoCode.cpp file.)从上图可以看到,SlowCacheObject树中的方法需要在树中的各个级别之间来回交替.(As you can see from the above diagram, the method in the SlowCacheObject tree requires alternating back and forth between levels in the tree.)处理完所有左侧节点的叶子之后,我们可以多回到第二级,并从右侧节点开始重复该过程.(After processing all of the left node leaves, we much go back up to the second level and repeat the process starting with the right node.)在FastCacheObject中,我们仅迭代两个数组并执行四元数乘法.(In the FastCacheObject we simply iterate through two arrays and perform quaternion multiplications.)
如果我们对此代码进行概要分析,则会得到如图6所示的结果.(If we profile this code we get the results shown in Fig. 6.)
图6.(Fig. 6.)
CS:EIP,符号+偏移,64位,计时器样本,(CS:EIP,Symbol + Offset,64-bit,Timer samples,)
" 0x40423a"," D3DXQuaternionMultiply",""," 1.2",(“0x40423a”,“D3DXQuaternionMultiply”,"",“1.2”,)
" 0x40423a"," D3DXQuaternionMultiply + 0",""," 1.2",(“0x40423a”,“D3DXQuaternionMultiply+0”,"",“1.2”,)
" 0x401c10"," FastCacheObject :: getTransforms",""," 0.76",(“0x401c10”,“FastCacheObject::getTransforms”,"",“0.76”,)
" 0x401de0"," Node :: getNumObjects",""," 0.54",(“0x401de0”,“Node::getNumObjects”,"",“0.54”,)
" 0x402120"," Node :: updateAllObjects",""," 7.61",(“0x402120”,“Node::updateAllObjects”,"",“7.61”,)
" 0x401f20"," Node :: updateWorldTransform",""," 3.7",(“0x401f20”,“Node::updateWorldTransform”,"",“3.7”,)
" 0x401f80"," SlowCacheObject :: getWorldTransform",""," 1.2",(“0x401f80”,“SlowCacheObject::getWorldTransform”,"",“1.2”,)
" 0x401f50"," SlowCacheObject :: updateWorldTransform",""," 4.67",(“0x401f50”,“SlowCacheObject::updateWorldTransform”,"",“4.67”,)
" 0x402500"," std :: _ Aux_cont :: __ Getcont",""," 3.8",(“0x402500”,“std::_Aux_cont::_Getcont”,"",“3.8”,)
" 0x4024d0"," std :: _ Iterator_base_aux :: _ Getmycont",""," 7.17",(“0x4024d0”,“std::_Iterator_base_aux::_Getmycont”,"",“7.17”,)
" 0x402510"," std :: _ Iterator_base_aux :: _ Has_container",""," 16.41",(“0x402510”,“std::_Iterator_base_aux::_Has_container”,"",“16.41”,)
" 0x402e00"," std :: _ Iterator_base_aux :: _ Iterator_base_aux",""," 2.28",(“0x402e00”,“std::_Iterator_base_aux::_Iterator_base_aux”,"",“2.28”,)
" 0x402580"," std :: _ Iterator_base_aux :: _ Same_container",""," 3.26",(“0x402580”,“std::_Iterator_base_aux::_Same_container”,"",“3.26”,)
" 0x402cb0"," std :: _ Iterator_base_aux :: _ Set_container",""," 1.09",(“0x402cb0”,“std::_Iterator_base_aux::_Set_container”,"",“1.09”,)
" 0x402de0"," std :: _ Iterator_with_base <std :: random_access_iterator_tag,FastCacheObject *,int,FastCacheObject * const *,FastCacheObject * const&,std :: _ Iterator_base_aux> :: _ Iterator_with_base <std :: random_access_iterator_tag, * const ,Fa",""," 2.61",(“0x402de0”,“std::_Iterator_with_base<std::random_access_iterator_tag,FastCacheObject *,int,FastCacheObject * const *,FastCacheObject * const &,std::_Iterator_base_aux>::_Iterator_with_base<std::random_access_iterator_tag,FastCacheObject *,int,FastCacheObject * const ,Fa”,"",“2.61”,)
" 0x402dc0"," std :: _ Ranit <SlowCacheObject *,int,SlowCacheObject * const *,SlowCacheObject * const&> :: _ Ranit <SlowCacheObject *,int,SlowCacheObject * const ,SlowCacheObject * const&>","," 1.96",(“0x402dc0”,“std::_Ranit<SlowCacheObject *,int,SlowCacheObject * const *,SlowCacheObject * const &>::_Ranit<SlowCacheObject *,int,SlowCacheObject * const ,SlowCacheObject * const &>”,"",“1.96”,)
" 0x4025c0"," std :: _ Vector_const_iterator <FastCacheObject *,std :: allocator <FastCacheObject *» :: operator ",""," 4.35",(“0x4025c0”,“std::_Vector_const_iterator<FastCacheObject ,std::allocator<FastCacheObject > >::operator”,"",“4.35”,)
" 0x402530"," std :: _ Vector_const_iterator <FastCacheObject *,std :: allocator <FastCacheObject » :: operator ==",""," 6.2",(“0x402530”,“std::_Vector_const_iterator<FastCacheObject *,std::allocator<FastCacheObject > >::operator==”,"",“6.2”,)
" 0x402d30"," std :: _ Vector_const_iterator <SlowCacheObject *,std :: allocator <SlowCacheObject *» :: _ Vector_const_iterator <SlowCacheObject *,std :: allocator <SlowCacheObject »",""," 6.09",(“0x402d30”,“std::_Vector_const_iterator<SlowCacheObject *,std::allocator<SlowCacheObject *> >::_Vector_const_iterator<SlowCacheObject *,std::allocator<SlowCacheObject > >”,"",“6.09”,)
" 0x402260"," std :: _ Vector_const_iterator <SlowCacheObject *,std :: allocator <SlowCacheObject » :: operator!=",""," 4.89",(“0x402260”,“std::_Vector_const_iterator<SlowCacheObject *,std::allocator<SlowCacheObject > >::operator!=”,"",“4.89”,)
" 0x402cd0"," std :: _ Vector_const_iterator <SlowCacheObject *,std :: allocator <SlowCacheObject » :: operator ++",""," 4.89",(“0x402cd0”,“std::_Vector_const_iterator<SlowCacheObject *,std::allocator<SlowCacheObject > >::operator++”,"",“4.89”,)
" 0x4025a0"," std :: _ Vector_iterator <FastCacheObject *,std :: allocator <FastCacheObject *» :: _ Vector_iterator <FastCacheObject *,std :: allocator <FastCacheObject »","," 1.52",(“0x4025a0”,“std::_Vector_iterator<FastCacheObject *,std::allocator<FastCacheObject *> >::_Vector_iterator<FastCacheObject *,std::allocator<FastCacheObject > >”,"",“1.52”,)
" 0x402290"," std :: _ Vector_iterator <SlowCacheObject *,std :: allocator <SlowCacheObject *» :: operator ",""," 1.09",(“0x402290”,“std::_Vector_iterator<SlowCacheObject ,std::allocator<SlowCacheObject > >::operator”,"",“1.09”,)
" 0x4022b0"," std :: _ Vector_iterator <SlowCacheObject *,std :: allocator <SlowCacheObject » :: operator ++",""," 3.7",(“0x4022b0”,“std::_Vector_iterator<SlowCacheObject *,std::allocator<SlowCacheObject > >::operator++”,"",“3.7”,)
" 0x4024b0"," std :: _ Vector_iterator <SlowCacheObject *,std :: allocator <SlowCacheObject » :: operator ++",""," 0.65",(“0x4024b0”,“std::_Vector_iterator<SlowCacheObject *,std::allocator<SlowCacheObject > >::operator++”,"",“0.65”,)
" 0x402200"," std :: vector <FastCacheObject ,std :: allocator <FastCacheObject » :: begin",""," 1.09",(“0x402200”,“std::vector<FastCacheObject ,std::allocator<FastCacheObject > >::begin”,"",“1.09”,)
" 0x402230"," std :: vector <SlowCacheObject ,std :: allocator <SlowCacheObject » :: end",""," 2.07",(“0x402230”,“std::vector<SlowCacheObject ,std::allocator<SlowCacheObject > >::end”,"",“2.07”,)
" 0x402100"," std :: vector <SlowCacheObject ,std :: allocator <SlowCacheObject » :: size",""," 1.3",(“0x402100”,“std::vector<SlowCacheObject ,std::allocator<SlowCacheObject > >::size”,"",“1.3”,)
" 0x401630"," testFastCacheObject",""," 0.22",(“0x401630”,“testFastCacheObject”,"",“0.22”,)
" 0x401670"," testSlowCacheObject",""," 0.33",(“0x401670”,“testSlowCacheObject”,"",“0.33”,)
" 0x401580"," updateWorldTransforms",""," 3.37",(“0x401580”,“updateWorldTransforms”,"",“3.37”,)
即使不计算SlowCacheObject使用的所有STL运算符,我们也可以看到FastCacheObject方法具有显着的性能优势.(Even without counting all of the STL operators used by SlowCacheObject, we can see a significant performance advantage in the FastCacheObject approach.)除了在树的构造/破坏期间,FastCacheObject不使用任何STL运算符.(FastCacheObject does not use any STL operators except during construction/destruction of the tree.)但是,这种方法的主要缺点是它非常静态.(The main drawback to this approach, however, is that it is very static.)如果您使用的系统非常动态,则将不断添加新对象而删除旧对象,则必须在运行时维护和更新所有Transforms数组.(If you are working with a very dynamic system where new objects will constantly be added and old objects deleted, it will be necessary to maintain and update all of the Transforms arrays at runtime.)事实证明,这可能会带来更多的麻烦,要么实施起来太困难,要么在性能上成本过高.(This can prove to be more trouble than it is worth, either being too difficult to implement or prohibitively costly in terms of performance.)相反,SlowCacheObject系统本质上是非常动态的,并且可以以其当前形式处理树中的插入和删除操作,而无需任何新代码.(In contrast, the SlowCacheObject system is very dynamic in nature and can handle insertions and deletions in the tree in its present form with no new code required.)与所有优化技术一样,您必须考虑项目的需求和探查器数据,以确定真正需要优化的内容以及应该如何进行优化.(As with all optimization techniques, you must consider the requirements of the project and the profiler data to determine what really needs to be optimized and how it should be done.)
结论(Conclusion)
四元数肯定比矩阵具有存储空间和时间效率方面的优势.(Quaternions definitely have memory space and time efficiency advantages over matrices.)但是,有时候矩阵是更好的选择.(But there are still times when matrices are a better choice.)因此,至关重要的是,在决定使用哪个操作之前,始终准确地知道每个操作如何实现所需的操作.(So it is essential to always know exactly how each one implements the desired operations before deciding which one to use.)当然,应用程序的设计要求始终优先于任何性能问题.因此,如果设计需要某些功能(例如平滑插值),那么尽管四元数具有一定的性能,但您可能仍必须使用它.(Of course, the design requirements of the application always take priority over any performance concerns. So if there are some features such as smooth interpolation that are required by the design then you may have to use quaternions despite their performance.)并作为(And as)通常,请使用良好的探查器来确定真正的瓶颈和热点在代码中的位置,以便最大程度地进行优化.(always, use a good profiler to determine where the real bottlenecks and hotspots are in the code in order to make the most of your optimization efforts.)
快乐编码,(Happy Coding,)
加布里埃尔T
德拉罗萨(Gabriel T. Delarosa)
参考书目:(Bibliography:)
Jan Svarovsky,“游戏编程四元论”,(Svarovsky, Jan, “Quaternions for Game Programming,“)**游戏编程宝石(Game Programming Gems,)**查尔斯河媒体,2000年(Charles River Media, 2000)
http://research.scee.net/files/presentations/gcapaustralia09/Pitfalls_of_Object_Oriented_Programming_GCAP_09.pdf(http://research.scee.net/files/presentations/gcapaustralia09/Pitfalls_of_Object_Oriented_Programming_GCAP_09.pdf)
许可
本文以及所有相关的源代码和文件均已获得The Code Project Open License (CPOL)的许可。
C C++ Windows Dev application graphics performance game 新闻 翻译