Performance Impact of Clean Code Practices: A C++ Case Study
This article examines how strict adherence to clean‑code principles in C++—such as using polymorphism, limiting function size, and avoiding internal knowledge—affects runtime performance, comparing virtual‑function hierarchies, switch‑based implementations, and table‑driven approaches, and revealing up to fifteen‑fold speed differences.
Writing "clean" code is a repeatedly‑cited programming recommendation, especially for beginners, but many of the rules behind it have little impact on execution time. Some rules, however, can be objectively measured because they affect runtime behavior.
Key performance‑affecting rules include using polymorphism instead of if/else or switch , hiding internal object structure, strictly limiting function size, ensuring a function does only one thing, and applying the DRY principle.
The author asks: if we create code that follows these rules, how does its performance compare to more traditional implementations?
Below is a classic clean‑code example: a shape base class with derived classes for square, rectangle, triangle, and circle, each providing a virtual Area() method.
/* ========================================================================
LISTING 22
======================================================================== */
class shape_base
{
public:
shape_base() {}
virtual f32 Area() = 0;
};
class square : public shape_base
{
public:
square(f32 SideInit) : Side(SideInit) {}
virtual f32 Area() {return Side*Side;}
private:
f32 Side;
};
class rectangle : public shape_base
{
public:
rectangle(f32 WidthInit, f32 HeightInit) : Width(WidthInit), Height(HeightInit) {}
virtual f32 Area() {return Width*Height;}
private:
f32 Width, Height;
};
class triangle : public shape_base
{
public:
triangle(f32 BaseInit, f32 HeightInit) : Base(BaseInit), Height(HeightInit) {}
virtual f32 Area() {return 0.5f*Base*Height;}
private:
f32 Base, Height;
};
class circle : public shape_base
{
public:
circle(f32 RadiusInit) : Radius(RadiusInit) {}
virtual f32 Area() {return Pi32*Radius*Radius;}
private:
f32 Radius;
};Using this hierarchy, the total area of a collection of shapes is computed with a virtual‑function loop:
/* ========================================================================
LISTING 23
======================================================================== */
f32 TotalAreaVTBL(u32 ShapeCount, shape_base **Shapes)
{
f32 Accum = 0.0f;
for(u32 ShapeIndex = 0; ShapeIndex < ShapeCount; ++ShapeIndex)
{
Accum += Shapes[ShapeIndex]->Area();
}
return Accum;
}An unrolled version reduces loop overhead:
/* ========================================================================
LISTING 24
======================================================================== */
f32 TotalAreaVTBL4(u32 ShapeCount, shape_base **Shapes)
{
f32 Accum0 = 0.0f;
f32 Accum1 = 0.0f;
f32 Accum2 = 0.0f;
f32 Accum3 = 0.0f;
u32 Count = ShapeCount/4;
while(Count--)
{
Accum0 += Shapes[0]->Area();
Accum1 += Shapes[1]->Area();
Accum2 += Shapes[2]->Area();
Accum3 += Shapes[3]->Area();
Shapes += 4;
}
return (Accum0 + Accum1 + Accum2 + Accum3);
}Benchmarking shows both versions need roughly 35 loop iterations per shape, confirming that strict clean‑code rules do not improve performance here.
Violating the first rule (using a switch instead of polymorphism) yields a 1.5× speedup:
/* ========================================================================
LISTING 25
======================================================================== */
enum shape_type : u32
{
Shape_Square,
Shape_Rectangle,
Shape_Triangle,
Shape_Circle,
Shape_Count,
};
struct shape_union
{
shape_type Type;
f32 Width;
f32 Height;
};
f32 GetAreaSwitch(shape_union Shape)
{
f32 Result = 0.0f;
switch(Shape.Type)
{
case Shape_Square: Result = Shape.Width*Shape.Width; break;
case Shape_Rectangle: Result = Shape.Width*Shape.Height; break;
case Shape_Triangle: Result = 0.5f*Shape.Width*Shape.Height; break;
case Shape_Circle: Result = Pi32*Shape.Width*Shape.Width; break;
}
return Result;
}The corresponding total‑area loops are almost identical to the virtual‑function versions, but the switch‑based code runs about 1.5× faster because it eliminates virtual calls and pointer indirection.
Further replacing the switch with a table‑driven approach yields roughly a 10× speedup:
/* ========================================================================
LISTING 27
======================================================================== */
f32 const CTable[Shape_Count] = {1.0f, 1.0f, 0.5f, Pi32};
f32 GetAreaUnion(shape_union Shape)
{
return CTable[Shape.Type] * Shape.Width * Shape.Height;
}With this change the loop count per shape drops to about 3–3.5, delivering a ten‑fold performance gain over the original clean‑code version.
The author extends the experiment by adding a second property (corner count) and shows that each additional rule violation further widens the performance gap, reaching up to a fifteen‑fold advantage for the table‑driven code.
Even when compiling with AVX optimizations, the clean‑code implementations remain 20–25× slower than the highly tuned, rule‑breaking versions.
Overall, the article argues that while clean‑code guidelines may improve readability, they can dramatically degrade performance, especially when many such rules are applied to compute‑intensive code.
Top Architecture Tech Stack
Sharing Java and Python tech insights, with occasional practical development tool tips.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.