Why Defining __init__ Directly Is a Bad Practice and How to Replace It with dataclasses, classmethods, and NewType
The article explains that using a custom __init__ method to create data structures couples object construction with side‑effects, leading to fragile code, and demonstrates how refactoring with @dataclass, @classmethod factories and typing.NewType yields cleaner, safer, and more testable Python classes.
Historical background – Before Python 3.7 introduced dataclasses , the special method __init__ was the only way to initialise a class that represented a data structure, such as a 2DCoordinate with x and y attributes.
When a class needed to expose a convenient constructor like 2DCoordinate(x=1, y=2) , developers had to write an __init__ that accepted the corresponding parameters. Alternative approaches (removing the class from the public API, exposing a factory function, using class attributes, or creating an abstract base class) all had serious drawbacks, especially before typing.Protocol existed in Python 3.8.
Consequently, the default __init__ became the de‑facto way of creating objects, even for classes that only needed to store a few attributes. This habit introduced hidden side‑effects whenever the constructor performed I/O or other work.
Problem illustration – Consider a FileReader class that wraps a low‑level fileio module: <code>class FileReader: def __init__(self, path: str) -> None: self._fd = fileio.open(path) def read(self, length: int) -> bytes: return fileio.read(self._fd, length) def close(self) -> None: fileio.close(self._fd) </code> The constructor hides the file descriptor ( _fd ) and performs the I/O operation. As requirements grow (e.g., needing to create an instance from an existing file descriptor), developers resort to hacks like: <code>def reader_from_fd(fd: int) -> FileReader: fr = object.__new__(FileReader) fr._fd = fd return fr </code> These workarounds break the original benefits: the simple FileReader("path") syntax disappears, type safety is lost, and testing becomes harder because every test must stub the low‑level fileio functions.
Solution
Use @dataclass to declare the data attributes. The dataclass automatically generates a safe __init__ that only assigns attributes.
Replace the side‑effect‑ful constructor with a @classmethod factory that performs the I/O and returns an instance.
Introduce precise type aliases with typing.NewType to distinguish raw integers (file descriptors) from valid descriptors.
Example refactor: <code>from typing import Self, NewType FileDescriptor = NewType("FileDescriptor", int) @dataclass class FileReader: _fd: FileDescriptor @classmethod def open(cls, path: str) -> Self: return cls(FileDescriptor(fileio.open(path))) def read(self, length: int) -> bytes: return fileio.read(self._fd, length) def close(self) -> None: fileio.close(self._fd) </code> Now users call FileReader.open("path") , the class remains easy to instantiate in tests (by passing a mock FileDescriptor ), and static type checkers can enforce correct usage.
Additional notes – The NewType alias adds no runtime overhead; it merely informs type checkers that only values produced by fileio.open are valid. For more complex validation, the dataclass __post_init__ hook can be used without re‑introducing side‑effects in __init__ .
Summary – New best practice
Define data‑holding classes as @dataclass (or attrs classes).
Keep the automatically generated __init__ for simple attribute assignment.
Add explicit @classmethod factories for any construction that requires I/O, validation, or other work.
Use typing.NewType to give primitive types (e.g., int file descriptors) a distinct, type‑checked identity.
Following these guidelines yields Python classes that are easier to use, test, and extend, while avoiding the historic anti‑pattern of overloading __init__ with side‑effects.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.