Basic Syntax and Usage
To understand a dataclass and its syntax, you need to understand the basic layout and structure of a Python class first. Below is an example showing a simple Python class:
def __init__(self, apples, oranges, mangoes):
self.apples = apples
self.oranges = oranges
self.mangoes = mangoes
stock = StockInHand(40, 50 , 60)
print (stock.apples, stock.oranges, stock.mangoes)
In the code sample above, a new class called “StockInHand” has been created with a “__init__” method defined inside it. The __init__ method is automatically invoked whenever you create a new instance of StockInHand class. In this case, the __init__ method has been defined with some mandatory arguments. So, you cannot create a new instance of StockInHand without some values for necessary arguments. The “self” argument provides a reference to the instance of a class, so you can use it to refer to any variable or method within a class as long as these variables and methods have been defined by making use of the self argument. The self argument acts as a convenience tool and it can be named anything. You can also omit it completely. In the last couple of statements, a new instance of StockInHand class is created and its variables are accessed using dot notation.
After running the above code sample, you should get the following output:
The same class can be defined using dataclass as follows:
@dataclass
class StockInHand:
apples: int
oranges: int
mangoes: int
stock = StockInHand(40, 50 , 60)
print (stock.apples, stock.oranges, stock.mangoes)
The first statement imports the “dataclass” decorator from the “dataclasses” module. Decorators can be used to modify behavior of Python objects without actually changing them. In this case, the dataclass decorator is predefined and comes from the dataclasses module. To define a dataclass, you need to attach dataclass decorator using “@” symbol to a Python class, as shown in the above code sample. In the next few statements, variables in the dataclass are defined using type hints to indicate what type of object they are. Type hints were introduced in Python 3.6 and they are defined using “:” (colon) symbols. You can create a new instance of dataclass like any other Python class. After running the above code sample, you should get the following output:
Note that if a method in dataclass returns a value, you can assign it a type hint using “->” symbol. Here is an example:
@dataclass
class StockInHand:
apples: int
oranges: int
mangoes: int
def total_stock(self) -> int:
return self.apples + self.oranges + self.mangoes
stock = StockInHand(40, 50 , 60)
print (stock.total_stock())
A new method called “total_stock” has been created and a type hint using the “int” reserved keyword has been assigned to it to indicate the type of return value. After running the above code sample, you should get the following output:
Variables in Dataclass Objects can have Default Values
You can assign default values to members of dataclasses after type hints. Here is an example:
@dataclass
class StockInHand:
apples: int = 40
oranges: int = 50
mangoes: int = 60
def total_stock(self) -> int:
return self.apples + self.oranges + self.mangoes
stock = StockInHand()
print (stock.total_stock())
In the second last statement, no arguments have been supplied during creation of a new instance of StockInHand dataclass, so default values have been used. After running the above code sample, you should get the following output:
Dataclass Members are Mutable
Dataclasses are mutable, so you can change the value of its members by getting a reference to them. Below is a code sample:
@dataclass
class StockInHand:
apples: int = 40
oranges: int = 50
mangoes: int = 60
def total_stock(self) -> int:
return self.apples + self.oranges + self.mangoes
stock = StockInHand()
stock.apples = 100
print (stock.total_stock())
The value of apples variable has been changed before calling the total_stock method. After running the above code sample, you should get the following output:
Creating a Dataclass from a List
You can create a dataclass programmatically using the “make_dataclass” method, as shown in the code sample below:
fields = [("apples", int, 40), ("oranges", int, 50), ("mangoes", int, 60)]
StockInHand = dataclasses.make_dataclass(
"StockInHand", fields,
namespace={'total_stock': lambda self: self.apples + self.oranges + self.mangoes}
)
stock = StockInHand()
stock.apples = 100
print (stock.total_stock())
The make_dataclass method takes a class name and a list of member fields as two mandatory arguments. You can define members as a list of tuples where each tuple contains the name of the variable, its type hint and its default value. Defining default value is not required, you can omit it to assign no default value. The optional namespace argument takes a dictionary that can be used to define member functions using key-value pairs and lambda functions. The code above is exactly equivalent to defining the following dataclass manually:
@dataclass
class StockInHand:
apples: int = 40
oranges: int = 50
mangoes: int = 60
def total_stock(self):
return self.apples + self.oranges + self.mangoes
stock = StockInHand()
stock.apples = 100
print (stock.total_stock())
After running the above two code samples, you should get the following output:
Why Use a Dataclass Instead of a Class?
You might wonder why use dataclasses if they are nearly the same as other Python classes? One of the main benefits of using dataclasses is its conciseness. You can create dataclasses using clean and minimal shorthands without much boilerplate code. They are especially designed to be used as data containers where variables can be easily accessed using dot notation, though you can use dataclasses as full-fledged classes as well. In simple terms, If you want to use a Python class just to use it as a data store, dataclass seems to be a better choice.
Conclusion
Dataclasses in Python provide a minimal way to quickly create Python classes intended to be used as data stores. You can get references to members of dataclasses using dot notation and they are especially useful when you are looking for dictionaries like key-value pairs that can be accessed using dot notation.