Python

How to Use Dataclasses in Python

This article will cover a guide on using the new “dataclass” objects included in Python 3.7 and newer versions. Dataclasses are just like other Python classes, but they are especially designed to be used as data containers and provide a cleaner and shorter syntax for quickly creating data objects. If you know about “namedtuple” objects and have used them in Python, you can define them as mutable namedtuple type objects. You can create new instances of dataclasses like any other class or namedtuple type objects and access their attributes using dot notation.

Basic Syntax and Usage

To understand a dataclass and its syntax, you need to understand the basic layout and structure of a Python class first. Below is an example showing a simple Python class:

class StockInHand:
    def __init__(self, apples, oranges, mangoes):
        self.apples = apples
        self.oranges = oranges
        self.mangoes = mangoes

stock = StockInHand(40, 50 , 60)
print (stock.apples, stock.oranges, stock.mangoes)

In the code sample above, a new class called “StockInHand” has been created with a “__init__” method defined inside it. The __init__ method is automatically invoked whenever you create a new instance of StockInHand class. In this case, the __init__ method has been defined with some mandatory arguments. So, you cannot create a new instance of StockInHand without some values for necessary arguments. The “self” argument provides a reference to the instance of a class, so you can use it to refer to any variable or method within a class as long as these variables and methods have been defined by making use of the self argument. The self argument acts as a convenience tool and it can be named anything. You can also omit it completely. In the last couple of statements, a new instance of StockInHand class is created and its variables are accessed using dot notation.

After running the above code sample, you should get the following output:

40 50 60

The same class can be defined using dataclass as follows:

from dataclasses import dataclass

@dataclass
class StockInHand:
    apples: int
    oranges: int
    mangoes: int

stock = StockInHand(40, 50 , 60)
print (stock.apples, stock.oranges, stock.mangoes)

The first statement imports the “dataclass” decorator from the “dataclasses” module. Decorators can be used to modify behavior of Python objects without actually changing them. In this case, the dataclass decorator is predefined and comes from the dataclasses module. To define a dataclass, you need to attach dataclass decorator using “@” symbol to a Python class, as shown in the above code sample. In the next few statements, variables in the dataclass are defined using type hints to indicate what type of object they are. Type hints were introduced in Python 3.6 and they are defined using “:” (colon) symbols. You can create a new instance of dataclass like any other Python class. After running the above code sample, you should get the following output:

40 50 60

Note that if a method in dataclass returns a value, you can assign it a type hint using “->” symbol. Here is an example:

from dataclasses import dataclass

@dataclass
class StockInHand:
    apples: int
    oranges: int
    mangoes: int

    def total_stock(self) -> int:
        return self.apples + self.oranges + self.mangoes

stock = StockInHand(40, 50 , 60)
print (stock.total_stock())

A new method called “total_stock” has been created and a type hint using the “int” reserved keyword has been assigned to it to indicate the type of return value. After running the above code sample, you should get the following output:

150

Variables in Dataclass Objects can have Default Values

You can assign default values to members of dataclasses after type hints. Here is an example:

from dataclasses import dataclass

@dataclass
class StockInHand:
    apples: int = 40
    oranges: int = 50
    mangoes: int = 60

    def total_stock(self) -> int:
        return self.apples + self.oranges + self.mangoes

stock = StockInHand()
print (stock.total_stock())

In the second last statement, no arguments have been supplied during creation of a new instance of StockInHand dataclass, so default values have been used. After running the above code sample, you should get the following output:

150

Dataclass Members are Mutable

Dataclasses are mutable, so you can change the value of its members by getting a reference to them. Below is a code sample:

from dataclasses import dataclass

@dataclass
class StockInHand:
    apples: int = 40
    oranges: int = 50
    mangoes: int = 60

    def total_stock(self) -> int:
        return self.apples + self.oranges + self.mangoes

stock = StockInHand()
stock.apples = 100
print (stock.total_stock())

The value of apples variable has been changed before calling the total_stock method. After running the above code sample, you should get the following output:

210

Creating a Dataclass from a List

You can create a dataclass programmatically using the “make_dataclass” method, as shown in the code sample below:

import dataclasses

fields = [("apples", int, 40), ("oranges", int, 50), ("mangoes", int, 60)]
StockInHand = dataclasses.make_dataclass(
    "StockInHand", fields,
    namespace={'total_stock': lambda self: self.apples + self.oranges + self.mangoes}
)

stock = StockInHand()
stock.apples = 100
print (stock.total_stock())

The make_dataclass method takes a class name and a list of member fields as two mandatory arguments. You can define members as a list of tuples where each tuple contains the name of the variable, its type hint and its default value. Defining default value is not required, you can omit it to assign no default value. The optional namespace argument takes a dictionary that can be used to define member functions using key-value pairs and lambda functions. The code above is exactly equivalent to defining the following dataclass manually:

from dataclasses import dataclass

@dataclass
class StockInHand:
    apples: int = 40
    oranges: int = 50
    mangoes: int = 60

    def total_stock(self):
        return self.apples + self.oranges + self.mangoes

stock = StockInHand()
stock.apples = 100
print (stock.total_stock())

After running the above two code samples, you should get the following output:

210

Why Use a Dataclass Instead of a Class?

You might wonder why use dataclasses if they are nearly the same as other Python classes? One of the main benefits of using dataclasses is its conciseness. You can create dataclasses using clean and minimal shorthands without much boilerplate code. They are especially designed to be used as data containers where variables can be easily accessed using dot notation, though you can use dataclasses as full-fledged classes as well. In simple terms, If you want to use a Python class just to use it as a data store, dataclass seems to be a better choice.

Conclusion

Dataclasses in Python provide a minimal way to quickly create Python classes intended to be used as data stores. You can get references to members of dataclasses using dot notation and they are especially useful when you are looking for dictionaries like key-value pairs that can be accessed using dot notation.

About the author

Nitesh Kumar

I am a freelancer software developer and content writer who loves Linux, open source software and the free software community.