PyTorch Tutorial

PyTorch is an open-source library of machine learning that is used for Python. Facebook’s artificial intelligence research team initially created it, serving as the foundation for Uber’s Pyro software for probabilistic programming. It is a mathematical toolkit that makes it possible to compute contrast graph-based models effectively and automatically.

Since it is straightforward and adaptable, the PyTorch API is popular among academics and researchers, creating new deep learning models and applications. Due to the widespread use, there are already numerous extensions for niche applications, such as text, computer vision, and audio data, and potentially ready-to-use pre-trained models.


The key characteristics of PyTorch are:

Easy To Use

Since PyTorch has an intuitive API and relies on Python, it is considered relatively easy to use. Code execution is made simple by this framework.

Python Usage

This module seamlessly interacts with the Python data science stac and is considered Pythonic. As a result, it may utilize all the features and services the Python environment provides.

Computational Graphs

PyTorch offers an effective environment for dynamic computational graphs. This allows for runtime modification by the user.

Tools and Libraries

A vibrant community of researchers and creators has established a comprehensive network of tools and libraries for the dissemination of PyTorch and fostering advancement in fields like computer vision and reinforcement learning.

Benefits of PyTorch

  • It is easy to understand and simple to code because it is based on Python
  • Enables simple debugging using well-liked Python tools
  • PyTorch is simple to scale and has excellent support on the most popular cloud platforms
  • Focuses its small community on open-source
  • Has the ability to export learning models in the Open Neural Network Exchange (ONNX) common format

Difference Between Torch and PyTorch

Torch (Torch7) is a freeware machine learning library and a scientific computing framework based on the programming language and typically accessed using the Lua interface. It is no longer being actively developed.

PyTorch is the library based on a torch since it is straightforward and adaptable. In PyTorch, “py” indicates the Python where the “torch” reflects its earlier torch library.

How To Install PyTorch

Remember that Python is installed, particularly version 3.7 or higher, before configuring PyTorch. Python can be installed using Anaconda. After setting up our Python environment, we will proceed to our next step of installing PyTorch.

First, we have to create a virtual environment for Conda in which we want to install all of our packages and establish the environment.

$ conda create –n PyTorch python=3.7

Conda is an environment manager that helps you install the machine learning program package. If you have to work with another version of Python, then you don’t have to switch to it. Conda will automatically manage it. You just have to use some commands to make your environment according to your choice.

Where create is a keyword to tell the compiler to create a new environment for PyTorch that will use the “Python 3.7” version. When our packages are successfully downloaded, it will ask for your permission to install them or not.

As your environment is successfully created, it will display the output, as shown in the following snippet:

Let us get started with PyTorch Installation:

Open the following link and go to the PyTorch website. Go to the Get Started tab and click on Start Locally.

A zone like the one in the figure beneath will be seen. The first selector will be PyTorch Build, and select Stable(1.12.1) from the given options. Now, the second selector field will be your choice, and you can select OS as per your requirements. We will select Windows OS from the given options because we are using Windows to perform our tasks. The third tab will be Package. We will choose Conda from the given options. If we use any other package, then we will select that package.

In the second last option, select the name of the Language you are using, and, in our case, that will be Python so that we will select it from the given options. At last, we will select the CPU from the given options because we are using the CPU as our computing platform. If we are using some other platform like CUDA, then select CUDA.

After selecting all the options from the Selectors given above, a command will be shown in the field below. Copy that command and run that command in your Conda command prompt.

In the previous command, Conda will be instructed to install the PyTorch and other libraries, Torchvision and Torchaudio, where Torchvision is the library of Python. It was designed to make testing and study in the field of computer vision simpler. It has model architectures, common image transformations, and other popular datasets for Computer Vision.

Torchaudio is also a package of PyTorch, and it was built for audio and signal processing of computers. It also has popular datasets, I/O, and common audio transformations.

When all of our packages are downloaded, it will ask for your permission whether to proceed with it or not.

Now, moving to our next step, activate our PyTorch packages. In doing so, we will run the following command in the anaconda command prompt:

$ conda activate PyTorch

Now, activating our PyTorch packages using the command previously mentioned in this activate is the keyword used to instruct Conda to activate the PyTorch library we have already installed.

We can verify by starting Python that if it is running, it will display the following output:

After that, we have to import the torch in it; if it displays a module not found error, this means your installation was not correct. In this case, the installation was correct, so it didn’t display any error.

Now, we can create a torch tensor to check if it works correctly to do so; let’s say:

$ x = torch.rand(3)

$ print(x)

In the previous command, we created a variable “x” and assigned it to the value torch.rand, where rand indicates a random value. Then we displayed the torch output for it using the print function and passing it our variable “x”. The following screenshot shows how tensor values are shown:

Now, checking the Torch version, we import the torch, and using the print function, we will pass the value “torch__version__” to check it.

$ import torch

$ print(torch.__version__)

Here’s the output for the following command, as shown in the previous figure. And now, our PyTorch environment is successfully installed for our use.

PyTorch Deep Learning Model Life-Cycle

Projects involving machine learning are not simple; they involve an ongoing cycle of improving the data, model, and evaluation. This cycle is essential for creating a Machine learning model since it emphasizes the usage of simulation and results to improve your dataset.

For more than ten years, deep learning has been a major topic and has contributed to the market’s increased focus on ML. The development of many tools to help in the production of ML models has caused the ML industry to experience.

5 Major Steps for Deep Learning Model Life-Cycle Are:

  1. Preparing data
  2. Modal Defining
  3. Training the modal
  4. Modal evaluation
  5. Prediction making

Example 1

In this example, we will use the Torchvision library with a preloaded “fashionmnist” dataset using the Torchvision library of PyTorch, which has real-world objects.

When it comes to working with data, PyTorch uses its two main elements, which are and, allowing us to use your data or preloaded datasets. Data Loader wraps an iterable around the Dataset to make it easy to acquire the samples, which Dataset uses to store the samples and their related labels.

ToTensor is used for picture enhancement and transformation, while the nn package specifies a collection of practical loss functions that are typically employed. FashionMNIST dataset works for images with elements ranging from 0–255. You can also write customized transforms according to your needs. Let’s check out the code of this scenario in the following image:

In the upper code, we have assigned a dataset of FashionMNIST to our training_data. This data set assigned to training_data consists of many real-world vision data, such as CIFAR, and COCO. We will employ the FashionMNIST dataset in this instance. All the TorchVision Datasets require two parameters. One is transformed, and the other is target_transform. These two parameters modify the labels and samples.

The results shown in the accompanying figure appear after the code is run. It will start downloading the Dataset from the library.

Now, the Dataset is downloaded after the execution of our code. We will pass the Dataset as a parameter to DataLoader. This will help us in Sampling, Shuffling, Data loading of multiprocess, and batching automatically. In our case, we are assigning 64 to our batch_size. As a result, we are defining 64 batch sizes. Every element of DataLoader Iterable will be able to return an output of a Batch of 64 labels and features.

We have set the train to be True because it sets the module. The download is also set to True mode, because we have to download the dataset to train our module with the help of that dataset. We have passed data to our root parameter because we will pass the information to our module in the form of data that we will download from the ToTensor package.

After executing our code, we can see in our output that it has printed the shapes of X and Y according to our past arguments. It will download the images.

Now, moving toward modal creation, we will create a class named Neural Network which will inherit nn. The module is a parameter. We will create a function named init; in this function, we will define the layers of the network. Next, we will create a function with the name forward as the name clarifies its purpose. We shall indicate how the data will be transferred via the network in the forward function.

Linear is a class of nn Library that applies a linear transformation to the data that we pass to it. We are passing 28*28 as the number of dimensions to Linear Class. ReLU is a function on nn packages, which handles the negative elements by replacing them with 0. To speed up the whole process for the neural network, we use the resources of our graphics card if they are available for utilization. Let’s check out the code of this scenario in the following affixed image:

Init function is the initialization method in PyTorch. It is required to let PyTorch get the directories containing libraries. In this, we passed it self-function. Whereas in the forward function, we pass input tensors as parameters, and the forward function returns output tensors by calculating your input tensors. In this, we pass two parameters, “self” and “x”, to the forward function. The forward function also defines how your module will work from input to output.

Following that, the output of the previous code appears in the following screenshot:

After the process we performed in the previous steps, we will train our model. We must employ an optimizer and a loss function to accomplish this. A loss function procedure determines the discrepancy between an algorithm’s current output and anticipated output. Optimizer helps the system to select the best choice from a set of possible options that are easy and do not exploit constraints. Let’s check out the code of this scenario in the affixed image shown below:

Epoch is a parameter that specifies the number of times a learning algorithm will run through the dataset we have provided for training. In our case, we have set it to 5, which means it will go through our dataset 5 times during training.

After initializing epochs, we will use a for loop, which will run the number of times “t” that we have declared in epochs. In this loop, it will execute the trained function 5 times and will execute it. After all the “for loop” execution is done, it will print the text “done”.

The following snippet displays the output of the epochs trained function:

Example 2

In this instance, solving the problem of fitting y=sin(x) with a third-order polynomial will suffice.

Method 1

The pytorch-provided optim package will be used in this technique. “Torch.optim” is a PyTorch package responsible for implementing different optimization techniques. The majority of frequently used techniques are already implemented, and the interface is sufficiently open-ended that future easy integration of more complex ones is also possible. By manually changing the Tensors containing learnable parameters with grad, the weights of our models() are updated.

We will also use the nn package to declare our modal, but we will use RMSpop to optimize the model because it provides the torch package that is provided by optim.

Let’s proceed to our code to create a tensor responsible for holding data for input and output. In our case, we created two variables, “a” and “b”. Next, we will define the input tensors named “var” and “myvar”, which are input tensors x^, x^2, and x^3.

After that, we will declare modal and loss functions using the nn package. Create an Optimizer using the optim package to change the model’s weights on our behalf. RMSprop will be used in this instance; the optim package includes numerous other optimization techniques. The RMSprop constructor’s first input instructs the optimizer on which Tensors it should be modified. Create an Optimizer using the optim package to change the model’s weights on our behalf. RMSprop will be used in this instance; the optim package includes numerous other optimization techniques. The code for the previous explanation can be found in the following Spyder screen:

Then in the following step, we will compute the predicted y by passing it myvar:

$ y_pred = modal(myvar)

In it, y_predicat is the variable to which we assign the modal function to which the “myvar” value is passed. The loss will then be calculated and displayed. This is caused by the fact that gradients are dynamically gathered in buffers whenever.backward() is invoked.

loss.backward() will determine the loss gradient with consideration to the modal. When an Optimizer’s step function is invoked, its parameters are updated. The input features of linear layers are multiplied by the weight matrix to generate the output features. The input features are passed to a linear layer as a one-dimensional tensor that has been demolished, and they are multiplied by the weight matrix. At the end of the code, it will display the output as a result in the form of the equation.

Method 2

In the following method, we will use tensor and autograd packages. First, import our torch and math libraries for this program and then define our variables “dtype”. Here, we assigned the value torch.float as performed in the previous example and defined the variable “device”. We again assign the value torch.device(“CPU”) that is used to define the platform to be used.

To hold inputs and outputs, create tensors. Assuming that requires grad=False, the backward pass doesn’t need to compute gradients about these Tensors by default. Make weights using random Tensors. We require four weights for a polynomial of third order: y = a + b x + c x^2 + d x^3 setting demands. We wish to compute gradients concerning these Tensors during the backward pass, which is indicated by the value _grad=True.

Calculate and display loss utilizing tensor functions. Loss becomes a Tensor with the form (1,) loss. The loss’s scalar value is returned by item(). The code for the previous explanation can be found in the following Spyder screen:

In this case, the backward pass of our neural network must be managed manually. It is not physically hard for the straightforward two-layer network to do so. However, it can rapidly become complicated for large complex networks. Fortunately, we can utilize automatic differentiation to mechanize the process of a backward pass in neural networks. The autograd library in PyTorch gives this functionality very precisely; while using autograd, the forward pass of a network will characterize a computational graph. The graph will have nodes and edges; tensors are represented using nodes, and edges represent the methods that will return output tensors by taking input tensors as arguments.

Reading this graph helps to calculate gradients easily. This seems like a difficult task, but in reality, this practice is very simple and straightforward. If “X” is a tensor and the attribute “requires grad” is set to true. Therefore, x is a tensor that addresses a node in a computational graph. x.grad is another tensor holding a slope of “x” for some scalar value. Here, we demonstrate fitting a sine wave with a third-order polynomial using PyTorch tensors and autograd. Now, there is no need for us to manually execute the backward pause through the network.

Method 3

In this example, we will be using the NumPy module. NumPy is a Python library for the programming language that supports substantial, multi-dimensional arrays and matrices. In addition, a significant amount of high-level mathematical operations to operate on these arrays. NumPy is a generic platform for computing scientific operations; it doesn’t involve deep learning or computing graphs.

Using NumPy operations, we can easily fit a third-order polynomial in a “sin” function. For doing so, we will first import libraries NumPy and math. Math is the package that is used to import mathematical operations. Let’s check out the code of this scenario in the following image:

We first initialized two variables named “var1” and “var2” and assigned their functions. Here, np.linspace function is a Python method that helps us to create numeric sequences. It helps us in generating linear-spaced vectors. It is the same as the colon sign, but the only difference between the colon sign and it is that it provides us direct control over the number of points. To this function, we passed three parameters that are “-math.pi”, “math.pi”, and “200”, in which 200 is the range the calculation will be done. And the second function we passed to var2 is “np.sin(var1)”, which is used to calculate the sine of the values that will be stored in “var1”.

Next, we initialized four variables named “a”, “b”, “c”, and “d” to these variables we passed random values. Next, we initialized a for loop to which we passed variable “t”, indicating the time and the range to 200.

After that, define another variable, “var2_pred”, to which we will assign the mathematical operation for calculating aggregate and then pass it to the variable named “loss”. The loss will be responsible for calculating the var “var2_pred” loss.

Moving to the next step, where we used the if statement, which will be responsible for calculating the computation of “t%10 == 9”, it will take the mode of “t” with the value 10 and increment it by 9. It will print the loss of these values obtained from the previous if statement.

After all of these computations where the for loop ends, it will sum up the whole values, and the last print operation will be calculated at the end of the for loop and will be displayed as a result, as shown in the snippet below:

Method 4

In this piece of code, we will be using the tensor package. A NumPy array and a PyTorch Tensor are theoretically equivalent because a Tensor is a multi-dimensional array, and PyTorch offers a variety of methods for working with them. Tensors are a general-purpose scientific computing tool that may monitor gradients and computational graphs in the background. PyTorch Tensors can use GPUs to speed up their numerical operations, unlike NumPy. A PyTorch Tensor can be executed on a GPU in a few easy steps.

In this piece of code, we imported two libraries, torch and math. Since we will be using a torch in this code and mathematical functionalities, we defined a variable named “dtype”. We assigned datatypes torch.float because we are performing mathematical operations which will return values in points. To store decimal values, we use datatype float.

Then we declared another variable named “device” with assigning value torch.device(“CPU”) because we selected CPU as our platform in the PyTorch installation. Moving forward, we have initialized two variables named var1″ and “var2” and assigned them the same function that we assigned in the previous example. Then creating four variables that are “a”, “b”, “c”, and “d” are being used to create random tensors. Each has its unique shape; we have also passed the datatype device and dtype, which will be initialized at the beginning of our code. So, it will use the resources of our CPU and will take values in the form of decimals. It is indicated that we intend to calculate gradients about such Tensors during the reverse pass by stating that it requires grad=True. Let’s check out the code of this scenario in the following affixed image:

Then we declared a learning_rate=1e-6. Learning rate is also a hyperparameter used for the training of neural networks. It has a very small value which must be greater than 0.0 and less than 1.0, meaning it must be a positive value. The pace of learning increases with the increasing learning rate.

After that, we will open a loop that will be executed 200 times and will be performing mathematical operations on our variables. Wherever t%100 gets equal to 99, the loop will print the number of times it is executed and stopped. After that, the result will be published as our iteration ends, as displayed in the following screenshot:

Method 5

In this method, we will be creating our autograd function using PyTorch subclasses. Every primary autograd is two methods that operate on tensors. The forward function takes input tensors and calculates output tensors from them. In PyTorch, we can undoubtedly characterize our autograd operator by describing a subclass.autograd.function and running the forward and backward approaches.

Now, the output tensors are being procured as parameters to the backward functions with some scalar values. PyTorch facilitates us to define our autograd operator by enabling us to create a subclass that analyzes the gradient of the input tensors with allusion to the same scalar values. Now, we can use our autograd operator by initializing a constructor and calling it a method. Then we will pass the tensor with input data as a parameter to that constructor.

In the following example, we specify our modal as:

$ y=a+b P_3(c+dx)


instead of



We compose our custom autograd method by processing the forward and backward of p3 and using it to execute our model. The code for the following explanation can be found on the Spyder screen.

By subclassing the torch, we may implement our unique autograd functions.autograd. Thus, functioning and putting into practice the forward and backward passes that use tensors. We get a tensor holding the input during the forward pass and deliver a tensor containing the output back. A context object called ctx can be used to store data for use in backward calculation.

Tensors can be created to store input and output. Assuming that requires grad=False, the backward pass doesn’t need to compute gradients about these Tensors by default.

Make weights using random Tensors. To guarantee convergence, the weights for this example should be introduced reasonably close to the accurate result: y = a + b * P3(c + d * x). It is indicated that we intend to generate a gradient concerning these Mappings during a backward pass by setting that requires grad=True.

The following image demonstrates the output that the previous code generated:

Method 6

Here, we will use nn the module to solve this problem. After importing our desired libraries torch and math, we will declare Tensors to assign input and outputs. This example can be considered a linear layer neural network since the resulting output is a linear function of (x, x2, x3). Let’s check out the code of this scenario is the following affixed image:

Let’s get the tensor (x, x2, x3) ready. Since p has shape (3) and x.unsqueeze(-1) has structure (200, 1), streaming semantics will be used to produce a tensor of shape in this situation (200, 3). To construct our model as a series of layers, use the nn package. nn is a module that applies other modules sequentially to obtain its output, known as a sequential module. The Linear Module maintains internal Tensors for its bias and weight and uses a linear function to compute the output from input. To calculate the backward pass, use autograd.

a.grad, b.grad will follow this call. Tensors holding the gradient of a loss concerning letters a, b, c, and d are designated as c.grad and d.grad, respectively. We utilized Mean Squared Error (MSE) as our loss function, which is defined in the nn package along with other prominent loss functions.

Forward pass: by giving the model x, calculate the anticipated y. The __call__ operator is extended by module objects, allowing you to call them just like functions. In doing so, you provide the Module a Tensor of input data, and it returns a Tensor of expected output. Duplicate and print loss The loss function provides a Tensor that contains the loss after receiving Tensors from us containing the true and anticipated values of y.

Calculate the gradient of a loss concerning each learnable model parameter during the backward pass. Due to the internal storage of each Module’s characteristics in Tensors that requires a grad set to True, this function will calculate gradients for all learnable model parameters. You can access the model’s initial layer by selecting the first item from a list.

At last, the result is displayed using the print function when the whole iteration is executed successfully.

Method 7

Torch.nn Module is generally PyTorch’s main building block. First, define an nn.Module object, and then call its forward function to run it. This method of operation is Object Oriented. An entirely connected ReLU network includes one hidden layer trained to forecast y from x by reducing squared Euclidean distance.

In this instance, the model is defined as a specific Module subclass. You must define your model in this way anytime you want it to be more complex than a straightforward listing of all the currently existing Modules. The code for the previous explanation can be found in the following Spyder screen:

Then calculate the backward pass using autograd. The method would determine the gradient of loss for all Tensors with required grad=True. A.grad, b.grad will follow this call. Tensors holding the gradient of the loss concerning letters a, b, c, and d are designated as c.grad and d.grad, respectively. We can also define a custom function in PyTorch.

After creating class polynomial3, we will declare tensors for holding input and output. Four arguments are created and assigned as member parameters in the function. When used with Modules, the subclass of tensors known as a parameter has a highly unique property. When assigned as Module attributes, the parameter is automatically included in the list of the module’s parameters and will appear in the module’s parameters list, e.g., in parameters() iterator. The parameters() method in each nn.Module retrieves the module’s trainable variables. These variables must be explicitly defined.

The forward function requires a Tensor of input data and a Tensor of output data. We can apply arbitrary operators to the Tensors and Modules defined in the constructor. Build an optimizer and our loss function. The trainable parameters, which are model members and are specified with the torch.nn.Parameter in the SGD constructor would be included in the call to model.parameters(). Then in the forward pass, calculate the predicted y (y_pred) and pass it the variable “var”.at the end, the zero gradients will automatically perform the backward pass and execute the weights.

Stochastic Gradient Descent (SGD) is an optimizer that belongs to the gradient descent family and is a well-known optimization technique used in deep learning and machine learning. The term “stochastic” refers to a system coupled to or linked with random probability in the SGD optimizer.

Ultimately, we will calculate the loss and print it by assigning it the criterion function responsible for the calculation of loss. We passed the predicted y and the modal that we trained. Then declaring if statement that is computing the mode by 10 of variable “t” values. Next, it will increment it by 9, and the resulting value of t is passed to the print function to which the loss will be computed.

At the end of the code, the result will be successfully executed, as shown in the following figure:


The application of the PyTorch API for typical deep learning model creation activities is the tutorial’s central objective. The information you’ve studied in this article should make it easier to put any sophisticated PyTorch concepts you may have thought of and learn PyTorch modules from basic Installation of the PyTorch on Python to the complex training of a model. We have also discussed the different ways to implement a single example using different modules, NumPy, Torch Tensor, Torch, and autograd, by establishing a new autograd function, optim function, nn modules, and also the custom nn module. The platform used to execute our code is Spyder; you can use other platforms, such as Conda, Jupiter, and Linux.

About the author

Kalsoom Bibi

Hello, I am a freelance writer and usually write for Linux and other technology related content