Data Structures & Algorithms

Create a New DataFrame From an Existing DataFrame in Pandas?

Sometimes, we need to copy the existing DataFrame with data and indices. However, copying the whole DataFrame is also another way for there to be a direct relationship created between the old DataFrame and the new DataFrame. If we make any changes in the old DataFrame, it will also affect the new DataFrame or vice-versa.

In this article, we are going to see pandas.DataFrame.copy () method, which is used for copy () dataframe.

If we want to create a new DataFrame from an existing DataFrame, then we can use the copy()method. So, in this article, we are going to see how we can use the Pandas DataFrame.copy() method to create another DataFrame from an existing DataFrame.

The Syntax Is Given Below:

DataFrame.copy(deep=True)

In the syntax above, we can see that there is deep either false and true.

These two values are very important to use the copy() method. Let’s see in the details about these two values.

Deep (True): Whenever we use the copy () method, the deep is true by default. This true value indicates that we have to copy all the data and indices from the existing DataFrame and create a new object. Suppose we do any manipulation to the new DataFrame, it will not affect the old DataFrame or vice-versa, which means there will be no relationship connection between the old and new DataFrame, and both can work independently.

Deep (False): When we keep the value of the deep false, then the copy () creates a new object without the data and index. It will create a reference to the data and index of the original DataFrame. If any manipulation to the original DataFrame, it will also affect the shallow copy DataFrame or vice-versa.

Example 1:

Copy the DataFrame using the deep=True:

# python example_1.py
import pandas as pd
data = {'TV_Show_name': ['The Walking Dead', 'Merlin', 'little evil',
                         'Money Heist'],
        'TV_Streaming_name': ['Netflix', 'Fx', 'Disney Plus',
                              'Amazon Prime'],
        'show_Season': [4, 10, 4, 5],
        'Main Actor': ['Rick Grimes', 'Mordred', 'Karl C. Miller',
                       'Sergio Marquina']}
df = pd.DataFrame.from_dict(data)
print('Original DataFrame')
print(df)
print('_________________________________________________________')
dfCopy = df.copy()
print('Copied DataFrame')
print(dfCopy)

Line 2: We import the library Pandas as pd. Here, pd means we are importing the Pandas library with the new namespace name called pd. We can use the pd instead of using the pandas full name.

Line 3 to 10: We created a dict with some keys and values, wherein the values are in the list. After creating the dictionary, we convert that dict to a DataFrame (df) using the DataFrame.from_dict () method.

Line 11 to 12: We are printing our dataframe (df), which shows in the output below.

Line 14: We are creating a copy of the df (DataFrame) from the existing df (DataFrame). Here, we are not using any deep=True because that is by default. And, as shown in deep=True, it will create a new object with all data and indices of the existing DataFrame, and there will be no direct relationship between the copy DataFrame and the old DataFrame.

Line 15 to 16: We are printing our copied DataFrame (dfCopy), and the output is shown below:

Output:

Original DataFrame
       TV_Show_name TV_Streaming_name  show_Season       Main Actor
0  The Walking Dead           Netflix            4      Rick Grimes
1            Merlin                Fx           10          Mordred
2       little evil       Disney Plus            4   Karl C. Miller
3       Money Heist      Amazon Prime            5  Sergio Marquina
_________________________________________________________
Copied DataFrame
       TV_Show_name TV_Streaming_name  show_Season       Main Actor
0  The Walking Dead           Netflix            4      Rick Grimes
1            Merlin                Fx           10          Mordred
2       little evil       Disney Plus            4   Karl C. Miller
3       Money Heist      Amazon Prime            5  Sergio Marquina

Process finished with exit code 0

Example 2:

In this example, we are going to manipulate the old DataFrame and check whether it will affect the dfCopy DataFrame or not. Here, we are using the deep=True to copy the DataFrame:

# python example_2.py
import pandas as pd
data = {'TV_Show_name': ['The Walking Dead', 'Merlin', 'little evil',
                         'Money Heist'],
        'TV_Streaming_name': ['Netflix', 'Fx', 'Disney Plus',
                              'Amazon Prime'],
        'show_Season': [4, 10, 4, 5],
        'Main Actor': ['Rick Grimes', 'Mordred', 'Karl C. Miller',
                       'Sergio Marquina']}
df = pd.DataFrame.from_dict(data)
print('Original DataFrame')
print(df)
print('_________________________________________________________')
dfCopy = df.copy()

print('Copied DataFrame')
print(dfCopy)
print('_________________________________________________________')
print("************Manipulation done in the original df***************")
# Now, we are doing data manipulation in the original dataframe
# we are changing the column ('TV_Show_name') values to A,B,C,D
# now, we will see this will affect to the dfCopy dataframe or not
df['TV_Show_name'] = df['TV_Show_name'].replace(['The Walking Dead',
            'Merlin', 'little evil','Money Heist'],['A','B','C','D'])

#Now printing both dfCopy(deep=True) and df (original) dataframe
print('Original DataFrame')
print(df)
print('Copied DataFrame')
print(dfCopy)

Line 1 to 18: Explanations are already given in the previous program in Example 1.

Line 23: We replace the original df (dataframe) column ([‘TV_Show_name’]) values into [‘A’,’B’,’C’,’D’]. Now, we will check if this manipulation in the original df (dataframe) will affect the dfCopy (deep=True) or not. As we know already, there is no direct relationship between when we use the deep=True.

Line 27 to 30: We print the original df and copy (dataframe) as shown in the output below. From the output, we can confirm that the changes done in the original DataFrame (df) have no effect on the copy (DataFrame):

Output:

Original DataFrame
       TV_Show_name TV_Streaming_name  show_Season       Main Actor
0  The Walking Dead           Netflix            4      Rick Grimes
1            Merlin                Fx           10          Mordred
2       little evil       Disney Plus            4   Karl C. Miller
3       Money Heist      Amazon Prime            5  Sergio Marquina
_________________________________________________________
Copied DataFrame
       TV_Show_name TV_Streaming_name  show_Season       Main Actor
0  The Walking Dead           Netflix            4      Rick Grimes
1            Merlin                Fx           10          Mordred
2       little evil       Disney Plus            4   Karl C. Miller
3       Money Heist      Amazon Prime            5  Sergio Marquina
_________________________________________________________
************Manipulation done in the original df***************
Original DataFrame
  TV_Show_name TV_Streaming_name  show_Season       Main Actor
0            A           Netflix            4      Rick Grimes
1            B                Fx           10          Mordred
2            C       Disney Plus            4   Karl C. Miller
3            D      Amazon Prime            5  Sergio Marquina
Copied DataFrame
       TV_Show_name TV_Streaming_name  show_Season       Main Actor
0  The Walking Dead           Netflix            4      Rick Grimes
1            Merlin                Fx           10          Mordred
2       little evil       Disney Plus            4   Karl C. Miller
3       Money Heist      Amazon Prime            5  Sergio Marquina

From the above example 2, we can confirm that deep=True value when set, the newly created DataFrame from the existing DataFrame has no direct relationship and can perform manipulation without affecting each other.

Example 3:

In this example, we are going to manipulate the old DataFrame and check whether it will affect the dfCopy DataFrame or not. Here, we are using the deep=False to copy the DataFrame:

# python example_3.py
import pandas as pd

data = {'TV_Show_name': ['The Walking Dead', 'Merlin', 'little evil',
                         'Money Heist'],
        'TV_Streaming_name': ['Netflix', 'Fx', 'Disney Plus',
                              'Amazon Prime'],
        'show_Season': [4, 10, 4, 5],
        'Main Actor': ['Rick Grimes', 'Mordred', 'Karl C. Miller',
                       'Sergio Marquina']}
df = pd.DataFrame.from_dict(data)
print('Original DataFrame')
print(df)
print('_________________________________________________________')
dfCopy = df.copy(deep=False)
print('Copied DataFrame')
print(dfCopy)
print('_________________________________________________________')

# Now, we are doing data manipulation in the original dataframe
# we are changing the column ('TV_Show_name') values to A,B,C,D
# now, we will see this will affect to the dfCopy dataframe or not
df['TV_Show_name'] = df['TV_Show_name'].replace(['The Walking Dead',
        'Merlin', 'little evil','Money Heist'],['A','B','C','D'])

#Now printing both dfCopy(deep=False) and df (original) dataframe
print('_________________________________________________________')
print('Copied DataFrame')
print(dfCopy)
print('Original DataFrame')
print(df)

Line 1 to 18: Explanations are already given in the program of Example 1. The one change was done at line no. 15. Now, we are using the deep=False instead deep=True.

Line 23: We replace the original df (DataFrame) column ([‘TV_Show_name’]) values into [‘A’,’B’,’C’,’D’]. Now, we will check if this manipulation in the original df (dataframe) will affect the dfCopy (deep=False) or not. As we know already, there is a direct relationship between when we use the deep=False.

Line 27 to 30: We print the original df and copy (DataFrame) as shown in the output below. From the output, we can confirm that the changes done in the original DataFrame (df) have an effect on the copy (DataFrame). The values of the column ([‘TV_Show_name’]) also change in the copy DataFrame.

Output:

Original DataFrame
       TV_Show_name TV_Streaming_name  show_Season       Main Actor
0  The Walking Dead           Netflix            4      Rick Grimes
1            Merlin                Fx           10          Mordred
2       little evil       Disney Plus            4   Karl C. Miller
3       Money Heist      Amazon Prime            5  Sergio Marquina
_________________________________________________________
Copied DataFrame
       TV_Show_name TV_Streaming_name  show_Season       Main Actor
0  The Walking Dead           Netflix            4      Rick Grimes
1            Merlin                Fx           10          Mordred
2       little evil       Disney Plus            4   Karl C. Miller
3       Money Heist      Amazon Prime            5  Sergio Marquina
_________________________________________________________
_________________________________________________________
Copied DataFrame
  TV_Show_name TV_Streaming_name  show_Season       Main Actor
0            A           Netflix            4      Rick Grimes
1            B                Fx           10          Mordred
2            C       Disney Plus            4   Karl C. Miller
3            D      Amazon Prime            5  Sergio Marquina
Original DataFrame
  TV_Show_name TV_Streaming_name  show_Season       Main Actor
0            A           Netflix            4      Rick Grimes
1            B                Fx           10          Mordred
2            C       Disney Plus            4   Karl C. Miller
3            D      Amazon Prime            5  Sergio Marquina

Example_4:

Copy the existing DataFrame using assignment operator, which has same direct relationship issue like deep=False:

# python example_4.py
import pandas as pd

data = {'TV_Show_name': ['The Walking Dead', 'Merlin', 'little evil',
                         'Money Heist'],
        'TV_Streaming_name': ['Netflix', 'Fx', 'Disney Plus',
                              'Amazon Prime'],
        'show_Season': [4, 10, 4, 5],
        'Main Actor': ['Rick Grimes', 'Mordred', 'Karl C. Miller',
                       'Sergio Marquina']}
df = pd.DataFrame.from_dict(data)
print('Original DataFrame')
print(df)
print('_________________________________________________________')
dfCopy = df
print('Copied DataFrame')
print(dfCopy)
print('_________________________________________________________')

# Now, we are doing data manipulation in the original dataframe
# we are changing the column ('TV_Show_name') values to A,B,C,D
# now, we will see this will affect to the dfCopy dataframe or not
df['TV_Show_name'] = df['TV_Show_name'].replace(['The Walking Dead',
        'Merlin', 'little evil','Money Heist'],['A','B','C','D'])

#Now printing both dfCopy and df (original) dataframe
print('_________________________________________________________')
print('Copied DataFrame')
print(dfCopy)
print('Original DataFrame')
print(df)

Line 15: In the above program Example 4, we direct the Dataframe to another variable without using the copy () method. But this also creates a direct relationship between the original DataFrame and the copied DataFrame like the deep=False. The following output shows that if we change anything in the original DataFrame, then it will also affect the copied DataFrame or vice-versa:

Output:

Original DataFrame
       TV_Show_name TV_Streaming_name  show_Season       Main Actor
0  The Walking Dead           Netflix            4      Rick Grimes
1            Merlin                Fx           10          Mordred
2       little evil       Disney Plus            4   Karl C. Miller
3       Money Heist      Amazon Prime            5  Sergio Marquina
_________________________________________________________
Copied DataFrame
       TV_Show_name TV_Streaming_name  show_Season       Main Actor
0  The Walking Dead           Netflix            4      Rick Grimes
1            Merlin                Fx           10          Mordred
2       little evil       Disney Plus            4   Karl C. Miller
3       Money Heist      Amazon Prime            5  Sergio Marquina
_________________________________________________________
_________________________________________________________
Copied DataFrame
  TV_Show_name TV_Streaming_name  show_Season       Main Actor
0            A           Netflix            4      Rick Grimes
1            B                Fx           10          Mordred
2            C       Disney Plus            4   Karl C. Miller
3            D      Amazon Prime            5  Sergio Marquina
Original DataFrame
  TV_Show_name TV_Streaming_name  show_Season       Main Actor
0            A           Netflix            4      Rick Grimes
1            B                Fx           10          Mordred
2            C       Disney Plus            4   Karl C. Miller
3            D      Amazon Prime            5  Sergio Marquina

Conclusion:

In this article, we have seen the correct way to copy the existing DataFrame, and doing this will create a new object with data and indices. As we have seen, when we keep the deep value False, it will create a reference to the data and indices to the new copy DataFrame. So, copy using the assignment operator also works in the same way (deep=False), as we have already seen in this article with the help of an example.

Sometimes we need only some of the columns to copy from the existing DataFrame, not the whole. Then we can use the following method, which is similar to the copy (deep=True) but with the name of the columns:

new_df = old_df[['A', 'B', 'C']].copy()

Be careful. If you have only one column, then you must use double square brackets. Otherwise, it will create a series, not a DataFrame.

new_df = old_df[['A']].copy()

The code for this article is available at the GitHub link:

https://github.com/shekharpandey89/pandas-dataframe-copy-method

About the author

Shekhar Pandey