Python Pandas

How to Fix ValueError: All arrays must be of the same length

In this guide, we will discuss the ValueError: All arrays must be of the same length by reproducing this first and provide two solutions to resolve this error. In Python, ValueError occurs when an incorrect value is provided to the object.

ValueError: All arrays must be of the same length

In pandas DataFrame, column refers to an array of elements/values. This error occurs when the length (number of elements) of the column is different in another column while creating the pandas DataFrame. Let’s reproduce the error first.

Create pandas DataFrame named Industry with two columns of unequal length. The ‘Type’ column holds 10 elements, while the ‘Location’ column holds only eight elements.

import pandas

import numpy

# Try to create pandas DataFrame - Industry with two columns

# of unequal length

Industry = pandas.DataFrame({'Type': ['Media','Not For Profit','Recreation','Retail','Shipping','Technology','Telecommunications',
'Transportation','Utilities','Other'],

'Location':['USA','India','Italy','Japan','USA','India','Italy','Japan']})

print(Industry)

Output

Solution 1: Check the Values in all Columns While Creation

This is the manual scenario where the developer creates the DataFrame with an equal number of elements in all columns. By doing this, the DataFrame is created without the ValueError.

Let’s create the same DataFrame with two columns and pass 10 elements into each column.

import pandas

import numpy

# Create pandas DataFrame - Industry with two columns

# of equal length

Industry = pandas.DataFrame({'Type': ['Media','Not For Profit','Recreation','Retail','Shipping','Technology','Telecommunications',
'Transportation','Utilities','Other'],

'Location':['USA','India','Italy','Japan','USA','India','Italy','Japan','Italy','Japan']})

print(Industry)

Output

You can see that DataFrame Industry has been created with two columns and ten rows.

Solution 2: Fill with Other Elements Based on the Length

This will work for the DataFrame with two columns. In this scenario, first, we will check the length of the arrays that need to be passed as columns.

Step 1: If both arrays (in terms of length) are not equal,

Step 2: If this condition is satisfied and the length of the first array is greater than the second array, we will add that number of elements to the second array using the concatenation operator (+).

Step 3: If this condition is satisfied and the length of the second array is greater than the first array, we will add that number of elements to the first array using the concatenation operator (+).

If Step 1 is not satisfied, the DataFrame is created directly with the actual arrays.

Example 1

Let’s create the same DataFrame with two columns. The first list, Type, holds ten elements passed as first column, and the second list, Location, holds only eight elements passed as the second column.

  1. If the length of the first list (Type) is greater than the length of the second first list (Location), we will append ‘Italy’ to the Location list.
  2. If the length of the first list (Type) is less than the length of the second list (Location), we will append ‘Utilities’ to the Type list.
import pandas

import numpy

Type=['Media','Not For Profit','Recreation','Retail','Shipping','Technology','Telecommunications',
'Transportation','Utilities','Other']

Location = ['USA','India','Italy','Japan','USA','India','Italy','Japan']

if (len(Type) != len(Location)):

if (len(Type) > len(Location)):

Location += (len(Type)-len(Location)) * ['Italy']

elif(len(Type) < len(Location)):

Type += (len(Location)-len(Type)) * ['Utilities']

# Create pandas DataFrame - Industry with two columns from Type and Location
Industry = pandas.DataFrame({'Type': Type,'Location':Location})
print(Industry)

Output

The first condition within the main condition is satisfied. Therefore ‘Italy’ is added two times to the second list to make it match the length of the first list.

Example 2

Let’s create the same DataFrame with two columns. The first list, Type, holds only six elements passed as the first column, and the second list, Location, holds eight elements passed as the second column. Create the DataFrame by appending ‘Utilities’ to the Type by providing the same conditions like Example 1.

import pandas

import numpy

Type=['Media','Not For Profit','Recreation','Retail','Shipping','Utilities']

Location = ['USA','India','Italy','Japan','USA','India','Italy','Japan']

if (len(Type) != len(Location)):

if (len(Type) > len(Location)):

Location += (len(Type)-len(Location)) * ['Italy']

elif(len(Type) < len(Location)):

Type += (len(Location)-len(Type)) * ['Utilities']

# Create pandas DataFrame - Industry with two columns from Type and Location
Industry = pandas.DataFrame({'Type': Type,'Location':Location})
print(Industry)

Output

The second condition within the main condition is satisfied. Therefore, ‘Utilities’ is added two times to the first list to make it match with respect to the second list.

Conclusion

We have seen how to fix the ValueError: All arrays must be of the same length by providing two different solutions. The second approach only applies to DataFrame with two columns. If you have more than two columns, you’ll need to pass the multiple conditions within the main ‘if’ statement. The first approach is the most suitable for any kind of DataFrames because you need to create the DataFrame with an equal number of elements in all columns while creating the DataFrame.

About the author

Gottumukkala Sravan Kumar

B tech-hon's in Information Technology; Known programming languages - Python, R , PHP MySQL; Published 500+ articles on computer science domain