We will be discussing Pandas in Python, an open-source library that delivers high-performance data structures and data analysis tools that are ready to use. We will also learn about the DataFrame, the advantages of Pandas, and how you can use Pandas to select multiple columns of a DataFrame . Let’s get started!
What is Pandas in Python?
Pandas is a Python open-source library. It delivers efficient structures and tools for data analysis that are ready to use. Pandas is a Python module that operates on top of NumPy and is widely used for data science and analytics. NumPy is another set of low-level data structures that can handle multi-dimensional arrays and a variety of mathematical array operations. Pandas have a more advanced user interface. It also has robust time-series capability and efficient tabular data alignment. Pandas’ primary data structure is the DataFrame. A 2-D data structure allows us to store and modify tabular data. Pandas provide any functionality to the DataFrame like data manipulation, concatenation, merging, grouping, etc.
What is a DataFrame?
The most essential and extensively used data structure is the DataFrame. It is a common method of data storage. DataFrame stores data in rows and columns, just like an SQL table or a spreadsheet database.
Advantages of Pandas
Many users wish that the SQL have included capabilities like the Gaussian random number generation or quantiles because they struggle to incorporate a procedural notion into an SQL query. Users may say, “If only I could write this in Python and switch back to SQL quickly,” and Pandas provides a tabular data type with well-designed interfaces that allow them to do exactly that. There are more verbose options, such as utilizing a specific procedural language like the Oracle’s PLSQL or Postgres’ PLPGSQL or a low-level database interface. Pandas have a one-liner SQL read interface (pd.read sql) and a one-liner SQL write interface (pd.to sql), comparable to R data frames.
Another significant advantage is that the charting libraries such as Seaborn may treat the data frame columns as high-level graph attributes. So, Pandas provide a reasonable way of managing the tabular data in Python and some very wonderful storage and charting APIs.
Option 1: Using the Basic Key Index
1 2 3 4 5 6 7 8 9 10 | import pandas as pd data = {'Name':['A', 'B', 'C', 'D'], 'Age':[27, 24, 22, 32]} df = pd.DataFrame(data) df[['Name', 'Age']] |
Output:
1 2 3 4 5 6 7 8 9 | Name Age 0 A 27 1 B 24 2 C 22 3 D 32 |
Option 2: Using .loc[]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | import pandas as pd data = {'Fruit':['Apple', 'Banana', 'Grapes', 'Orange'], 'Price':[160, 100, 60, 80]} df = pd.DataFrame(data) df.loc[0:2, ['Fruit', 'Price']] |
Output:
1 2 3 4 5 6 7 8 9 | Fruit Price 0 Apple 160 1 Banana 100 2 Grapes 60 3 Orange 80 |
Option 3: Using .iloc[]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | import pandas as pd data = {'Dog':['A', 'B', 'C', 'D'], 'Age':[2, 4, 3, 1]} df = pd.DataFrame(data) df.iloc[:, 0:2] |
Output:
1 2 3 4 5 6 7 8 9 | Dog Age 0 A 2 1 B 4 2 C 3 3 D 1 |
Options 4: Using .ix[]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | import pandas as pd data = {'Name':['A', 'B', 'C', 'D'], 'Roll number':[21, 25, 19, 49]} df = pd.DataFrame(data) print(df.ix[:, 0:2]) |
Output:
1 2 3 4 5 6 7 8 9 | Name Roll number 0 A 21 1 B 25 2 C 19 3 D 49 |
Conclusion
We discussed about Pandas in Python, the DataFrame, the advantages of Pandas, and how to use Pandas to select multiple columns of a DataFrame. There are four options that we discussed in selecting multiple columns: using the basic key indexing, “.ix”, “.loc”, and “.iloc”, respectively.