How to Group by Two & Multiple Columns of pandas DataFrame in Python
Group by Two & Multiple Columns of pandas DataFrame in Python (2 Examples)
On this page you’ll learn how to group a pandas DataFrame by two or more columns in the Python programming language.
The tutorial is structured as follows:
So now the part you have been waiting for – the examples.
Example Data & Libraries
First, we need to import the pandas library:
import
pandasas
pd# Import pandas library in Python
import pandas as pd # Import pandas library in Python
Furthermore, have a look at the following example data:
data=
pd.DataFrame
(
{
'x1'
:[
6
,
1
,
3
,
2
,
5
,
5
,
1
,
9
,
7
,
2
,
3
,
9
]
,
# Create pandas DataFrame
'x2'
:range
(
7
,
19
)
,
'group1'
:[
'A'
,
'B'
,
'B'
,
'A'
,
'C'
,
'A'
,
'A'
,
'C'
,
'B'
,
'C'
,
'B'
,
'A'
]
,
'group2'
:[
'a'
,
'a'
,
'a'
,
'a'
,
'a'
,
'a'
,
'b'
,
'b'
,
'b'
,
'b'
,
'b'
,
'b'
]
,
'group3'
:[
'x'
,
'y'
,
'z'
,
'x'
,
'y'
,
'z'
,
'x'
,
'y'
,
'z'
,
'x'
,
'y'
,
'z'
]
}
)
(
data)
# Print pandas DataFrame
data = pd.DataFrame({‘x1’:[6, 1, 3, 2, 5, 5, 1, 9, 7, 2, 3, 9], # Create pandas DataFrame
‘x2’:range(7, 19),
‘group1’:[‘A’, ‘B’, ‘B’, ‘A’, ‘C’, ‘A’, ‘A’, ‘C’, ‘B’, ‘C’, ‘B’, ‘A’],
‘group2’:[‘a’, ‘a’, ‘a’, ‘a’, ‘a’, ‘a’, ‘b’, ‘b’, ‘b’, ‘b’, ‘b’, ‘b’],
‘group3’:[‘x’, ‘y’, ‘z’, ‘x’, ‘y’, ‘z’, ‘x’, ‘y’, ‘z’, ‘x’, ‘y’, ‘z’]})
print(data) # Print pandas DataFrame
Table 1 shows the structure of our example DataFrame – It contains twelve rows and five columns. The variables group1, group2, and group3 will be used as group indicators in the following examples.
Example 1: GroupBy pandas DataFrame Based On Two Group Columns
Example 1 shows how to group the values in a pandas DataFrame based on two group columns.
To accomplish this, we can use the groupby function as shown in the following Python codes.
The syntax below returns the mean values by group using the variables group1 and group2 as group indicators.
(
data.groupby
(
[
'group1'
,
'group2'
]
)
.mean
(
)
)
# Get mean by two groups
# x1 x2
# group1 group2
# A a 4.333333 9.666667
# b 5.000000 15.500000
# B a 2.000000 8.500000
# b 5.000000 16.000000
# C a 5.000000 11.000000
# b 5.500000 15.000000
print(data.groupby([‘group1’, ‘group2’]).mean()) # Get mean by two groups
# x1 x2
# group1 group2
# A a 4.333333 9.666667
# b 5.000000 15.500000
# B a 2.000000 8.500000
# b 5.000000 16.000000
# C a 5.000000 11.000000
# b 5.500000 15.000000
Similar to that, we can calculate other descriptive statistics for the value columns by group such as the maximum values…
(
data.groupby
(
[
'group1'
,
'group2'
]
)
.max
(
)
)
# Get maxima by two groups
# x1 x2 group3
# group1 group2
# A a 6 12 z
# b 9 18 z
# B a 3 9 z
# b 7 17 z
# C a 5 11 y
# b 9 16 y
print(data.groupby([‘group1’, ‘group2’]).max()) # Get maxima by two groups
# x1 x2 group3
# group1 group2
# A a 6 12 z
# b 9 18 z
# B a 3 9 z
# b 7 17 z
# C a 5 11 y
# b 9 16 y
…or the addition of all values by group:
(
data.groupby
(
[
'group1'
,
'group2'
]
)
.sum
(
)
)
# Get sum by two groups
# x1 x2
# group1 group2
# A a 13 29
# b 10 31
# B a 4 17
# b 10 32
# C a 5 11
# b 11 30
print(data.groupby([‘group1’, ‘group2’]).sum()) # Get sum by two groups
# x1 x2
# group1 group2
# A a 13 29
# b 10 31
# B a 4 17
# b 10 32
# C a 5 11
# b 11 30
Example 2: GroupBy pandas DataFrame Based On Multiple Group Columns
In Example 1, we have created groups and subgroups using two group columns.
Example 2 demonstrates how to use more than two (i.e. three) variables to group our data set.
For this, we simply have to specify another column name within the groupby function.
The following Python codes calculate the mean by three group columns…
(
data.groupby
(
[
'group1'
,
'group2'
,
'group3'
]
)
.mean
(
)
)
# Get mean by three groups
# x1 x2
# group1 group2 group3
# A a x 4.0 8.5
# z 5.0 12.0
# b x 1.0 13.0
# z 9.0 18.0
# B a y 1.0 8.0
# z 3.0 9.0
# b y 3.0 17.0
# z 7.0 15.0
# C a y 5.0 11.0
# b x 2.0 16.0
# y 9.0 14.0
print(data.groupby([‘group1’, ‘group2’, ‘group3’]).mean()) # Get mean by three groups
# x1 x2
# group1 group2 group3
# A a x 4.0 8.5
# z 5.0 12.0
# b x 1.0 13.0
# z 9.0 18.0
# B a y 1.0 8.0
# z 3.0 9.0
# b y 3.0 17.0
# z 7.0 15.0
# C a y 5.0 11.0
# b x 2.0 16.0
# y 9.0 14.0
…the maxima by group…
(
data.groupby
(
[
'group1'
,
'group2'
,
'group3'
]
)
.max
(
)
)
# Get maxima by three groups
# x1 x2
# group1 group2 group3
# A a x 6 10
# z 5 12
# b x 1 13
# z 9 18
# B a y 1 8
# z 3 9
# b y 3 17
# z 7 15
# C a y 5 11
# b x 2 16
# y 9 14
print(data.groupby([‘group1’, ‘group2’, ‘group3’]).max()) # Get maxima by three groups
# x1 x2
# group1 group2 group3
# A a x 6 10
# z 5 12
# b x 1 13
# z 9 18
# B a y 1 8
# z 3 9
# b y 3 17
# z 7 15
# C a y 5 11
# b x 2 16
# y 9 14
…and the sum by group:
(
data.groupby
(
[
'group1'
,
'group2'
,
'group3'
]
)
.sum
(
)
)
# Get sum by three groups
# x1 x2
# group1 group2 group3
# A a x 8 17
# z 5 12
# b x 1 13
# z 9 18
# B a y 1 8
# z 3 9
# b y 3 17
# z 7 15
# C a y 5 11
# b x 2 16
# y 9 14
print(data.groupby([‘group1’, ‘group2’, ‘group3’]).sum()) # Get sum by three groups
# x1 x2
# group1 group2 group3
# A a x 8 17
# z 5 12
# b x 1 13
# z 9 18
# B a y 1 8
# z 3 9
# b y 3 17
# z 7 15
# C a y 5 11
# b x 2 16
# y 9 14
Video & Further Resources
Do you need more info on the examples of this tutorial? Then I can recommend watching the following video instruction which I have published on the Statistics Globe YouTube channel. I’m explaining the Python codes of this article in the video:
The YouTube video will be added soon.
Furthermore, you may read some of the related posts on my homepage.
To summarize: In this article you have learned how to group the values in a pandas DataFrame by two or more columns in the Python programming language. Please let me know in the comments, in case you have any additional questions or comments. Furthermore, please subscribe to my email newsletter for updates on new tutorials.