In pandas, indexing and selecting data in DataFrames is a crucial aspect of data manipulation. Here are some common techniques for indexing and selecting data in a pandas DataFrame:
Setting a Column as the Index:
import pandas as pd
# Sample DataFrame
data = {
'Name': ['John', 'Alice', 'Bob'],
'Age': [25, 30, 22],
'City': ['New York', 'London', 'Paris']
}
df = pd.DataFrame(data)
# Set 'Name' as the index
df.set_index('Name', inplace=True)
print("DataFrame with 'Name' as the index:")
print(df)
Accessing Rows by Index:
# Accessing a specific row by index label
john_data = df.loc['John']
print("\nData for 'John':")
print(john_data)
# Accessing rows by integer location
first_row = df.iloc[0]
print("\nData for the first row:")
print(first_row)
Resetting the Index:
# Resetting the index
df.reset_index(inplace=True)
print("\nDataFrame after resetting the index:")
print(df)
Multi-level Indexing:
# Creating a DataFrame with multi-level index
data = {
'Age': [25, 30, 22, 35, 28, 32],
'City': ['New York', 'London', 'Paris', 'New York', 'London', 'Paris']
}
index = pd.MultiIndex.from_tuples([('John', 1), ('Alice', 2), ('Bob', 1), ('John', 2), ('Alice', 1), ('Bob', 2)],
names=['Name', 'Group'])
multi_df = pd.DataFrame(data, index=index)
print("\nDataFrame with Multi-level Index:")
print(multi_df)
# Accessing data using multi-level index
john_data_group1 = multi_df.loc[('John', 1)]
print("\nData for 'John' in Group 1:")
print(john_data_group1)
Conditional Indexing:
# Conditional indexing
filtered_df = df[df['Age'] > 25]
print("\nDataFrame with Age > 25:")
print(filtered_df)
These are just some of the common techniques for indexing and selecting data in pandas DataFrames. Depending on your specific requirements, you may need to use a combination of these methods. The pandas documentation provides comprehensive information on indexing and selecting data in DataFrames.