fbpx

Indexing Data Frames

In pandas, indexing and selecting data in DataFrames is a crucial aspect of data manipulation. Here are some common techniques for indexing and selecting data in a pandas DataFrame:

Setting a Column as the Index:

import pandas as pd

# Sample DataFrame
data = {
    'Name': ['John', 'Alice', 'Bob'],
    'Age': [25, 30, 22],
    'City': ['New York', 'London', 'Paris']
}

df = pd.DataFrame(data)

# Set 'Name' as the index
df.set_index('Name', inplace=True)

print("DataFrame with 'Name' as the index:")
print(df)

Accessing Rows by Index:

# Accessing a specific row by index label
john_data = df.loc['John']
print("\nData for 'John':")
print(john_data)

# Accessing rows by integer location
first_row = df.iloc[0]
print("\nData for the first row:")
print(first_row)

Resetting the Index:

# Resetting the index
df.reset_index(inplace=True)

print("\nDataFrame after resetting the index:")
print(df)

Multi-level Indexing:

# Creating a DataFrame with multi-level index
data = {
    'Age': [25, 30, 22, 35, 28, 32],
    'City': ['New York', 'London', 'Paris', 'New York', 'London', 'Paris']
}

index = pd.MultiIndex.from_tuples([('John', 1), ('Alice', 2), ('Bob', 1), ('John', 2), ('Alice', 1), ('Bob', 2)],
                                  names=['Name', 'Group'])

multi_df = pd.DataFrame(data, index=index)

print("\nDataFrame with Multi-level Index:")
print(multi_df)

# Accessing data using multi-level index
john_data_group1 = multi_df.loc[('John', 1)]
print("\nData for 'John' in Group 1:")
print(john_data_group1)

Conditional Indexing:

# Conditional indexing
filtered_df = df[df['Age'] > 25]
print("\nDataFrame with Age > 25:")
print(filtered_df)

These are just some of the common techniques for indexing and selecting data in pandas DataFrames. Depending on your specific requirements, you may need to use a combination of these methods. The pandas documentation provides comprehensive information on indexing and selecting data in DataFrames.