fbpx

Data Frames and Basic operation with data frames

DataFrames are a fundamental data structure in the context of data manipulation and analysis, especially in the realm of data science and machine learning. While Python has various libraries for working with DataFrames, one of the most popular ones is pandas. In this example, I’ll demonstrate basic operations using DataFrames with the pandas library.

Installing Pandas:

Before you start, ensure you have the pandas library installed. You can install it using the following:

pip install pandas

Basic DataFrame Operations:

import pandas as pd

# Creating a DataFrame from a dictionary
data = {
    'Name': ['John', 'Alice', 'Bob'],
    'Age': [25, 30, 22],
    'City': ['New York', 'London', 'Paris']
}

df = pd.DataFrame(data)

# Display the DataFrame
print("Original DataFrame:")
print(df)

# Accessing columns
print("\nAccessing Columns:")
print(df['Name'])
print(df['Age'])

# Accessing rows
print("\nAccessing Rows:")
print(df.loc[0])  # Using index
print(df.iloc[1])  # Using integer location

# Adding a new column
df['Country'] = ['USA', 'UK', 'France']
print("\nDataFrame after adding 'Country' column:")
print(df)

# Filtering data
filtered_df = df[df['Age'] > 25]
print("\nFiltered DataFrame (Age > 25):")
print(filtered_df)

# Reading from CSV (assuming the same CSV file from the previous example)
csv_file_path = 'path/to/your/file/data.csv'
csv_df = pd.read_csv(csv_file_path)

# Display the DataFrame from the CSV file
print("\nDataFrame from CSV:")
print(csv_df)

In this example:

  • We create a DataFrame from a dictionary where keys represent column names, and values represent column values.
  • We access columns using square brackets and rows using loc (location) and iloc (integer location).
  • We add a new column (‘Country’) to the DataFrame.
  • We filter the DataFrame based on a condition (age > 25).
  • Finally, we read data from a CSV file into a DataFrame using pd.read_csv().

Make sure to replace 'path/to/your/file/data.csv' with the actual path to your CSV file.

Pandas provides a vast array of functionalities for data manipulation, cleaning, and analysis. Understanding these operations is crucial for effective data handling in Python. Refer to the pandas documentation for more details on the available functionalities and methods.