Day 13: Pandas for Data Manipulation

Topics to Cover:

  • Introduction to Pandas
  • DataFrames and Basic Operations

Introduction to Pandas

Pandas is a powerful data manipulation and analysis library for Python. It provides data structures and functions needed to work with structured data seamlessly.

Installing Pandas:
If you don’t have Pandas installed, you can install it using pip:

pip install pandas

Pandas DataFrames

A DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. It is similar to a spreadsheet or SQL table.

Creating a DataFrame:

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'San Francisco', 'Los Angeles']
}

df = pd.DataFrame(data)
print(df)

Basic Operations with DataFrames

Pandas allows you to perform a wide range of operations on DataFrames.

Loading a Dataset:

# Loading a dataset from a CSV file
df = pd.read_csv('data.csv')

Filtering Data:

# Filtering rows based on a condition
filtered_df = df[df['Age'] > 30]
print(filtered_df)

Grouping Data:

# Grouping data by a column and calculating the mean
grouped_df = df.groupby('City')['Age'].mean()
print(grouped_df)

Calculating Summary Statistics:

# Calculating summary statistics for the dataset
summary_stats = df.describe()
print(summary_stats)

Potential Problems to Solve

Problem 1: Load a Dataset and Perform Basic Data Manipulation

Task: Load a dataset into a Pandas DataFrame and perform basic data manipulation (e.g., filtering, grouping).

Solution:

import pandas as pd

# Loading a dataset
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 30, 35, 40, 45],
    'City': ['New York', 'San Francisco', 'Los Angeles', 'New York', 'San Francisco']
}

df = pd.DataFrame(data)

# Filtering rows where Age is greater than 30
filtered_df = df[df['Age'] > 30]
print("Filtered DataFrame:")
print(filtered_df)

# Grouping by City and calculating the mean age
grouped_df = df.groupby('City')['Age'].mean()
print("\nGrouped DataFrame (Mean Age by City):")
print(grouped_df)

Problem 2: Calculate Summary Statistics for a Dataset

Task: Calculate summary statistics for a dataset.

Solution:

import pandas as pd

# Loading a dataset
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 30, 35, 40, 45],
    'City': ['New York', 'San Francisco', 'Los Angeles', 'New York', 'San Francisco']
}

df = pd.DataFrame(data)

# Calculating summary statistics
summary_stats = df.describe()
print("Summary Statistics:")
print(summary_stats)

Conclusion

Pandas is an essential tool for data manipulation and analysis in Python. By mastering DataFrames and basic operations, you can efficiently handle and analyze data.


Stay tuned for Day 14 of the python4ai 30-day series, where we will continue exploring advanced Python topics to enhance our programming skills!

Team
Team

This account on Doubtly.in is managed by the core team of Doubtly.

Articles: 418