📜  pandas, python, dataframes - Python (1)

📅  最后修改于: 2023-12-03 15:33:25.010000             🧑  作者: Mango

Introduction to Pandas, Python and DataFrames

Pandas is a popular data manipulation library for Python. It provides data structures for efficiently storing and manipulating large datasets. One of the key data structures in pandas is the DataFrame, which is similar to a spreadsheet or SQL table.

Installation

To install pandas, you can use pip, the Python package manager:

!pip install pandas
Creating DataFrames

You can create a DataFrame from a dictionary:

import pandas as pd

data = {
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [25, 30, 35],
    'gender': ['F', 'M', 'M']
}

df = pd.DataFrame(data)

You can also create a DataFrame from a CSV file:

df = pd.read_csv('data.csv')
Basic Operations

You can display the top and bottom rows of a DataFrame using the head() and tail() methods:

print(df.head())
print(df.tail())

You can also select columns and rows using indexing:

# selecting a single column
print(df['name'])

# selecting multiple columns
print(df[['name', 'age']])

# selecting rows by condition
print(df[df['age'] > 30])
DataFrame Manipulation

Pandas provides many methods for manipulating DataFrames. Here are a few examples:

# adding a new column
df['city'] = ['New York', 'Chicago', 'Los Angeles']

# renaming a column
df = df.rename(columns={'gender': 'sex'})

# dropping rows with missing values
df = df.dropna()

# grouping by a column and aggregating
df.groupby('sex').agg({'age': 'mean'})
Conclusion

Pandas is a powerful library for working with data in Python. DataFrames provide a convenient and efficient way to manipulate large datasets. With its rich set of features, Pandas is a must-have tool for any data scientist or programmer working with data.