🤖
Artificial Intelligence for Idiot
  • Introduction
  • What is Machine Learning
  • Python Basics
    • Random Number
  • Data Manipulation
    • Numpy
    • Pandas
  • Data Visualization
    • Matplotlib
      • Simple drawing
      • Simple line and point
      • Any line and label
      • Annotation in reality
  • Dataset Searching
  • Keras
    • Data preparation
    • Build the simplest model
  • Robot
    • Speech to Text
Powered by GitBook
On this page
  • First in first
  • Create your first DataFrame
  • A general pandas DataFrame
  • get one column from your pandas dataframe
  • get two columns from your pandas dataframe
  • drop a column in your table
  • get a new column based on other exists columns
  • add a column to your table
  • create an empty table with head titles
  • add a row to that table with a dict
  • add a row to that table with a list
  • export your table as an excel file
  • read your excel table
  • get one column as a NumPy array

Was this helpful?

  1. Data Manipulation

Pandas

First in first

Pandas is a Python tool for handling tables, which works like excel, but it's programmable.

import pandas as pd

Create your first DataFrame

>>> numpy_array = np.arange(1,3, step=0.5)
>>> pd.DataFrame(numpy_array)
     0
0  1.0
1  1.5
2  2.0
3  2.5

A general pandas DataFrame

>>> data = {'column1': numpy_array, 'column2': numpy_array*2}
>>> pd.DataFrame(data)
   column1  column2
0      1.0      2.0
1      1.5      3.0
2      2.0      4.0
3      2.5      5.0

get one column from your pandas dataframe

>>> table = pd.DataFrame(data)
>>> table
   column1  column2
0      1.0      2.0
1      1.5      3.0
2      2.0      4.0
3      2.5      5.0
>>> table['column1']
0    1.0
1    1.5
2    2.0
3    2.5

get two columns from your pandas dataframe

>>> data = {'column1': numpy_array, 'column2': numpy_array*2, 'another_column': numpy_array*3}
>>> table = pd.DataFrame(data)
>>> table
   column1  column2  another_column
0      1.0      2.0             3.0
1      1.5      3.0             4.5
2      2.0      4.0             6.0
3      2.5      5.0             7.5

>>> table[["column2", "another_column"]]
   column2  another_column
0      2.0             3.0
1      3.0             4.5
2      4.0             6.0
3      5.0             7.5

drop a column in your table

>>> table
   column1  column2  another_column
0      1.0      2.0             3.0
1      1.5      3.0             4.5
2      2.0      4.0             6.0
3      2.5      5.0             7.5

>>> table.drop("another_column", axis=1)
   column1  column2
0      1.0      2.0
1      1.5      3.0
2      2.0      4.0
3      2.5      5.0

Here the axis means:

{0 or ‘index’, 1 or ‘columns’}, default 0 index is a number, columns is a string, for both values, you can put it into a list for batch handling

get a new column based on other exists columns

>>> table = table.drop("another_column", axis=1)
>>> table
   column1  column2
0      1.0      2.0
1      1.5      3.0
2      2.0      4.0
3      2.5      5.0

>>> def greater_than_2(row):
...     if row['column1'] > 2:
...         return 'yes'
...     else:
...         return 'no'
... 
>>> table.apply(lambda row: greater_than_2(row), axis=1)
0     no
1     no
2     no
3    yes
dtype: object
>>>

add a column to your table

>>> new_column = table.apply(lambda row: greater_than_2(row), axis=1)
>>> table['new_column'] = new_column
>>> table
   column1  column2 new_column
0      1.0      2.0         no
1      1.5      3.0         no
2      2.0      4.0         no
3      2.5      5.0        yes

create an empty table with head titles

>>> table = pd.DataFrame(columns=["name", "age"])
>>> table
Empty DataFrame
Columns: [name, age]
Index: []

add a row to that table with a dict

>>> new_row = {'name':'yingshaoxo', 'age':22}
>>> table = table.append(new_row, ignore_index=True)
>>> table
         name age
0  yingshaoxo  22

add a row to that table with a list

>>> new_row = ["yingshaoxo", 22]
>>> a_series = pd.Series(new_row, index=table.columns)
>>> table = table.append(a_series, ignore_index=True)
>>> table
         name age
0  yingshaoxo  22
1  yingshaoxo  22

export your table as an excel file

table.to_excel("hello.xlsx")

read your excel table

table = pd.read_excel('hello.xlsx', index_col=0)  

get one column as a NumPy array

>>> df = pd.DataFrame({"A":[1, 2], "B":[3, 4], "C":[5, 6]})
>>> df[["A"]].to_numpy()
array([[1],
       [2]], dtype=int64)
PreviousNumpyNextData Visualization

Last updated 4 years ago

Was this helpful?