Data preparation
import numpy as np
import pandas as pd
First, let us get some intuition from it
>>> data_table = pd.read_csv("/home/yingshaoxo/500_Person_Gender_Height_Weight_Index.csv")
>>> data_table.head()
Gender Height Weight Index
0 Male 174 96 4
1 Male 189 87 2
2 Female 185 110 4
3 Female 195 104 3
4 Male 149 61 3
For our case, we only need the height and weight
as input
, gender
as output
height and weight
as input
, gender
as output
>>> x = data_table[['Height', 'Weight']].values
>>> x[:10]
array([[174, 96],
[189, 87],
[185, 110],
[195, 104],
[149, 61],
[189, 104],
[147, 92],
[154, 111],
[174, 90],
[169, 103]])
>>> def convert_gender_to_number(row):
... if row['Gender'] == 'Male':
... return 1
... else:
... return 0
...
>>> y = data_table[['Gender']].apply(lambda row: convert_gender_to_number(row), axis=1)
>>> y[:10]
0 1
1 1
2 0
3 0
4 1
5 1
6 1
7 1
8 1
9 0
Last updated
Was this helpful?