Open a folder (that you have created, with the data we are going to be using) in VS code
You can provide a shorter alias, which makes it easier to type
Let’s use this kaggle dataset on house prices as an example. I downloaded the data and saved it in a folder called data
.
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12075 entries, 0 to 12074
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 date 12075 non-null object
1 house_size 11107 non-null object
2 bed 11410 non-null object
3 bath 11410 non-null object
4 price 11009 non-null object
5 broker 9569 non-null object
6 street 12075 non-null object
7 city 12075 non-null object
8 state_name 12075 non-null object
9 zip_code 12075 non-null int64
dtypes: int64(1), object(9)
memory usage: 943.5+ KB
Index(['date', 'house_size', 'bed', 'bath', 'price', 'broker', 'street',
'city', 'state_name', 'zip_code'],
dtype='object')
data_frame["bed"].count() # number of non-null values
data_frame["bed"].nunique() # count of unique values
data_frame["bed"].value_counts() # count for each unique value
bed
3bd 5172
4bd 2998
2bd 1797
5bd 762
1bd 290
Studio 153
6bd 150
7bd 31
8bd 31
9bd 9
12bd 4
10bd 3
14bd 2
21bd 2
15bd 2
11bd 2
16bd 1
13bd 1
Name: count, dtype: int64
A pandas.Series
is one column in our data frame
Read the documentation on pandas.Series.str – how can we create a numeric variable based on the "bed"
column?
date | house_size | bed | bath | price | broker | street | city | state_name | zip_code | |
---|---|---|---|---|---|---|---|---|---|---|
2 | AUG 29, 2024 | 1,926 sqft (on 0.45 acres) | 3bd | 3bd | $375,000 | Coldwell Banker Hartung | 6761 Landover Cir | Tallahassee | Florida | 32317 |
4 | AUG 29, 2024 | 1,205 sqft | 3bd | 3bd | $233,900 | D R Horton Realty of NW Florida, LLC | 6274 June Bug Dr | Milton | Florida | 32583 |
14 | AUG 28, 2024 | 1,820 sqft | 3bd | 3bd | $330,500 | EXP Realty, LLC | 8564 Westview Ln | Pensacola | Florida | 32514 |
15 | AUG 28, 2024 | 1,370 sqft | 3bd | 3bd | $173,000 | Better Homes And Gardens Real Estate Main Stre... | 6905 Woodley Dr | Pensacola | Florida | 32503 |
17 | AUG 28, 2024 | 2,681 sqft (on 1.82 acres) | 3bd | 3bd | $525,000 | American Valor Realty LLC | 8021 Quiet Dr | Pensacola | Florida | 32526 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
12059 | AUG 30, 2024 | 2,610 sqft | 3bd | 3bd | $1,143,909 | BALBOA REAL ESTATE, INC. | 50525 Spyglass Hill Dr | La Quinta | CA | 92253 |
12063 | AUG 30, 2024 | 1,983 sqft (on 1 acre) | 3bd | 3bd | $1,385,000 | Compass | 10241 McBroom St | Sunland | CA | 91040 |
12065 | AUG 30, 2024 | 1,757 sqft | 3bd | 3bd | $568,000 | Berkshire Hathaway Home Serv. | 26848 Hanford St | Menifee | CA | 92584 |
12069 | AUG 30, 2024 | 2,000 sqft | 3bd | 3bd | $1,880,000 | Real Estate Legends USA | 16 Riveroaks | Irvine | CA | 92602 |
12074 | AUG 30, 2024 | 1,615 sqft | 3bd | 3bd | $508,000 | Starlitloan&Realty | 1668 Ravenswood Rd | Beaumont | CA | 92223 |
5172 rows × 10 columns
date | house_size | bed | bath | price | broker | street | city | state_name | zip_code | |
---|---|---|---|---|---|---|---|---|---|---|
2 | AUG 29, 2024 | 1,926 sqft (on 0.45 acres) | 3bd | 3bd | $375,000 | Coldwell Banker Hartung | 6761 Landover Cir | Tallahassee | Florida | 32317 |
3 | AUG 29, 2024 | 1,132 sqft | Studio | Studio | $190,000 | EXP Realty, LLC | 1701 S Fairfield Dr | Perdido Key | Florida | 32507 |
4 | AUG 29, 2024 | 1,205 sqft | 3bd | 3bd | $233,900 | D R Horton Realty of NW Florida, LLC | 6274 June Bug Dr | Milton | Florida | 32583 |
5 | AUG 29, 2024 | 3,044 sqft (on 0.34 acres) | 4bd | 4bd | $416,402 | ADAMS HOME REALTY, INC | 6528 Benelli Dr | Milton | Florida | 32570 |
13 | AUG 28, 2024 | 1,254 sqft | 2bd | 2bd | $250,000 | JANET COULTER REALTY | 520 Richard Jackson Blvd #2810 | Panama City Beach | Florida | 32407 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
12067 | AUG 30, 2024 | 2,352 sqft | 4bd | 4bd | $795,000 | Realty Masters | 12549 Navel Ct | Riverside | CA | 92503 |
12069 | AUG 30, 2024 | 2,000 sqft | 3bd | 3bd | $1,880,000 | Real Estate Legends USA | 16 Riveroaks | Irvine | CA | 92602 |
12070 | AUG 30, 2024 | 3,835 sqft | 5bd | 5bd | $1,935,000 | Realty ONE Group Pacific | 1154 Via Vera Cruz | San Marcos | CA | 92078 |
12072 | AUG 30, 2024 | 2,616 sqft | 4bd | 4bd | $685,000 | Anderson Real Estate Group | 5509 W Modoc Avenue | Visalia | CA | 93291 |
12074 | AUG 30, 2024 | 1,615 sqft | 3bd | 3bd | $508,000 | Starlitloan&Realty | 1668 Ravenswood Rd | Beaumont | CA | 92223 |
8068 rows × 10 columns