Chapter 11. Pandas I#

ํ•™์Šต๋ชฉํ‘œ์™€ ๊ธฐ๋Œ€ํšจ๊ณผ

  • ํ•™์Šต๋ชฉํ‘œ

    • Pandas์˜ ์ž๋ฃŒ ๊ตฌ์กฐ์ธ Series์™€ DataFrame์˜ ๊ตฌ์กฐ์— ๋Œ€ํ•ด ์•Œ์•„๋ณธ๋‹ค.

    • Series์˜ ์ƒ์„ฑ, ์—ฐ์‚ฐ, ์ธ๋ฑ์‹ฑ/์Šฌ๋ผ์ด์‹ฑ ๋“ฑ์„ ์‹ค์Šตํ•ด๋ณธ๋‹ค.

    • DataFrame์˜ ์ƒ์„ฑ, ์—ฐ์‚ฐ, ์ธ๋ฑ์‹ฑ/์Šฌ๋ผ์ด์‹ฑ ๋“ฑ์„ ์‹ค์Šตํ•ด๋ณธ๋‹ค.

  • ๊ธฐ๋Œ€ํšจ๊ณผ

    • Series์™€ DataFrame๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ ์—ฐ์‚ฐ ๋ฐ ๋ถ„์„์— ํ•„์š”ํ•œ ๊ธฐ๋Šฅ์„ ์‰ฝ๊ณ  ๋น ๋ฅด๊ฒŒ ํ•  ์ˆ˜ ์žˆ๋‹ค.

  • ํŒ๋‹ค์Šค๋Š” ํŒŒ์ด์ฌ์—์„œ ์‚ฌ์šฉํ•˜๋Š” ๋ฐ์ดํ„ฐ ๋ถ„์„ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์ด๋‹ค.

  • โ€˜pandasโ€™๋ผ๋Š” ์ด๋ฆ„์€ ๋‹ค์ฐจ์›์œผ๋กœ ๊ตฌ์กฐํ™”๋œ ๋ฐ์ดํ„ฐ๋ฅผ ๋œปํ•˜๋Š” ๊ฒฝ์ œํ•™ ์šฉ์–ด์ธ Panel data์™€ ํŒŒ์ด์ฌ ๋ฐ์ดํ„ฐ ๋ถ„์„์ธ Python data analysis์—์„œ ๋”ฐ์˜จ ๊ฒƒ์ด๋‹ค.

  • 2008๋…„ ๊ธˆ์œต๋ฐ์ดํ„ฐ ๋ถ„์„์šฉ์œผ๋กœ ์ฒ˜์Œ ๊ฐœ๋ฐœ๋˜์—ˆ๋‹ค.

  • ๋Œ€์šฉ๋Ÿ‰์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š”๋ฐ ์ตœ์ ํ™”๋œ ๋„๊ตฌ์ด๋ฉฐ, ๋ฐ์ดํ„ฐ์˜ ์žฌ๋ฐฐ์น˜, ์ง‘๊ณ„, ๋ถ€๋ถ„์ง‘ํ•ฉ ๊ตฌํ•˜๊ธฐ ๋“ฑ์„ ์‰ฝ๊ฒŒ ํ•  ์ˆ˜ ์žˆ๋‹ค.

  • Pandas๋Š” Series์™€ DataFrame์ด๋ผ๋Š” ์ž๋ฃŒ๊ตฌ์กฐ๋ฅผ ์ œ๊ณตํ•œ๋‹ค.

์ถ”์ฒœ์ž๋ฃŒ:

Series#

  • ์‹œ๋ฆฌ์ฆˆ๋Š” 1์ฐจ์› ๋ฐฐ์—ด์— ์ธ๋ฑ์Šค๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ๊ตฌ์กฐ์ด๋‹ค.

  • ์ด๋•Œ ๊ฐ’์€ ์ธ๋ฑ์Šค ๋ฒˆํ˜ธ(์œ„์น˜ ์ธ๋ฑ์Šค)๋กœ๋„ ์ ‘๊ทผ ํ•  ์ˆ˜ ์žˆ๊ณ  ์ธ๋ฑ์Šค๋ช…(์„ค์ • ์ธ๋ฑ์Šค)์œผ๋กœ๋„ ์ ‘๊ทผํ•  ์ˆ˜ ์žˆ๋‹ค.

  • ์ฆ‰, ์ธ๋ฑ์Šค ๋ฒˆํ˜ธ๋กœ ๊ฐ’์„ ์ ‘๊ทผํ•  ์ˆ˜ ์žˆ๋Š” ๋ฆฌ์ŠคํŠธ(list)์™€ ์ธ๋ฑ์Šค๋ช…๊ณผ ๊ฐ™์€ ํ‚ค(key)๋กœ ๊ฐ’์„ ์ ‘๊ทผํ•  ์ˆ˜ ์žˆ๋Š” ๋”•์…”๋„ˆ๋ฆฌ(dict)์˜ ์žฅ์ ์„ ์„ž์–ด ๋†“์€ ์ž๋ฃŒ๊ตฌ์กฐ์ด๋‹ค.

์‹œ๋ฆฌ์ฆˆ ์ƒ์„ฑ#

  • ์‹œ๋ฆฌ์ฆˆ๋ฅผ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋จผ์ € pandas ๋ชจ๋“ˆ์„ ๋ถˆ๋Ÿฌ์™€์•ผ ํ•œ๋‹ค.

  • pandas ๋ชจ๋“ˆ์€ ๋ณดํ†ต pd๋ผ๋Š” ๋ณ„์นญ์„ ์‚ฌ์šฉํ•œ๋‹ค.

import pandas as pd
  • ์‹œ๋ฆฌ์ฆˆ๋ฅผ ์ƒ์„ฑํ•ด๋ณด์ž. ํ˜•์‹์€ ์•„๋ž˜์™€ ๊ฐ™๋‹ค. ์ด๋•Œ Series์˜ โ€˜Sโ€™๋ฅผ ๋Œ€๋ฌธ์ž๋กœ ์จ์•ผ ํ•œ๋‹ค๋Š” ๊ฒƒ์— ์ฃผ์˜ํ•˜์ž.

pd.Series(list or or dict or array, index = list or array)
  • ์ธ๋ฑ์Šค๋ฅผ ๋ณ„๋„ ์„ค์ •ํ•˜์ง€ ์•Š์œผ๋ฉด ๊ธฐ๋ณธ์ ์œผ๋กœ ์œ„์น˜ ์ธ๋ฑ์Šค๋กœ ์„ค์ •๋œ๋‹ค.

  • ๋งˆ์ง€๋ง‰์— ๋ฐ์ดํ„ฐ ํƒ€์ž…์ด ํ‘œ์‹œ๋œ๋‹ค. dtype: int64๋Š” ์‹œ๋ฆฌ์ฆˆ ์† ๋ฐ์ดํ„ฐ๊ฐ€ ์ •์ˆ˜ํ˜•64bit ๋ฐ์ดํ„ฐ์ž„์„ ์˜๋ฏธํ•œ๋‹ค.

dtype

์„ค๋ช…

int64

์ •์ˆ˜

float64

์†Œ์ˆ˜

object

ํ…์ŠคํŠธ

bool

True/False

datetime64

๋‚ ์งœ์™€ ์‹œ๊ฐ„

category

์นดํ…Œ๊ณ ๋ฆฌ

score = [84, 21, 87, 100, 59, 46]
s = pd.Series(score)
print(s)
0     84
1     21
2     87
3    100
4     59
5     46
dtype: int64
  • index ์˜ต์…˜์„ ํ†ตํ•ด ์ธ๋ฑ์Šค๋ช…์„ ์„ค์ •ํ•  ์ˆ˜ ์žˆ๋‹ค.

score = [84, 21, 87, 100, 59, 46]
names=['์ฒ ์ˆ˜','์˜์ด','๊ธธ๋™','๋ฏธ์˜','์ˆœ์ด','์ฒ ์ด']
s = pd.Series(score, index = names)
print(s)
์ฒ ์ˆ˜     84
์˜์ด     21
๊ธธ๋™     87
๋ฏธ์˜    100
์ˆœ์ด     59
์ฒ ์ด     46
dtype: int64
  • ๋”•์…”๋„ˆ๋ฆฌ๋ฅผ ๋„ฃ์–ด์„œ ์‹œ๋ฆฌ์ฆˆ๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค.

dic={'์ฒ ์ˆ˜':84, '์˜์ด':21, '๊ธธ๋™':87,'๋ฏธ์˜':100, '์ˆœ์ด':59, '์ฒ ์ด':46}
s = pd.Series(dic)
print(s)
์ฒ ์ˆ˜     84
์˜์ด     21
๊ธธ๋™     87
๋ฏธ์˜    100
์ˆœ์ด     59
์ฒ ์ด     46
dtype: int64
  • type() ํ•จ์ˆ˜๋กœ ์‹œ๋ฆฌ์ฆˆ์˜ ํƒ€์ž…์„ ํ™•์ธํ•ด๋ณด์ž.

print(type(s))
<class 'pandas.core.series.Series'>
  • ์‹œ๋ฆฌ์ฆˆ์˜ ๋ชจ์–‘์€ ์‹œ๋ฆฌ์ฆˆ๋ช….shape์œผ๋กœ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

  • 1์ฐจ์› ๋ฐฐ์—ด์€ (์›์†Œ๊ฐฏ์ˆ˜, ) ํ˜•ํƒœ๋กœ ๋‚˜์˜จ๋‹ค.

print(s.shape)
(6,)

์‹œ๋ฆฌ์ฆˆ ์—ฐ์‚ฐ#

  • ์‹œ๋ฆฌ์ฆˆ์— ์‚ฐ์ˆ ์—ฐ์‚ฐ์„ ์ ์šฉํ•ด๋ณด์ž. ์ด๋•Œ, ์—ฐ์‚ฐํ•˜์—ฌ ๋‚˜์˜จ ๊ฒฐ๊ณผ ๋˜ํ•œ ์‹œ๋ฆฌ์ฆˆํƒ€์ž…์ด๋‹ค.

import pandas as pd
names1=['์ฒ ์ˆ˜','์˜์ด','๊ธธ๋™','๋ฏธ์˜','์ˆœ์ด','์ฒ ์ด']
score1 = [84, 21, 87, 100, 59, 46]
names2 =['๊ธธ๋™','์ฒ ์ˆ˜','์˜์ด','์ฒ ์ด','์ˆœ์ด','๋ฏธ์˜']
score2 = [99, 87, 87, 84, 77, 15]

s1 = pd.Series(score1, index=names1)
s2 = pd.Series(score2, index=names2)
print(s1)
print(s2)
์ฒ ์ˆ˜     84
์˜์ด     21
๊ธธ๋™     87
๋ฏธ์˜    100
์ˆœ์ด     59
์ฒ ์ด     46
dtype: int64
๊ธธ๋™    99
์ฒ ์ˆ˜    87
์˜์ด    87
์ฒ ์ด    84
์ˆœ์ด    77
๋ฏธ์˜    15
dtype: int64
s1 + 10
์ฒ ์ˆ˜     94
์˜์ด     31
๊ธธ๋™     97
๋ฏธ์˜    110
์ˆœ์ด     69
์ฒ ์ด     56
dtype: int64
s1 + s2
๊ธธ๋™    186
๋ฏธ์˜    115
์ˆœ์ด    136
์˜์ด    108
์ฒ ์ˆ˜    171
์ฒ ์ด    130
dtype: int64
s1 - s2
๊ธธ๋™   -12
๋ฏธ์˜    85
์ˆœ์ด   -18
์˜์ด   -66
์ฒ ์ˆ˜    -3
์ฒ ์ด   -38
dtype: int64
s1 * s2
๊ธธ๋™    8613
๋ฏธ์˜    1500
์ˆœ์ด    4543
์˜์ด    1827
์ฒ ์ˆ˜    7308
์ฒ ์ด    3864
dtype: int64
(s1+s2)/2
๊ธธ๋™    93.0
๋ฏธ์˜    57.5
์ˆœ์ด    68.0
์˜์ด    54.0
์ฒ ์ˆ˜    85.5
์ฒ ์ด    65.0
dtype: float64
s1%2
์ฒ ์ˆ˜    0
์˜์ด    1
๊ธธ๋™    1
๋ฏธ์˜    0
์ˆœ์ด    1
์ฒ ์ด    0
dtype: int64
s1.sum()
397

์‹œ๋ฆฌ์ฆˆ ์ธ๋ฑ์‹ฑ#

  • ์‹œ๋ฆฌ์ฆˆ์˜ ๋ฐ์ดํ„ฐ์— ์ ‘๊ทผ์€ ์ธ๋ฑ์Šค ๋ฒˆํ˜ธ(์œ„์น˜ ์ธ๋ฑ์Šค) ๋˜๋Š” ์ธ๋ฑ์Šค๋ช…(์„ค์ • ์ธ๋ฑ์Šค)๋ฅผ ํ†ตํ•ด ๊ฐ€๋Šฅํ•˜๋‹ค.

์‹œ๋ฆฌ์ฆˆ๋ช…[์œ„์น˜์ธ๋ฑ์Šค or ์„ค์ •์ธ๋ฑ์Šค]
s1[2]
87
s1['์˜์ด']
21

์‹œ๋ฆฌ์ฆˆ ์Šฌ๋ผ์ด์‹ฑ#

  • ์œ„์น˜ ์ธ๋ฑ์Šค๋กœ ์Šฌ๋ผ์ด์‹ฑํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ๋ฆฌ์ŠคํŠธ๋‚˜ ๋ฐฐ์—ด๊ณผ ๋™์ผํ•˜๋‹ค.

  • ์„ค์ • ์ธ๋ฑ์Šค๋กœ ์Šฌ๋ผ์ด์‹ฑํ•˜๋ฉด ๋ ์ธ๋ฑ์Šค๊นŒ์ง€ ํฌํ•จํ•œ๋‹ค.

์‹œ๋ฆฌ์ฆˆ๋ณ€์ˆ˜๋ช…[start:end:step]
s1[2:]
๊ธธ๋™     87
๋ฏธ์˜    100
์ˆœ์ด     59
์ฒ ์ด     46
dtype: int64
  • ๋ ์ธ๋ฑ์Šค์ธ โ€˜์ˆœ์ดโ€™๊นŒ์ง€ ํฌํ•จํ•˜์—ฌ ์Šฌ๋ผ์ด์‹ฑํ•œ๋‹ค.

s1['์˜์ด':'์ˆœ์ด']
์˜์ด     21
๊ธธ๋™     87
๋ฏธ์˜    100
์ˆœ์ด     59
dtype: int64
s1['๋ฏธ์˜':]
๋ฏธ์˜    100
์ˆœ์ด     59
์ฒ ์ด     46
dtype: int64

์‹œ๋ฆฌ์ฆˆ ๋ฐ์ดํ„ฐ ์ถ”๊ฐ€, ์ˆ˜์ •, ์‚ญ์ œ#

  • ์‹œ๋ฆฌ์ฆˆ์— ๋ฐ์ดํ„ฐ ์ถ”๊ฐ€์™€ ์ˆ˜์ •์€ ํ˜•์‹์ด ๋™์ผํ•˜๋‹ค. ์‹œ๋ฆฌ์ฆˆ์— ์ธ๋ฑ์Šค๊ฐ€ ์žˆ์œผ๋ฉด ์ˆ˜์ •ํ•˜๊ณ , ์—†์œผ๋ฉด ์ถ”๊ฐ€ํ•œ๋‹ค.

  • ํ˜•์‹์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

์‹œ๋ฆฌ์ฆˆ๋ช…[์ธ๋ฑ์Šค] = value
  • โ€˜์Šฌ๊ธฐโ€™๋Š” ์ธ๋ฑ์Šค์— ์—†์œผ๋ฏ€๋กœ ์ถ”๊ฐ€๋œ๋‹ค.

s1['์Šฌ๊ธฐ']= 87
print(s1)
์ฒ ์ˆ˜     84
์˜์ด     21
๊ธธ๋™     87
๋ฏธ์˜    100
์ˆœ์ด     59
์ฒ ์ด     46
์Šฌ๊ธฐ     87
dtype: int64
  • โ€˜๊ธธ๋™โ€™์€ ์ธ๋ฑ์Šค์— ์žˆ์œผ๋ฏ€๋กœ ์ˆ˜์ •๋œ๋‹ค.

s1['๊ธธ๋™']=88
print(s1)
์ฒ ์ˆ˜     84
์˜์ด     21
๊ธธ๋™     88
๋ฏธ์˜    100
์ˆœ์ด     59
์ฒ ์ด     46
์Šฌ๊ธฐ     87
dtype: int64
  • ์‹œ๋ฆฌ์ฆˆ์—์„œ ๋ฐ์ดํ„ฐ ์‚ญ์ œ๋Š” ํ‚ค์›Œ๋“œ del์„ ํ†ตํ•ด ๊ฐ€๋Šฅํ•˜๋‹ค.

  • ์•„๋ž˜ ์ฝ”๋“œ๋Š” ์ธ๋ฑ์Šค โ€˜์ฒ ์ดโ€™๋ฅผ ์‚ญ์ œํ•œ๋‹ค.

del s1['์ฒ ์ด']
print(s1)
์ฒ ์ˆ˜     84
์˜์ด     21
๊ธธ๋™     88
๋ฏธ์˜    100
์ˆœ์ด     59
์Šฌ๊ธฐ     87
dtype: int64

์‹œ๋ฆฌ์ฆˆ ๋น„๊ต์—ฐ์‚ฐ๊ณผ ํ•„ํ„ฐ๋ง#

  • ์‹œ๋ฆฌ์ฆˆ์—์„œ๋„ ๋„˜ํŒŒ์ด์—์„œ์™€ ๊ฐ™์ด ๋น„๊ต์—ฐ์‚ฐ์˜ ๊ฒฐ๊ณผ๋ฅผ ์–ป๊ธฐ๋„ ์‰ฝ๋‹ค.

a = s1>=85
print(a)
์ฒ ์ˆ˜    False
์˜์ด    False
๊ธธ๋™     True
๋ฏธ์˜     True
์ˆœ์ด    False
์Šฌ๊ธฐ     True
dtype: bool
  • ์‹œ๋ฆฌ์ฆˆ์—์„œ๋„ ๋น„๊ต์—ฐ์‚ฐ์˜ ๊ฒฐ๊ณผ๋ฅผ ํ•„ํ„ฐ๋งํ•˜๋Š”๋ฐ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.

  • ์•„๋ž˜ ์ฝ”๋“œ๋Š” ์‹œ๋ฆฌ์ฆˆ s1์—์„œ 85์ด์ƒ์ธ ๊ฐ’๋“ค์— ๋Œ€ํ•ด์„œ ํ•„ํ„ฐ๋งํ•œ๋‹ค.

s1[a]
๊ธธ๋™     88
๋ฏธ์˜    100
์Šฌ๊ธฐ     87
dtype: int64

DataFrame#

DataFrame์€ ํ–‰๊ณผ ์—ด๋กœ ์ด๋ฃจ์–ด์ง„ 2์ฐจ์› ํ˜•ํƒœ์˜ ๋ฐฐ์—ด์ด๋‹ค. ์ด๋•Œ ํ•œ ๊ฐœ์˜ ์—ด(์นผ๋Ÿผ)์€ ํ•˜๋‚˜์˜ ์‹œ๋ฆฌ์ฆˆ์ด๋ฉฐ, DataFrame์€ ์‹œ๋ฆฌ์ฆˆ๋“ค์˜ ๋ฌถ์Œ์ด๋‹ค.

๋ฐ์ดํƒ€ํ”„๋ ˆ์ž„ ์ƒ์„ฑ#

  • ๋ฐ์ดํƒ€ํ”„๋ ˆ์ž„ ์ƒ์„ฑ ํ˜•์‹์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

pd.DataFrame(list or array or series or dictionary, index = list or array, columns = list or array)
  • ๋จผ์ €, ์‹œ๋ฆฌ์ฆˆ๋ฅผ ์ด์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์„ ์ƒ์„ฑํ•ด๋ณด์ž.

  • ์•„๋ž˜์™€ ๊ฐ™์ด ์‹œ๋ฆฌ์ฆˆ 2๊ฐœ๋ฅผ ๋งŒ๋“ค์ž.

names1=['์ฒ ์ˆ˜','์˜์ด','๊ธธ๋™','๋ฏธ์˜','์ˆœ์ด','์ฒ ์ด']
score1 = [84, 21, 87, 100, 59, 46]
names2 =['๊ธธ๋™','์ฒ ์ˆ˜','์˜์ด','์ฒ ์ด','์ˆœ์ด','๋ฏธ์˜']
score2 = [99, 87, 87, 84, 77, 15]

s1 = pd.Series(score1, index=names1)
s2 = pd.Series(score2, index=names2)
  • ์‹œ๋ฆฌ์ฆˆ๋ฅผ ๋ฐ์ดํƒ€ํ”„๋ ˆ์ž„์˜ ์—ด(column)๋กœ ๋„ฃ์–ด์ค€๋‹ค.

df = pd.DataFrame()
df['๊ตญ์–ด']= s1  # ๊ตญ์–ด ์นผ๋Ÿผ ์ƒ์„ฑ
df['์˜์–ด']= s2  # ์˜์–ด ์นผ๋Ÿผ ์ƒ์„ฑ
df['ํ•ฉ๊ณ„']=df['๊ตญ์–ด']+df['์˜์–ด'] # ํ•ฉ๊ณ„ ์นผ๋Ÿผ ์ƒ์„ฑ
df
๊ตญ์–ด ์˜์–ด ํ•ฉ๊ณ„
์ฒ ์ˆ˜ 84 87 171
์˜์ด 21 87 108
๊ธธ๋™ 87 99 186
๋ฏธ์˜ 100 15 115
์ˆœ์ด 59 77 136
์ฒ ์ด 46 84 130

์ธ๋ฑ์‹ฑ ์ฃผ์˜

  • ์‹œ๋ฆฌ์ฆˆ์—์„œ ์ธ๋ฑ์‹ฑ

    • ์˜ˆ) s[โ€˜์˜์ดโ€™]๋กœ ํ–ˆ๋‹ค๋ฉด โ€˜์˜์ดโ€™๋Š” ํ–‰์˜ ์ธ๋ฑ์Šค์ด๋‹ค.

  • ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ์ธ๋ฑ์‹ฑ

    • ์˜ˆ) df[โ€˜๊ตญ์–ดโ€™]๋กœ ํ–ˆ๋‹ค๋ฉด โ€˜๊ตญ์–ดโ€™๋Š” ์—ด์˜ ์ธ๋ฑ์Šค์ด๋‹ค.

  • ๋ฐ์ดํƒ€ํ”„๋ ˆ์ž„์„ ๋ฆฌ์ŠคํŠธ๋‚˜ ์–ด๋ ˆ์ด๋กœ๋„ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋‹ค.

scores = [[84,87,78], [21,15,84], [87,84,76], [100,87,99],[59,99,59],[46,77,56]]
d1 = pd.DataFrame(scores)
d1
0 1 2
0 84 87 78
1 21 15 84
2 87 84 76
3 100 87 99
4 59 99 59
5 46 77 56
  • ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์— ์˜ต์…˜ index์™€ columns์— ์ธ๋ฑ์Šค์™€ ์นผ๋Ÿผ๋ช…์„ ์ง์ ‘ ๋ช…์‹œํ•ด์ค„ ์ˆ˜๋„ ์žˆ๋‹ค.

scores = [[84,87,78], [21,15,84], [87,84,76], [100,87,99],[59,99,59],[46,77,56]]
names=['์ฒ ์ˆ˜','์˜์ด','๊ธธ๋™','๋ฏธ์˜','์ˆœ์ด','์ฒ ์ด']
lectures=['๊ตญ์–ด','์ˆ˜ํ•™','์˜์–ด']
d2 = pd.DataFrame(scores, index=names, columns=lectures)
d2
๊ตญ์–ด ์ˆ˜ํ•™ ์˜์–ด
์ฒ ์ˆ˜ 84 87 78
์˜์ด 21 15 84
๊ธธ๋™ 87 84 76
๋ฏธ์˜ 100 87 99
์ˆœ์ด 59 99 59
์ฒ ์ด 46 77 56
  • ๋”•์…”๋„ˆ๋ฆฌ๋ฅผ ๋ฐ์ดํ„ฐ๋กœ ๋„ฃ์–ด ์ค„ ์ˆ˜๋„ ์žˆ๋‹ค.

ScoresWithLectures={'๊ตญ์–ด':[84,21,87,100,59,46], '์ˆ˜ํ•™':[87,15,84,87,99,77], '์˜์–ด':[78,84,76,99,59,56]}
names=['์ฒ ์ˆ˜','์˜์ด','๊ธธ๋™','๋ฏธ์˜','์ˆœ์ด','์ฒ ์ด']
d3 = pd.DataFrame(ScoresWithLectures, index=names)
d3
๊ตญ์–ด ์ˆ˜ํ•™ ์˜์–ด
์ฒ ์ˆ˜ 84 87 78
์˜์ด 21 15 84
๊ธธ๋™ 87 84 76
๋ฏธ์˜ 100 87 99
์ˆœ์ด 59 99 59
์ฒ ์ด 46 77 56
  • numpy์—์„œ ๋ฐฐ์› ๋˜ ์ „์น˜ํ–‰์—ด ์‹œํ‚ค๋Š” transpose() ๋˜๋Š” .T๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ–‰๊ณผ ์—ด์„ ๊ตํ™˜ํ•  ์ˆ˜๋„ ์žˆ๋‹ค.

  • ๊ทธ๋Ÿฌ๋‚˜ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ์›๋ณธ์—๋Š” ๋ฐ˜์˜๋˜์ง€ ์•Š๋Š”๋‹ค.

d4 = d3.transpose()
d4
์ฒ ์ˆ˜ ์˜์ด ๊ธธ๋™ ๋ฏธ์˜ ์ˆœ์ด ์ฒ ์ด
๊ตญ์–ด 84 21 87 100 59 46
์ˆ˜ํ•™ 87 15 84 87 99 77
์˜์–ด 78 84 76 99 59 56
d3
๊ตญ์–ด ์ˆ˜ํ•™ ์˜์–ด
์ฒ ์ˆ˜ 84 87 78
์˜์ด 21 15 84
๊ธธ๋™ 87 84 76
๋ฏธ์˜ 100 87 99
์ˆœ์ด 59 99 59
์ฒ ์ด 46 77 56

๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ํ™•์ธ#

  • ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์˜ ์ •๋ณด๋ฅผ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•ด ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋ณ€์ˆ˜ ๋˜๋Š” ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.

    • .shape: ๋ชจ์–‘์„ ํ™•์ธํ•œ๋‹ค.

    • .columns: ์—ด์ด๋ฆ„์„ ํ™•์ธํ•œ๋‹ค.

    • head(): ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์˜ ๋งจ ์œ„ 5๊ฐœ ํ–‰์„ ๋ณด์—ฌ์ค€๋‹ค.

    • tail(): ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์˜ ๋งจ ์•„๋ž˜ 5๊ฐœ ํ–‰์„ ๋ณด์—ฌ์ค€๋‹ค.

    • info(): ์นผ๋Ÿผ๋ช…, Non-Null count, Dtype ์ •๋ณด๋ฅผ ๋ณด์—ฌ์ค€๋‹ค.

    • isnull().sum(): ๊ฐ’์ด ์—†๋Š”(null) ๊ฐฏ์ˆ˜๋ฅผ ๋ฐ˜ํ™˜ํ•œ๋‹ค.

url='https://raw.githubusercontent.com/HaesunByun/Basic-Computing/main/titanic_test.csv'
titanic_test = pd.read_csv(url)
titanic_test
  • passenger ID

  • Pclass: ํ‹ฐ์ผ“ ํด๋ž˜์Šค. (1 = 1st, 2 = 2nd, 3 = 3rd)

  • Name: ์ด๋ฆ„

  • Sex: ์„ฑ๋ณ„

  • Age: ๋‚˜์ด(์„ธ)

  • Sibsp (Siblings and spouse): ํ•จ๊ป˜ ํƒ‘์Šนํ•œ ํ˜•์ œ์ž๋งค, ๋ฐฐ์šฐ์ž ์ˆ˜ ์ดํ•ฉ

  • Parch (Parents and children): ํ•จ๊ป˜ ํƒ‘์Šนํ•œ ๋ถ€๋ชจ, ์ž๋…€ ์ˆ˜ ์ดํ•ฉ

  • Ticket: ํ‹ฐ์ผ“ ๋ฒˆํ˜ธ

  • Fare: ํƒ‘์Šน ์š”๊ธˆ

  • Cabin: ๊ฐ์‹ค ๋ฒˆํ˜ธ

  • Embarked: ํƒ‘์Šน ํ•ญ๊ตฌ

# titanic_test.shape
# titanic_test.head(3)
# titanic_test.tail()
# titanic_test.info()
# titanic_test.isnull().sum()

๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ์ธ๋ฑ์‹ฑ๊ณผ ์Šฌ๋ผ์ด์‹ฑ#

.iloc[]#

  • iloc๋Š” index location์˜ ์•ฝ์ž๋กœ ์œ„์น˜ ์ธ๋ฑ์Šค๋กœ๋งŒ ์ธ๋ฑ์‹ฑ๊ณผ ์Šฌ๋ผ์ด์‹ฑ์„ ํ•  ์ˆ˜ ์žˆ๋‹ค.

  • .iloc[]์„ ์‚ฌ์šฉํ•˜์—ฌ ํ–‰์ถ”์ถœ/์—ด์ถ”์ถœ/ํ–‰๋ ฌ์ถ”์ถœ์„ ํ•ด๋ณด์ž.

  • ํ˜•์‹์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„๋ช….iloc[ํ–‰ ๋˜๋Š” ํ–‰๋ฒ”์œ„, ์—ด ๋˜๋Š” ์—ด๋ฒ”์œ„]
  • ์ด๋•Œ ํ–‰์ด๋“  ์—ด์ด๋“  ๋ฒ”์œ„๊ฐ€ ์•„๋‹Œ ํ•˜๋‚˜๋งŒ ๋ช…์‹œํ–ˆ๋‹ค๋ฉด ๊ทธ ๊ฒฐ๊ณผ๋Š” ์‹œ๋ฆฌ์ฆˆ๋กœ ๋‚˜์˜จ๋‹ค.

  • ํ–‰๊ณผ ์—ด ๋ชจ๋‘ ๋ฒ”์œ„๋กœ ๋ช…์‹œํ–ˆ๋‹ค๋ฉด ๊ทธ ๊ฒฐ๊ณผ๋Š” ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์œผ๋กœ ๋‚˜์˜จ๋‹ค.

  • ํ–‰์ถ”์ถœ

d3.iloc[2]
d3.iloc[2:3]
๊ตญ์–ด ์ˆ˜ํ•™ ์˜์–ด
๊ธธ๋™ 87 84 76
  • ์—ด์ถ”์ถœ

d3.iloc[:, 1]
#d3.iloc[:, 0:2]
์ฒ ์ˆ˜    87
์˜์ด    15
๊ธธ๋™    84
๋ฏธ์˜    87
์ˆœ์ด    99
์ฒ ์ด    77
Name: ์ˆ˜ํ•™, dtype: int64
  • ํ–‰์—ด ์ถ”์ถœ

d3.iloc[1:3, 0:2]
๊ตญ์–ด ์ˆ˜ํ•™
์˜์ด 21 15
๊ธธ๋™ 87 84

.loc[]#

  • ์„ค์ • ์ธ๋ฑ์Šค๋กœ๋งŒ ์ธ๋ฑ์‹ฑ๊ณผ ์Šฌ๋ผ์ด์‹ฑ์„ ํ•  ์ˆ˜ ์žˆ๋‹ค.

  • ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„๋ช….loc[ํ–‰ ๋˜๋Š” ํ–‰๋ฒ”์œ„, ์—ด ๋˜๋Š” ์—ด๋ฒ”์œ„]
  • ์ด๋•Œ ํ–‰์ด๋“  ์—ด์ด๋“  ๋ฒ”์œ„๊ฐ€ ์•„๋‹Œ ํ•˜๋‚˜๋งŒ ๋ช…์‹œํ–ˆ๋‹ค๋ฉด ๊ทธ ๊ฒฐ๊ณผ๋Š” ์‹œ๋ฆฌ์ฆˆ๋กœ ๋‚˜์˜จ๋‹ค.

  • ํ–‰๊ณผ ์—ด ๋ชจ๋‘ ๋ฒ”์œ„๋กœ ๋ช…์‹œํ–ˆ๋‹ค๋ฉด ๊ทธ ๊ฒฐ๊ณผ๋Š” ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์œผ๋กœ ๋‚˜์˜จ๋‹ค.

  • ํ–‰์ถ”์ถœ

d3.loc['๊ธธ๋™']
d3.loc['๊ธธ๋™':'์ˆœ์ด']
๊ตญ์–ด ์ˆ˜ํ•™ ์˜์–ด
๊ธธ๋™ 87 84 76
๋ฏธ์˜ 100 87 99
์ˆœ์ด 59 99 59
  • ์—ด์ถ”์ถœ

d3.loc[:, '๊ตญ์–ด']
d3.loc[:, '๊ตญ์–ด':'์ˆ˜ํ•™']
๊ตญ์–ด ์ˆ˜ํ•™
์ฒ ์ˆ˜ 84 87
์˜์ด 21 15
๊ธธ๋™ 87 84
๋ฏธ์˜ 100 87
์ˆœ์ด 59 99
์ฒ ์ด 46 77
  • ํ–‰๋ ฌ์ถ”์ถœ

d3.loc['์ฒ ์ˆ˜':'์˜์ด', '๊ตญ์–ด':'์ˆ˜ํ•™']
๊ตญ์–ด ์ˆ˜ํ•™
์ฒ ์ˆ˜ 84 87
์˜์ด 21 15

๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„๋ช…#

  • ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„๋ช…๋งŒ ์‚ฌ์šฉํ•˜์—ฌ ์ถ”์ถœํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ์กฐ๊ธˆ ๋ณต์žกํ•˜๋‹ค.

  • ํ–‰์ถ”์ถœ์€ ํ•˜๋‚˜์˜ ํ–‰์„ ์ถ”์ถœํ•˜๋”๋ผ๋„ ๋ฐ˜๋“œ์‹œ ์Šฌ๋ผ์ด์‹ฑ์œผ๋กœ ํ•ด์•ผ ํ•œ๋‹ค. ๊ทธ๋ ‡์ง€ ์•Š์œผ๋ฉด ์—ด์ด๋ฆ„์œผ๋กœ ์ธ์‹ํ•ด ์—๋Ÿฌ๊ฐ€ ๋‚œ๋‹ค.

๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„๋ช…[ํ–‰๋ฒ”์œ„]
d3[2:3]
d3[2:5]
d3['๊ธธ๋™':'๊ธธ๋™']
d3['๊ธธ๋™':'์ฒ ์ด']
๊ตญ์–ด ์ˆ˜ํ•™ ์˜์–ด
๊ธธ๋™ 87 84 76
๋ฏธ์˜ 100 87 99
์ˆœ์ด 59 99 59
์ฒ ์ด 46 77 56
  • ์—ด์ถ”์ถœ์€ ๋ฐ˜๋“œ์‹œ ์นผ๋Ÿผ๋ช…์œผ๋กœ๋งŒ ์ถ”์ถœ๊ฐ€๋Šฅํ•˜๋‹ค.

  • ์—ฌ๋Ÿฌ ์นผ๋Ÿผ์„ ์ถ”์ถœํ•  ๋•Œ์—๋Š” ๋ฆฌ์ŠคํŠธ๋กœ ๋„ฃ์–ด์ค€๋‹ค.

๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„๋ช…[์—ด์ด๋ฆ„ ๋˜๋Š” ์—ด์ด๋ฆ„ ๋ฆฌ์ŠคํŠธ]
d3['๊ตญ์–ด']
d3[['๊ตญ์–ด','์ˆ˜ํ•™']]
๊ตญ์–ด ์ˆ˜ํ•™
์ฒ ์ˆ˜ 84 87
์˜์ด 21 15
๊ธธ๋™ 87 84
๋ฏธ์˜ 100 87
์ˆœ์ด 59 99
์ฒ ์ด 46 77
  • ํ–‰๋ ฌ์ถ”์ถœ์€ ํ–‰๋ ฌ์˜ ๋Œ€๊ด„ํ˜ธ ์ˆœ์„œ๋ฅผ ๋ฐ”๊ฟ€ ์ˆ˜ ์žˆ์ง€๋งŒ ํ–‰์ถ”์ถœ, ์—ด์ถ”์ถœ ์ œ์•ฝ์‚ฌํ•ญ์ด ๊ทธ๋Œ€๋กœ ์ ์šฉ๋œ๋‹ค.

    • ํ–‰์ถ”์ถœ์€ ์Šฌ๋ผ์ด์‹ฑ์œผ๋กœ๋งŒ!

    • ์—ด์ถ”์ถœ์€ ์นผ๋Ÿผ๋ช…์œผ๋กœ๋งŒ!

    • ์—ฌ๋Ÿฌ ์—ด์„ ์ถ”์ถœํ•  ๋•Œ์—๋Š” ๋ฆฌ์ŠคํŠธ๋กœ๋งŒ!

๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„๋ช…[ํ–‰๋ฒ”์œ„][์—ด์ด๋ฆ„ ๋˜๋Š” ์—ด์ด๋ฆ„ ๋ฆฌ์ŠคํŠธ]
๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„๋ช…[์—ด์ด๋ฆ„ ๋˜๋Š” ์—ด์ด๋ฆ„ ๋ฆฌ์ŠคํŠธ][ํ–‰๋ฒ”์œ„]
d3['๊ธธ๋™':'๋ฏธ์˜'][['๊ตญ์–ด','์ˆ˜ํ•™']]
d3[2:4][['๊ตญ์–ด','์ˆ˜ํ•™']]

d3[['๊ตญ์–ด','์ˆ˜ํ•™']]['๊ธธ๋™':'๋ฏธ์˜']
d3[['๊ตญ์–ด','์ˆ˜ํ•™']][2:4]
๊ตญ์–ด ์ˆ˜ํ•™
๊ธธ๋™ 87 84
๋ฏธ์˜ 100 87

๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ํ–‰/์—ด ์ถ”๊ฐ€, ์ˆ˜์ •, ์‚ญ์ œ#

  • โ€˜์ปด๊ธฐโ€™ ์—ด์„ ์ถ”๊ฐ€ํ•ด๋ณด์ž.

  • โ€˜์ปด๊ธฐโ€™๋ผ๋Š” ์นผ๋Ÿผ์ด ์—†์œผ๋ฉด ์ถ”๊ฐ€๋˜๊ณ , ์žˆ์œผ๋ฉด ์—…๋ฐ์ดํŠธ ๋œ๋‹ค.

comScore=[60, 70, 56, 74, 77, 66]
d3['์ปด๊ธฐ']= comScore
d3
๊ตญ์–ด ์ˆ˜ํ•™ ์˜์–ด ์ปด๊ธฐ
์ฒ ์ˆ˜ 84 87 78 60
์˜์ด 21 15 84 70
๊ธธ๋™ 87 84 76 56
๋ฏธ์˜ 100 87 99 74
์ˆœ์ด 59 99 59 77
์ฒ ์ด 46 77 56 66
  • โ€˜ํ•ฉ๊ณ„โ€™ํ–‰์„ ์ถ”๊ฐ€ํ•ด๋ณด์ž.

  • โ€˜ํ•ฉ๊ณ„โ€™ํ–‰์— ๋“ค์–ด๊ฐˆ ๊ฐ’์€ sum()ํ•จ์ˆ˜๋กœ ๊ตฌํ•œ๋‹ค.

    • sum()ํ•จ์ˆ˜ ๊ด„ํ˜ธ ์•ˆ์—๋Š” axis(์ถ•)์„ ๋ช…์‹œํ•  ์ˆ˜ ์žˆ๋‹ค.

    • axis=0์€ ํ–‰, axis=1์€ ์—ด์„ ์˜๋ฏธํ•œ๋‹ค.

    • axis๋ฅผ ์ƒ๋žตํ•˜๋ฉด ๊ธฐ๋ณธ๊ฐ’์€ 0์ด๋‹ค.

d3.loc['ํ•ฉ๊ณ„']=d3.sum()
d3
๊ตญ์–ด ์ˆ˜ํ•™ ์˜์–ด ์ปด๊ธฐ
์ฒ ์ˆ˜ 84 87 78 60
์˜์ด 21 15 84 70
๊ธธ๋™ 87 84 76 56
๋ฏธ์˜ 100 87 99 74
์ˆœ์ด 59 99 59 77
์ฒ ์ด 46 77 56 66
ํ•ฉ๊ณ„ 397 449 452 403

๐Ÿ˜„ ๊ฐ ํ•™์ƒ ๋ณ„ โ€˜์ด์ โ€™ ์—ด์„ ์ถ”๊ฐ€ํ•ด ์ฃผ์„ธ์š”.

๐Ÿ˜„ โ€˜๊ธธ๋™โ€™์˜ ์ปด๊ธฐ ์ ์ˆ˜๋ฅผ 86์ ์œผ๋กœ ์ˆ˜์ •ํ•ด ์ฃผ์„ธ์š”.

๐Ÿ˜„ ๊ธธ๋™์˜ ์ด์ ๋„ ๋‹ค์‹œ ๊ณ„์‚ฐํ•ด ์ฃผ์„ธ์š”.

๐Ÿ˜„ โ€˜๊ฐ•์ขŒโ€™์—ด์„ ์ถ”๊ฐ€ํ•˜๊ณ  โ€˜001โ€™์„ ๊ฐ’์œผ๋กœ ๋„ฃ์–ด์ฃผ์„ธ์š”.

  • โ€˜ํ•ฉ๊ณ„โ€™ํ–‰์„ ์‚ญ์ œํ•ด๋ณด์ž. .drop()์œผ๋กœ๋„ ์‚ญ์ œํ•  ๋•Œ ์›๋ณธ์— ๋ฐ˜์˜ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” inplace=True๋ฅผ ํ•ด์ค˜์•ผ ํ•œ๋‹ค.

d3.drop('ํ•ฉ๊ณ„', axis=0, inplace=True)
d3
๊ตญ์–ด ์ˆ˜ํ•™ ์˜์–ด ์ปด๊ธฐ
์ฒ ์ˆ˜ 84 87 78 60
์˜์ด 21 15 84 70
๊ธธ๋™ 87 84 76 56
๋ฏธ์˜ 100 87 99 74
์ˆœ์ด 59 99 59 77
์ฒ ์ด 46 77 56 66

๐Ÿ˜„ โ€˜๊ฐ•์ขŒโ€™์—ด์„ ์‚ญ์ œํ•ด์ฃผ์„ธ์š”.

๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ํ™œ์šฉ์—ฐ์Šต#

  • ์ตœ๋Œ“๊ฐ’, ์ตœ์†Ÿ๊ฐ’ ๊ตฌํ•˜๊ธฐ

print(d3['์ด์ '].max())
print(d3['์ด์ '].min())
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
D:\users\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2896             try:
-> 2897                 return self._engine.get_loc(key)
   2898             except KeyError:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: '์ด์ '

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-45-9b8e3290d354> in <module>
----> 1 print(d3['์ด์ '].max())
      2 print(d3['์ด์ '].min())

D:\users\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   2978             if self.columns.nlevels > 1:
   2979                 return self._getitem_multilevel(key)
-> 2980             indexer = self.columns.get_loc(key)
   2981             if is_integer(indexer):
   2982                 indexer = [indexer]

D:\users\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2897                 return self._engine.get_loc(key)
   2898             except KeyError:
-> 2899                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2900         indexer = self.get_indexer([key], method=method, tolerance=tolerance)
   2901         if indexer.ndim > 1 or indexer.size > 1:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: '์ด์ '
  • idxmax()๋Š” ์ตœ๋Œ“๊ฐ’์˜ ์ธ๋ฑ์Šค, idxmin()์€ ์ตœ์†Ÿ๊ฐ’์˜ ์ธ๋ฑ์Šค๋ฅผ ์ถ”์ถœํ•œ๋‹ค.

print(d3['์ด์ '].idxmax())
print(d3['์ด์ '].idxmin())
  • โ€˜์ด์ โ€™์ด 1๋“ฑ์ธ ํ•™์ƒ์˜ ์ „์ฒด ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”์ถœํ•œ๋‹ค.

d3.loc[d3['์ด์ '].idxmax()]
  • ์ด์ ์ด 200๋ณด๋‹ค ํฐ ๊ฒฐ๊ณผ๋ฅผ True, False๋กœ ๋งŒ๋“ ๋‹ค.

x=d3.loc[:, '์ด์ ']>200
x
d3['Pass/Fail']=x
d3
  • ๋”•์…”๋„ˆ๋ฆฌ๋ฅผ ์ด์šฉํ•˜์—ฌ ๋งคํ•‘์ •๋ณด๋ฅผ ์‰ฝ๊ฒŒ ๋„ฃ์„ ์ˆ˜ ์žˆ๋‹ค.

mapping = {True: 'Pass', False: 'Fail'}
d3['Pass/Fail']= d3['Pass/Fail'].map(mapping)
d3
d3['Pass/Fail'] = ['Pass' if total>200 else 'Fail' for total in d3['์ด์ ']]
d3
  • groupby()๋ฅผ ํ†ตํ•ด์„œ ๊ทธ๋ฃน์œผ๋กœ ๋ฌถ์–ด์ค„ ์ˆ˜ ์žˆ๋‹ค.

total_groupby = d3.groupby('Pass/Fail')
total_groupby.size()
  • ์•„๋ž˜๋Š” ๊ฒฐ์ธก๊ฐ’์„ ์ฑ„์šฐ๋Š” ๋ฐฉ๋ฒ•์„ ์ตํžˆ๊ธฐ ์œ„ํ•ด ์ผ๋ถ€๋Ÿฌ ๊ฒฐ์ธก๊ฐ’์„ ๋งŒ๋“  ๊ฒฝ์šฐ์ด๋‹ค.

mapping = {'Pass': 300000}

d3['์žฅํ•™๊ธˆ']=d3['Pass/Fail'].map(mapping)
d3
  • ๊ฒฐ์ธก๊ฐ’์„ ์ฑ„์šฐ๋Š” ํ•จ์ˆ˜๋Š” .fillna()์ด๋‹ค.

d3['์žฅํ•™๊ธˆ'].fillna(0, inplace=True)
d3
  • sort_index()๋ฅผ ํ†ตํ•ด ์ธ๋ฑ์Šค๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์ •๋ ฌํ•œ๋‹ค.

d3.sort_index(ascending=True)
  • ํŠน์ • ์นผ๋Ÿผ๋ช…์„ ๊ธฐ์ค€์œผ๋กœ ์ •๋ ฌํ•œ๋‹ค.

d3.sort_values(by = '๊ตญ์–ด', ascending=True)

๋งˆ๋ฌด๋ฆฌ#

  • pandas๋Š” ํŒŒ์ด์ฌ์—์„œ ์‚ฌ์šฉํ•˜๋Š” ๋ฐ์ดํ„ฐ ๋ถ„์„ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์ด๋‹ค.

  • Pandas๋Š” Series์™€ DataFrame์ด๋ผ๋Š” ์ž๋ฃŒ๊ตฌ์กฐ๋ฅผ ์ œ๊ณตํ•œ๋‹ค.

  • ์‹œ๋ฆฌ์ฆˆ๋Š” 1์ฐจ์› ๋ฐฐ์—ด์— ์ธ๋ฑ์Šค๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ๊ตฌ์กฐ์ด๋‹ค.

  • DataFrame์€ ํ–‰๊ณผ ์—ด๋กœ ์ด๋ฃจ์–ด์ง„ 2์ฐจ์› ํ˜•ํƒœ์˜ ๋ฐฐ์—ด๋กœ, ์‹œ๋ฆฌ์ฆˆ๋“ค์˜ ๋ฌถ์Œ์ด๋‹ค.

  • loc๋ฅผ ์ด์šฉํ•œ ํ–‰์—ด์„ ํƒ : ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„๋ช….loc[index๋ช…, column๋ช… ]

  • iloc๋ฅผ ์ด์šฉํ•œ ํ–‰์—ด์„ ํƒ : ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„๋ช….iloc[index๋ฒˆํ˜ธ, column๋ฒˆํ˜ธ]