In [3]:
from dsc80_utils import *
Announcements 📣¶
Lab 9 deadline moved to Monday Dec 2 since Thanksgiving is this week.
Guest lecture on Thursday Dec 5, 1:30pm-3pm in the HDSI MPR: Dr. Mohammad Ramezanali, an AI lead from Salesforce, will be talking about LLMs and how he uses them in industry.
- No regular lecture on Dec 5.
- If you attend the guest lecture, you will get lecture attendance credit and 1% extra credit on your final exam grade.
- If you can't make it, we'll record the talk and you can get attendance + extra credit by making a post on Ed with a few paragraphs about the talk (details to come).
The Final Project is due on Friday Dec 6.
- No slip days allowed!
The Final Exam is on Saturday, Dec 7 from 11:30am-2:30pm in PODEM 1A18 and 1A19.
- Practice by working through old exams at practice.dsc80.com.
- You can bring two double-sided notes sheets
- Will post on Ed for more details.
Thursday's class will start with career advice, then rest of time is exam review!
Final Exam 📝¶
- Saturday, Dec 7 from 11:30am-2:30pm in PODEM 1A18 and 1A19.
- Will write the exam to take about 2 hours, so you'll have a lot of time to double check your work.
- Two 8.5"x11" cheat sheets allowed of your own creation (handwritten on tablet, then printed is okay.)
- Covers every lecture, lab, and project.
- Similar format to the midterm: mix of fill-in-the-blank, multiple choice, and free response.
- I use
pandas
fill-in-the-blank questions to test your ability to read and write code, not just write code from scratch, which is why they can feel tricker.
- I use
- Questions on final about pre-Midterm material will be marked as "M". Your Midterm grade will be the higher of your (z-score adjusted) grades on the Midterm and the questions marked as "M" on the final.
Agenda 📆¶
- Classifier evaluation.
- Logistic regression.
- Model fairness.
Aside: MLU Explain is a great resource with visual explanations of many of our recent topics (cross-validation, random forests, precision and recall, etc.).
Random Forests¶
In [4]:
diabetes = pd.read_csv(Path('data') / 'diabetes.csv')
display_df(diabetes, cols=9)
Pregnancies | Glucose | BloodPressure | SkinThickness | Insulin | BMI | DiabetesPedigreeFunction | Age | Outcome | |
---|---|---|---|---|---|---|---|---|---|
0 | 6 | 148 | 72 | 35 | 0 | 33.6 | 0.63 | 50 | 1 |
1 | 1 | 85 | 66 | 29 | 0 | 26.6 | 0.35 | 31 | 0 |
2 | 8 | 183 | 64 | 0 | 0 | 23.3 | 0.67 | 32 | 1 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
765 | 5 | 121 | 72 | 23 | 112 | 26.2 | 0.24 | 30 | 0 |
766 | 1 | 126 | 60 | 0 | 0 | 30.1 | 0.35 | 47 | 1 |
767 | 1 | 93 | 70 | 31 | 0 | 30.4 | 0.32 | 23 | 0 |
768 rows × 9 columns
In [5]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = (
train_test_split(diabetes[['Glucose', 'BMI']], diabetes['Outcome'], random_state=1)
)
In [6]:
fig = (
X_train.assign(Outcome=y_train.astype(str))
.plot(kind='scatter', x='Glucose', y='BMI', color='Outcome',
color_discrete_map={'0': 'orange', '1': 'blue'},
title='Relationship between Glucose, BMI, and Diabetes')
)
fig