ml data technical write-up

Property Assessment ML Pipeline

Machine learning and fairness analysis over 200k+ records

This project processed 200k+ property records through cleaning, feature engineering, modeling, and fairness-oriented evaluation.

Problem / motivation

Assessment models can look accurate in aggregate while behaving unevenly across neighborhoods, property types, or demographic proxies. The goal was to pair model performance with careful inspection.

Key technical challenges

  • Handling missing, inconsistent, and high-cardinality fields.
  • Avoiding misleading aggregate metrics.
  • Communicating model behavior clearly to non-ML audiences.

Architecture / workflow

How the system fits together

01

Raw property records are cleaned and normalized in pandas notebooks and scripts.

02

Feature engineering prepares numeric, categorical, and geographic signals.

03

Model experiments compare baselines and stronger estimators using scikit-learn.

04

Evaluation includes error slices, residual inspection, and fairness-oriented comparisons.

What I built

  • Reusable data-cleaning steps for 200k+ records.
  • Model training and evaluation workflow.
  • Fairness analysis views that compare model error across slices.

Outcomes / metrics

  • Produced a technically grounded ML workflow beyond a single accuracy score.
  • Strengthened skills in data preparation, evaluation design, and model communication.

Lessons learned

  • The hardest ML work is often making the data and evaluation honest.
  • Fairness analysis needs to be designed into the workflow, not bolted on at the end.

Screenshots / media

Visual evidence placeholders

Replace these panels with screenshots, demos, diagrams, or notebook exports as each artifact becomes ready for publishing.

Media

Pipeline stages

Cleaning, features, training, evaluation, and fairness slices.

Media

Evaluation notebook

Model metrics paired with residual and subgroup analysis.