25
Scalable Fair Clustering Arturs Backurs Piotr Indyk Krzysztof Onak Baruch Schieber Ali Vakilian Tal Wagner

Scalable Fair clustering - icml.cc · Scalable Fair Clustering Arturs Backurs Piotr Indyk Krzysztof Onak Baruch Schieber Ali Vakilian Tal Wagner. Fair Clustering • Algorithmic Fairness

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Scalable Fair clustering - icml.cc · Scalable Fair Clustering Arturs Backurs Piotr Indyk Krzysztof Onak Baruch Schieber Ali Vakilian Tal Wagner. Fair Clustering • Algorithmic Fairness

Scalable Fair ClusteringArturs Backurs Piotr Indyk Krzysztof Onak

Baruch Schieber Ali Vakilian Tal Wagner

Page 2: Scalable Fair clustering - icml.cc · Scalable Fair Clustering Arturs Backurs Piotr Indyk Krzysztof Onak Baruch Schieber Ali Vakilian Tal Wagner. Fair Clustering • Algorithmic Fairness

Fair Clustering

• Algorithmic Fairness

• Common Unsupervised Learning Task

• Further Implication: e.g., feature engineering

©Petrina Chan/The Tufts Daily

Page 3: Scalable Fair clustering - icml.cc · Scalable Fair Clustering Arturs Backurs Piotr Indyk Krzysztof Onak Baruch Schieber Ali Vakilian Tal Wagner. Fair Clustering • Algorithmic Fairness

Problem Definition

Page 4: Scalable Fair clustering - icml.cc · Scalable Fair Clustering Arturs Backurs Piotr Indyk Krzysztof Onak Baruch Schieber Ali Vakilian Tal Wagner. Fair Clustering • Algorithmic Fairness

Problem Definition

Page 5: Scalable Fair clustering - icml.cc · Scalable Fair Clustering Arturs Backurs Piotr Indyk Krzysztof Onak Baruch Schieber Ali Vakilian Tal Wagner. Fair Clustering • Algorithmic Fairness

Problem Definition

Page 6: Scalable Fair clustering - icml.cc · Scalable Fair Clustering Arturs Backurs Piotr Indyk Krzysztof Onak Baruch Schieber Ali Vakilian Tal Wagner. Fair Clustering • Algorithmic Fairness

Problem Definition

Page 7: Scalable Fair clustering - icml.cc · Scalable Fair Clustering Arturs Backurs Piotr Indyk Krzysztof Onak Baruch Schieber Ali Vakilian Tal Wagner. Fair Clustering • Algorithmic Fairness

Problem Definition

• Collection of 𝑛𝑛 points 𝑃𝑃 in ℝ𝑑𝑑

• Each point is colored either red or blue• Each cluster S has to be (r,b)-balanced

𝑏𝑏𝑟𝑟≤ # 𝐫𝐫𝐫𝐫𝐫𝐫 points in 𝑆𝑆

# 𝐛𝐛𝐛𝐛𝐛𝐛𝐫𝐫 points in 𝑆𝑆≤ 𝑟𝑟

𝑏𝑏

Page 8: Scalable Fair clustering - icml.cc · Scalable Fair Clustering Arturs Backurs Piotr Indyk Krzysztof Onak Baruch Schieber Ali Vakilian Tal Wagner. Fair Clustering • Algorithmic Fairness

Problem Definition

• Collection of 𝑛𝑛 points 𝑃𝑃 in ℝ𝑑𝑑

• Each point is colored either red or blue• Each cluster S has to be (r,b)-balanced

𝑏𝑏𝑟𝑟≤ # 𝐫𝐫𝐫𝐫𝐫𝐫 points in 𝑆𝑆

# 𝐛𝐛𝐛𝐛𝐛𝐛𝐫𝐫 points in 𝑆𝑆≤ 𝑟𝑟

𝑏𝑏

𝟑𝟑,𝟐𝟐 -balanced

Page 9: Scalable Fair clustering - icml.cc · Scalable Fair Clustering Arturs Backurs Piotr Indyk Krzysztof Onak Baruch Schieber Ali Vakilian Tal Wagner. Fair Clustering • Algorithmic Fairness

Problem Definition

• Collection of 𝑛𝑛 points 𝑃𝑃 in ℝ𝑑𝑑

• Each point is colored either red or blue• Each cluster S has to be (r,b)-balanced

𝑏𝑏𝑟𝑟≤ # 𝐫𝐫𝐫𝐫𝐫𝐫 points in 𝑆𝑆

# 𝐛𝐛𝐛𝐛𝐛𝐛𝐫𝐫 points in 𝑆𝑆≤ 𝑟𝑟

𝑏𝑏

(r,b)-Fair k-median: Find 𝒌𝒌 centers that partition 𝑃𝑃 into (r,b)-balanced clusters s.t. average distance of points to their centers is minimized.

𝟑𝟑,𝟐𝟐 -balanced

Page 10: Scalable Fair clustering - icml.cc · Scalable Fair Clustering Arturs Backurs Piotr Indyk Krzysztof Onak Baruch Schieber Ali Vakilian Tal Wagner. Fair Clustering • Algorithmic Fairness

Fair Clustering Through Fairlets [Chierichetti et al]

• Fairlets: minimal sets that satisfy the (r,b)-balance requirement

Page 11: Scalable Fair clustering - icml.cc · Scalable Fair Clustering Arturs Backurs Piotr Indyk Krzysztof Onak Baruch Schieber Ali Vakilian Tal Wagner. Fair Clustering • Algorithmic Fairness

Fair Clustering Through Fairlets [Chierichetti et al]

• Fairlets: minimal sets that satisfy the (r,b)-balance requirement

Outline of Algorithm [Chierichetti et al, NeurIPS’17]I. Compute an approximately optimal fairlet decomposition 𝜶𝜶-approxII. Cluster the centers of fairlets into 𝑘𝑘 groups 𝜷𝜷-approx

Theorem. The proposed algorithm is 𝑂𝑂(𝛼𝛼 + 𝛽𝛽)-approximation

Page 12: Scalable Fair clustering - icml.cc · Scalable Fair Clustering Arturs Backurs Piotr Indyk Krzysztof Onak Baruch Schieber Ali Vakilian Tal Wagner. Fair Clustering • Algorithmic Fairness

Fair Clustering Through Fairlets [Chierichetti et al]

• Fairlets: minimal sets that satisfy the (r,b)-balance requirement

Outline of Algorithm [Chierichetti et al, NeurIPS’17]I. Compute an approximately optimal fairlet decomposition 𝜶𝜶-approxII. Cluster the centers of fairlets into 𝑘𝑘 groups 𝜷𝜷-approx

Theorem. The proposed algorithm is 𝑂𝑂(𝛼𝛼 + 𝛽𝛽)-approximation

Limitations: 1) Quadratic runtime in step I2) Only works for (𝑡𝑡, 1)-balanced (𝑡𝑡 is an integer)

Page 13: Scalable Fair clustering - icml.cc · Scalable Fair Clustering Arturs Backurs Piotr Indyk Krzysztof Onak Baruch Schieber Ali Vakilian Tal Wagner. Fair Clustering • Algorithmic Fairness

Near Linear Time Fairlet Decomposition

1. HST-embedding of points

Page 14: Scalable Fair clustering - icml.cc · Scalable Fair Clustering Arturs Backurs Piotr Indyk Krzysztof Onak Baruch Schieber Ali Vakilian Tal Wagner. Fair Clustering • Algorithmic Fairness

Near Linear Time Fairlet Decomposition

1. HST-embedding of points

Page 15: Scalable Fair clustering - icml.cc · Scalable Fair Clustering Arturs Backurs Piotr Indyk Krzysztof Onak Baruch Schieber Ali Vakilian Tal Wagner. Fair Clustering • Algorithmic Fairness

Near Linear Time Fairlet Decomposition

1. HST-embedding of points

Page 16: Scalable Fair clustering - icml.cc · Scalable Fair Clustering Arturs Backurs Piotr Indyk Krzysztof Onak Baruch Schieber Ali Vakilian Tal Wagner. Fair Clustering • Algorithmic Fairness

Near Linear Time Fairlet Decomposition

1. HST-embedding of points

Page 17: Scalable Fair clustering - icml.cc · Scalable Fair Clustering Arturs Backurs Piotr Indyk Krzysztof Onak Baruch Schieber Ali Vakilian Tal Wagner. Fair Clustering • Algorithmic Fairness

Near Linear Time Fairlet Decomposition

1. HST-embedding of points2. Top-down traversal +

greedy fairlet construction

Page 18: Scalable Fair clustering - icml.cc · Scalable Fair Clustering Arturs Backurs Piotr Indyk Krzysztof Onak Baruch Schieber Ali Vakilian Tal Wagner. Fair Clustering • Algorithmic Fairness

Near Linear Time Fairlet Decomposition

1. HST-embedding of points2. Top-down traversal +

greedy fairlet construction

𝟏𝟏,𝟏𝟏 -balanced

Page 19: Scalable Fair clustering - icml.cc · Scalable Fair Clustering Arturs Backurs Piotr Indyk Krzysztof Onak Baruch Schieber Ali Vakilian Tal Wagner. Fair Clustering • Algorithmic Fairness

Near Linear Time Fairlet Decomposition

1. HST-embedding of points2. Top-down traversal +

greedy fairlet construction

𝟏𝟏,𝟏𝟏 -balanced

Page 20: Scalable Fair clustering - icml.cc · Scalable Fair Clustering Arturs Backurs Piotr Indyk Krzysztof Onak Baruch Schieber Ali Vakilian Tal Wagner. Fair Clustering • Algorithmic Fairness

Near Linear Time Fairlet Decomposition

1. HST-embedding of points2. Top-down traversal +

greedy fairlet construction

𝟏𝟏,𝟏𝟏 -balanced

Page 21: Scalable Fair clustering - icml.cc · Scalable Fair Clustering Arturs Backurs Piotr Indyk Krzysztof Onak Baruch Schieber Ali Vakilian Tal Wagner. Fair Clustering • Algorithmic Fairness

Near Linear Time Fairlet Decomposition

1. HST-embedding of points2. Top-down traversal +

greedy fairlet construction

𝟏𝟏,𝟏𝟏 -balanced

Page 22: Scalable Fair clustering - icml.cc · Scalable Fair Clustering Arturs Backurs Piotr Indyk Krzysztof Onak Baruch Schieber Ali Vakilian Tal Wagner. Fair Clustering • Algorithmic Fairness

Near Linear Time Fairlet Decomposition

1. HST-embedding of points2. Top-down traversal +

greedy fairlet construction

𝟏𝟏,𝟏𝟏 -balanced

Page 23: Scalable Fair clustering - icml.cc · Scalable Fair Clustering Arturs Backurs Piotr Indyk Krzysztof Onak Baruch Schieber Ali Vakilian Tal Wagner. Fair Clustering • Algorithmic Fairness

Near Linear Time Fairlet Decomposition

1. HST-embedding of points2. Top-down traversal +

greedy fairlet constructionTheorem. 𝑂𝑂(𝑑𝑑 ⋅ log𝑛𝑛)-approx.for fairlet-decomposition in 𝑂𝑂(𝑑𝑑 ⋅ 𝑛𝑛 ⋅ log 𝑛𝑛) time

I. Runs in near-linear timeII. Works for all values of (r,b)

Page 24: Scalable Fair clustering - icml.cc · Scalable Fair Clustering Arturs Backurs Piotr Indyk Krzysztof Onak Baruch Schieber Ali Vakilian Tal Wagner. Fair Clustering • Algorithmic Fairness

Empirical Results

Dataset BalanceFairlet Decomposition Cost

Previous Work* OursDiabetes 0.8 ~9836 2971

Bank 0.5 ~5.46 × 105 5.24 × 105

Census 0.5 ~3.59 × 107 2.41 × 107

Runtime scales almost linearly in the number of points while the empirical quality is as good as (Chierichetti et al., 2017)

*(Chierichetti et al., NeurIPS 2017)

Page 25: Scalable Fair clustering - icml.cc · Scalable Fair Clustering Arturs Backurs Piotr Indyk Krzysztof Onak Baruch Schieber Ali Vakilian Tal Wagner. Fair Clustering • Algorithmic Fairness

Empirical Results

Dataset BalanceFairlet Decomposition Cost

Previous Work* OursDiabetes 0.8 ~9836 2971

Bank 0.5 ~5.46 × 105 5.24 × 105

Census 0.5 ~3.59 × 107 2.41 × 107

Runtime scales almost linearly in the number of points while the empirical quality is as good as (Chierichetti et al., 2017)

*(Chierichetti et al., NeurIPS 2017)

Thank You!Poster: @6:30 pm - Pacific Ballroom #84