598:
80:, with the elements as its vertices and with non-negative real weights on its edges. Large weights indicate elements that should be considered more similar to each other, while small weights or missing edges indicate pairs of elements that are not similar. A hierarchical clustering can be described as a tree (not necessarily a binary tree) whose leaves are the elements to be clustered; the clusters are then the subsets of elements descending from each tree node, and the size
384:, the problem of finding a partition that minimizes the ratio of the total weight of cut edges to the total number of cut pairs. Equivalently, for purposes of approximation, one may minimize the ratio of the total weight of cut edges to the number of elements on the smaller side of the cut. Using the best known approximation for the sparsest cut problem, the
32:, the optimal clustering for this quality measure follows the underlying structure of the ultrametric space. In this sense, clustering methods that produce good clusterings for this objective can be expected to approximate the
363:
425:
78:
238:
183:
108:
206:
151:
278:
258:
128:
28:
on the elements to be clustered. It is named after Sanjoy
Dasgupta, who formulated it in 2016. Its key property is that, when the similarity comes from an
39:
In
Dasgupta's formulation, the input to a clustering problem consists of similarity scores between certain pairs of elements, represented as an
639:
286:
372:
to find. However, it is possible to find a clustering that approximates the minimum value of the objective in
632:
658:
391:
663:
625:
377:
17:
613:
505:, Philadelphia, Pennsylvania: Society for Industrial and Applied Mathematics, pp. 378–397,
376:
by a divisive (top-down) clustering algorithm that repeatedly subdivides the elements using an
45:
503:
Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on
Discrete Algorithms (SODA 2018)
571:
528:
476:
381:
211:
156:
8:
449:
Dasgupta, Sanjoy (2016), "A cost function for similarity-based hierarchical clustering",
385:
83:
188:
133:
575:
550:
506:
480:
454:
263:
243:
113:
25:
451:
Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of
Computing (STOC 2016)
579:
29:
597:
559:
516:
484:
464:
40:
567:
524:
472:
373:
609:
520:
498:
652:
541:
563:
468:
33:
605:
548:(2009), "Expander flows, geometric embeddings and graph partitioning",
501:(2018), "Hierarchical clustering: objective functions and algorithms",
496:
240:
denote the smallest cluster of a given clustering that contains both
545:
511:
459:
369:
497:
Cohen-Addad, Vincent; Kanade, Varun; Mallmann-Trenn, Frederik;
24:
is a measure of the quality of a clustering, defined from a
280:. Then Dasgupta defines the cost of a clustering to be
394:
289:
266:
246:
214:
191:
159:
136:
116:
86:
48:
419:
358:{\displaystyle \sum _{uv\in E}w(uv)\cdot |C(uv)|.}
357:
272:
252:
232:
200:
177:
145:
122:
102:
72:
650:
540:
633:
453:, New York, New York: ACM, pp. 118–127,
368:The optimal clustering for this objective is
640:
626:
444:
442:
440:
510:
458:
130:is its number of elements. For each edge
36:underlying the given similarity measure.
448:
437:
651:
592:
490:
420:{\displaystyle O({\sqrt {\log n}})}
13:
534:
14:
675:
596:
414:
398:
348:
344:
335:
328:
321:
312:
227:
218:
172:
163:
96:
88:
67:
55:
1:
430:
612:. You can help Knowledge by
7:
10:
680:
591:
521:10.1137/1.9781611975031.26
185:denote the weight of edge
153:of the input graph, let
564:10.1145/1502793.1502794
469:10.1145/2897518.2897527
378:approximation algorithm
73:{\displaystyle G=(V,E)}
18:hierarchical clustering
608:-related article is a
421:
359:
274:
254:
234:
202:
179:
147:
124:
104:
74:
422:
360:
275:
255:
235:
233:{\displaystyle C(uv)}
203:
180:
178:{\displaystyle w(uv)}
148:
125:
105:
75:
392:
388:of this approach is
382:sparsest cut problem
287:
264:
244:
212:
189:
157:
134:
114:
84:
46:
22:Dasgupta's objective
659:Clustering criteria
386:approximation ratio
103:{\displaystyle |C|}
551:Journal of the ACM
417:
355:
308:
270:
250:
230:
201:{\displaystyle uv}
198:
175:
146:{\displaystyle uv}
143:
120:
100:
70:
26:similarity measure
621:
620:
558:(2): A5:1–A5:37,
412:
290:
273:{\displaystyle v}
253:{\displaystyle u}
123:{\displaystyle C}
30:ultrametric space
671:
664:Statistics stubs
642:
635:
628:
600:
593:
583:
582:
538:
532:
531:
514:
494:
488:
487:
462:
446:
426:
424:
423:
418:
413:
402:
364:
362:
361:
356:
351:
331:
307:
279:
277:
276:
271:
259:
257:
256:
251:
239:
237:
236:
231:
207:
205:
204:
199:
184:
182:
181:
176:
152:
150:
149:
144:
129:
127:
126:
121:
109:
107:
106:
101:
99:
91:
79:
77:
76:
71:
41:undirected graph
16:In the study of
679:
678:
674:
673:
672:
670:
669:
668:
649:
648:
647:
646:
589:
587:
586:
546:Vazirani, Umesh
544:; Rao, Satish;
539:
535:
499:Mathieu, Claire
495:
491:
447:
438:
433:
401:
393:
390:
389:
374:polynomial time
347:
327:
294:
288:
285:
284:
265:
262:
261:
245:
242:
241:
213:
210:
209:
190:
187:
186:
158:
155:
154:
135:
132:
131:
115:
112:
111:
110:of any cluster
95:
87:
85:
82:
81:
47:
44:
43:
12:
11:
5:
677:
667:
666:
661:
645:
644:
637:
630:
622:
619:
618:
601:
585:
584:
542:Arora, Sanjeev
533:
489:
435:
434:
432:
429:
416:
411:
408:
405:
400:
397:
366:
365:
354:
350:
346:
343:
340:
337:
334:
330:
326:
323:
320:
317:
314:
311:
306:
303:
300:
297:
293:
269:
249:
229:
226:
223:
220:
217:
197:
194:
174:
171:
168:
165:
162:
142:
139:
119:
98:
94:
90:
69:
66:
63:
60:
57:
54:
51:
9:
6:
4:
3:
2:
676:
665:
662:
660:
657:
656:
654:
643:
638:
636:
631:
629:
624:
623:
617:
615:
611:
607:
602:
599:
595:
594:
590:
581:
577:
573:
569:
565:
561:
557:
553:
552:
547:
543:
537:
530:
526:
522:
518:
513:
508:
504:
500:
493:
486:
482:
478:
474:
470:
466:
461:
456:
452:
445:
443:
441:
436:
428:
409:
406:
403:
395:
387:
383:
379:
375:
371:
352:
341:
338:
332:
324:
318:
315:
309:
304:
301:
298:
295:
291:
283:
282:
281:
267:
247:
224:
221:
215:
195:
192:
169:
166:
160:
140:
137:
117:
92:
64:
61:
58:
52:
49:
42:
37:
35:
31:
27:
23:
19:
614:expanding it
603:
588:
555:
549:
536:
502:
492:
450:
367:
38:
34:ground truth
21:
15:
653:Categories
606:statistics
512:1704.02147
460:1510.05043
431:References
580:263871111
407:
325:⋅
302:∈
292:∑
380:for the
208:and let
572:2535878
529:3775814
485:2262168
477:3536559
370:NP-hard
578:
570:
527:
483:
475:
604:This
576:S2CID
507:arXiv
481:S2CID
455:arXiv
610:stub
260:and
560:doi
517:doi
465:doi
404:log
655::
574:,
568:MR
566:,
556:56
554:,
525:MR
523:,
515:,
479:,
473:MR
471:,
463:,
439:^
427:.
20:,
641:e
634:t
627:v
616:.
562::
519::
509::
467::
457::
415:)
410:n
399:(
396:O
353:.
349:|
345:)
342:v
339:u
336:(
333:C
329:|
322:)
319:v
316:u
313:(
310:w
305:E
299:v
296:u
268:v
248:u
228:)
225:v
222:u
219:(
216:C
196:v
193:u
173:)
170:v
167:u
164:(
161:w
141:v
138:u
118:C
97:|
93:C
89:|
68:)
65:E
62:,
59:V
56:(
53:=
50:G
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.