Many clustering problems can be reduced to the task of partitioning a weighted graph into highly-connected components. The weighted edges indicate pairwise similarity between two nodes and can often be estimated from training data. However, in many domains, there exist higher-order dependencies not captured by pairwise metrics. For example, there may exist soft constraints on aggregate features of an entire cluster, such as its size, mean or mode. We propose clusterwise similarity metrics to directly measure the cohesion of an entire cluster of points. We describe ways to learn a clusterwise metric from labeled data, using weighted, first-order features over clusters. Extending recent work equating graph partitioning with inference in graphical models, we frame this approach within a discriminatively-trained Markov network. The advantages of our approach are demonstrated on the task of coreference resolution.


  author = {Aron Culotta and Andrew McCallum},
  title = {Learning clusterwise similarity with first-order features},
  booktitle = {Neural Information Processing Systems (NIPS) Workshop on the Theoretical Foundations of Clustering},
  year = {2005},
  address = {Whistler, B.C.},
  month = {December},