## Generalized Metric Repair on Graphs

Many modern data analysis algorithms either assume or are considerably more efficient if the distances between the data points satisfy a metric. These algorithms include metric learning, clustering, and dimension reduction... As real data sets are noisy, distances often fail to satisfy a metric. For this reason, Gilbert and Jain and Fan et al. introduced the closely related sparse metric repair and metric violation distance problems. The goal of these problems is to repair as few distances as possible to ensure they satisfy a metric. Three variants were considered, one admitting a polynomial time algorithm. The other variants were shown to be APX-hard, and an $O(OPT^{1/3})$-approximation was given, where $OPT$ is the optimal solution size. In this paper, we generalize these problems to no longer consider all distances between the data points. That is, we consider a weighted graph $G$ with corrupted weights $w$, and our goal is to find the smallest number of weight modifications so that the resulting weighted graph distances satisfy a metric. This is a natural generalization and is more flexible as it takes into account different relationships among the data points. As in previous work, we distinguish among the types of repairs permitted and focus on the increase only and general versions. We demonstrate the inherent combinatorial structure of the problem, and give an approximation-preserving reduction from MULTICUT. Conversely, we show that for any fixed constant $\varsigma$, for the large class of $\varsigma$-chordal graphs, the problems are fixed parameter tractable. Call a cycle broken if it contains an edge whose weight is larger than the sum of all its other edges, and call the amount of this difference its deficit. We present approximation algorithms, one which depends on the maximum number of edges in a broken cycle, and one which depends on the number of distinct deficit values. read more

PDF Abstract