Cannot-Link Constraint

References

(Wikipedia, 2017) ⇒ https://en.wikipedia.org/wiki/Constrained_clustering Retrieved:2017-5-28.
- In computer science, constrained clustering is a class of semi-supervised learning algorithms. Typically, constrained clustering incorporates either a set of must-link constraints, cannot-link constraints, or both, with a Data clustering algorithm. Both a must-link and a cannot-link constraint define a relationship between two data instances. A must-link constraint is used to specify that the two instances in the must-link relation should be associated with the same cluster. A cannot-link constraint is used to specify that the two instances in the cannot-link relation should not be associated with the same cluster. These sets of constraints acts as a guide for which a constrained clustering algorithm will attempt to find clusters in a data set which satisfy the specified must-link and cannot-link constraints. Some constrained clustering algorithms will abort if no such clustering exists which satisfies the specified constraints. Others will try to minimize the amount of constraint violation should it be impossible to find a clustering which satisfies the constraints. Constraints could also be used to guide the selection of a clustering model among several possible solutions. ^[1]
  A cluster in which the members conform to all must-link and cannot-link constraints is called a chunklet.

↑ Pourrajabi, M.; Moulavi, D.; Campello, R. J. G. B.; Zimek, A.; Sander, J.; Goebel, R. (2014). “Model Selection for Semi-Supervised Clustering". Proceedings of the 17th International Conference on Extending Database Technology (EDBT),. pp. 331–342. doi:10.5441/002/edbt.2014.31.

(Cheng et al., 2014) ⇒ Cheng, J., Leng, M., Li, L., Zhou, H., & Chen, X. (2014). Active semi-supervised community detection based on must-link and cannot-link constraints. PloS one, 9(10), e110088.
- Abstract - Community structure detection is of great importance because it can help in discovering the relationship between the function and the topology structure of a network. Many community detection algorithms have been proposed, but how to incorporate the prior knowledge in the detection process remains a challenging problem. In this paper, we propose a semi-supervised community detection algorithm, which makes full utilization of the must-link and cannot-link constraints to guide the process of community detection and thereby extracts high-quality community structures from networks. To acquire the high-quality must-link and cannot-link constraints, we also propose a semi-supervised component generation algorithm based on active learning, which actively selects nodes with maximum utility for the proposed semi-supervised community detection algorithm step by step, and then generates the must-link and cannot-link constraints by accessing a noiseless oracle. Extensive experiments were carried out, and the experimental results show that the introduction of active learning into the problem of community detection makes a success. Our proposed method can extract high-quality community structures from networks, and significantly outperforms other comparison methods.

(Sammut & Webb, 2011) ⇒ Claude Sammut (editor), and Geoffrey I. Webb (editor). (2011). “Cannot-Link Constraint.” In: (Sammut & Webb, 2011) p.139
- Cannot-Link Constraint- A pairwise constraint between two items indicating that they should be placed into different clusters in the final partition.

(Basu et al., 2004) ⇒ Basu, S., Banerjee, A., & Mooney, R. J. (2004, April). [Active semi-supervision for pairwise constrained clustering. In: Proceedings of the 2004 SIAM International Conference on data mining (pp. 333-344)]. Society for Industrial and Applied Mathematics.
- Abstract: Semi-supervised clustering uses a small amount of supervised data to aid unsupervised learning. One typical approach specifies a limited number of must-link and cannot-link constraints between pairs of examples. This paper presents a pairwise constrained clustering framework and a new method for actively selecting informative pairwise constraints to get improved clustering performance. The clustering and active learning methods are both easily scalable to large datasets, and can handle very high dimensional data. Experimental and theoretical results confirm that this active querying of pairwise constraints significantly improves the accuracy of clustering when given a relatively small amount of supervision.