- See: Confidence Score, Prediction Score Distribution.
- (Grassi et al., 2019) ⇒ Massimiliano Grassi, David A. Loewenstein, Daniela Caldirola, Koen Schruers, Ranjan Duara, and Giampaolo Perna. (2019). “A Clinically-translatable Machine Learning Algorithm for the Prediction of Alzheimer’s Disease Conversion: Further Evidence of Its Accuracy via a Transfer Learning Approach.” International psychogeriatrics 31, no. 7
- QUOTE: … In a previous study, we developed a highly performant and clinically-translatable machine learning algorithm for a prediction of three-year … As primary performance metric, the Area Under the Receiving Operating Curve (AUC) was used. At first, the algorithm outputs a continuous prediction score (range: 0–1; the closer to 1 the higher the predicted risk of conversion for that subject), then the class prediction is finally made setting a cut-off score (AD if above or equal to the cut-off score, CN if below). ... The cut-off applied to the algorithm output scores was progressively increased starting from 0, and the thresholds providing the best balanced accuracy was identified, calculating also the sensitivity and specificity achieved. ...
- (Wang, He, et al., 2018) ⇒ Xiang Wang, Xiangnan He, Fuli Feng, Liqiang Nie, and Tat-Seng Chua. (2018). “TEM: Tree-enhanced Embedding Model for Explainable Recommendation.” In: Proceedings of the 2018 World Wide Web Conference.
- QUOTE: ... Given one positive user-item interaction in the testing set, we pair it with 50 negative instances that the user did not consume before. Then each method outputs prediction scores for these 51 instances. To evaluate the prediction scores, we adopt two metrics: the error-based log loss and the ranking-aware ndcg@K. ...
- (Graham, 2015) ⇒ Yvette Graham. (2015). “Improving Evaluation of Machine Translation Quality Estimation.” In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 1804-1813.
- QUOTE: ... Quality estimation evaluation commonly takes the form of measurement of the error that exists between predictions and gold standard labels for a particular test set of translations. Issues can arise during comparison of quality estimation prediction score distributions and gold label distributions, however. In this paper, we provide an analysis of methods of comparison and identify areas of concern with respect to widely used measures, such as the ability to gain by prediction of aggregate statistics specific to gold label distributions or by optimally conservative variance in prediction score distributions. As an alternative, we propose the use of the unit-free Pearson correlation, in addition to providing an appropriate method of significance testing improvements over a baseline. Components ofWMT-13 andWMT-14 quality estimation shared tasks are replicated to reveal substantially increased conclusivity in system rankings, including identification of outright winners of tasks. ...
- (Yuan et al., 2012) ⇒ Lei Yuan, Yalin Wang, Paul M. Thompson, Vaibhav A. Narayan, and Jieping Ye. (2012). “Multi-source Learning for Joint Analysis of Incomplete Multi-modality Neuroimaging Data.” In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2012). ISBN:978-1-4503-1462-6 doi:10.1145/2339530.2339710
- QUOTE: ... Our second method learns a base classifier for each data source independently, based on which we represent each source using a single column of prediction scores; we then estimate the missing prediction scores, which, combined with the existing prediction scores, are used to build a multi-source fusion model. ...