Overall Evaluation Criteria (OEC)

An Overall Evaluation Criteria (OEC) is an evaluation criteria of an A/B test.

Context:
- It can range from being a Good OEC to being a Bad OEC.
- It can (typically) be a Leading Metric.
Example(s):
- Bing: Sesssions/User ...
- Netflix: viewing hours (or content consumption) as their strongest proxy metric for retention
- Coursera: test completion or course engagement as proxies for their key metric.
- …
Counter-Example(s):
- a Lagging Metric, such as KPIs.
See: Guardrail Metric.

References

2017

http://exp-platform.com/Documents/2017-08%20ABTutorial/ABTutorial_5_designingMetrics.pptx
- QUOTE: OEC vs KPIs (Key Performance Indicators).
  - KPIs are lagging metrics reported monthly/quarterly/yearly at the overall product level (DAU, MAU, Revenue, etc.)
  - OEC is a leading metric measured during the experiment (e.g. 2 weeks) at user level, which is indicative of long term increase in KPIs
- Netflix: Because it is a subscription-based business, one of Netflix’s key metrics is retention, defined as the percentage of their customers who return month over month. Conceptually, you can imagine that someone who watches a lot of Netflix should derive a lot of value from the service and therefore be more likely to renew their subscription. In fact, the Netflix team found a very strong correlation between viewing hours and retention. So, for instance, if a user watched only one hour of Netflix per month, then they were not as likely to renew their monthly subscription as if they watched 15 hours of Netflix per month. As a result, the Netflix team used viewing hours (or content consumption) as their strongest proxy metric for retention, and many tests at Netflix had the goal of increasing the number of hours users streamed.
- Coursera: Coursera is a credential-driven business; in other words, they make money when users pay for credentials (certifications) after completing a course. One of their key metrics might be the number of credentials sold, or the revenue generated from credential purchases. You might be skeptical about this metric, though, and for good reason: because Coursera courses are often 13-week college classes, measuring how a design changes this metric would take far too long to be practical...Course completion is predicted by module completion, which is in turn predicted by test completion and how often a user engages with a course. So Coursera measures test completion or course engagement as proxies for their key metric, allowing them to cut down the time to measure the impact of a design change dramatically.

2016

(Deng & Shi, 2016) ⇒ Alex Deng, and Xiaolin Shi. (2016). “Data-Driven Metric Development for Online Controlled Experiments: Seven Lessons Learned.” In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ISBN:978-1-4503-4232-2 doi:10.1145/2939672.2939700
- QUOTE: … The Overall Evaluation Criteria (OEC) (also known as goal metrics or key metrics) of an online service are metrics defined to help the system move toward the North Star. The choice of good OEC is not easy. As an online service grows over time, its OEC should also evolve, and the choice of its OEC depends on how far the system is away from the North Star ...

2007

(Kohavi et al., 2007) ⇒ Ron Kohavi, Randal M. Henne, and Dan Sommerfield. (2007). “Practical Guide to Controlled Experiments on the Web: Listen to Your Customers Not to the Hippo.” In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge discovery and data mining, pp. 959-967 . ACM,
- QUOTE: Overall Evaluation Criterion (OEC) (Roy, 2001). A quantitative measure of the experiment’s objective. In statistics this is often called the Response or Dependent Variable (15; 16); other synonyms include Outcome, Evaluation metric, Performance metric, or Fitness Function (22). Experiments may have multiple objectives and a scorecard approach might be taken (29), although selecting a single metric, possibly as a weighted combination of such objectives is highly desired and recommended (Roy, 2001 p. 50). A single metric forces tradeoffs to be made once for multiple experiments and aligns the organization behind a clear objective. A good OEC should not be short-term focused (e.g., clicks); to the contrary, it should include factors that predict long-term goals, such as predicted lifetime value and repeat visits. Ulwick describes some ways to measure what customers want (although not specifically for the web) (Pinegar, 2006).

2006

(Ulwick, 2005) ⇒ Anthony W. Ulwick. (2005). “What Customers Want: Using Outcome-driven Innovation to Create Breakthrough Products and Services.” McGraw-Hill Companies,

2001

(Roy, 2001) ⇒ Ranjit K. Roy. (2001). “Design of Experiments Using the Taguchi Approach: 16 Steps to Product and Process Improvement.” John Wiley & Sons,
- QUOTE: [[Overall Evaluation Criteria (OEC)|Overall evaluation criterion (OEC). For most industrial projects, we are after more than one objective. If the design is adjusted to maximize the result based on one objective, it may not necessarily produce favorable results for the other objectives. A compromise might have to be obtained by analysis of the overall evaluation criterion (OEC), which is obtained by combining into one number evaluations under various criteria.

Overall Evaluation Criteria (OEC)

References

2017

2016

2007

2006

2001

Navigation menu

Search