Correlation Analysis

“Opportunity is missed by most people because it is dressed in overalls and looks like work.” - Thomas A. Edison

"Am I loosing an opportunity if my users don’t do specific actions in the pre period to retain/uninstall/ activate to specific event in the post period". - Apxor Insights

Objective:

To figure out what are the actions that are taken by the user inside the app are leading to certain metrics like Retention, Churn or Doing a aha moment and many more and effecting the conversions heavily when they are not done in the pre period.

How to Do:

Pre Period:

  • Select the user journey days in the app in which you believe the opportunity window to nudge the user that will lead for the conversion event in the post period

  • Include a event or list of events, on which you are sure that they will have the impact on the conversion event

  • Exclude a event or list of events, which you don’t want to include this analysis, might be some SYSTEM EVENTS etc.

Post Period:

  • Select the user journey days in the app in which the conversion event should happen

  • Select Metrics such as “Retention” or “Activation”

    • Retention: Opening the app in the post period

    • Activation: Select the event that you want to label as the conversion event

Users:

  • Select whether this analysis should be performed on

    • All users: who ever opened the app

    • Segment: A pre defined segment based on certain specific user behavior

    • Cohort: A fixed set of users, that you obtained it from some other platform

Once you fill these details, click on “Generate Results”

Note:

The report generation might take 2 to 3 minutes based upon the data that is being analyzed

Results:

  • By default, the analysis will be run on the last 30 days data, and once can change to last 15Days or last 7 Days.

The table will provide the following:

  • Event: the event name that is done in the pre period

  • Correlation (F-score): quantifies the relationship between the event and the metric selected during the post period. It lies between 0 to 1 and usually ≥0.25 indicates strong correlation.

  • % Users: It is vital to observe the percentage of users that have done the specific event in the pre period. One can considers such events that are done by at lease 1% of the users in the pre period

From the above table, one can select the events for further analysis based on the frequency that the specific action is performed:

  • Pick Automatically: By checking the box “Pick Automatically”, we will select the events automatically and dig further on the Correlation Table

  • Using the corresponding values, pick the events you want to analyze further using the Correlation Table

Correlation Table:

This is a 2 X 2 matrix representation where the rows will denote whether the event is performed at least once or not in the pre-period and the columns will denote whether they are in the TARGET GROUP (converted) or not in the post-period.

  • the element (1,1) denotes the number of users who converted in the post period after doing the event at least once in the pre period. The uplift in the brackets is calculated from the conversion percentage of the post event or metric out of the user who opened the app at least once. It should be a positive value (Green arrow)

  • the element (1,2) denotes the number of users who has not converted in the post period after doing the event at least once in the pre period. The uplift in the brackets is calculated from the non conversion percentage of the post event or metric out of the user who opened the app at least once. It should be a negative value (Red arrow)

  • the element (2,1) denotes the number of users who converted in the post period after not doing the event at all in the pre period. The uplift in the brackets is calculated from the conversion percentage of the post event or metric out of the user who opened the app at least once. It should be a negative value (Red arrow)

  • the element (2,2) denotes the number of users who has not converted in the post period after not doing the event at all in the pre period. The uplift in the brackets is calculated from the non conversion percentage of the post event or metric out of the user who opened the app at least once. It should be ideally a positive value (Green arrow)

  • Ideally, an event is said to have a correlation with the conversion, if the diagonal elements are positive and the non diagonals are negative.

Along with the matrix, we present the following metrics to quantify the correlation.

Correlation Table Metrics

The correlation table draws a comparison with the confusion matrix that is being used for assessing the quality of a classification modal in machine learning.

  • Predicted Values-

    • Positive (1) - Doing the event at least once in the pre period

    • Negative (0) - Not doing the event at all in the pre period

  • Actual Values-

    • Positive (1) - Converting or doing the post metric in the post period

    • Negative (0) - Non conversion or not doing the post metric in the post period

  • True Positive (TP): Doing the event at least once in the pre period and doing the post metric in the post period

  • False Positive (FP): Doing the event at least once in the pre period not doing the post metric in the post period

  • False Negative (FN): Not doing the event at all in the pre period not doing the post metric in the post period

  • False Positive (FP): Not doing the event at all in the pre period and doing the post metric in the post period

Using this, we can define and interpret the following metrics:

Accuracy:

This tells you, for how many of the users, it means that doing an event in the pre period is converting in the post period and not doing the event in the pre period refers to not converting in the post period.

Precision:

Precision tells us how many of the users who did the event in the pre period actually turned out to be converted.

Precision is a useful metric in cases where False Positive is a higher concern than False Negatives.

Recall Or Sensitivity:

Recall tells us how many of the actual converted users in the post period have done the event in the pre period

Recall is a useful metric in cases where False Negative trumps False Positive.

F1-Score:

F1-score is a harmonic mean of Precision and Recall, and so it gives a combined idea about these two metrics. It is maximum when Precision is equal to Recall.

F1-score is useful when there is no clear distinction between whether Precision is more important or Recall

True Negative Rate Or Specificity:

Specificity tells us how many of the actual non converted users in the post period have not done the event in the pre period. It is counter part of the Sensitivity.

Specificity is a useful metric in cases where False Negative trumps False Positive.

False Positive Rate or Type-I Error:

False Positive Rate is the percentage of users who have done the event in the pre period but not converted in the post period out of all those non converted users.

This is also called Type-1 Error. This gives us how many of the users we wrongly predicted that they will convert. This should be as minimal as it can be.

False Negative Rate or Type-II Error:

False Negative Rate is the percentage of users who have not done the event in the pre period but converted in the post period out of all those converted users.

This is also called Type-II Error. This gives us how many of the users we wrongly predicted that they will not convert. This should be as minimal as it can be.

A trade off should be taken between type-I and type-II errors in view of understanding which can be a costly mistake to commit.

Note: Apxor by default sorts the event in the order of F1-score. But, customers should take the final decision based on the type of metric that they are considering as conversion.

Last updated