Correlation Analysis
Last updated
Last updated
“Opportunity is missed by most people because it is dressed in overalls and looks like work.” - Thomas A. Edison
"Am I loosing an opportunity if my users don’t do specific actions in the pre period to retain/uninstall/ activate to specific event in the post period". - Apxor Insights
To figure out what are the actions that are taken by the user inside the app are leading to certain metrics like Retention, Churn or Doing a aha moment and many more and effecting the conversions heavily when they are not done in the pre period.
Select the user journey days in the app in which you believe the opportunity window to nudge the user that will lead for the conversion event in the post period
Include a event or list of events, on which you are sure that they will have the impact on the conversion event
Exclude a event or list of events, which you don’t want to include this analysis, might be some SYSTEM EVENTS etc.
Select the user journey days in the app in which the conversion event should happen
Select Metrics such as “Retention” or “Activation”
Retention: Opening the app in the post period
Activation: Select the event that you want to label as the conversion event
Select whether this analysis should be performed on
All users: who ever opened the app
Segment: A pre defined segment based on certain specific user behavior
Cohort: A fixed set of users, that you obtained it from some other platform
Once you fill these details, click on “Generate Results”
Note:
The report generation might take 2 to 3 minutes based upon the data that is being analyzed
By default, the analysis will be run on the last 30 days data, and once can change to last 15Days or last 7 Days.
The table will provide the following:
Event: the event name that is done in the pre period
Correlation (F-score): quantifies the relationship between the event and the metric selected during the post period. It lies between 0 to 1 and usually ≥0.25 indicates strong correlation.
% Users: It is vital to observe the percentage of users that have done the specific event in the pre period. One can considers such events that are done by at lease 1% of the users in the pre period
From the above table, one can select the events for further analysis based on the frequency that the specific action is performed:
Pick Automatically: By checking the box “Pick Automatically”, we will select the events automatically and dig further on the Correlation Table
Using the corresponding values, pick the events you want to analyze further using the Correlation Table
This is a 2 X 2 matrix representation where the rows will denote whether the event is performed at least once or not in the pre-period and the columns will denote whether they are in the TARGET GROUP (converted) or not in the post-period.
the element (1,1) denotes the number of users who converted in the post period after doing the event at least once in the pre period. The uplift in the brackets is calculated from the conversion percentage of the post event or metric out of the user who opened the app at least once. It should be a positive value (Green arrow)
the element (1,2) denotes the number of users who has not converted in the post period after doing the event at least once in the pre period. The uplift in the brackets is calculated from the non conversion percentage of the post event or metric out of the user who opened the app at least once. It should be a negative value (Red arrow)
the element (2,1) denotes the number of users who converted in the post period after not doing the event at all in the pre period. The uplift in the brackets is calculated from the conversion percentage of the post event or metric out of the user who opened the app at least once. It should be a negative value (Red arrow)
the element (2,2) denotes the number of users who has not converted in the post period after not doing the event at all in the pre period. The uplift in the brackets is calculated from the non conversion percentage of the post event or metric out of the user who opened the app at least once. It should be ideally a positive value (Green arrow)
Ideally, an event is said to have a correlation with the conversion, if the diagonal elements are positive and the non diagonals are negative.
Along with the matrix, we present the following metrics to quantify the correlation.
The correlation table draws a comparison with the confusion matrix that is being used for assessing the quality of a classification modal in machine learning.
Predicted Values-
Positive (1) - Doing the event at least once in the pre period
Negative (0) - Not doing the event at all in the pre period
Actual Values-
Positive (1) - Converting or doing the post metric in the post period
Negative (0) - Non conversion or not doing the post metric in the post period
True Positive (TP): Doing the event at least once in the pre period and doing the post metric in the post period
False Positive (FP): Doing the event at least once in the pre period not doing the post metric in the post period
False Negative (FN): Not doing the event at all in the pre period not doing the post metric in the post period
False Positive (FP): Not doing the event at all in the pre period and doing the post metric in the post period
Using this, we can define and interpret the following metrics:
Accuracy:
This tells you, for how many of the users, it means that doing an event in the pre period is converting in the post period and not doing the event in the pre period refers to not converting in the post period.
Precision:
Precision tells us how many of the users who did the event in the pre period actually turned out to be converted.
Precision is a useful metric in cases where False Positive is a higher concern than False Negatives.
Recall Or Sensitivity:
Recall tells us how many of the actual converted users in the post period have done the event in the pre period
Recall is a useful metric in cases where False Negative trumps False Positive.
F1-Score:
F1-score is a harmonic mean of Precision and Recall, and so it gives a combined idea about these two metrics. It is maximum when Precision is equal to Recall.
F1-score is useful when there is no clear distinction between whether Precision is more important or Recall
True Negative Rate Or Specificity:
Specificity tells us how many of the actual non converted users in the post period have not done the event in the pre period. It is counter part of the Sensitivity.
Specificity is a useful metric in cases where False Negative trumps False Positive.
False Positive Rate or Type-I Error:
False Positive Rate is the percentage of users who have done the event in the pre period but not converted in the post period out of all those non converted users.
This is also called Type-1 Error. This gives us how many of the users we wrongly predicted that they will convert. This should be as minimal as it can be.
False Negative Rate or Type-II Error:
False Negative Rate is the percentage of users who have not done the event in the pre period but converted in the post period out of all those converted users.
This is also called Type-II Error. This gives us how many of the users we wrongly predicted that they will not convert. This should be as minimal as it can be.
A trade off should be taken between type-I and type-II errors in view of understanding which can be a costly mistake to commit.
Note: Apxor by default sorts the event in the order of F1-score. But, customers should take the final decision based on the type of metric that they are considering as conversion.