Australia’s Inflation Rate Isn’t Lying, But It’s Not Telling the Truth Either

As someone who’s spent years working with data, I’m wary of any statistic that claims to tell a complete story. Inflation is one of them.

Australia’s headline inflation rate suggests a stubbornly overheated economy, but that single number hides more than it reveals. Despite high interest rates, inflation refuses to fall meaningfully. The easy narrative is “consumers are still spending.” The reality is much more complex.

Averages Are Blurring the Real Story

The Consumer Price Index (CPI) is an average of thousands of price movements. It’s designed to give us a consistent view of the economy, but it smooths out the very inequalities that now define it.

Think about it:

  • A homeowner with two investment properties experiences inflation very differently from a renter in Sydney or a first-home buyer paying a mortgage that’s jumped 40%.
  • Wealthier households, who benefit from higher interest income or rising rents, continue spending, thus keeping “aggregate demand” high.
  • Meanwhile, lower- and middle-income Australians are cutting back harder than ever, just to stay afloat.

On paper, inflation looks stable. On the ground, it feels anything but.

When a Single Metric Misleads Policy

The RBA’s rate hikes are a blunt tool: they hit mortgage holders directly but barely touch those who are debt-free or asset-rich. The CPI can’t capture that asymmetry.

In statistical terms, we’re using a mean to describe a skewed distribution. That’s like saying the “average income” in a room with a billionaire tells you something meaningful about everyone else’s paycheck.

Inflation, as currently measured, is becoming a misleading signal for policy, one that risks deepening inequality while chasing an “average” that no longer represents the lived reality of most Australians.

We Need to Measure What Matters

As a data person, I’m not suggesting we abandon aggregate measures. We need benchmarks like CPI to maintain comparability and credibility. But we also need distributional indicators: measures that reveal who is experiencing inflation and how intensely.

Because inflation is not just an economic number. It’s a reflection of how opportunity, security, and hardship are distributed across society.

Until we start measuring those differences, we’ll keep mistaking averages for insights and policy will keep missing the mark.

The inflation rate isn’t lying, but it’s not telling the truth either. And as statisticians, economists, and policymakers, it’s our job to bridge that gap.

Net Promoter Score (NPS) and the Pursuit of the Ultimate Question

What’s your take on the great NPS debate? are we using it correctly? How robust is NPS to be used as measure of success and long term growth? Are there guidelines to when it should be used?

Too many companies these days are hooked on measures of customer trust ratings. Studies upon studies of the relationship between customer trust and business revenues tells us that it’s right to do so. The impact of bad customer trust ratings go beyond a company’s boundaries, easily impacting market share, profits and opportunities for true growth. But is it the end all and be all, the ultimate measure of business success?

Whenever a customer feels misled, mistreated, ignored, or coerced, chances are good that you will have created a disgruntled customer base. You would be lucky enough if the worst that happen is to churn the most loyal of your customers to your competitors. But as customer experience studies have shown, disgruntled customers find ways to get even – they drive up service cost by frequently reporting problems, they gripe to anyone who would care to listen, and even affect your frontline service team with their complaints and demands. Disgruntled customers ‘detracts’ from business growth, and the more of these customers you have, the more they inhibit and strangle your company’s opportunities. On the other end of the spectrum, when customers are delighted, they willingly come back for more. Not only that, they become advocates for what you are trying to sell. Advocacy ensures customers’ enthusiastic cooperation and your business gets free marketing from customer promotion, which in turn fuels growth.

The idea that customer trust and loyalty are the key to profitability and sustained growth makes perfect sense. Hence, companies feel it is imperative to be able to track how many of your customers fall into these two critical groups, and NPS seems like the Ultimate Question. But are we using it correctly? How robust is NPS to be used as measure of success and long term growth? Are there guidelines to when it should be used?

Net Promoter Score is based on the aggregation of data from a ten-point rating-scale question : How likely are you to recommend your provider to others? Customers who rate 9 or 10 are aggregated to form the ‘Promoters’ segment. Those that rate a 1,2,3,4,5 or 6 are categorized as ‘Detractors’. NPS is the difference between the percentage of Promoters and Detractors. It’s simplicity is its beauty. However, it is important to know the pitfalls of NPS before deciding to use it.

Firstly, NPS is a business-level metric, not a customer-level metric. Relative NPS should mean that the principal business is comparatively better or worse than other businesses in the category based on the differences in NPS levels. This is by virtue of NPS being an aggregate-level metric. Unfortunately, aggregate-level metrics don’t work if the intention is to understand share of wallet. This analysis must be done on customer-level metric. As a cardinal rule in statistics, it must be first proven that there is a strong relationship between the variable you are tracking and the outcome variable before you can aggregate the data.  Without going into too much detail as to why, the “average” you come up with by aggregating the data cancel out the extremes. Hence, using an aggregated metric to represent the individuals within the group will be an ecological fallacy – you mistakenly think you understand individuals within the group simply with just one number that represents them.

The most glaring concern with using NPS is the compounded error. From a marketing science and statistics point of view, it is enough to discourage decisions made only on NPS. Professor Claes Fornell – the world’s leading authority on customer satisfaction measurement and customer asset management has stated it perfectly:

The problem has to do with how the numbers are assigned: A perfectly good scale is ruined to the point that it generates very little useful information. A competent measurement methodology looks to minimize error. But here, the opposite is done. Instead of getting precision, random noise is produced. From a single scale, we have not only converted something continuous to something binary, but we have done it three times (percent of customers likely to recommend, percent of customers not likely to do so and the difference between them). Each time, we have created a new estimate. All estimates contain error. Going from a continuous scale to a binary one introduces even more error.

If that’s not enough, taking the difference between the two estimates with error leads to exponentially greater error. In the end, we have produced a large amount of random noise, but very little information. When it comes to looking at changes over time, we further compound the problem. For each time period comparison, there are now six estimates and the final calculation is the percentage difference of customers that are likely to recommend. I have seen published reports sold for several thousands of dollars in which almost all the reported change is due to random noise. For managers, it’s bad enough to chase the numbers they can’t effect, but to chase randomly moving targets can do a great deal of harm to individual and company performance.

The bottom line : You can very well use NPS but first establish that there is a strong relationship between NPS and whatever variable you want to explain by variations in NPS, such as share of wallet. 

As a cardinal rule : It must be first proven that there is a strong relationship between the variable you are tracking, whether it be satisfaction, purchase intention or purchase value and the individual’s NPS scale score before you can aggregate the data to report causality 

Question Scale Format : Full vs End Labelling

When I take over a project or a role from another researcher which involves questionnaire design, I notice that some researchers tend to present Likert scales with labelling for each response category, while some tend to just label the ends of the Likert scale. Is there a risk of measurement error in presenting it in different ways? IS presenting it in one way better thaN the other?

 

Full versus End Labelling

Measurement error is difficult to avoid, and when it is not random, it is a greatest cause of concern for any measurement science expert, whose ultimate dream is to develop an unbiased measurement of attitudes and perceptions.

Response bias is the most common source of non-random error, especially with the rampant use of Likert scales, which are prone to all kinds of biases. Your question is interesting as it is uncommon. Researchers are less likely to question the measurement tool than the data collected by that tool, which I think is the most overt problem that the measurement science community should address.

During the course of my career, I’ve collected enough evidence pointing towards the fact that the verbal and numerical labeling of the answering categories does affect a respondent’s likelihood of providing biased responses. If we look at the variations on response style behaviours when presented with differing Likert labeling formats, we are able to understand the implication of each.

The two most common response style bias are called Extreme Response Style (ERS) and Acquiescence Response Style (ARS), and the incidence of these tends to vary relative to the three aspects of question format, which are (1) full versus end labelling, (2) numbering answering categories and, (3) bipolar versus agreement response scale. ERS is the tendency to choose only the extreme endpoints of the scale and ARS is the tendency to agree rather than disagree with items regardless of item content.

With regards to format effects, experiments tells us that end labeling evokes more ERS than full labeling, and that bipolar scales evoke more ERS than agreement style scales. I will not touch on the effects of (2) and (3) on the biases as that is not your question. I will find time to address it in another post.

I am more in favour of end labeling. As a measurement scientist tasked with a market segmentation initiative, my job is made easier when I get data that contrasts respondents as much as possible, and in this case ERS helps. From a respondent viewpoint as well, end labeling can be argued as less cognitively demanding than fully labelled scales as it is more precise and easier to hold in memory.

The bottom line. While in the real world finding ways of reducing response bias is the ideal, in today’s research practice it is inevitable. Trying to curtail the vulnerability of response style behaviours and leverage on its benefits is the best one can do.

A good rule of thumb that I use when faced with this question; when issues such as social desirability might influence the quality of the measurement, I use full labeling. I at least get some sort of contrast between lower agreement vs higher agreement responses. For everything else, use end labeling.

Formative versus Reflective Measurement Models

What’s a good guideline to help decide on a measurement model to use when setting up Structural Equation Modelling? When do I use a formative model and when do i use a reflective model?

 

I am glad you asked this question. Some modellers haphazardly decide on a measurement model to use with SEM without a sense of understanding of the assumptions and strength of each, and more so, the implications it will have on the results.

I cannot stress enough its importance. A model is only as good as what you feed it. GIGO. You cannot expect to have a good model if you didn’t put in due diligence at the onset. The use of incorrect measurement model undermines the content validity of constructs, misrepresents the structural relationships between them, and essentially drops the usefulness of theories.

A short intro on measurement models: Before we start on any data modelling tasks, we establish a hypothesis on the relationships of the variables under investigation. Measurement models are a way of theorizing the relationships among variables and understanding the market and its orientation

A real-world example: If we are trying to understand perceptions of being a good basketball player (a construct), we can organize the variables that relate to being a basketball player, say salary, endorsements, media exposure, successful goals, assists, jump height, quickness (indicators), in two ways:

Perceptions of being a good basketball player is reflected on how high a player’s salary is, how many products he promotes, how much media exposures he gets, etc.

Perceptions of being a good basketball player is formed by how often he blocks a pass, how many successful goals out of attempted goal shoots, how quick he is on the court, how high he jumps, how many assists were converted to goals.

The first statement is an example of a reflective relationship between perception of being a good basketball player and the variables. With reflective measurement models, causality flows from the latent construct to the indicators. In other words, a change in the perception of being a good basketball leads to a change in salary, endorsements, etc.

We understand the framework behind a formative construct, which is the second statement, the other way around. Causality flows from the indicators to the construct. You are only as good a basketball player as the number of successful goals you have made, blocks, assists, quickness in the courts, etc.

Although the reflective view dominates the psychological and management sciences, the formative view is more common in economics and marketing. In marketing specially, volume sales and market share is directly impacted by the distribution, awareness and campaign resources behind a product, to mention a few. In formative constructs, relationships among variables are meant to be strong to predict the amount of increase in latent construct for every degree increase in the indicator variables.

Why I lean towards formative constructs in my models :

  1. In the reflective model, latent construct exists independent of the indicators. In market research, we know that a metric behaves a certain way because it is reacting to something. It is my job to identify what that “something” is that drives the metric’s reaction. By that mere assumption alone, I cannot have a latent construct that exist independently.
  2. In the reflective model, causality flows from the construct to the indicator. In market research a lot of our metrics such as market share are viewed as a composite comprising the adaptation of the various elements of the marketing mix. Conceptual framework alone imply that the direction of causality should be from the indicator to the construct and not the other way around.

The bottom line : once the data is collected, it is often useful to know if the assumptions underlying the measurement model hold empirically or not. But uncritical and universal application of a reflective structure to oversimplify the measurement of broad, diverse and complex real-world constructs exposes one to the risk of reducing rigor of business theory and research and its relevance for decision making.

A good rule of thumb that I use: in some cases, like personality and attitude measurement for example, a reflective model is the obvious choice. For everything else, a formative model is the sensible choice. In most of my models, I tend to use the formative (causal) model.

I think of it this way: You are not born already associated with the perceptions of being a good basketball player and instantly get endorsements and high salary. You have to earn it.

 

What is the minimum sample size for the internal validation of scales?

Whenever I design a statistical analysis, I always struggle with determining the number of subjects to include in validation studies of measurement scale. I know that too small a sample can lead to erroneous conclusions drawn but including too many subjects is a waste of time and resources.  Hence it would be good to understand if there is a minimum to guarantee an acceptable level of precision and stability of results.

 

You are specifically asking about internal validation of scales and hence this response is for that specific approach alone. It would be safe to assume that you already know about the assumptions that has to be fulfilled for any statistical approach you wish to use, as it appears you frequently design statistical analyses.

One of the most important approach decisions when designing a study is the number of subjects to include. In inferential statistics, the sample size is based on the power of a statistical hypothesis testing. In descriptive studies, however, sample size is usually determined by the range of the confidence interval of a given parameter. This is the case in internal validation of measurement scales where two types of parameters are of interest: Cronbach’s alpha coefficient which is a measure of reliability, and factor analysis loadings which gives an indication of the dimensional structure of the scale. I won’t go into further details on this as you can easily read up if you want to know more.

The bottom line: the general rule that the ratio of subjects to variables (N/p) to calculate the sample size required in internal validation of measurement scales has been recommended without strict theoretical or empirical basis provided. The most widely used rule uses the ratio of the number of subjects (N) to the number of items (p), and this varies from three to 10 depending on authors (Cattell, 1978; Everitt, 1975; Gorsuch, 1983; Nunnaly, 1978).

What we do have is years of researches on the consequences of using factor analysis on insufficient sample sizes. The findings explicitly state that the validation of short scales does not warrant a smaller sample size.

If one’s aim is to reveal the factor structure, under the hypothesis that the underlying common factor model is true, a minimum of 300 subjects is generally acceptable in the conditions encountered in the field of psychiatry. This sample size needs, however, to be larger when the expected number of factors within the scale is large (>10). Furthermore, to obtain more accurate solutions, researchers should choose Exploratory Factor Analysis as the method for factor extraction. (Rouquette, Falissard, 2010)

 

What’s an acceptable value for R-squared?

How do I tell if the model is good enough if I only have R-squared to base it on? What’s a good value for  R-squared? How large does an R-squared need to be for the model to be valid? I am aware that there is the understanding among analysts that a model is not useful unless R-squared is at least some fraction greater than 50%. Is this the standard to be followed?

Generally, it is better to look at adjusted R-squared as these are unbiased estimators that correct for sample size and number of parameters estimated.

In marketing science, particularly in the study of customer behavior,  no hard cut-off has ever become the accepted norm for the value of R-squared (or adjusted R-squared for that matter). In marketing we are making predictions about human behavior and not the workings for a physical system, so to get to any level of insight from nothing is a big win.

Thus I would advise that in this case R2 should be “large’, meaning, large enough in the eye of the experimenter, and what is large varies a great deal according to the type of experiment being conducted. For example in social science experiments I would be delighted to get a statistically significant R2 value as low as 0.20. But in natural sciences and engineering experiments, I wouldn’t be happy with R2 values that are lower than 0.90.

As a general rule in experimental designs, the more the researcher knows about the science, the better controlled the experiment can be. Hence, the expectation for R2 increases. Obviously, human behavior is poorly understood so R2 values that physicists and engineers would outright dismiss are acceptable. I wouldn’t even go to the topic of over-fitting.

A good rule of thumb is to consider the context of the experiment. If our projects involve customer response, then statistically significant R2 values around 0.50 can be good enough to give us a basis for improvement. But if we are looking at cycle time through process, we would want a higher threshold, say 0.70. Still, you can tell these are arbitrary. As with every statistical approach, you need to have a fairly good idea of the underlying phenomena that you are trying to model.

The most important thing to remember is that we want our data to point us in the right direction for making improvements. What R2 will do for us varies depending on the context of the experiment. I wouldn’t put too much stock on a single metric like R2 if there are other available metric we can use to validate the model.

Hello world!

Why come up with this website? What is the purpose of this website?

During the course of my marketing science career, too many times what I do has been called magic – MackyMagic – as its own brand of effectiveness in producing results. However, as what Arthur C. Clarke has said, “Magic is just science we don’t understand yet”

With this page I want contribute to a better understanding of marketing science by de-mystifying it for audiences. How? Letting the audience ask the big questions.

Marketing Science tends to be off-putting with all the snobbish buzzwords. Not only that, marketing scientists tends to be preachy when explaining a concept, usually with an air of arrogance and exclusivity, which further alienates even the most experienced marketers. I find that an effective way to make marketing science relatable is to let the audience ask the questions. This not only sets the context as to how advanced the language of the reply should be, it also mitigates being preachy, and most importantly, it enforces me to put myself in the shoes of the marketer that leads to a better understanding of the issue at hand. Along these lines, if I get I question asking about R-squared (i.e. what’s an acceptable value for r-squared), unless it is actually a question about what it is, I would assume that the asker already knows what R-squared is.

In recognizing that the skills needed to be an effective marketing scientist is not monopolized by only those with advance analytics skills; a marketer is best placed to be a marketing scientist by having the right tools, support, and the right information.

I want to be able to help propagate a culture of evidence-based decision-making, assuring us that product and marketing decisions will be more effective in achieving desired outcomes when based on accurate and meaningful information.