Transfer Pricing – IOSR article
Research Article – IOSR Journal
As part of VSTN ‘s thoughtleadership, glad to share that our researchpaper has been published with @IOSR Journals demonstrating ‘core competence’ which is one of the Core pillars of VSTN
This paper examines the Indian arm’s length range in light of skewness in a data set and its vulnerability, followed by an analysis of “adjusting” the Indian arm’s length range in an intuitive manner, bearing in mind the intent of the Indian arm’s length range. Parallely, the paper also nudges for a transition from the existing Indian arm’s length range to interquartile range from a statistical perspective, beyond the usual rhetoric of alignment with the global standards.
An Intuitive Adjustment To Indian Arm’s Length Range
Date of Submission: 12-11-2023 Date of acceptance: 22-11-2023
I. Introduction
In the Indian transfer pricing landscape, the arm’s length range is 35th percentile to 65th percentile. Often representations have been made before the Ministry of Finance to update the arm’s length to interquartile range, which is generally prevalent across other tax jurisdictions. This article aims to understand if interquartile range does not hold water with the tax authorities due to the existing legislations, is there an adjustment required for the existing range in the first place. If yes, how can it be adjusted through an intuitive approach.
Let’s start with two observations we would have made while concluding our benchmarking exercises under Indian Transfer Pricing regulations:
- Material difference between mean and median of the dataset; or
- Where the median is closer to 35th percentile than 65th percentile or vice-versa.
Reminisce of statistics would answer this as skewness, which leads to the moot question under discussion does the Indian arm’s length range coincide with the ‘modal range’.
II. Analysis
Comparability analysis is at the heart of application of arm’s length principle. While undertaking a benchmarking study under Transactional Net Margin Method (TNMM), we know greater emphasis is placed on functional rather than product comparability1. Further, not all the companies engaged in similar business activities are part of databases used for benchmarking searches and even if available they might get rejected due to various standard quantitative and qualitative filters. Hence, identification of companies engaged in very similar business activities (functions) in the same segment of the industry is more often hard to obtain. The companies selected as comparable would be broadly similar to that of tested party. This limitation is accepted while adopting TNMM and is also the reason it is widely adopted – its flexibility.
To this point, the OECD Guidelines2 states that though effort would have been made to exclude data points which are less comparable, there can exist certain defects which cannot be identified / quantified and hence provides for use of statistical measures to narrow the range in order to enhance the results of this benchmarking analysis viz, measures of central tendency (interquartile range or other percentiles).
Ex-Post – Journey of Mean to Median
Where there are multiple prices for similar commodities, price agreed would be that which is “mostly prevalent”. settle within a range – that can be named as equilibrium price range. From a transfer pricing perspective, the arm’s length price, the value at which two unrelated parties would transact, would be that price range.
In a large discrete data set, having limited number of unique values, more often there would be certain value(s) which recur more frequently than others. Further these certain values would be positioned near each other, and the group or set of these values would have majority of data points in the data set – representative of the data set. However, in case of data set with continuous data points (such as profitability of companies), it would not be possible to identify these values, but a range of values would represent the data set.
From a statistical perspective, data sets can be distributed in different ways, but data sets are said to follow normal distribution on a rebuttable presumption basis. In finance literature, profitability of companies is said to follow normal distribution (this is often debated even from a transfer pricing perspective3). Where data sets follow normal distribution, the mean is more appropriate measure of central tendency4.
Let’s take an example of a discrete data set (having 100 data points) following normal distribution. In this example, the mean is 10.5 and data points center around the mean. The interquartile range is 8 to 13 and the 35th to 65th percentile being 9 to 12.
The quartiles are 8, 10.5 and 13 and each quarter i.e., 1-8 (Q1), 8-10.5 (Q2), 10.5-13 (Q3) and 13-20 (Q4) have 25% of the data points each. In the entire data set (range of 19 viz, 20 minus 1), 50% of the data is within the small range of 81.3. Other characteristics include the median being the same as mean and symmetry in data.
Key aspect that follows in the normal distribution is that data points are closest to each other near the mean / median, and moving farther from the mean / median data points appear sparsely distributed. This is similar to 80/20 rule used in management and business, where a small range of values (in our example 8-13) has a significant number of data points (in our example 50%). A simple statistical tool – standard deviation can measure how the data points are spread across. Q1 and Q4 have a standard deviation of 1.79 each, and Q2 and Q3 have standard deviation of 0.78 each.
As long as the data set follow normal distribution mean would be used as de facto measure to represent data set. However, where there are extreme values (outliers which are not representative of data), mean would be significantly influenced by such outliers. To ensure that the measure is not impacted by such outliers, median is generally considered. Median is based on position of data points rather than the value of data points. Hence median is insulated from these extreme values in the data set, following which interquartile ranges are used to represent the data set.
Let’s ponder over the reason for moving from mean to median. Where there are outliers in the data set and have extreme values, the measures of central tendencies of datasets wobble or do not equilibrate. The mean materially changed due to these extreme values, while median was not impacted since it considers position of the data point rather than its value. This means that such data sets are not normally distributed, which is corroborated by difference in mean and median of such data sets.
There are instances where material number of outliers are present in the data set, which then skews the data set. 5Let’s use an academic pictorial representation of skewed data sets to understand movement of mean and median when a data set is skewed. Mean is significantly impacted due to outliers and even median is impacted. Median is a center from a positional perspective. And if there are material number of outliers on one side of the data, median too would succumb, though not as much as mean. Mode, by definition, is the point which has the highest frequency and is not impacted. This is from a theoretical view. However, if we see arithmetically for a continuous dataset, mode would not be representative. Hence, median is practically the next best measure.
Though the median also moves due to skewness in the data set, interquartile range would be representative since it is a resistant measure. This is because interquartile range covers 50% of the data set and hence would cover the ‘peak of the curve’ which has the highest frequency. As mentioned earlier, nearing the ‘peak of the curve’ the data points are closer to each other, and inclusion of such area in the range is key to state that range (interquartile) is representative of the data set. The OECD guidelines too refers to interquartile range while discussing on measures of central tendency.
Ex-ante –Median to modal range
Continuing the foregoing discussion on skewed data sets, let’s dwell on skewness using a box-plot6. This analysis would throw light on skewness of a dataset – why median would not be equidistant to 35th percentile and 65th percentile.More likely than not, we would be aware of the rationale behind the two observations (in yellow and green), academically or intuitively.
Data is divided into four quarters to accommodate equal number of data points in each quarter, sliced by the three quartiles. In figure 3′ data in blue plot is symmetric and though it qualifies as a uniform distribution than a normal distribution, it has been used to provide a contrast while analyzing skewed distribution. While in yellow and green plots, data is concentrated in left and right sides, and data points in the other sides appear disconnected / non-representative – outliers. Depending on the number of outliers and their absolute value, they stretch interquartile range differently. In the yellow plot, data points are concentrated in the left and spread-out after the half (median). The green plot is inverse of the yellow plot i.e., data points are predominantly in the right half.
The bottom line is that where material number of outliers included in one direction, median does succumb marginally, though it depends on the frequency of these data points. And therefore the 25th (or35th) percentile is closer to median than 75th(or 65th) percentile.
Interposing the left most graph in Figure 2 with middle boxplot in green color of Figure 3, we will be able to guess that 25th/35th percentile would be farther away from the median as compared 65th/75th percentile in the left most graph in Figure 2. Similarly, in the right most graph of Figure 2, the results would be similar to yellow boxplot in Figure 3.
Even if the dataset is skewed, the interquartile range or 35th to 65th percentile capture 50%/30% central to the dataset. Yes, but the question is the range(s) representative? The answer is perhaps a yes when interquartile range is concerned. But for 35th to 65th percentile it requires examination.
The Indian Transfer Pricing rules (Rule 10CA of the Income Tax Rules) introduced the range concept of 35th to 65th percentile, which is slightly different from the interquartile range widely accepted by other tax jurisdictions. While interquartile range covers 50% of the data, Indian range covers 30% of the data. Hence the Indian Range (35th to 65th percentile) it sensitive to material skewness, as compared to interquartile range. While undertaking benchmarking, comparable companies are selected based on the functional analysis of the tested party. As indicated earlier some of the companies performing very similar functions might not appear in the database or might get rejected due to various filters even if they appear. To compound this, three-year weighted average of the comparable companies is considered. Therefore, the likelihood of data points following normal distribution is limited. And what is about normal distribution? Symmetry in data. This means data set mirrors around the mean/median.
The question of any adjustment to the range would not arise even in case of a skewed data set provided that interquartile range was adopted. Reason being interquartile range is a ‘resistant measure’ in statistics8, and typically used when datasets are skewed.
Further, in order to ensure computational ease, the exercise of arriving at 35th and 65th percentile as prescribed by Rule 10CA of the Income Tax Rules, 1962 has made it more of an arithmetic exercise rather than being a statistical one. As per the rules (including illustrations) if the 35th/65th percent of the total no data points is not a whole number, the next higher data place would be 35th/65th percentile. While computation of percentiles in a statistical way is through interpolation. Therefore, there is a possibility of the 35th to 65th percentile not capturing 30% of the data. Perhaps considering the benefits from an administrative perspective, weighing more than costs of accuracy (interpretation and litigation), it has been left undeliberated by academia at large.
To understand the requirement for adjustment, let’s revisit the right-most graph in Figure 2 while interposing analysis of Figure 3 yellow boxplot. As the dataset gets materially skewed in the positive direction, while the median moves towards the right, 35th percentile and 65th percentile no longer remain equidistant from the median. In an intuitive way (or a ‘thought experiment’ if it may be called) this non-equidistance can be attributed to the difference in the ‘density’ or the change in closeness of data points within the dataset. In the middle graph of Figure 2 as well, which is a normal distribution, there is a difference in density i.e., density of data points is highest near the median and reduces as we move away from the median in a symmetric way in both the directions.
A key point to note is that in a normal distribution the ‘densest’ area has the median as epicenter. Therefore, both interquartile range and the Indian range capture the densest area in dataset. The reason why the densest area is given importance is that it has the most number of data points within the least number of values, and capturing this area is key to ensure that the range (either interquartile or Indian range) is representative of the dataset. Back to the discussion, in the right graph in Figure 2, the median moves always from the densest area and this results in movement in 35th and 65th percentile. However, the 35th and 65th percentile do not move equidistantly from median due to differences in density. The Indian range aims to capture 30% of data central to dataset viz., 15% each on either side from the median. In the right graph in Figure 2 where the median has moved away from the densest area, the 75th percentile would move farther away from the median to accommodate 15% of the data, as the data points are sparse as we move away from densest area. 35th percentile would be much closer to the median since 35th percentile will have to capture 15% of the data set towards the left of the median – which is denser as compared to region right to the median.
We now need to compare two visualizations in the right chart in Figure 2,which are a) 30% area covered by 35th to 65th around the median and b) 30% area covered with the center as the densest line denoted by the dotted line as ‘mode’ i.e., 15% data right and left to dotted line stated as mode. Even in b), values capturing 15% of data on the left and right would not be equidistance due to the skewness – the 15% value on the right would be farther away from median. Though there would be a common area between a) and b) mentioned above, the probability of b) capturing the densest area is significantly higher than of a). Or rather, there is significant probability of a) not capturing the densest area of the data set. The densest region captured by b) can be said as the ‘modal range’. The caveat is that modal range is difficult to be defined mathematically through a simple formula / equation since data sets adopt different distributions and have varying intensity of skewness. The modal range for the purpose of this article / analysis is more relative and is one that captures a set of data points for a particular range (in terms of % of data) that are more dense as compared to other set of data points. If Figure 2 could be translated into graphs (or histograms), modal range would be those intervals that have highest frequencies.
Therefore, the 35th to 65th percentile range does not capture the modal range when the data is materially (not necessarily significantly) skewed.
This can be illustrated through a live example. A search was undertaken for arriving at the mark-up for manufacturing activity in the Auto ancillary sector – FY 2019-20. The following search process was adopted:
- Database: AceTP was used to conduct benchmarking study to arrive at companies undertaking manufacturing of auto ancillary;
- Period: The search was undertaken for FY 2019-20 (since FY 2020-21 was impacted by COVID pandemic, search was conducted for FY 2019-20). As per Indian Transfer Pricing regulations three-year data is considered i.e., FY 2017-18, 2018-19 and FY 2019-20.
Quantitative Filters
- Data Availability: Companies were selected if they had financial information for all the three years or financial information was available for FY 2019-20 and FY 2018-19.
- Sales Filter: Companies were selected if the sales were at least INR 1 crore (10 million) for the three years.
- Manufacturing Filter: Companies were selected if income from manufacturing sales was greater than 50% of the total sales for the three years.
- RPT Filter: Companies were selected if transactions with related parties were lesser than 25% of total revenue for the three years.
- Loss Filter: Companies were selected if they did not incur losses for two consecutive years.
- Industry Filter: Companies were selected if they were classified under auto-ancillary industry as per the database
Margin Computation: For the companies satisfying all the above filters, ratio of operating profit over operating costs was computed. The weighted average for 3 years was computed i.e., summation of the operating profits for three years over summation of operating costs. Operating revenue and cost was computed based on generally accepted heads and operating profit was computed by operating revenue minus operating costs.
The search yielded 342 data points – weighted average operating profit over operating costs for 342 companies. The data points have been plotted as histogram as per Figure 4 below.
Particulars | Value | Particulars | Value |
---|---|---|---|
Mean | 10.24% | Interquartile range | 5.13% to 14.38% |
Median | 8.88% | 35th to 65th percentile | 6.78% to 11.66% |
Nearly 50% of data points is within the range of 4.6% to 13.0% (frequency of 167). The graph appears to be normal distributed data set but is skewed positively (clue can also be taken from mean (10.24%) being farther away from median (8.88%)).
To analyze if 35th to 65th percentile covers the modal range, the histogram has been plotted at small intervals.
The median (8.88%) falls in the interval of 8.2% to 10.1% (frequency of 37). The mean (10.24%) falls in the next interval 10.1% to 12.0% (frequency of 35). The modal interval is 6.3% to 8.2% (frequency of 43). For pictorial ease of reference, bars highlighted in green represent 35th to 65th percentile and bars highlighted in green and dark blue represent interquartile range.
Pictorially the intervals with highest intervals are 4.4% to 6.3% (dark blue), 6.3% to 8.2% (green) and 8.2% to 10.1% (green). As discussed in the forgoing para, the median falls in the interval next to modal interval and 35th to 65th percentile does not cover second highest interval of 4.4% to 6.3% due to the skewness in the data set. However, the interquartile covers the second highest frequency interval. The 35th percentile is closer to the median as compared to 65th percentile as captured in the below table:
35th to 65th percentile | Median (central point of the range) | Difference between Median and 35th percentile | Difference between 65th percentile and median |
---|---|---|---|
6.78% to 11.66% | 8.88% | 2.10% | 2.78% |
Adjustment to arm’s length range
Indian transfer pricing regulations specify use of 30% of the central data for computation of the arm’s length range. An easier approach to overcome the above issue would be to extend this range to an interquartile range to ensure that even where the data set is skewed the arm’s length range would include the modal range. In such a case the necessity for an adjustment would not be required.
The 30% range might be considered as appropriate or non-negotiable to ensure stricter compliance to arm’s length principle. In May 2015 the draft scheme of the proposed rules for arm’s length range was issued by the CBDT which considered 40th percentile to 60th percentile – a 20% range. Hence the probability of extending the arm’s length range to interquartile range is not high in the coming years as well.
As discussed above, 30% would be an apt arm’s length range provided the dataset is symmetrical at its center, which would then capture the modal range. But due to skewness the median moves away from the modal range, due to which 30% data around the median would not capture the modal range, which is brought out in the above example. As per Figure 3 the modal range can said to be 4.6% to 13% as it cover about 50%, but as per Figure 4 the modal range can said to be the three intervals viz., 4.4% to 6.3%, 6.3% to 8.2% and 8.2% to 10.1% with frequency of 39, 43 and 37 respectively. Cumulatively they cover 34% of data. These three intervals cannot per se be considered since data covered is slightly more than 30% and we should be cognizant of the median (4.4% is approximately the 21 percentile).
Considering that interquartile range has been accepted both academically as well by international Tax bodies like the OECD, a 30% range within interquartile as a cap-and-collar approach can be adopted. That is, depending on how data sets are skewed (positively or negatively) and their extent (moderately or highly skewed), 30% of data that is can be considered to be modal range within the 50% of the data can be considered as the arm’s length range. To ensure administrative ease, threshold limits can be set to trigger exercise of this adjustment such as a) material variance between mean and median or b) the difference between median and 35th is materially higher or lower as compared to the difference between 65th percentile and median. Also, effecting the adjustment can be done by increasing or decreasing by 5 percentiles. Hence there would be a limited number of iterations (4 viz., 25th to 55th percentile, 30th to 60th percentile, 40th to 70th percentile and 45th to 75th percentile). The iteration range that is closest to being equidistant from their respective mid-point can be selected as the arm’s length range. As per Indian transfer pricing regulations if the transaction price is outside the arm’s length range comparison will have to be made with the median. For the 4 iterations mentioned above, adjustment will have to be computed from the mid-point of the respective range.
Implementation of the adjustment can be illustrated by continuing from the earlier example – arriving at the arm’s length mark-up range for manufacturing of auto ancillary.
30% range | Results | Mid-point | Difference of mid-point and lower-end (A) | Difference of higher-end and mid-point (B) | Net Difference (B-A) |
---|---|---|---|---|---|
35th to 65th percentile | 6.78% to 11.66% | 8.88% | 2.10% | 2.78% | 0.69% |
30th to 60th percentile | 5.92% to 10.93% | 8.20% (45th percentile) | 2.29% | 2.73% | 0.44% |
25th to 55th percentile | 5.13% to 9.90% | 7.46% (40th percentile) | 2.32% | 2.44% | 0.12% |
As per the above table, the 30% range of 25th to 55th percentile (mid-point as 40th percentile) might be considered as the arm’s length price, as it is almost equidistant as compared to other two ranges (35th to 65th percentile and 30th to 60th percentile).
Benchmarking studies undertaken usually have limited number of comparable companies as against 342 as per the above first example. The intention of taking a huge dataset was to bring out pictorially case where the data set has been skewed and the necessity for adjustment i.e., where the 35th to 65th percentile does not capture the modal range.
Following is another example of a specific benchmarking search undertaken for auto ancillary (particular activities / products). Similar search process was adopted and based on analysis of similar functions and products. The results of the search is presented as a histogram.
Since there are limited number of companies it might be difficult to visualize a normal distribution which is being positively skewed, and is the reason for illustrating through a broad auto ancillary search. The summary of the results is as per the below table.
Particulars | Value | Particulars | Value |
---|---|---|---|
Mean | 5.26% | Interquartile range | 2.36% to 8.85% |
Median | 4.23% | 35th to 65th percentile | 3.32% to 5.83% |
In the second example, the mean is greater than the median, and the graph peaks in the left side and slowly tapers in the right side – which can be classified as positively skewed dataset. The below table details the workings for the iterations.
30% range | Results | Mid-point | Difference of mid-point and lower-end (A) | Difference of higher-end and mid-point (B) | Net Difference (B-A) or (B-A) |
---|---|---|---|---|---|
35th to 65th percentile | 3.32% to 5.83% | 4.23% (Median) | 0.9% | 1.6% | 0.7% |
30th to 60th percentile | 2.52% to 5.33% | 3.41% (45th percentile) | 0.9% | 1.9% | 1.0% |
25th to 55th percentile | 2.36% to 4.33% | 3.35% (40th percentile) | 1.0% | 1.0% | 0.0% |
As per the above table, 25th to 55th percentile (40th percentile as mid-point) can be considered as the arm’s length range since it is being equidistant among the other ranges.
To statistically prove that one of the 30% range is more appropriate against other iterations, perhaps standard deviation of the range can be computed to numerically measure the closeness of data points / density of the range.
The above discussion addresses whether there is a requirement for adjustment to an arm’s length range and a pragmatic approach for effecting an adjustment. There can be alternative solutions as well, which might be subjective, such as (i) ascertaining the nature of distribution of dataset and compute the modal range covering 30% of data, (ii) where the dataset does not follow any distribution, using calculus ascertain the area with the highest density to qualify as modal range. (This can be done by first arriving at the equation of function for the distribution and then using integration arrive at limits covering 30 percent of the dataset. The limits having the least difference can be said to have the highest area under the curve and these limits would be the arm’s length range covering 30 percent of the data being the modal range). Alternatives such as these will require subject-matter experts and at times taking the discussion to very academic level, the efforts of which would far exceed the benefits.
Key considerations
The above approach, though simple, have some issues to bear in mind. The issues and ways to address them are captured below:
(i) Benchmarking studies undertaken is based on public databases, which may not capture the entire universe of companies, and the comparable companies are arrived based on certain level of functional comparability. Arguments can be placed that the comparable companies are subset of larger number of companies within the segment of sub-industry and hence the companies selected would be sample of larger population. On the other hand, it can be argued that the comparable companies partake the nature of population, since all the closely comparable companies in the database has been selected. Due to the above, among other factors, certain benchmarking exercises might result in datasets having two modal areas within the interquartile range. To address this, similar to applicability thresholds, there can be measures in place for non-applicability of this adjustment.
(ii) Due to economic differences the taxpayers might claim economic adjustment such as working capital adjustment to level the differences between circumstances between the companies selected and the tested party. In such cases, post effecting these economic adjustments on the comparable companies the adjustment through various iterations can be undertaken.
III. Conclusion:
The concept of range of 35th to 65th percentile was aimed at providing elasticity to the concept arm’s length price when median is used as measure of central tendency. This range covers 30% of the data as compared to widely used interquartile range covering 50% of data. It has been considered that median would gravitate towards the mode, which is true in case of a normal distribution. This narrower range brings forth question whether 35th to 65th percentile range captures the modal range even at times when dataset is skewed. An intuitive way to align the 35th to 65th percentile range with modal range is if a cap-and-collar approach within the interquartile range can be considered.
As businesses expand across borders, navigating complex transfer pricing regulations becomes critical. At VSTN Consultancy, a global transfer pricing firm, we specialize in helping companies stay compliant and competitive across key markets including:
India | UAE | USA | KSA | Dubai | Asia Pacific | Europe | Africa | North America
Whether you’re preparing for benchmarking intercompany transactions, or developing robust TP documentation, our team is here to support your international strategy and Compliance.
Contact us today to explore how we can partner with you to optimize your global transfer pricing approach.
#TransferPricing #TransferPricingFirm#VSTNConsultancy #TaxCompliance #IndiaUAEUSA
#TPExperts#TransferPricingExperts#GlobalTransferPricingFirm