Skip to content
BY 4.0 license Open Access Published by De Gruyter December 23, 2020

Beyond Manipulation: Administrative Sorting in Regression Discontinuity Designs

  • Cristian Crespo EMAIL logo

Abstract

This paper elaborates on administrative sorting, a threat to internal validity that has been overlooked in the regression discontinuity (RD) literature. Variation in treatment assignment near the threshold may still not be as good as random even when individuals are unable to precisely manipulate the running variable. This can be the case when administrative procedures, beyond individuals’ control and knowledge, affect their position near the threshold non-randomly. If administrative sorting is not recognized it can be mistaken as manipulation, preventing fixing the running variable and leading to discarding viable RD research designs.

MSC 2010: 62D20; 62P20

1 Introduction

In RD designs some exogenous variation in treatment assignment occurs near a threshold of a running variable. Causal inference relies on the assumption that the average potential outcome for units marginally at one side of the threshold represents a valid counterfactual for the group just at the other side of the threshold [24, 27].

RD designs are a popular approach for causal inference. Consequently, the methodological literature in RD designs has been developing quickly over the last decade. For a detailed summary, see [9]. Despite its extensive use, practical applications of RD designs are partly determined by researchers’ interpretation of the methodology [32]. As [11] explain, applied practitioners of RD designs choose between two frameworks. Researchers who adopt the local randomization framework use the logic of experimental designs to recover causal estimates. The methods required for estimation and inference in this framework are discussed by [15] and [10]. Conversely, researchers relying on the continuity-based framework use polynomial approximations at each side of the threshold to predict the limiting value of the outcome at the threshold. Multiple authors [5, 6, 7, 14, 24, 25] have contributed to advance knowledge within this RD framework.

Further elaboration about the relative merits, required assumptions and practical applications of these two frameworks is beyond the scope of this paper. Instead, I introduce these frameworks to support the idea that central aspects of RD designs are still subject to debate.

This paper highlights administrative sorting, a threat to internal validity that has been relatively overlooked in the RD literature. By doing so, the paper contributes to further conceptual clarification of RD designs. I introduce this threat empirically, using novel Chilean administrative datasets, to illustrate RD design challenges in real-world contexts. In this paper, these are two evaluations of conditional cash transfers on adolescents’ school enrollment.[1]

Manipulation of the running variable is one of the two main conceptual concerns in the application of RD designs [26]. The issue is that individuals with a stake might try to manipulate the running variable close to the threshold [26, 29] and sort themselves around it [28]. Manipulation threatens the plausibility of the continuity assumption on which RD designs rely.[2]Thus, the analysis of individuals’ ability to precisely manipulate the running variable becomes central. This rationale explains why running variable density tests [13, 22, 29] have been usually interpreted as tests of manipulation of this running variable.

One point highlighted in influential papers in RD designs is that if individuals are unable to precisely manipulate the running variable, then variation in treatment (assignment) near the threshold should be as good as random [11, 28]. Though this is frequently true, my paper shows that variation in treatment assignment is not always as good as random in the absence of manipulation and that density tests can fail for different reasons than individuals’ induced sorting. This can be the case when administrative procedures, beyond the control and knowledge of individuals, affect their position near the threshold non-randomly, and by doing so, these actions threaten the continuity assumption of RD designs.

Sorting is not a new concept in the RD methods literature. However, the concept has been intrinsically tied to individuals’ manipulation [29], optimizing behavior in response to rules [28], and self-selection. In other words, manipulation and sorting have been used as synonyms of deliberate human action to locate themselves across the threshold. In this sense, administrative sorting is different from the threat of manipulation.[3]

I define then administrative sorting by three features. The first is being the result of administrative procedures that affect the position of individuals or units near the threshold non-randomly. The second is that these actions are beyond the control and knowledge of individuals and do not deliberately intend to locate individuals across the threshold. The third is that these procedures threaten the continuity assumption on which RD designs rely. Therefore, the most relevant conceptual contributions of this paper are as follows: i) administrative sorting is an overlooked threat to internal validity in RD designs, ii) lack of individuals’ manipulation does not translate automatically into variation in treatment assignment being as good as random. In this sense, my paper specifically contributes to the RD literature explaining problems associated to the running variable’s distribution, such as heaping [2], that applied RD practitioners encounter and are not necessarily related to manipulation.

Administrative sorting may lead to regular data heaps or distributions with no evident patterns. How administrative sorting unfolds differs in each of my two examples. In the first case, one type of adolescents ends up at one side of the threshold while a different type of adolescents ends up at the other. A better design of the index would have caused both types of adolescents to have equal probabilities of being at one side or the other of the threshold. In the second case, adolescents at one side of the threshold have been artificially compressed inducing a discontinuity in the density of the running variable. In this case, what is affected is the distance of individuals from the threshold rather than the side where they finish nor their relative order. In this case a better design of the index would have translated in a smooth density near the threshold. The severity of the implications for RD designs differ by each case I present.

From an applied perspective, my paper contributes to bringing into attention a relevant, and a generally undiscussed, challenge that RD practitioners can encounter and potential solutions. My findings highlight the importance of fully understanding the data generation process of a running variable. If administrative rules, not individual manipulation, explain the shape of an index density near the threshold then useful solutions for identification may be available. If treatment probabilities have been affected, the causes behind administrative sorting could be fixed for future versions of the index and facilitate a later implementation of an RD design. When treatment probabilities have not been affected, administrative sorting may not invalidate a retrospective RD design and the running variable could be transformed promptly into another useful running variable. If administrative sorting is not recognized correctly, it can be mistaken as manipulation, which could potentially lead to viable research RD designs being discarded.

The structure of the paper is as follows. The second and third sections present the cases studied. Both sections have a similar aim, which is to introduce an example of administrative sorting when implementing an RD design. The second section focuses on a conditional cash transfer called Beca de Apoyo a la Retención Es-colar, which used the IVSE index for targeting. The third section builds on a conditional cash transfer named Subsidio Familiar, which used the SPF index to allocate the transfers. The last section of the paper integrates the discussion of the previous three sections, provides recommendations for RD applied practitioners and concludes.

2 Administrative Sorting Invalidates an RD Design

This section presents how administrative sorting invalidates an RD design.Features of the index used to target the conditional cash transfer diminish the likelihood of the continuity assumption to hold at the threshold. However, minor amendments in the index would have prevented administrative sorting, providing a path to implement RD assessments in the future.

In 2014, the Beca de Apoyo a la Retención Escolar (BARE) consisted of an annual total monetary contribution of $178,000 CLP (approximately $280 USD on June 30th 2015).[4] The goal of the conditional cash transfer is to encourage secondary school students to stay in school. Accordingly, the target population of BARE are students at risk of dropping out of school. This concept is operationalized through the IVSE index. Each year those enrolled secondary school students with the highest IVSE scores become eligible to be BARE new entrants. Soon after the academic year starts, the people in charge of the cash transfer in the field have to actively locate these students in their schools and encourage them to apply. There is no perfect match between eligible students and the conditional cash transfer recipients, as not all eligible students are encouraged to apply, and a small fraction of non-eligible students end up accessing BARE. In 2014 and 2015 BARE’s targeting index, the IVSE, was derived from six variables that are highly correlated with observing a future school dropout. I present these variables in Table 1. Each of the variables of the index was transformed into a score using a function fk(∙). After applying fk(∙), IVSE scores for each student i were calculated using the following formula:

Table 1

IVSE Variables

VariablesDescription
AttendanceAttendance rate (%) in the previous academic year
OverageDifference between age and expected age for the grade (months)
Welfare RecipientStudent’s household in Chile Solidario or Ingreso Ético Familiar
PaternityStudent is a father or a mother
PregnancyStudent is pregnant
Mother SchoolingMother’s years of schooling
(1)IVSEi=0.165f1Attendancei+0.135f2Overagei+0.21f3WelfareRecipienti+0.21f3Paternityi+0.21f3Pregnancyi+0.07f4MotherSchoolingi

Overage measures how older a student is relative to a student that has progressed through the Chilean educational system traditionally. An important feature of the Chilean primary and secondary systems is that students are not automatically promoted to the following grade. Therefore, the traditional pattern of progress through primary and secondary education may be affected in at least two ways. First, a student might not have progressed to the following grade due to failure to accomplish the attendance or academic requirements or because of withdrawal. Also, irrespective of whether a student progressed, failed or withdrew in a given academic year, he or she might not have enrolled in any educational institution in the next academic year.

Table 2 describes the functions f1(∙), f2(∙), f3(∙), and f4(∙) that transform the variables. As a result, the maximum possible IVSE score was 182.650 and the minimum score was 30.775. In 2014, those with a score equal to or above 84.625 were eligible to be a new entrant for BARE, while in 2015 only those students with a score no lower than 84.150 became eligible for BARE.

Table 2

Functions fk(∙) Transforming IVSE Variables into Scores

fk(∙)VariablesFormulaScore
f1(∙)Attendanceif Attendance is between 0% and 50%165
if Attendance is between 51% and 74%125
if Attendance is between 75% and 84%85
if Attendance is between 85% and 94%45
if Attendance is between 95% and 100%5
f2(∙)Overageif Overage is lower or equal than 6 months30
if Overage is between 7 and 12 months65
if Overage is between 13 and 24 months100
if Overage is equal to or higher than 25 months135
f3(∙)Welfare Recipientif No40
Paternityif Yes210
Pregnancy
f4(∙)Mother Schooling (Years)if Mother Schooling is 070
if Mother Schooling is between 1 and 760
if Mother Schooling is 850
if Mother Schooling is between 9 and 1140
if Mother Schooling is 1230
if Mother Schooling is between 13 and 1720
if Mother Schooling is equal to or higher than 1810

My analysis focuses on the subgroup of potential BARE new entrants and relies on administrative data provided by Junta Nacional de Auxilio Escolar y Becas.[5] This contains for each student from the ninth to eleventh grades: i) personal information, ii) each variable of the IVSE index, iii) their IVSE score, iv) whether the student became a recipient of the cash transfer during the year, and, v) future academic performance and enrollment in secondary education.

Table 3 provides descriptive statistics by IVSE scores and BARE cohorts (year 2014 and/or 2015) for the entire sample. Panel A shows that adolescents scoring above the IVSE score used to determine eligibility were more likely to receive BARE afterward. In 2014, 49.2% of adolescents scoring no less than 84.625 in the IVSE became recipients of the conditional cash transfer later in the year. Conversely, less than 1% of adolescents scoring below this threshold ended up being beneficiaries. In 2015 these percentages are 40.1% and 1.8%, respectively.

Table 3

Descriptive Statistics (Mean Values) by BARE Cohorts and IVSE scores

BARE Cohorts
Variables201420152014 & 2015
< 84.625>=84.625Total< 84.150>=84.150Total< Cutoff>=CutoffTotal
    Panel A: IVSE Score and BARE Recipient
IVSE 48.9294.2951.6050.5596.0253.5549.7295.1952.55
BARE0.0050.4920.0340.0180.4010.0430.0110.4450.038
    Panel B: IVSE Variables
Attendance (%)89.568.488.390.670.189.290.069.388.7
Overage (Months)0.3915.311.27–0.1314.870.850.1415.081.07
Welfare Recipient0.2360.9350.2780.3010.9270.3420.2680.9310.309
Parent0.0010.0630.0050.0010.1320.0100.0010.0990.007
Pregnancy0.0010.0580.0050.0000.0640.0040.0010.0610.004
Mother Schooling (Years)[1]9.547.929.4510.078.289.959.818.119.70
    Panel C: Demographic Information
Male0.5250.4940.5230.5130.4980.5120.5190.4960.517
Age (Years)16.0617.1916.1316.0017.1716.0816.0317.1816.10
Metropolitan Region0.2870.2780.2870.2610.2630.2610.2750.2700.274
    Panel D: Academic Information
Technical-Professional0.5130.5060.5120.5190.5130.5180.5160.5100.515
School
Public School0.6940.7880.7000.7080.7750.7120.7010.7810.706
Ninth Grade0.3350.4070.3390.3500.4020.3530.3420.4040.346
Tenth Grade0.3470.3160.3450.3360.3140.3340.3410.3150.340
Eleventh Grade0.3190.2770.3160.3140.2840.3120.3170.2800.314
Previous Average Grade5.204.285.155.314.735.275.254.515.21
Progressed Previous Year0.9050.5850.8860.9240.6240.9040.9140.6050.895
Number of Observations245,30715,372260,679233,44216,445249,887478,74931,817510,566

Panel B shows that students scoring above the thresholds had lower levels of attendance, were more likely to be lagging in their pathway to graduation, were more likely to belong to a household whose members were welfare recipients, were more likely to be parents or pregnant, and had mothers with fewer years of schooling. Panels C and D show that adolescents above the relevant IVSE thresholds were (compared to those below these): slightly less likely to be male and more likely to be older, more likely to be enrolled in a public school and ninth grade. The group with higher IVSE scores also had a lower average grade (academic performance). Additionally, a smaller fraction had progressed to the following grade in the previous year.

The remainder of this section shows and explains how administrative sorting invalidates an RD design using IVSE as a running variable. The design of the IVSE formula causes adolescents around the threshold to be dissimilar. The continuity assumption is unlikely to hold when using this running variable. However, without administrative sorting, a researcher could have found that variation near the threshold was as good as random. The continuity assumption is likely to have held. This reveals that the index can be fixed for future evaluations.

Table 4 presents comparisons of means for pre-treatment variables for adolescents just above and just below the IVSE thresholds. These two groups of students differ substantially. For example, in 2014 adolescents scoring between 84.625 and 85.125 in the IVSE (and therefore who were eligible for BARE) had on average (relative to adolescents scoring above 84.125 but lower than 84.625): lower levels of attendance in the previous year (0.072 percent points), lower overage (10.51 months) and mothers with more years of schooling (4.11 years). All these differences are statistically significant at a 99% level of confidence. A similar phenomenon can be observed in 2015. Adolescents scoring between 84.150 but lower than 84.650 (who were thus eligible for BARE) had different pre-treatment variables relative to those scoring above 83.650 and below 84.150. The former group had on average (relative to the latter) more attendance levels in the previous year (0.030 percentage points), a higher overage (5.81 months) and mothers with fewer years of schooling (2.06 years).

Table 4

Mean Values by IVSE Scores and Differences in Means for Pre-Treatment Variables

BARE Cohorts
Pre-Treatment Variables20142015
>=84.625 & <85.125<84.625 & >=84.125Difference in Meansp-value <>=84.15 & < 84.65< 84.15 & >=83.65Difference in Meansp-value <
Panel A: IVSE Variables
Attendance (%)88.996.1–7.20.00091.588.53.00.000
Overage (Months)17.1227.63–10.510.00020.8915.085.810.000
Welfare Recipient0.9760.986–0.0100.2790.9820.984–0.0030.688
Parent0.0120.0110.0010.9190.0180.0160.0030.688
Mother Schooling (Years)9.745.634.110.0008.8010.86–2.060.000
     Panel B: Demographic Information
Male0.6080.609–0.0010.9650.6310.6100.0210.396
Age (Years)17.2518.35–1.090.00017.4917.090.400.000
Metropolitan Region0.2050.0650.1390.0000.1980.252–0.0540.010
     Panel C: Academic and School Information
Technical Professional School0.5070.552–0.0460.1490.5380.4850.0530.041
Public School0.7240.6830.0420.1470.7280.739–0.0110.633
Ninth Grade0.3810.411–0.0290.3450.4760.3520.1240.000
Tenth Grade0.4140.2180.1960.0000.3040.418–0.1130.000
Eleventh Grade0.2050.371–0.1670.0000.220.230–0.0100.640
Previous Average Grade4.855.07–0.220.0004.964.930.040.211
Progressed Previous Year0.7390.827–0.0880.0010.7590.765–0.0060.792
Number of Observations8313531,1841,0975791,676

The first four columns in Table 5 provide difference in means estimates for pre-treatment variables using two different local windows (0.5 and 1.5 IVSE points around the threshold, respectively). Statistically significant differences between groups can be observed not only in variables that are part of the index but also in other demographic and academic features.

Table 5

RD Estimates for Pre-Treatment Variables

Pre-Treatment VariablesBARE Cohorts (Columns Additionally Vary by Local Window/Bandwidth Size)
2014201520142015
    Panel A: IVSE Variables
Attendance (%)–7.207[***]2.563[***]3.020[***]5.680[***]–5.244[***]– 10.393[***]8.784[***]4.013[***]
(0.258)(0.156)(0.232)(0.179)(0.683)(0.172)(0.258)(0.150)
Overage (Months)–10.514[***]8.259[***]5.814[***]13.115[***]–9.718[***]– 18.859[***]12.568[***]7.286[***]
(0.362)(0.266)(0.377)(0.322)(0.986)(0.354)(0.628)(0.288)
Mother Schooling (Years)4.110[***]0.463[***]–2.059[***]1.057[***]8.678[***]1.049[***]–10.373[***]–1.481[***]
(0.200)(0.089)(0.226)(0.120)(0.229)(0.075)(0.152)(0.096)
    Panel B: Non IVSE Variables
Male–0.0010.066[***]0.0210.125[***]0.069–0.094[***]–0.0160.041
(0.031)(0.014)(0.025)(0.015)(0.120)(0.033)(0.053)(0.027)
Metropolitan Region0.139[***]–0.065[***]–0.054[**]–0.077[***]0.164[*]0.231[***]–0.208[***]–0.044[**]
(0.023)(0.011)(0.021)(0.013)(0.100)(0.023)(0.039)(0.022)
Ninth Grade–0.0290.040[***]0.124[***]0.098[***]–0.101–0.0240.167[***]0.078[***]
(0.031)(0.014)(0.025)(0.015)(0.108)(0.032)(0.053)(0.027)
Previous Average Grade–0.225[***]–0.125[***]0.037–0.080[***]–0.501[***]–0.107[***]0.286[***]0.088[***]
(0.038)(0.015)(0.030)(0.017)(0.130)(0.038)(0.065)(0.032)
Progressed Previous Year–0.088[***]–0.091[***]–0.006–0.108[***]–0.234[**]0.0120.133[***]0.007
(0.027)(0.011)(0.022)(0.012)(0.105)(0.027)(0.045)(0.023)
Local Window Size0.51.50.51.5
Bandwidth Size1.54.51.54.5
Number of Observations1,1845,3181,6764,1455,31814,1124,14516,087
  1. Standard errors in parentheses

A potential explanation for the lack of balance in average pre-treatment variables is that the neighborhood I use is too large. To avoid this problem, I compare these variables only at the threshold, where it should be guaranteed that variation is as good as random. The continuity-based framework in RD designs is suitable for this purpose. In this approach, if there are no discontinuities of pre-treatment variables at the threshold, then it is more likely that variation in eligibility is as good as random at the IVSE threshold. I use local regressions for this end.

(2)Xi=α+βIi+γZi+θZiIi+ωZi2+δZi2Ii+ϵi,

where Xi is a pre-treatment variable for adolescent i, Zi is the difference between the IVSE score for adolescent i and the IVSE threshold. Ii is an indicator function, which takes the value of one if Zi ≥ 0 and zero otherwise. ϵi represents the error term of the regression.

The last four columns of Table 5 present β, the results of the RD falsification tests for the continuity-based framework. I obtain robust biased-corrected standard errors [4]. The fifth and sixth columns show the results for the 2014 BARE cohort. The last two columns focus on the 2015 group. Each pair of estimates vary by the size of the bandwidth I use in the regression. Most of the variables are statistically significant at a 99% level of confidence. Therefore, I cannot assume that a continuous conditional distribution exists for each pre-treatment variable at the IVSE threshold used by the conditional cash transfer. Therefore, Table 5 suggests that the variation in BARE eligibility cannot be considered as good as random near each IVSE threshold. There are systematic differences in adolescents across the thresholds each year.

In 2015 adolescents just above the threshold (compared to those just below) had higher overage and attendance. These differences will affect impact estimates in ways that are not straightforward to anticipate as these variables are negatively and positively correlated with future school enrollment, respectively. The results are not explained by adolescents’ sorting near the threshold, making themselves eligible for BARE.

IVSE scores are hard for adolescents to manipulate as the variables from which the index is derived are difficult to manipulate. For example, mother’s schooling, and parenthood decisions are unlikely to be affected by BARE. Overage is a variable that partly depends on the date of birth. Being a welfare recipient depends on the characteristics of the adolescent’s household and is determined after complex and long assessments. The variable that adolescents could exercise more control over is attendance. Additionally, students could misreport information such as their mother’s schooling. However, manipulation of attendance or misreporting does not imply necessarily precise manipulation of IVSE scores. Even if some adolescents could manipulate their attendance or misreport information, they would need to have precise knowledge of the structure of the IVSE formula, the weights assigned to each variable (which are not public), and the threshold to be used in the future by BARE (which changed between 2014 and 2015) to precisely manipulate their eligibility. For all the given reasons, the lack of suitability of the IVSE as a running variable for an RD design seems, at first glance, surprising.

Features of the design of the IVSE formula cause adolescents just above and just below the threshold to be dissimilar. For example, in 2014 adolescents with 84.625 points ended up being eligible for BARE while adolescents with 84.150 (one of the closest in terms of IVSE) ended up not being eligible. Both types of adolescents notably differ in their values of attendance, overage and their mothers’ schooling. The former group have only mothers with 9 to 11 years of schooling while the latter have mothers with 1 to 7 years of schooling. The former group only have attendance levels between 85% and 94% and overage between 13 and 24 months while the latter group exclusively have attendance and overage equal or higher to 95% and 25 months, respectively. These systematic differences across individuals near the threshold explain Table 4 results.

In practice, the IVSE formula generates: i) mass points[6], IVSE values that are comprised of many adolescents, and ii) clusters of observations. Adolescents who have the same score in the IVSE also share the same category of values for attendance, overage and mother schooling.[7] Despite all students near the threshold being at risk of dropping out, how the IVSE index is computed makes students just above and below the threshold to be different. One type of adolescent, relative to the other, have a different probability of being on one side of the cutoff.

The design of the IVSE leads to an irregular distribution of scores - see the left panel of Figure 1. The IVSE distribution could have returned comparable students near the threshold and could have been smoother. The latter argument can be understood by looking at the right-hand panel of Figure 1 in which I show a hypothetical distribution of IVSE scores using a few amendments.

Figure 1 Observed IVSE Density and Hypothetical IVSE Density (Year 2014)
Figure 1

Observed IVSE Density and Hypothetical IVSE Density (Year 2014)

The only difference between this hypothetical distribution and the original is that I replace f1(∙) and f2(∙) by two new functions f1()andf2(),respectively. Table 6 presents f1()andf2(). The logic I utilize to design f1()andf2()follows two criterions. Firstly, to preserve the original scale of IVSE, f1()andf2(∙) return the same minimum and maximum scores as f1(∙) and f2(∙), respectively. Secondly, f1()andf2()intend to break with the categorization of attendance and overage of f1()andf2().Instead, in these new functions, I use almost the entire range of values available for each variable to return a unique score for each value. Thus, Figure 1 shows that the irregular distribution of the IVSE scores is mainly explained by the design of f1(∙) and f2(∙), the functions that transform attendance and overage into scores, respectively.

Table 6

Functions fk()Transforming IVSE Variables into Scores

fk()VariablesFormulaMin. ScoreMax. Score
f1()Attendance5+1.6(100–Attendance)5165
f2()Overage30 if overage is lower than –13 months30135
135 if overage is higher than 37 months
30+2.1(overage+13) otherwise

The left panel of Figure 2 shows the hypothetical distribution of the IVSE (from the right-hand panel of Figure 1) but close to the BARE threshold. The right-hand panel of Figure 2 shows that a McCrary test [29] does not reject this new distribution. No discontinuity can be observed in the density of the hypothetical IVSE index at the threshold used by BARE. These results show that minor adjustments in the IVSE formula could have prevented administrative sorting, increasing the index suitability to be used as a running variable in a future RD design.

Figure 2 Hypothetical IVSE Density & McCrary Test (Year 2014)
Figure 2

Hypothetical IVSE Density & McCrary Test (Year 2014)

The IVSE index is an inappropriate running variable for an RD design due to the design of f1(∙) and f2(∙). These functions transform discrete variables into categorical ones to build the index. The index formula prevents a smooth distribution of scores and generates clusters of students. As a result, adolescents just above and below the threshold differ in terms of key features despite all being at risk of dropping out of school. The design of the IVSE, specifically this categorization of discrete variables, affects non-randomly the position of adolescents near the threshold. Without administrative sorting, the continuity assumption is more likely to have held. For this conditional cash transfer administrative sorting, not manipulation, undermines the RD design. This finding demonstrates that lack of ability to manipulate a running variable does not automatically translate into randomized variation in treatment assignment near the relevant threshold.

3 Administrative Sorting Threatens an RD Design

This section presents how administrative sorting threatens, however not invalidates, an RD design. As in the previous section, features of the index used to target the conditional cash transfer induce changes in the threshold vicinity, casting doubts about the RD design. In this case the index can be adjusted, enabling to implement an RD design for an impact evaluation.

Subsidio Familiar (SUF) is a cash benefit that operates as a conditional cash transfer in practice, seeking to increase households’ present income and intending to promote human capital accumulation among children and adolescents.[8] From July 2014 until June 2015 the monthly cash transfer per child or adolescent was $9,242 CLP ($14.6 USD at June 30th 2015). SUF’s target population is households belonging to the poorest 40%. This status was measured using the Social Protection File (SPF) index, by having a score equal to or lower than 11,734 points in the index. SPF scores estimate household income. Specifically, the formula estimates household members’ income using variables correlated to their income, which are collected during a household interview.[9] The general formula for SPF scores is as follows:

(3)SPFScore h=GinCGIi0.9+YDi0.1+PiYPiINh,

where CGIi is the proxy means test prediction of the labor income of individual i. A proxy means test refers to a system where information correlated with income is used in a formula to proxy income. The formula (parameters and weights) is obtained through statistical analysis and tends to use data that is easily observable by public officials [18, 23]. YDi is the self-reported income (mostly from labor) of individual i. YPi is the permanent income for individual i. INh is a needs index for household h. Finally, G(∙) is a monotonic function that transforms household h income prediction into an SPF score.

Given its formula, in theory, the SPF index ranks households from the poorest to the richest in a similar way to a ranking by income per capita.[10] The SPF scale ranges from 2,072 points (the poorest households) to infinity. However, by 2014, only two households had a score higher than 16,000 points.

Every administrative dataset I use in this section was provided by the Chilean Ministry of Social Development. For privacy purposes the individuals ID numbers were changed by the Ministry using an algorithm that is unknown to me but that enabled me to link the datasets. The Social Protection File Dataset provides information on: i) SPF scores, which are crucial for the identification of Subsidio Familiar recipients, ii) household structure, and iii) variables such as years of schooling, employment status, and self-reported income. The Chilean Ministry of Education Performance Dataset contains information on enrollment, attendance, performance and end of year academic classification for most primary and secondary school students.

Table 7 provides descriptive statistics relevant to SUF for two cohorts, corresponding to adolescents in 2013 and 2014. Panel A shows that adolescents scoring no more than 11,734 in the SPF were more likely to be entitled to the cash transfer. Overall, 41.2% of eligible adolescents received it. Conversely, less than 1% of adolescents scoring above this threshold received it. Panel B presents variables that are used in the SPF formula. Adolescents scoring below 11,734 are more vulnerable than those scoring above this threshold. On average, the former group has a head of household with fewer years of schooling and a lower chance of working formally (defined as working and contributing to social security). Their households also have a lower income and are larger. Concerning their academic information, Panel D shows that adolescents scoring below the threshold are more disadvantaged in terms of academic features relative to adolescents with higher SPF scores.

Table 7

Descriptive Statistics (Mean Values) by SUF Threshold

VariablesSPF Score
<= 11,734>11,734Total
        Panel A: SPF Score and SUF Recipient
SPF Score5,907.213,222.86,976.0
SUF Recipient0.4120.0080.353
        Panel B: SPF Relevant Variables (HH: Head of Household)
HH Years of Schooling9.1712.109.60
HH Working0.7410.8380.755
HH Working Formally0.3510.7300.406
Household Monthly Income ($CLP)154,970423,502194,201
Household Size4.304.004.26
Female HH0.5100.2910.478
        Panel C: Demographic Information
Male0.5130.5150.513
Age (Years)14.9314.9514.93
Metropolitan Region0.3350.4290.349
        Panel D: Academic and School Information
Enrollment Previous Year0.9130.9540.919
Seventh or Eighth Grade0.3770.3540.373
Ninth or Tenth Grade0.4240.4460.427
Eleventh or Twelfth Grade0.1440.1780.150
Attendance Previous Year (%)90.2091.7490.43
Average Grade Previous Year5.325.485.35
Progressed Previous Year0.9140.9390.918
Number of Observations1,627,331278,4191,905,750
  1. Note: grades, attendance, average grade and progression measured only among those who were enrolled in the previous year.

The remainder of this section shows and explains how administrative sorting threatens an RD design using the SPF as a running variable. A monotonic function induces abrupt changes in the SPF density, leading to failing the density test when using the SPF as a running variable. These problems though can be fixed, enabling to use an RD design for the assessment.

Observing a smooth distribution in the running variable in the neighborhood of the threshold is essential to believe that treatment assignment is as good as random. Any discontinuity near the cutoff is generally interpreted as a sign of manipulation. Therefore, it becomes essential to assess whether any discontinuities emerge in the distribution of SPF scores. The left panel of Figure 3 presents this distribution in a small neighborhood close to the 11,734 SPF threshold (the dashed line). A clear discontinuity can be seen in the density of SPF scores to the right-hand side of the threshold, though this is not observed immediately after crossing it.

Figure 3 SPF Scores Distribution and Density Test near Subsidio Familiar Threshold
Figure 3

SPF Scores Distribution and Density Test near Subsidio Familiar Threshold

I implement [13] manipulation test without imposing any restriction and allowing for the bandwidth to be chosen in a data-driven manner. The test produces an estimation of the distribution of the running variable at each side of the threshold. If the estimate of the discontinuity in the density is not statistically significant there is no evidence to claim that such a discontinuity exists. Graphically, this is mostly the case when the confidence intervals of the estimates overlap at the threshold. The right-hand panel of Figure 3 presents the results of this test. The figure shows the estimated density of SPF scores (in the wider central line) with its 95% bias-corrected confidence interval at each side of the threshold. No overlap between the estimated density of SPF scores can be observed at the 11,734 threshold.

The density test result raises the question of the suitability of using SPF scores as a running variable to evaluate the conditional cash transfer. At first glance, the discontinuity in the density of SPF scores near 11,734 suggests that the variation in SUF eligibility is not as good as random around this threshold. This dis-continuity in the distribution of SPF scores is not an isolated case. The SPF scores distribution shows clear discontinuities in at least four parts of its range. Moreover, I distinguish a sequence in these discontinuities. This can be seen in Figure 4.

Figure 4 Social Protection File Scores Distribution (Year 2011)
Figure 4

Social Protection File Scores Distribution (Year 2011)

The dashed lines divide the SPF scores distribution into ten sections. Each section has a size of 1,385 or 1,386 SPF points. The first discontinuity in the distribution is observed at the SPF score of 3,458 (where the first dashed line is), 1,386 points away from the lowest SPF score of 2,072. On the right-hand side of the figure, larger discontinuities can be observed at the SPF scores of 11,772, 13,157 and 14,543 (the seventh, eighth and ninth lines, respectively). Figure 4 suggests the existence of a pattern in the SPF scores. Every 1,385 or 1,386 SPF points there is a discontinuity in the distribution. Public policies in Chile used none of the points where these discontinuities are observed. Hence, these discontinuities are unlikely to be explained by individuals’ manipulation of SPF scores. That argument seems more suitable for explaining the accumulation of observations on the very left side of the distribution.

A better explanation is that the discontinuities in the SPF density are administratively produced. The function G(∙), which transforms SPF households’ income per capita predictions into SPF scores is responsible for these discontinuities. The G(∙) function transforms a continuous density (the SPF predicted income per capita distribution, which is based on the Chilean income distribution) into an irregular one. This hypothesis is supported by the feasibility of obtaining from the Chilean income per capita distribution a density with a similar shape to the SPF scores distribution in a small number of steps. This sequence, or function g (∙), is described as follows:

  1. Obtain the independent income per capita for each household (using CASEN 2011).[11][12]

  2. Obtain the nine percentiles of income that divide the sample into ten decile groups.

    • The ten decile groups have different sizes in terms of the income range. For example, income per capita for the first group ranges from $0 to $43,960 CLP, while for the ninth group it ranges from $329,507.75 to $567,581.5 CLP.

  3. Divide each decile group into ten additional subgroups with an equal income range.

    • Each subgroup has the same income band within each decile group (for example each subgroup in the first decile has an income band of $4,396 CLP).

    • Income ranges differ for two subgroups from different income decile groups. Thus, the income band for the tenth subgroup is different from the eleventh.

  4. Assign each household in CASEN to 1 of the 100 independent income per capita subgroups by observing their independent income per capita.

I present the resulting distribution in the left panel of Figure 5.[13]The dashed lines in this panel separate income decile groups. This is compared with the right-hand panel of Figure 5, which shows the density of SPF scores when 100 bins of 138.6 points are used to construct the histogram. The dashed lines in this panel separate SPF scores by 1,386 points. There are many similarities between these two distributions, including: i) an increasing density from the sixth to the tenth bin, ii) a sharp decline at the eleventh bin, iii) a flat density thereafter (albeit with more variance in CASEN), iv) a moderate discontinuity at the 71st bin, v) a steady decline in the density until the 80th bin, vi) a repetition of the pattern noticed in points iv) and v) between the 81st to the 90th bin, and finally, vii) a very sharp increase in the 91st bin that leads to the highest concentration of observations in both distributions.

Figure 5 One Hundred Bins of Income per Capita in CASEN and the SPFSPF scores do not precisely predict independent income per capita, but a different measure of income (which includes some public pensions and subsidies and excludes income related to the property of assets) divided by a needs index. Many of these factors cannot be easily accounted for in CASEN. For this reason, I select independent income per capita as the standard for comparison.
Figure 5

One Hundred Bins of Income per Capita in CASEN and the SPF

SPF scores do not precisely predict independent income per capita, but a different measure of income (which includes some public pensions and subsidies and excludes income related to the property of assets) divided by a needs index. Many of these factors cannot be easily accounted for in CASEN. For this reason, I select independent income per capita as the standard for comparison.

The SPF formula predicts household income per capita. If the SPF formula, before using G (∙), is a good predictor and if the income per capita distribution is smooth, the SPF predicted income per capita distribution would likely be smooth. However, as seen, this is not the case with SPF scores, which are returned after applying G (∙) to income per capita predictions.

The monotonic function G(∙) that produces SPF scores from income predictions seems to affect non-randomly the position of units near some SPF thresholds. Thus, administrative sorting in the running variable is the primary explanation for the density test result in the case of SUF. The 11,734 SPF threshold used by SUF for targeting is too close to 11,772, where one of the discontinuities in the density of SPF scores explained by G(∙) emerges.

As SPF scores are derived from a prediction of income per capita and a monotonic function G(∙), using the inverse function of G(∙) on SPF scores will provide the original SPF income prediction. Accordingly, if the national household income per capita distribution is continuous, then the SPF predicted income per capita distribution is likely to be continuous. I present the result of inverting the SPF scores, using similar ideas as in function g(∙), in the left panel of Figure 6. This returns an SPF prediction of income per capita for each household. The density of SPF predicted incomeper capita does not show the discontinuities of the SPF scores. This is the case in all parts shown in this distribution except on its tails, which are not shown. Moreover, the density is similar to CASEN’s (available in the right-hand panel of Figure 6).

Figure 6 Income per Capita Distribution (SPF Prediction and CASEN)
Figure 6

Income per Capita Distribution (SPF Prediction and CASEN)

SPF predicted income per capita is a more suitable running variable for an RD design for SUF than SPF scores because this variable is smoother around the threshold used by the cash transfer. This can be better appreciated in Figure 7. The left panel of the figure shows the density of SPF predicted income per capita around the equivalent (to 11,734 SPF points) threshold. The density is smooth in the relevant neighborhood. The discontinuity in the estimated density is narrower relative to the one I presented in Figure 3. The new running variable does pass the [13] density test (in the right-hand panel of Figure 7). I do not impose any restriction and allow for the bandwidth to be chosen in a data-driven way.

Figure 7 SPF Predicted Income per Capita Distribution and Density Test
Figure 7

SPF Predicted Income per Capita Distribution and Density Test

In this example, administrative sorting is caused by the monotonic function used to transform the predicted values of an income-proxy means test into ordinal scores. The function generates noticeable discontinuities in the distribution of index’ scores. For the cash transfer evaluated, a discontinuity is close to the threshold used for targeting. As a result, the running variable fails to pass the density test. The paper shows that the predicted income of the targeting instrument is a more suitable running variable for an RD design. An RD design using the latter running variable will not fail the density test and is likely to be more credible for an impact assessment.

All these findings confirm that administrative sorting, not manipulation, is the primary driver of the dis-continuities in the distribution of SPF scores. This situation highlights the relevance of fully understanding the running variable generating process when applying an RD design.

4 Conclusion

This paper focuses on a threat to internal validity in RD designs that has received little attention in the RD literature. The paper contributes by highlighting a relevant issue, usually not discussed nor addressed in detail, that applied RD practitioners can encounter in their work.

I analyze administrative sorting in the context of the evaluation of two programs using RD designs. Administrative sorting is the result of administrative procedures that are beyond the control and knowledge of individuals. These procedures affect non-randomly the position of individuals in the running variable near the threshold. As a result, the continuity assumption becomes less plausible. Administrative sorting differs from the well-known and addressed threat of manipulation, which is characterized by individuals’ deliberate action for their benefit.

In my two cases, lack of manipulation of the running variable does not translate automatically into variation in treatment assignment being as good as random near the threshold. This finding contradicts one of the most highlighted points in [28] influential paper on RD designs. Here administrative sorting undermines the RD design, not manipulation. Accordingly, lack of manipulation is a necessary condition but not sufficient for valid RD designs. My paper shows that a density test can fail for reasons other than the manipulation of the running variable. This finding broadens our understanding of the interpretation of the test.

My findings highlight the importance of fully understanding the data generation process. If administrative rules, not individual manipulation, explain the shape of the running variable density near the threshold then applied practitioners may still find useful variation for identification, enabling them to carry on an RD design immediately or, depending how severe the sorting is near the threshold, in the near future after adjusting these administrative rules.

For example, to estimate the impact of BARE I could exploit the categorization of attendance and overage in the IVSE formula if adolescents were unaware of the effect of these variables for selection into the conditional cash transfer. Some adolescents who differ by a few decimal places in their attendance or days in their date of birth have a different likelihood of being eligible for BARE. I also show that by introducing a few changes to the formula index, administrative sorting near the threshold can disappear. Addressing this issue directly can lead to improved versions of the index, allowing researchers’ future implementation of RD designs.

If the probability of being treated has not been affected by administrative sorting, then applied practitioners may still be able to implement a retrospective RD design without major waiting. In the paper I show that for SUF, future research could focus on using the SPF predicted income instead of SPF scores to estimate the impact of the cash transfer. In another example, [20] uses the average grade (annual academic performance) of students in an RD design instead of their relative ranking in their academic cohort, although the latter variable was the one used for allocating the program Bono por Logro Escolar. The academic relative ranking is not a useful running variable in this case because it administratively groups students at the relevant threshold, or very close to it, which have different features to those students marginally at the other side of the cutoff. The original variable, average grade, from which the academic relative ranking is derived, does not have this problem and is a more suitable running variable.

The threat of administrative sorting discussed in this paper is closely related to the use of indexes such as income-proxy means tests as running variables. At least 20 developing countries in Latin America, Africa and Asia such as Peru, Nicaragua, Bangladesh, Cambodia, Cameroon and Rwanda have used this mechanism for targeting [1, 3, 18]. In Ecuador, [31] do not consider whether administrative sorting explains why more individuals are found above the threshold used by the conditional cash transfer Bono de Desarrollo Humano. The authors only claim that this difference could not be attributed to manipulation. [17] faces a related situation in Chile. Hence, my paper could prove useful for public officers who are designing new social policies using indexes, such as proxy means tests, that are expected to be evaluated with RD designs. Conducting careful analysis of the properties of these indexes before being implemented can become a key activity facilitating future impact assessments with RD designs.

The recent breakthrough of RD designs in causal inference has been a major contribution. This is particularly true where experimental approaches are less feasible. In this spirit, this paper contributes clarifications regarding RD applications. It does so by highlighting administrative sorting, a relatively overlooked threat to validity that can invalidate an RD design or be misinterpreted as manipulation, leading to discarding valuable and feasible RD approaches.

  1. Ethical approval: The conducted research is not related to either human or animals use.

  2. Conflict of Interest

    Conflict of Interests: Authors state no conflict of interest

References

[1] Australian Aid. (2011). Targeting the Poorest: An Assessment of the Proxy Means Test Methodologyhttps://www.unicef.org/socialpolicy/files/targeting-poorest.pdfSearch in Google Scholar

[2] Barreca, A., Lindo, J., Waddell, G. (2016). Heaping-Induced Bias in Regression Discontinuity Designs. Economic Inquiry, 54(1), 268-293.10.3386/w17408Search in Google Scholar

[3] Brown, C., Ravallion, M., & Van deWalle, D. (2018). A Poor Means Test? Econometric Targeting in Africa. Journal of Development Economics, 134, 109-124.10.3386/w22919Search in Google Scholar

[4] Calonico, S., Cattaneo, M., Farrell, M., & Titiunik, R. (2017). rdrobust: Software for Regression Discontinuity Designs. Stata Journal, 17(2), 372–404.10.1177/1536867X1701700208Search in Google Scholar

[5] Calonico, S., Cattaneo, M., Farrell, M., & Titiunik, R. (2019). Regression Discontinuity Designs Using Covariates. Review of Economics and Statistics, 101(3), 442–451.10.1162/rest_a_00760Search in Google Scholar

[6] Calonico, S., Cattaneo, M., & Titiunik, R. (2014). Robust Nonparametric Confidence Intervals for Regression-Discontinuity Designs. Econometrica, 82(6), 2295–2326.10.3982/ECTA11757Search in Google Scholar

[7] Calonico, S., Cattaneo, M., & Titiunik, R. (2015). Optimal Data-Driven Regression Discontinuity Plots. Journal of the American Statistical Association, 110(512), 1753–1769.10.1080/01621459.2015.1017578Search in Google Scholar

[8] Camacho, A., & Conover, E. (2011). Manipulation of Social Program Eligibility. American Economic Journal: Economic Policy, 3(2), 41-65.10.1257/pol.3.2.41Search in Google Scholar

[9] Cattaneo, M., & Escanciano, J. C. (2017). Introduction: Regression Discontinuity Designs. In M. Cattaneo & J. C. Escanciano (Eds.), Advances in Econometrics (Vol. 38, pp. i-xxv). Bingley, UK: Emerald Publishing Limited.Search in Google Scholar

[10] Cattaneo, M., Frandsen, B., & Titiunik, R. (2015). Randomization Inference in the Regression Discontinuity Design: An Application to Party Advantages in the U.S. Senate. Journal of Causal Inference, 3(1), 1–24.10.1515/jci-2013-0010Search in Google Scholar

[11] Cattaneo, M., Idrobo, N., & Titiunik, R. (2020a). A Practical Introduction to Regression Discontinuity Designs: Foundations. Cambridge Elements: Quantitative and Computational Methods for Social Science Cambridge University Press.10.1017/9781108684606Search in Google Scholar

[12] Cattaneo, M., Idrobo, N., & Titiunik, R. (2020b). A Practical Introduction to Regression Discontinuity Designs: Extensions. Cambridge Elements: Quantitative and Computational Methods for Social Science Cambridge University Press.10.1017/9781108684606Search in Google Scholar

[13] Cattaneo, M., Jansson, M. & Ma, X. (2018). Manipulation Testing Based on Density Discontinuity. Stata Journal 18(1), 234-261.10.1177/1536867X1801800115Search in Google Scholar

[14] Cattaneo, M., Keele, L., Titiunik, R. & Vazquez-Bare, G. (2016). Interpreting Regression Discontinuity Designs with Multiple Cutoffs. Journal of Politics, 78(4), 1229–1248.10.1086/686802Search in Google Scholar

[15] Cattaneo, M., Titiunik, R. & Vazquez-Bare, G. (2017). Comparing Inference Approaches for RD Designs: A Reexamination of the Effect of Head Start on Child Mortality. Journal of Policy Analysis and Management 36(3), 643–681.10.1002/pam.21985Search in Google Scholar PubMed

[16] Cattaneo, M., Titiunik, R. & Vazquez-Bare, G. (2020). The Regression Discontinuity Design. Handbook of Research Methods in Political Science and International Relations. Sage Publications, Ch. 44, 835-857.10.4135/9781526486387.n47Search in Google Scholar

[17] Centro de Microdatos. (2012). Evaluación de Impacto del Programa Subsidio al Empleo Jovenhttp://www.dipres.gob.cl/594/articles-119350_doc_pdf.pdfSearch in Google Scholar

[18] Coady, D., Grosh, M., & Hoddinott, J. (2004). Targeting of Transfers in Developing Countries: Review of Lessons and Experiencehttp://siteresources.worldbank.org/SAFETYNETSANDTRANSFERS/Resources/281945-1138140795625/Targeting_En.pdf10.1596/0-8213-5769-7Search in Google Scholar

[19] Comité de Expertos Ficha de Protección Social. (2010). Informe Final Comité de Expertos Ficha de Protección Socialhttp://www.ministeriodesarrollosocial.gob.cl/btca/txtcompleto/mideplan/c.e-fps-infinal.pdfSearch in Google Scholar

[20] Crespo, C. (2019). Cash for Grades or Money for Nothing? Evidence from Regression Discontinuity Designs Social Policy Working Series. The London School of Economics and Political Science. Department of Social Policy, London, UK. http://www.lse.ac.uk/social-policy/Assets/Documents/PDF/working-paper-series/04-19-Cristian-Crespo.pdfSearch in Google Scholar

[21] Focus. Consultorías y Estudios. (2016). Evaluación de Impacto Subsidio Familiar y Asignación Familiarhttp://www.dipres.gob.cl/595/articles-146449_informe_final.pdfSearch in Google Scholar

[22] Frandsen, B. (2017). Party Bias in Union Representation Elections: Testing for Manipulation in the Regression Discontinuity Design When the Running Variable Is Discrete. In M. D. Cattaneo & J. C. Escanciano (Eds.), Advances in Econometrics (Vol. 38, pp. 281-315). Bingley, UK: Emerald Publishing Limited.Search in Google Scholar

[23] Grosh, M., & Baker, J. (1995). Proxy Means Tests for Targeting of Social Programs: Simulations and Speculationhttp://documents.worldbank.org/curated/en/750401468776352539/pdf/multi-page.pdf10.1596/0-8213-3313-5Search in Google Scholar

[24] Hahn, J., Todd, P., & Van der Klaauw, W. (2001). Identification and Estimation of Treatment Effects With a Regression-Discontinuity Design. Econometrica, 69(1), 201-209.10.1111/1468-0262.00183Search in Google Scholar

[25] Imbens, G. W., & Kalyanaraman, K. (2012). Optimal Bandwidth Choice for the Regression Discontinuity Estimator. Review of Economic Studies, 79(3), 933–959.10.1093/restud/rdr043Search in Google Scholar

[26] Imbens, G. W., & Lemieux, T. (2008). Regression Discontinuity Designs: A Guide to Practice. Journal of Econometrics, 142(2), 615-635.10.1016/j.jeconom.2007.05.001Search in Google Scholar

[27] Lee, D. (2008). Randomized Experiments from Non-Random Selection in U.S. House Elections. Journal of Econometrics, 142(2), 675-697.10.1016/j.jeconom.2007.05.004Search in Google Scholar

[28] Lee, D., & Lemieux, T. (2010). Regression Discontinuity Designs in Economics. Journal of Economic Literature, 48(2), 281-355.10.3386/w14723Search in Google Scholar

[29] McCrary, J. (2008). Manipulation of the Running Variable in the Regression Discontinuity Design: A Density Test. Journal of Econometrics, 142(2), 698-714.10.3386/t0334Search in Google Scholar

[30] Opazo, V., Ormazabal, C., & Crespo, C. (2015). Informe Final de Evaluación. Beca de Apoyo a la Retención Escolarhttp://www.dipres.gob.cl/597/articles-141242_informe_final.pdfSearch in Google Scholar

[31] Ponce, J., & Bedi, A. (2010). The Impact of a Cash Transfer Program on Cognitive Achievement: The Bono de Desarrollo Humano of Ecuador. Economics of Education Review, 29(1), 116-125.10.1016/j.econedurev.2009.07.005Search in Google Scholar

[32] Sekhon, J., & Titiunik, R. (2017). On Interpreting the Regression Discontinuity Design as a Local Experiment. In M. D. Cattaneo & J. C. Escanciano (Eds.), Advances in Econometrics (Vol. 38, pp. 1-28). Bingley, UK: Emerald Publishing Limited.Search in Google Scholar

Received: 2019-04-07
Accepted: 2020-07-18
Published Online: 2020-12-23

© 2020 C. Crespo, published by De Gruyter

This work is licensed under the Creative Commons Attribution 4.0 International License.

Downloaded on 6.6.2024 from https://www.degruyter.com/document/doi/10.1515/jci-2019-0009/html
Scroll to top button