Testing Firm Conduct with Cereal Data
[1]:
import numpy as np
import pandas as pd
import pyblp
import pyRVtest
pyblp.options.digits = 2
pyblp.options.verbose = False
pyRVtest.options.digits = 2
pyRVtest.__version__
[1]:
'0.2.0'
Overview
In this tutorial, we are going to use the Nevo (2000) fake cereal data which is provided in the PyBLP package. PyBLP has excellent documentation including a thorough tutorial for estimating demand on this dataset which can be found here.
Note that the data are originally designed to illustrate demand estimation. Thus, the following caveats should be kept in mind:
The application in this tutorial only serves the purpose to illustrate how pyRVtest works. The results we show should not be used to infer conduct or any other economic feature about the cereal industry.
To test conduct, a researcher generally needs data on both cost shifters, and strong excluded instruments. As the data was designed to perform demand estimation, it does not necessarily have the features that are desirable to test conduct in applications.
Hence, the specifications that we use below, including the specification of firm cost and the candidate conducts models, are just shown for illustrative purposes and may not be appropriate for the economic context of the cereal industry.
The tutorial proceeds in the following steps:
Load the main dataset
First we will use pandas to load the necessary datasets from PyBLP. We call the main data containing information on markets and product characteristics product_data
. The important product characteristics for demand estimation in this data set are price
, sugar
, and mushy
.
[2]:
product_data = pd.read_csv(pyblp.data.NEVO_PRODUCTS_LOCATION)
product_data.head()
[2]:
market_ids | city_ids | quarter | product_ids | firm_ids | brand_ids | shares | prices | sugar | mushy | ... | demand_instruments10 | demand_instruments11 | demand_instruments12 | demand_instruments13 | demand_instruments14 | demand_instruments15 | demand_instruments16 | demand_instruments17 | demand_instruments18 | demand_instruments19 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | C01Q1 | 1 | 1 | F1B04 | 1 | 4 | 0.012417 | 0.072088 | 2 | 1 | ... | 2.116358 | -0.154708 | -0.005796 | 0.014538 | 0.126244 | 0.067345 | 0.068423 | 0.034800 | 0.126346 | 0.035484 |
1 | C01Q1 | 1 | 1 | F1B06 | 1 | 6 | 0.007809 | 0.114178 | 18 | 1 | ... | -7.374091 | -0.576412 | 0.012991 | 0.076143 | 0.029736 | 0.087867 | 0.110501 | 0.087784 | 0.049872 | 0.072579 |
2 | C01Q1 | 1 | 1 | F1B07 | 1 | 7 | 0.012995 | 0.132391 | 4 | 1 | ... | 2.187872 | -0.207346 | 0.003509 | 0.091781 | 0.163773 | 0.111881 | 0.108226 | 0.086439 | 0.122347 | 0.101842 |
3 | C01Q1 | 1 | 1 | F1B09 | 1 | 9 | 0.005770 | 0.130344 | 3 | 0 | ... | 2.704576 | 0.040748 | -0.003724 | 0.094732 | 0.135274 | 0.088090 | 0.101767 | 0.101777 | 0.110741 | 0.104332 |
4 | C01Q1 | 1 | 1 | F1B11 | 1 | 11 | 0.017934 | 0.154823 | 12 | 0 | ... | 1.261242 | 0.034836 | -0.000568 | 0.102451 | 0.130640 | 0.084818 | 0.101075 | 0.125169 | 0.133464 | 0.121111 |
5 rows × 30 columns
It is possible to estimate demand and test conduct with other data, provided that they have the product_data
structure described here.
We will also load market demographics data from PyBLP and call this agent_data
. This data contains draws of income
, income_squared
, age
, and child
.
[3]:
agent_data = pd.read_csv(pyblp.data.NEVO_AGENTS_LOCATION)
agent_data.head()
[3]:
market_ids | city_ids | quarter | weights | nodes0 | nodes1 | nodes2 | nodes3 | income | income_squared | age | child | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | C01Q1 | 1 | 1 | 0.05 | 0.434101 | -1.500838 | -1.151079 | 0.161017 | 0.495123 | 8.331304 | -0.230109 | -0.230851 |
1 | C01Q1 | 1 | 1 | 0.05 | -0.726649 | 0.133182 | -0.500750 | 0.129732 | 0.378762 | 6.121865 | -2.532694 | 0.769149 |
2 | C01Q1 | 1 | 1 | 0.05 | -0.623061 | -0.138241 | 0.797441 | -0.795549 | 0.105015 | 1.030803 | -0.006965 | -0.230851 |
3 | C01Q1 | 1 | 1 | 0.05 | -0.041317 | 1.257136 | -0.683054 | 0.259044 | -1.485481 | -25.583605 | -0.827946 | 0.769149 |
4 | C01Q1 | 1 | 1 | 0.05 | -0.466691 | 0.226968 | 1.044424 | 0.092019 | -0.316597 | -6.517009 | -0.230109 | -0.230851 |
Estimate demand with PyBLP
Next, we set up the demand estimation problem using Conlon and Gortmaker (2020).
In this example the linear characteristics in consumer preferences are price and product fixed effects. Nonlinear characteristics with random coefficients include a constant, price
, sugar
, and mushy
. For these variables, we are going to estimate both the variance of the random coefficient as well as demographic interactions for income
, income_squared
, age
, and child
. More info can be found here.
Running the problem setup yields output which summarizes the dimensions of the problem (see here) for description of each variable. Also reported are the Formulations, i.e., the linear characteristics or non-linear characteristics for which we are estimating either variances or demographic interactions, and a list of the demographics being used.
[4]:
pyblp_problem = pyblp.Problem(
product_formulations=(
pyblp.Formulation('0 + prices ', absorb='C(product_ids)'),
pyblp.Formulation('1 + prices + sugar + mushy'),
),
agent_formulation=pyblp.Formulation('0 + income + income_squared + age + child'),
product_data=product_data,
agent_data=agent_data
)
pyblp_problem
[4]:
Dimensions:
=================================================
T N F I K1 K2 D MD ED
--- ---- --- ---- ---- ---- --- ---- ----
94 2256 5 1880 1 4 4 20 1
=================================================
Formulations:
===================================================================
Column Indices: 0 1 2 3
----------------------------- ------ -------------- ----- -----
X1: Linear Characteristics prices
X2: Nonlinear Characteristics 1 prices sugar mushy
d: Demographics income income_squared age child
===================================================================
Next, we solve the demand estimation and store the results for use with the testing module.
The output includes information on computation progress, as well as tables with the parameter estimates. A full list of the post-estimation output which can be queried is found here.
[5]:
pyblp_results = pyblp_problem.solve(
sigma=np.diag([0.3302, 2.4526, 0.0163, 0.2441]),
pi=[
[5.4819, 0.0000, 0.2037, 0.0000],
[15.8935, -1.2000, 0.0000, 2.6342],
[-0.2506, 0.0000, 0.0511, 0.0000],
[1.2650, 0.0000, -0.8091, 0.0000]
],
method='1s',
optimization=pyblp.Optimization('bfgs', {'gtol': 1e-5})
)
pyblp_results
[5]:
Problem Results Summary:
=======================================================================================================
GMM Objective Gradient Hessian Hessian Clipped Weighting Matrix Covariance Matrix
Step Value Norm Min Eigenvalue Max Eigenvalue Shares Condition Number Condition Number
---- --------- -------- -------------- -------------- ------- ---------------- -----------------
1 +4.6E+00 +6.9E-06 +2.4E-05 +1.6E+04 0 +6.9E+07 +8.4E+08
=======================================================================================================
Cumulative Statistics:
===========================================================================
Computation Optimizer Optimization Objective Fixed Point Contraction
Time Converged Iterations Evaluations Iterations Evaluations
----------- --------- ------------ ----------- ----------- -----------
00:00:25 Yes 51 57 46384 143967
===========================================================================
Nonlinear Coefficient Estimates (Robust SEs in Parentheses):
=====================================================================================================================
Sigma: 1 prices sugar mushy | Pi: income income_squared age child
------ ---------- ---------- ---------- ---------- | ------ ---------- -------------- ---------- ----------
1 +5.6E-01 | 1 +2.3E+00 +0.0E+00 +1.3E+00 +0.0E+00
(+1.6E-01) | (+1.2E+00) (+6.3E-01)
|
prices +0.0E+00 +3.3E+00 | prices +5.9E+02 -3.0E+01 +0.0E+00 +1.1E+01
(+1.3E+00) | (+2.7E+02) (+1.4E+01) (+4.1E+00)
|
sugar +0.0E+00 +0.0E+00 -5.8E-03 | sugar -3.8E-01 +0.0E+00 +5.2E-02 +0.0E+00
(+1.4E-02) | (+1.2E-01) (+2.6E-02)
|
mushy +0.0E+00 +0.0E+00 +0.0E+00 +9.3E-02 | mushy +7.5E-01 +0.0E+00 -1.4E+00 +0.0E+00
(+1.9E-01) | (+8.0E-01) (+6.7E-01)
=====================================================================================================================
Beta Estimates (Robust SEs in Parentheses):
==========
prices
----------
-6.3E+01
(+1.5E+01)
==========
Test firm conduct with pyRVtest
pyRVtest follows a similar structure to PyBLP. First, you set up the testing problem, then you run the test.
Setting Up Testing Problem
Here is a simple example of the code to set up the testing problem for testing between two models: 1. manufacturers set retail prices according to Bertrand vs 2. manufacturers set retail prices according to monopoly (i.e., perfect collusion).
We set up the testing problem with pyRVtest.problem
and we store this as a variable testing_problem
:
[6]:
testing_problem = pyRVtest.Problem(
cost_formulation = (
pyRVtest.Formulation('0 + sugar', absorb = 'C(firm_ids)' )
),
instrument_formulation = (
pyRVtest.Formulation('0 + demand_instruments0 + demand_instruments1')
),
model_formulations = (
pyRVtest.ModelFormulation(model_downstream='bertrand', ownership_downstream='firm_ids'),
pyRVtest.ModelFormulation(model_downstream='monopoly', ownership_downstream='firm_ids')
),
product_data = product_data,
demand_results = pyblp_results
)
Detailed information on the input accepted by Problem and how to specify them can be found in the API documentation. Below we clarify the inputs (observed exogenous cost shifters, instruments for testing conduct, and models to be tested) in this particular example:
cost_formulation
: Here, the researcher specifies the observable linear shifters of marginal cost (in the current version of the package, these must be exogenous variables). In this example, we have defined the cost formulation aspyRVtest.Formulation('0 + sugar', absorb = 'C(firm_ids)' )
. Here,0
means no constant. We are also including the variablesugar
as an observed cost shifter and this variable must be in theproduct_data
. Finallyabsorb = 'C(firm_ids)'
specifies that we are including firm fixed effects which will be absorbed. The variablefirm_ids
must also be in theproduct_data
.instrument_formulation
: Here, the researcher specifies the set of instruments she wants to use to test conduct. In this example, we will use one set of instruments to test conduct which contains two variables,demand_instruments0
anddemand_instruments1
. Thus, we have defined the instrument formulation aspyRVtest.Formulation('0 + demand_instruments0 + demand_instruments1')
. Here,0
means no constant and this should always be specified as a 0. Bothdemand_instruments0
anddemand_instruments1
must also be in theproduct_data
. It is possible to test conduct separately for more than one set of instruments as shown in the example below.model_formulations
: Here, we have specified two ModelFormulations and therefore two models to test. The first model is specified aspyRVtest.ModelFormulation(model_downstream='bertrand', ownership_downstream='firm_ids')
model_downstream = 'bertrand'
indicates that retail prices are set according to Bertrand.ownership_downstream='firm_ids'
specifies that the ownership matrix in each market should be built from the variablefirm_id
in theproduct_data
. For testing more than two models, see the example below. To find the full list of models supported by pyRVtest and their associated ModelFormulation see the Library of Models page.
[7]:
testing_problem
[7]:
Dimensions:
=========================
T N M L d_z0
--- ---- --- --- ----
94 2256 2 1 2
=========================
Formulations:
==========================================================
Column Indices: 0 1
---------------- ------------------- -------------------
w: Marginal Cost sugar
z0: Instruments demand_instruments0 demand_instruments1
==========================================================
Models:
=========================================
0 1
--------------------- -------- --------
Model - Downstream bertrand monopoly
Model - Upstream None None
Firm IDs - Downstream firm_ids monopoly
Firm IDs - Upstream None None
VI Index None None
Cost Scaling Column None None
Unit Tax None None
Advalorem Tax None None
Advalorem Payer None None
User Supplied Markups None None
=========================================
The first table Dimensions
reports the following statistics:
\(T\) = number of markets
\(N\) = number of observations
\(M\) = number of models (each specified by a model formulation)
\(L\) = number of instrument sets (each specified by an instrument formulation)
\(d\_z_0\) = number of instruments in the first instrument set (with more than one instrument formulation, additional columns \(d\_z_1\), \(d\_z_2\), … \(d\_z_{(L-1)}\) would be reported)
The second table Formulations
reports the variables specified as observed cost shifters and excluded instruments. The first row indicates that sugar
is the only included observed cost shifter (ignoring the fixed effects). The second row indicates that demand_instruments0
and demand_instruments1
are the excluded instruments for testing each model.
The third table Models
specifies the models being tested, where each model is a column in the table
\(\textbf{Model-Downstream}\) reports the name of the model of firm conduct. For vertical models of wholesaler and retailer behavior, this reports the nature of retailer conduct.
\(\textbf{Model-Upstream}\) reports, for vertical models with wholesalers and retailers, the nature of wholesaler conduct. In this example, we are ignoring upstream behavior and assuming manufacturers set retail prices directly as in Nevo (2001).
\(\textbf{Firm Ids - Downstream}\) is the variable in
product_data
used to make the ownership matrix for setting retail conduct (prices or quantities). If monopoly is specified as Model-Downstream, then Firm id - Downstream will default to monopoly and the ownership matrix in each market will be a matrix of ones.\(\textbf{Firm Ids - Upstream}\) is the same as \(\textbf{Firm IDs - Downstream}\) but for wholesale price or quantity behavior.
\(\textbf{VI Index}\) is the name of the dummy variable indicating whether retailer and wholesaler are vertically integrated.
\(\textbf{User Supplied Markups}\) indicates if the user has chosen to input their own markup computation instead of choosing a prespecified model for which the package will compute markups.
Additionally, the table contains outputs that are relevant when taxes are levied in the markets being studied (\(\textbf{Unit Tax}\), \(\textbf{Advalorem Tax}\), \(\textbf{Advalorem Payer}\)). In this example, there are no taxes in the market for cereal. These will be discussed in the testing with taxes example below.
Running the Testing Procedure
Now that the problem is set up, we can run the test, which we do with the following code.
Given that we define the variable testing_problem
as pyRVtest.problem, we must write testing_problem.solve
in the first line. There are two user specified options in running the test:
demand_adjustment
:False
indicates that the user does not want to adjust standard errors to account for two-step estimation with demand.True
indicates standard errors should be adjusted to account for demand estimation.clustering_adjustment
:False
means no clustering.True
indicates that all standard errors should be clustered. In this case, a variable calledclustering_ids
which indicates the cluster to which each group belongs needs to appear in theproduct_data
. See example below.
Both of these adjustments are implemented according to the formulas in Appendix C of Duarte, Magnolfi, Sølvsten, and Sullivan (2023).
[8]:
testing_results = testing_problem.solve(
demand_adjustment=False,
clustering_adjustment=False
)
testing_results
Computing Markups ...
Total Time is ... 0.1552119255065918
[8]:
Testing Results - Instruments z0:
=============================================================================
TRV: | F-stats: | MCS:
-------- --- ------ | ---------- --- ------- | -------- ------------
models 0 1 | models 0 1 | models MCS p-values
-------- --- ------ | ---------- --- ------- | -------- ------------
0 nan -1.144 | 0 nan 13.3 | 0 1.0
| *** ^^^ |
1 nan nan | 1 nan nan | 1 0.252
| |
=============================================================================
Significance of size and power diagnostic reported below each F-stat
*, **, or *** indicate that F > cv for a worst-case size of 0.125, 0.10, and 0.075 given d_z and rho
^, ^^, or ^^ indicate that F > cv for a best-case power of 0.50, 0.75, and 0.95 given d_z and rho
appropriate critical values for size are stored in the variable F_cv_size_list of the pyRVtest results class
appropriate critical values for power are stored in the variable F_cv_power_list of the pyRVtest results class
===========================================================================
The result table is split into three parts: the pairwise RV test statistics, the pairwise F-statistics, and the model confidence set p-values. Details on these objects and how they are computed can be found in Duarte, Magnolfi, Sølvsten, and Sullivan (2023).
The first part of the table reports the pairwise RV test statistic given the specified adjustments to the standard errors. In this example, there is one RV test statistic as we are only testing two models. Elements on and below the main diagonal in the RV and F-statistic block are set to “nan” since both RV tests and F-statistics are symmetric for a pair of models. A negative test statistic suggests better fit of the row model, in this case, Bertrand. The critical values for each \(T^{RV}\) are \(\pm\) 1.96, so the RV test cannot reject the null of equal fit of the two models.
The second part reports the pairwise F-statistics, again with the specified adjustments to the standard errors. The symbols underneath each F-statistic indicate whether the corresponding F-statistic is weak for size or for power. The appropriate critical values for each F-statistic depend on the number of instruments the researcher is using, as well as the value of the scaling parameter \(\rho\). In the above table, we see that the F-statistic is associated with a symbol of “***” for size. This means that the probability of incorrectly rejecting the null if it is true (Type-I error) is at most 7.5%. The same F-statistic is associated with “^” for power, meaning that the probability of rejecting the null if it is false is at least 95%. Therefore the instruments in this example are neither weak for size nor for power.
Finally, the p-values associated with the model confidence set are reported. To interpret these \(p\)-values, a researcher chooses a significance level \(\alpha\). Any model whose \(p\)-value is below \(\alpha\) is rejected. The models with \(p\)-values above \(\alpha\) cannot be rejected by the instruments and the researcher should conclude that they have equal fit. Thus, in this example, for an \(\alpha = 0.05\), the researcher is left with a model confidence set containing both models.
The testing procedure also stores additional output which the user can access after running the testing code. A full list of available output can be found in ProblemResults.
As an example, the procedure stores the markups for each model. In the above code, we stored testing_problem.solve()
as the variable `testing_results. Thus, to access the markups, you type
[9]:
testing_results.markups
[9]:
[array([[0.03616274],
[0.02752501],
[0.04300875],
...,
[0.03948227],
[0.02837644],
[0.04313683]]),
array([[0.07839009],
[0.04966324],
[0.08338118],
...,
[0.08728693],
[0.05866822],
[0.09141367]])]
where the first array, testing_results.markups[0]
stores the markups for model 0 and the second array testing_results.markups[1]
stores the markups for model 1.
Example with more than two models and more than one instrument set
Here we consider a more complicated example to illustrate more of the features of pyRVtest. Here, we are going to test five models of vertical conduct: 1. manufacturers set monopoly retail prices 2. manufacturers set Bertrand retail prices 3. manufacturers set Cournot retail quantities 4. manufacturers set Bertrand wholesale prices and retailers set monopoly retail prices 5. manufacturers set monopoly wholesale prices and retailers set monopoly retail prices
To accumulate evidence, we are going to separatley use two different sets of instruments to test these models.
We are also going to adjust all standard errors to account for two-step estimation coming from demand, as well as cluster standard errors at the market level. To implement these adjustments, we need to add a variable to the product_data
called clustering_ids
:
[10]:
product_data['clustering_ids'] = product_data.market_ids
Next, we can initialize the testing problem.
Notice that to add more models or more instrument sets, we add model formulations and instrument formulations. Further notice that by specifying demand_adjustment
= True
and clustering_adjustment
= True
we turn on two-step adjustments to the standard errors as well as clustering at the level indicated by product_data.clustering_ids
.
[11]:
testing_problem = pyRVtest.Problem(
cost_formulation=(
pyRVtest.Formulation('1 + sugar', absorb='C(firm_ids)')
),
instrument_formulation=(
pyRVtest.Formulation('0 + demand_instruments0 + demand_instruments1'),
pyRVtest.Formulation('0 + demand_instruments2 + demand_instruments3 + demand_instruments4')
),
model_formulations=(
pyRVtest.ModelFormulation(model_downstream='monopoly', ownership_downstream='firm_ids'),
pyRVtest.ModelFormulation(model_downstream='bertrand', ownership_downstream='firm_ids'),
pyRVtest.ModelFormulation(model_downstream='cournot', ownership_downstream='firm_ids'),
pyRVtest.ModelFormulation(
model_downstream='monopoly',
ownership_downstream='firm_ids',
model_upstream='bertrand',
ownership_upstream='firm_ids'
),
pyRVtest.ModelFormulation(
model_downstream='monopoly',
ownership_downstream='firm_ids',
model_upstream='monopoly',
ownership_upstream='firm_ids'
)
),
product_data=product_data,
demand_results=pyblp_results
)
testing_problem
[11]:
Dimensions:
===============================
T N M L d_z0 d_z1
--- ---- --- --- ---- ----
94 2256 5 2 2 3
===============================
Formulations:
===============================================================================
Column Indices: 0 1 2
---------------- ------------------- ------------------- -------------------
w: Marginal Cost sugar
z0: Instruments demand_instruments0 demand_instruments1
z1: Instruments demand_instruments2 demand_instruments3 demand_instruments4
===============================================================================
Models:
=======================================================================
0 1 2 3 4
--------------------- -------- -------- -------- -------- --------
Model - Downstream monopoly bertrand cournot monopoly monopoly
Model - Upstream None None None bertrand monopoly
Firm IDs - Downstream monopoly firm_ids firm_ids monopoly monopoly
Firm IDs - Upstream None None None firm_ids monopoly
VI Index None None None None None
Cost Scaling Column None None None None None
Unit Tax None None None None None
Advalorem Tax None None None None None
Advalorem Payer None None None None None
User Supplied Markups None None None None None
=======================================================================
Now, we can run testing_problem.solve
to output the testing results. Each table here shows the results for each instrument set. The tables appear in the order of the instrument sets specified by the user in instrument_formulations
.
[12]:
testing_results = testing_problem.solve(
demand_adjustment=True,
clustering_adjustment=True
)
testing_results
Computing Markups ...
Total Time is ... 19.744781017303467
[12]:
Testing Results - Instruments z0:
=====================================================================================================================
TRV: | F-stats: | MCS:
-------- --- ---- ----- ------ ------ | ---------- --- ----- ----- ----- ----- | -------- ------------
models 0 1 2 3 4 | models 0 1 2 3 4 | models MCS p-values
-------- --- ---- ----- ------ ------ | ---------- --- ----- ----- ----- ----- | -------- ------------
0 nan 0.49 0.618 -0.006 -1.339 | 0 nan 2.8 2.4 0.0 0.6 | 0 0.801
| *** *** *** *** |
1 nan nan 0.294 -0.028 -1.402 | 1 nan nan 4.6 0.0 0.3 | 1 0.932
| *** *** *** |
2 nan nan nan -0.03 -1.406 | 2 nan nan nan 0.0 0.3 | 2 1.0
| *** *** |
3 nan nan nan nan -0.444 | 3 nan nan nan nan 0.1 | 3 0.976
| *** |
4 nan nan nan nan nan | 4 nan nan nan nan nan | 4 0.418
| |
=====================================================================================================================
Testing Results - Instruments z1:
=======================================================================================================================
TRV: | F-stats: | MCS:
-------- --- ----- ----- ------ ------ | ---------- --- ----- ----- ----- ------ | -------- ------------
models 0 1 2 3 4 | models 0 1 2 3 4 | models MCS p-values
-------- --- ----- ----- ------ ------ | ---------- --- ----- ----- ----- ------ | -------- ------------
0 nan 0.047 0.243 -0.248 -1.601 | 0 nan 2.6 2.4 0.0 0.8 | 0 0.808
| *** *** *** *** |
1 nan nan 1.663 -0.219 -1.523 | 1 nan nan 3.3 0.1 0.6 | 1 0.183
| *** *** *** ^^ |
2 nan nan nan -0.25 -1.537 | 2 nan nan nan 0.0 0.6 | 2 1.0
| *** *** ^ |
3 nan nan nan nan -1.971 | 3 nan nan nan nan 2.5 | 3 0.913
| *** |
4 nan nan nan nan nan | 4 nan nan nan nan nan | 4 0.128
| |
=======================================================================================================================
Significance of size and power diagnostic reported below each F-stat
*, **, or *** indicate that F > cv for a worst-case size of 0.125, 0.10, and 0.075 given d_z and rho
^, ^^, or ^^ indicate that F > cv for a best-case power of 0.50, 0.75, and 0.95 given d_z and rho
appropriate critical values for size are stored in the variable F_cv_size_list of the pyRVtest results class
appropriate critical values for power are stored in the variable F_cv_power_list of the pyRVtest results class
=====================================================================================================================
This output has a similar format to Table 5 in Duarte, Magnolfi, Sølvsten, and Sullivan (2023). Each testing results panel, corresponding to a set of instruments, reports in three separate blocks the RV test statistics for each pair of models, effective F-statistic for each pair of models, and MCS p-value for the row model. See the above description for an explanation of each.
In interpreting these results, note that instruments z_0 and z_1 contain 2 and 3 instruments respectively. Since the number of instruments is between 2-9, there are no size distortions above 0.025 for any model pair and instruments are strong for size. However, the instruments are weak for power as each pairwise F-statistic is below the critical value corresponding to best-case power of 0.95. Unsurprisingly then, no model is ever rejected from the MCS. These results reflect the fact that the data used in the tutorial do not provide the appropriate variation to test conduct. As such, the results should not be interpreted as to draw any conclusion about the nature of conduct in this empirical environment. We plan to include a more economically interesting example with future releases.
Testing with Taxes
Suppose now that we want to run the testing procedure in a setting with taxes (specifically ad valorem or unit taxes). Having taxes in the testing set up changes the first order conditions faced by the firms. To incorporate them using pyRVtest, we can specify additional options in the ModelFormulation:
unit_tax
: a vector of unit taxes, measured in the same units as price, which should correspond to a data column in theproduct_data
.advalorem_tax
: a vector of ad valorem tax rates, measured between 0 and 1, which should correspond to a data column in theproduct_data
.advalorem_payer
: party responsible for paying the advalorem tax. In our example this is the consumer (other options are ‘firm’ and ‘None’).
Here, we add ad valorem and unit tax data to the product_data
for illustrative purposes.
[13]:
# additional variables for tax testing
product_data['unit_tax'] = .5
product_data['advalorem_tax'] = .5
product_data['lambda'] = 1
Next, initialize the testing problem. Note here that the first model formulation now specifies our additional variables.
[14]:
testing_problem_taxes = pyRVtest.Problem(
cost_formulation=(
pyRVtest.Formulation('0 + sugar', absorb='C(firm_ids)')
),
instrument_formulation=(
pyRVtest.Formulation('0 + demand_instruments0 + demand_instruments1')
),
model_formulations=(
pyRVtest.ModelFormulation(
model_downstream='bertrand',
ownership_downstream='firm_ids',
cost_scaling='lambda',
unit_tax='unit_tax',
advalorem_tax='advalorem_tax',
advalorem_payer='consumer'),
pyRVtest.ModelFormulation(
model_downstream='bertrand',
ownership_downstream='firm_ids'
),
pyRVtest.ModelFormulation(
model_downstream='perfect_competition',
ownership_downstream='firm_ids'
)
),
product_data=product_data,
demand_results=pyblp_results
)
testing_problem_taxes
[14]:
Dimensions:
=========================
T N M L d_z0
--- ---- --- --- ----
94 2256 3 1 2
=========================
Formulations:
==========================================================
Column Indices: 0 1
---------------- ------------------- -------------------
w: Marginal Cost sugar
z0: Instruments demand_instruments0 demand_instruments1
==========================================================
Models:
===================================================================
0 1 2
--------------------- ------------- -------- -------------------
Model - Downstream bertrand bertrand perfect_competition
Model - Upstream None None None
Firm IDs - Downstream firm_ids firm_ids firm_ids
Firm IDs - Upstream None None None
VI Index None None None
Cost Scaling Column lambda None None
Unit Tax unit_tax None None
Advalorem Tax advalorem_tax None None
Advalorem Payer consumer None None
User Supplied Markups None None None
===================================================================
As mentioned above, the Models table includes additional details related to testing with taxes:
Unit Tax: the column in
product_data
corresponding to the unit taxAdvalorem Tax : the column in
product_data
corresponding to the ad valorem taxAdvalorem Payer: who is responsible for paying the tax
Finally, output the testing results.
[15]:
testing_results = testing_problem_taxes.solve(
demand_adjustment=False,
clustering_adjustment=False
)
testing_results
Computing Markups ...
Total Time is ... 0.17354202270507812
[15]:
Testing Results - Instruments z0:
=========================================================================================
TRV: | F-stats: | MCS:
-------- --- ------ ------ | ---------- --- ---- ----- | -------- ------------
models 0 1 2 | models 0 1 2 | models MCS p-values
-------- --- ------ ------ | ---------- --- ---- ----- | -------- ------------
0 nan -1.936 -2.059 | 0 nan -0.0 0.3 | 0 1.0
| *** |
1 nan nan -1.53 | 1 nan nan 4.4 | 1 0.053
| *** |
2 nan nan nan | 2 nan nan nan | 2 0.078
| |
=========================================================================================
Significance of size and power diagnostic reported below each F-stat
*, **, or *** indicate that F > cv for a worst-case size of 0.125, 0.10, and 0.075 given d_z and rho
^, ^^, or ^^ indicate that F > cv for a best-case power of 0.50, 0.75, and 0.95 given d_z and rho
appropriate critical values for size are stored in the variable F_cv_size_list of the pyRVtest results class
appropriate critical values for power are stored in the variable F_cv_power_list of the pyRVtest results class
=======================================================================================
Storing Results
If you are working on a problem that runs over the course of several days, PyBLP and pyRVtest make it easy for you to store your results.
For example, suppose that you want to estimate one demand system and then run multiple testing scenarios. In this case, you can simply use the PyBLP pickling method.
[16]:
pyblp_results.to_pickle('my_demand_estimates')
And read them in similarly:
[17]:
old_pyblp_results = pyblp.read_pickle('my_demand_estimates')
You can now use these demand estimates to run additional iterations of testing without having to re-estimate demand each time.
Additionally, if you want to save your testing results so that you can access them in the future, you can do so with the same method in pyRVtest.
[18]:
testing_results.to_pickle('my_testing_results')
And read them back in to analyze:
[19]:
old_testing_results = pyRVtest.read_pickle('my_testing_results')
old_testing_results.markups
[19]:
[array([[0.03616274],
[0.02752501],
[0.04300875],
...,
[0.03948227],
[0.02837644],
[0.04313683]]),
array([[0.03616274],
[0.02752501],
[0.04300875],
...,
[0.03948227],
[0.02837644],
[0.04313683]]),
array([[0.],
[0.],
[0.],
...,
[0.],
[0.],
[0.]])]