Testing Firm Conduct with Cereal Data

[1]:
import numpy as np
import pandas as pd
import pyblp
import pyRVtest

pyblp.options.digits = 2
pyblp.options.verbose = False
pyRVtest.options.digits = 2
pyRVtest.__version__
[1]:
'0.2.0'

Overview

In this tutorial, we are going to use the Nevo (2000) fake cereal data which is provided in the PyBLP package. PyBLP has excellent documentation including a thorough tutorial for estimating demand on this dataset which can be found here.

Note that the data are originally designed to illustrate demand estimation. Thus, the following caveats should be kept in mind:

  • The application in this tutorial only serves the purpose to illustrate how pyRVtest works. The results we show should not be used to infer conduct or any other economic feature about the cereal industry.

  • To test conduct, a researcher generally needs data on both cost shifters, and strong excluded instruments. As the data was designed to perform demand estimation, it does not necessarily have the features that are desirable to test conduct in applications.

  • Hence, the specifications that we use below, including the specification of firm cost and the candidate conducts models, are just shown for illustrative purposes and may not be appropriate for the economic context of the cereal industry.

The tutorial proceeds in the following steps:

Load the main dataset

First we will use pandas to load the necessary datasets from PyBLP. We call the main data containing information on markets and product characteristics product_data. The important product characteristics for demand estimation in this data set are price, sugar, and mushy.

[2]:
product_data = pd.read_csv(pyblp.data.NEVO_PRODUCTS_LOCATION)
product_data.head()
[2]:
market_ids city_ids quarter product_ids firm_ids brand_ids shares prices sugar mushy ... demand_instruments10 demand_instruments11 demand_instruments12 demand_instruments13 demand_instruments14 demand_instruments15 demand_instruments16 demand_instruments17 demand_instruments18 demand_instruments19
0 C01Q1 1 1 F1B04 1 4 0.012417 0.072088 2 1 ... 2.116358 -0.154708 -0.005796 0.014538 0.126244 0.067345 0.068423 0.034800 0.126346 0.035484
1 C01Q1 1 1 F1B06 1 6 0.007809 0.114178 18 1 ... -7.374091 -0.576412 0.012991 0.076143 0.029736 0.087867 0.110501 0.087784 0.049872 0.072579
2 C01Q1 1 1 F1B07 1 7 0.012995 0.132391 4 1 ... 2.187872 -0.207346 0.003509 0.091781 0.163773 0.111881 0.108226 0.086439 0.122347 0.101842
3 C01Q1 1 1 F1B09 1 9 0.005770 0.130344 3 0 ... 2.704576 0.040748 -0.003724 0.094732 0.135274 0.088090 0.101767 0.101777 0.110741 0.104332
4 C01Q1 1 1 F1B11 1 11 0.017934 0.154823 12 0 ... 1.261242 0.034836 -0.000568 0.102451 0.130640 0.084818 0.101075 0.125169 0.133464 0.121111

5 rows × 30 columns

It is possible to estimate demand and test conduct with other data, provided that they have the product_data structure described here.

We will also load market demographics data from PyBLP and call this agent_data. This data contains draws of income, income_squared, age, and child.

[3]:
agent_data = pd.read_csv(pyblp.data.NEVO_AGENTS_LOCATION)
agent_data.head()
[3]:
market_ids city_ids quarter weights nodes0 nodes1 nodes2 nodes3 income income_squared age child
0 C01Q1 1 1 0.05 0.434101 -1.500838 -1.151079 0.161017 0.495123 8.331304 -0.230109 -0.230851
1 C01Q1 1 1 0.05 -0.726649 0.133182 -0.500750 0.129732 0.378762 6.121865 -2.532694 0.769149
2 C01Q1 1 1 0.05 -0.623061 -0.138241 0.797441 -0.795549 0.105015 1.030803 -0.006965 -0.230851
3 C01Q1 1 1 0.05 -0.041317 1.257136 -0.683054 0.259044 -1.485481 -25.583605 -0.827946 0.769149
4 C01Q1 1 1 0.05 -0.466691 0.226968 1.044424 0.092019 -0.316597 -6.517009 -0.230109 -0.230851

Estimate demand with PyBLP

Next, we set up the demand estimation problem using Conlon and Gortmaker (2020).

In this example the linear characteristics in consumer preferences are price and product fixed effects. Nonlinear characteristics with random coefficients include a constant, price, sugar, and mushy. For these variables, we are going to estimate both the variance of the random coefficient as well as demographic interactions for income, income_squared, age, and child. More info can be found here.

Running the problem setup yields output which summarizes the dimensions of the problem (see here) for description of each variable. Also reported are the Formulations, i.e., the linear characteristics or non-linear characteristics for which we are estimating either variances or demographic interactions, and a list of the demographics being used.

[4]:
pyblp_problem = pyblp.Problem(
    product_formulations=(
        pyblp.Formulation('0 + prices ', absorb='C(product_ids)'),
        pyblp.Formulation('1 + prices + sugar + mushy'),
    ),
    agent_formulation=pyblp.Formulation('0 + income + income_squared + age + child'),
    product_data=product_data,
    agent_data=agent_data
)
pyblp_problem
[4]:
Dimensions:
=================================================
 T    N     F    I     K1    K2    D    MD    ED
---  ----  ---  ----  ----  ----  ---  ----  ----
94   2256   5   1880   1     4     4    20    1
=================================================

Formulations:
===================================================================
       Column Indices:           0           1           2      3
-----------------------------  ------  --------------  -----  -----
 X1: Linear Characteristics    prices
X2: Nonlinear Characteristics    1         prices      sugar  mushy
       d: Demographics         income  income_squared   age   child
===================================================================

Next, we solve the demand estimation and store the results for use with the testing module.

The output includes information on computation progress, as well as tables with the parameter estimates. A full list of the post-estimation output which can be queried is found here.

[5]:
pyblp_results = pyblp_problem.solve(
  sigma=np.diag([0.3302, 2.4526, 0.0163, 0.2441]),
  pi=[
      [5.4819,   0.0000,  0.2037, 0.0000],
      [15.8935, -1.2000,  0.0000, 2.6342],
      [-0.2506,  0.0000,  0.0511, 0.0000],
      [1.2650,   0.0000, -0.8091, 0.0000]
  ],
  method='1s',
  optimization=pyblp.Optimization('bfgs', {'gtol': 1e-5})
)
pyblp_results
[5]:
Problem Results Summary:
=======================================================================================================
GMM   Objective  Gradient      Hessian         Hessian     Clipped  Weighting Matrix  Covariance Matrix
Step    Value      Norm    Min Eigenvalue  Max Eigenvalue  Shares   Condition Number  Condition Number
----  ---------  --------  --------------  --------------  -------  ----------------  -----------------
 1    +4.6E+00   +6.9E-06     +2.4E-05        +1.6E+04        0         +6.9E+07          +8.4E+08
=======================================================================================================

Cumulative Statistics:
===========================================================================
Computation  Optimizer  Optimization   Objective   Fixed Point  Contraction
   Time      Converged   Iterations   Evaluations  Iterations   Evaluations
-----------  ---------  ------------  -----------  -----------  -----------
 00:00:25       Yes          51           57          46384       143967
===========================================================================

Nonlinear Coefficient Estimates (Robust SEs in Parentheses):
=====================================================================================================================
Sigma:      1         prices      sugar       mushy     |   Pi:      income    income_squared     age        child
------  ----------  ----------  ----------  ----------  |  ------  ----------  --------------  ----------  ----------
  1      +5.6E-01                                       |    1      +2.3E+00      +0.0E+00      +1.3E+00    +0.0E+00
        (+1.6E-01)                                      |          (+1.2E+00)                  (+6.3E-01)
                                                        |
prices   +0.0E+00    +3.3E+00                           |  prices   +5.9E+02      -3.0E+01      +0.0E+00    +1.1E+01
                    (+1.3E+00)                          |          (+2.7E+02)    (+1.4E+01)                (+4.1E+00)
                                                        |
sugar    +0.0E+00    +0.0E+00    -5.8E-03               |  sugar    -3.8E-01      +0.0E+00      +5.2E-02    +0.0E+00
                                (+1.4E-02)              |          (+1.2E-01)                  (+2.6E-02)
                                                        |
mushy    +0.0E+00    +0.0E+00    +0.0E+00    +9.3E-02   |  mushy    +7.5E-01      +0.0E+00      -1.4E+00    +0.0E+00
                                            (+1.9E-01)  |          (+8.0E-01)                  (+6.7E-01)
=====================================================================================================================

Beta Estimates (Robust SEs in Parentheses):
==========
  prices
----------
 -6.3E+01
(+1.5E+01)
==========

Test firm conduct with pyRVtest

pyRVtest follows a similar structure to PyBLP. First, you set up the testing problem, then you run the test.

Setting Up Testing Problem

Here is a simple example of the code to set up the testing problem for testing between two models: 1. manufacturers set retail prices according to Bertrand vs 2. manufacturers set retail prices according to monopoly (i.e., perfect collusion).

We set up the testing problem with pyRVtest.problem and we store this as a variable testing_problem:

[6]:
testing_problem = pyRVtest.Problem(
    cost_formulation = (
        pyRVtest.Formulation('0 + sugar', absorb = 'C(firm_ids)' )
    ),
    instrument_formulation = (
        pyRVtest.Formulation('0 + demand_instruments0 + demand_instruments1')
    ),
    model_formulations = (
        pyRVtest.ModelFormulation(model_downstream='bertrand', ownership_downstream='firm_ids'),
        pyRVtest.ModelFormulation(model_downstream='monopoly', ownership_downstream='firm_ids')
    ),
    product_data = product_data,
    demand_results = pyblp_results
)

Detailed information on the input accepted by Problem and how to specify them can be found in the API documentation. Below we clarify the inputs (observed exogenous cost shifters, instruments for testing conduct, and models to be tested) in this particular example:

  • cost_formulation: Here, the researcher specifies the observable linear shifters of marginal cost (in the current version of the package, these must be exogenous variables). In this example, we have defined the cost formulation as pyRVtest.Formulation('0 + sugar', absorb = 'C(firm_ids)' ). Here, 0 means no constant. We are also including the variable sugar as an observed cost shifter and this variable must be in the product_data. Finally absorb = 'C(firm_ids)' specifies that we are including firm fixed effects which will be absorbed. The variable firm_ids must also be in the product_data.

  • instrument_formulation: Here, the researcher specifies the set of instruments she wants to use to test conduct. In this example, we will use one set of instruments to test conduct which contains two variables, demand_instruments0 and demand_instruments1. Thus, we have defined the instrument formulation as pyRVtest.Formulation('0 + demand_instruments0 + demand_instruments1'). Here, 0 means no constant and this should always be specified as a 0. Both demand_instruments0 and demand_instruments1 must also be in the product_data. It is possible to test conduct separately for more than one set of instruments as shown in the example below.

  • model_formulations: Here, we have specified two ModelFormulations and therefore two models to test. The first model is specified as pyRVtest.ModelFormulation(model_downstream='bertrand', ownership_downstream='firm_ids') model_downstream = 'bertrand' indicates that retail prices are set according to Bertrand. ownership_downstream='firm_ids' specifies that the ownership matrix in each market should be built from the variable firm_id in the product_data. For testing more than two models, see the example below. To find the full list of models supported by pyRVtest and their associated ModelFormulation see the Library of Models page.

[7]:
testing_problem
[7]:
Dimensions:
=========================
 T    N     M    L   d_z0
---  ----  ---  ---  ----
94   2256   2    1    2
=========================

Formulations:
==========================================================
Column Indices:            0                    1
----------------  -------------------  -------------------
w: Marginal Cost         sugar
z0: Instruments   demand_instruments0  demand_instruments1
==========================================================

Models:
=========================================
                          0         1
---------------------  --------  --------
 Model - Downstream    bertrand  monopoly
  Model - Upstream       None      None
Firm IDs - Downstream  firm_ids  monopoly
 Firm IDs - Upstream     None      None
      VI Index           None      None
 Cost Scaling Column     None      None
      Unit Tax           None      None
    Advalorem Tax        None      None
   Advalorem Payer       None      None
User Supplied Markups    None      None
=========================================

The first table Dimensions reports the following statistics:

  • \(T\) = number of markets

  • \(N\) = number of observations

  • \(M\) = number of models (each specified by a model formulation)

  • \(L\) = number of instrument sets (each specified by an instrument formulation)

  • \(d\_z_0\) = number of instruments in the first instrument set (with more than one instrument formulation, additional columns \(d\_z_1\), \(d\_z_2\), … \(d\_z_{(L-1)}\) would be reported)

The second table Formulations reports the variables specified as observed cost shifters and excluded instruments. The first row indicates that sugar is the only included observed cost shifter (ignoring the fixed effects). The second row indicates that demand_instruments0 and demand_instruments1 are the excluded instruments for testing each model.

The third table Models specifies the models being tested, where each model is a column in the table

  • \(\textbf{Model-Downstream}\) reports the name of the model of firm conduct. For vertical models of wholesaler and retailer behavior, this reports the nature of retailer conduct.

  • \(\textbf{Model-Upstream}\) reports, for vertical models with wholesalers and retailers, the nature of wholesaler conduct. In this example, we are ignoring upstream behavior and assuming manufacturers set retail prices directly as in Nevo (2001).

  • \(\textbf{Firm Ids - Downstream}\) is the variable in product_data used to make the ownership matrix for setting retail conduct (prices or quantities). If monopoly is specified as Model-Downstream, then Firm id - Downstream will default to monopoly and the ownership matrix in each market will be a matrix of ones.

  • \(\textbf{Firm Ids - Upstream}\) is the same as \(\textbf{Firm IDs - Downstream}\) but for wholesale price or quantity behavior.

  • \(\textbf{VI Index}\) is the name of the dummy variable indicating whether retailer and wholesaler are vertically integrated.

  • \(\textbf{User Supplied Markups}\) indicates if the user has chosen to input their own markup computation instead of choosing a prespecified model for which the package will compute markups.

Additionally, the table contains outputs that are relevant when taxes are levied in the markets being studied (\(\textbf{Unit Tax}\), \(\textbf{Advalorem Tax}\), \(\textbf{Advalorem Payer}\)). In this example, there are no taxes in the market for cereal. These will be discussed in the testing with taxes example below.

Running the Testing Procedure

Now that the problem is set up, we can run the test, which we do with the following code.

Given that we define the variable testing_problem as pyRVtest.problem, we must write testing_problem.solve in the first line. There are two user specified options in running the test:

  • demand_adjustment: False indicates that the user does not want to adjust standard errors to account for two-step estimation with demand. True indicates standard errors should be adjusted to account for demand estimation.

  • clustering_adjustment: False means no clustering. True indicates that all standard errors should be clustered. In this case, a variable called clustering_ids which indicates the cluster to which each group belongs needs to appear in the product_data. See example below.

Both of these adjustments are implemented according to the formulas in Appendix C of Duarte, Magnolfi, Sølvsten, and Sullivan (2023).

[8]:
testing_results = testing_problem.solve(
    demand_adjustment=False,
    clustering_adjustment=False
)
testing_results
Computing Markups ...
Total Time is ... 0.1552119255065918
[8]:


Testing Results - Instruments z0:
=============================================================================
  TRV:                 |   F-stats:                 |    MCS:
--------  ---  ------  |  ----------  ---  -------  |  --------  ------------
 models    0     1     |    models     0      1     |   models   MCS p-values
--------  ---  ------  |  ----------  ---  -------  |  --------  ------------
   0      nan  -1.144  |      0       nan   13.3    |     0          1.0
                       |                   *** ^^^  |
   1      nan   nan    |      1       nan    nan    |     1         0.252
                       |                            |
=============================================================================
Significance of size and power diagnostic reported below each F-stat
*, **, or *** indicate that F > cv for a worst-case size of 0.125, 0.10, and 0.075 given d_z and rho
^, ^^, or ^^ indicate that F > cv for a best-case power of 0.50, 0.75, and 0.95 given d_z and rho
appropriate critical values for size are stored in the variable F_cv_size_list of the pyRVtest results class
appropriate critical values for power are stored in the variable F_cv_power_list of the pyRVtest results class
===========================================================================

The result table is split into three parts: the pairwise RV test statistics, the pairwise F-statistics, and the model confidence set p-values. Details on these objects and how they are computed can be found in Duarte, Magnolfi, Sølvsten, and Sullivan (2023).

  • The first part of the table reports the pairwise RV test statistic given the specified adjustments to the standard errors. In this example, there is one RV test statistic as we are only testing two models. Elements on and below the main diagonal in the RV and F-statistic block are set to “nan” since both RV tests and F-statistics are symmetric for a pair of models. A negative test statistic suggests better fit of the row model, in this case, Bertrand. The critical values for each \(T^{RV}\) are \(\pm\) 1.96, so the RV test cannot reject the null of equal fit of the two models.

  • The second part reports the pairwise F-statistics, again with the specified adjustments to the standard errors. The symbols underneath each F-statistic indicate whether the corresponding F-statistic is weak for size or for power. The appropriate critical values for each F-statistic depend on the number of instruments the researcher is using, as well as the value of the scaling parameter \(\rho\). In the above table, we see that the F-statistic is associated with a symbol of “***” for size. This means that the probability of incorrectly rejecting the null if it is true (Type-I error) is at most 7.5%. The same F-statistic is associated with “^” for power, meaning that the probability of rejecting the null if it is false is at least 95%. Therefore the instruments in this example are neither weak for size nor for power.

  • Finally, the p-values associated with the model confidence set are reported. To interpret these \(p\)-values, a researcher chooses a significance level \(\alpha\). Any model whose \(p\)-value is below \(\alpha\) is rejected. The models with \(p\)-values above \(\alpha\) cannot be rejected by the instruments and the researcher should conclude that they have equal fit. Thus, in this example, for an \(\alpha = 0.05\), the researcher is left with a model confidence set containing both models.

The testing procedure also stores additional output which the user can access after running the testing code. A full list of available output can be found in ProblemResults.

As an example, the procedure stores the markups for each model. In the above code, we stored testing_problem.solve() as the variable `testing_results. Thus, to access the markups, you type

[9]:
testing_results.markups
[9]:
[array([[0.03616274],
        [0.02752501],
        [0.04300875],
        ...,
        [0.03948227],
        [0.02837644],
        [0.04313683]]),
 array([[0.07839009],
        [0.04966324],
        [0.08338118],
        ...,
        [0.08728693],
        [0.05866822],
        [0.09141367]])]

where the first array, testing_results.markups[0] stores the markups for model 0 and the second array testing_results.markups[1] stores the markups for model 1.

Example with more than two models and more than one instrument set

Here we consider a more complicated example to illustrate more of the features of pyRVtest. Here, we are going to test five models of vertical conduct: 1. manufacturers set monopoly retail prices 2. manufacturers set Bertrand retail prices 3. manufacturers set Cournot retail quantities 4. manufacturers set Bertrand wholesale prices and retailers set monopoly retail prices 5. manufacturers set monopoly wholesale prices and retailers set monopoly retail prices

To accumulate evidence, we are going to separatley use two different sets of instruments to test these models.

We are also going to adjust all standard errors to account for two-step estimation coming from demand, as well as cluster standard errors at the market level. To implement these adjustments, we need to add a variable to the product_data called clustering_ids:

[10]:
product_data['clustering_ids'] = product_data.market_ids

Next, we can initialize the testing problem.

Notice that to add more models or more instrument sets, we add model formulations and instrument formulations. Further notice that by specifying demand_adjustment = True and clustering_adjustment = True we turn on two-step adjustments to the standard errors as well as clustering at the level indicated by product_data.clustering_ids.

[11]:
testing_problem = pyRVtest.Problem(
    cost_formulation=(
        pyRVtest.Formulation('1 + sugar', absorb='C(firm_ids)')
    ),
    instrument_formulation=(
        pyRVtest.Formulation('0 + demand_instruments0 + demand_instruments1'),
        pyRVtest.Formulation('0 + demand_instruments2 + demand_instruments3 + demand_instruments4')
    ),
    model_formulations=(
        pyRVtest.ModelFormulation(model_downstream='monopoly', ownership_downstream='firm_ids'),
        pyRVtest.ModelFormulation(model_downstream='bertrand', ownership_downstream='firm_ids'),
        pyRVtest.ModelFormulation(model_downstream='cournot', ownership_downstream='firm_ids'),
        pyRVtest.ModelFormulation(
            model_downstream='monopoly',
            ownership_downstream='firm_ids',
            model_upstream='bertrand',
            ownership_upstream='firm_ids'
        ),
        pyRVtest.ModelFormulation(
            model_downstream='monopoly',
            ownership_downstream='firm_ids',
            model_upstream='monopoly',
            ownership_upstream='firm_ids'
        )
    ),
    product_data=product_data,
    demand_results=pyblp_results
)
testing_problem
[11]:
Dimensions:
===============================
 T    N     M    L   d_z0  d_z1
---  ----  ---  ---  ----  ----
94   2256   5    2    2     3
===============================

Formulations:
===============================================================================
Column Indices:            0                    1                    2
----------------  -------------------  -------------------  -------------------
w: Marginal Cost         sugar
z0: Instruments   demand_instruments0  demand_instruments1
z1: Instruments   demand_instruments2  demand_instruments3  demand_instruments4
===============================================================================

Models:
=======================================================================
                          0         1         2         3         4
---------------------  --------  --------  --------  --------  --------
 Model - Downstream    monopoly  bertrand  cournot   monopoly  monopoly
  Model - Upstream       None      None      None    bertrand  monopoly
Firm IDs - Downstream  monopoly  firm_ids  firm_ids  monopoly  monopoly
 Firm IDs - Upstream     None      None      None    firm_ids  monopoly
      VI Index           None      None      None      None      None
 Cost Scaling Column     None      None      None      None      None
      Unit Tax           None      None      None      None      None
    Advalorem Tax        None      None      None      None      None
   Advalorem Payer       None      None      None      None      None
User Supplied Markups    None      None      None      None      None
=======================================================================

Now, we can run testing_problem.solve to output the testing results. Each table here shows the results for each instrument set. The tables appear in the order of the instrument sets specified by the user in instrument_formulations.

[12]:
testing_results = testing_problem.solve(
    demand_adjustment=True,
    clustering_adjustment=True
)
testing_results
Computing Markups ...
Total Time is ... 19.744781017303467
[12]:


Testing Results - Instruments z0:
=====================================================================================================================
  TRV:                                      |   F-stats:                                    |    MCS:
--------  ---  ----  -----  ------  ------  |  ----------  ---  -----  -----  -----  -----  |  --------  ------------
 models    0    1      2      3       4     |    models     0     1      2      3      4    |   models   MCS p-values
--------  ---  ----  -----  ------  ------  |  ----------  ---  -----  -----  -----  -----  |  --------  ------------
   0      nan  0.49  0.618  -0.006  -1.339  |      0       nan   2.8    2.4    0.0    0.6   |     0         0.801
                                            |                   ***    ***    ***    ***    |
   1      nan  nan   0.294  -0.028  -1.402  |      1       nan   nan    4.6    0.0    0.3   |     1         0.932
                                            |                          ***    ***    ***    |
   2      nan  nan    nan   -0.03   -1.406  |      2       nan   nan    nan    0.0    0.3   |     2          1.0
                                            |                                 ***    ***    |
   3      nan  nan    nan    nan    -0.444  |      3       nan   nan    nan    nan    0.1   |     3         0.976
                                            |                                        ***    |
   4      nan  nan    nan    nan     nan    |      4       nan   nan    nan    nan    nan   |     4         0.418
                                            |                                               |
=====================================================================================================================

Testing Results - Instruments z1:
=======================================================================================================================
  TRV:                                       |   F-stats:                                     |    MCS:
--------  ---  -----  -----  ------  ------  |  ----------  ---  -----  -----  -----  ------  |  --------  ------------
 models    0     1      2      3       4     |    models     0     1      2      3      4     |   models   MCS p-values
--------  ---  -----  -----  ------  ------  |  ----------  ---  -----  -----  -----  ------  |  --------  ------------
   0      nan  0.047  0.243  -0.248  -1.601  |      0       nan   2.6    2.4    0.0    0.8    |     0         0.808
                                             |                   ***    ***    ***    ***     |
   1      nan   nan   1.663  -0.219  -1.523  |      1       nan   nan    3.3    0.1    0.6    |     1         0.183
                                             |                          ***    ***    *** ^^  |
   2      nan   nan    nan   -0.25   -1.537  |      2       nan   nan    nan    0.0    0.6    |     2          1.0
                                             |                                 ***    *** ^   |
   3      nan   nan    nan    nan    -1.971  |      3       nan   nan    nan    nan    2.5    |     3         0.913
                                             |                                        ***     |
   4      nan   nan    nan    nan     nan    |      4       nan   nan    nan    nan    nan    |     4         0.128
                                             |                                                |
=======================================================================================================================
Significance of size and power diagnostic reported below each F-stat
*, **, or *** indicate that F > cv for a worst-case size of 0.125, 0.10, and 0.075 given d_z and rho
^, ^^, or ^^ indicate that F > cv for a best-case power of 0.50, 0.75, and 0.95 given d_z and rho
appropriate critical values for size are stored in the variable F_cv_size_list of the pyRVtest results class
appropriate critical values for power are stored in the variable F_cv_power_list of the pyRVtest results class
=====================================================================================================================

This output has a similar format to Table 5 in Duarte, Magnolfi, Sølvsten, and Sullivan (2023). Each testing results panel, corresponding to a set of instruments, reports in three separate blocks the RV test statistics for each pair of models, effective F-statistic for each pair of models, and MCS p-value for the row model. See the above description for an explanation of each.

In interpreting these results, note that instruments z_0 and z_1 contain 2 and 3 instruments respectively. Since the number of instruments is between 2-9, there are no size distortions above 0.025 for any model pair and instruments are strong for size. However, the instruments are weak for power as each pairwise F-statistic is below the critical value corresponding to best-case power of 0.95. Unsurprisingly then, no model is ever rejected from the MCS. These results reflect the fact that the data used in the tutorial do not provide the appropriate variation to test conduct. As such, the results should not be interpreted as to draw any conclusion about the nature of conduct in this empirical environment. We plan to include a more economically interesting example with future releases.

Testing with Taxes

Suppose now that we want to run the testing procedure in a setting with taxes (specifically ad valorem or unit taxes). Having taxes in the testing set up changes the first order conditions faced by the firms. To incorporate them using pyRVtest, we can specify additional options in the ModelFormulation:

  • unit_tax: a vector of unit taxes, measured in the same units as price, which should correspond to a data column in the product_data.

  • advalorem_tax: a vector of ad valorem tax rates, measured between 0 and 1, which should correspond to a data column in the product_data.

  • advalorem_payer: party responsible for paying the advalorem tax. In our example this is the consumer (other options are ‘firm’ and ‘None’).

Here, we add ad valorem and unit tax data to the product_data for illustrative purposes.

[13]:
# additional variables for tax testing
product_data['unit_tax'] = .5
product_data['advalorem_tax'] = .5
product_data['lambda'] = 1

Next, initialize the testing problem. Note here that the first model formulation now specifies our additional variables.

[14]:
testing_problem_taxes = pyRVtest.Problem(
    cost_formulation=(
        pyRVtest.Formulation('0 + sugar', absorb='C(firm_ids)')
    ),
    instrument_formulation=(
        pyRVtest.Formulation('0 + demand_instruments0 + demand_instruments1')
    ),
    model_formulations=(
        pyRVtest.ModelFormulation(
            model_downstream='bertrand',
            ownership_downstream='firm_ids',
            cost_scaling='lambda',
            unit_tax='unit_tax',
            advalorem_tax='advalorem_tax',
            advalorem_payer='consumer'),
        pyRVtest.ModelFormulation(
            model_downstream='bertrand',
            ownership_downstream='firm_ids'
        ),
        pyRVtest.ModelFormulation(
            model_downstream='perfect_competition',
            ownership_downstream='firm_ids'
        )
    ),
    product_data=product_data,
    demand_results=pyblp_results
)
testing_problem_taxes
[14]:
Dimensions:
=========================
 T    N     M    L   d_z0
---  ----  ---  ---  ----
94   2256   3    1    2
=========================

Formulations:
==========================================================
Column Indices:            0                    1
----------------  -------------------  -------------------
w: Marginal Cost         sugar
z0: Instruments   demand_instruments0  demand_instruments1
==========================================================

Models:
===================================================================
                             0           1               2
---------------------  -------------  --------  -------------------
 Model - Downstream      bertrand     bertrand  perfect_competition
  Model - Upstream         None         None           None
Firm IDs - Downstream    firm_ids     firm_ids       firm_ids
 Firm IDs - Upstream       None         None           None
      VI Index             None         None           None
 Cost Scaling Column      lambda        None           None
      Unit Tax           unit_tax       None           None
    Advalorem Tax      advalorem_tax    None           None
   Advalorem Payer       consumer       None           None
User Supplied Markups      None         None           None
===================================================================

As mentioned above, the Models table includes additional details related to testing with taxes:

  • Unit Tax: the column in product_data corresponding to the unit tax

  • Advalorem Tax : the column in product_data corresponding to the ad valorem tax

  • Advalorem Payer: who is responsible for paying the tax

Finally, output the testing results.

[15]:
testing_results = testing_problem_taxes.solve(
    demand_adjustment=False,
    clustering_adjustment=False
)
testing_results
Computing Markups ...
Total Time is ... 0.17354202270507812
[15]:


Testing Results - Instruments z0:
=========================================================================================
  TRV:                         |   F-stats:                     |    MCS:
--------  ---  ------  ------  |  ----------  ---  ----  -----  |  --------  ------------
 models    0     1       2     |    models     0    1      2    |   models   MCS p-values
--------  ---  ------  ------  |  ----------  ---  ----  -----  |  --------  ------------
   0      nan  -1.936  -2.059  |      0       nan  -0.0   0.3   |     0          1.0
                               |                         ***    |
   1      nan   nan    -1.53   |      1       nan  nan    4.4   |     1         0.053
                               |                         ***    |
   2      nan   nan     nan    |      2       nan  nan    nan   |     2         0.078
                               |                                |
=========================================================================================
Significance of size and power diagnostic reported below each F-stat
*, **, or *** indicate that F > cv for a worst-case size of 0.125, 0.10, and 0.075 given d_z and rho
^, ^^, or ^^ indicate that F > cv for a best-case power of 0.50, 0.75, and 0.95 given d_z and rho
appropriate critical values for size are stored in the variable F_cv_size_list of the pyRVtest results class
appropriate critical values for power are stored in the variable F_cv_power_list of the pyRVtest results class
=======================================================================================

Storing Results

If you are working on a problem that runs over the course of several days, PyBLP and pyRVtest make it easy for you to store your results.

For example, suppose that you want to estimate one demand system and then run multiple testing scenarios. In this case, you can simply use the PyBLP pickling method.

[16]:
pyblp_results.to_pickle('my_demand_estimates')

And read them in similarly:

[17]:
old_pyblp_results = pyblp.read_pickle('my_demand_estimates')

You can now use these demand estimates to run additional iterations of testing without having to re-estimate demand each time.

Additionally, if you want to save your testing results so that you can access them in the future, you can do so with the same method in pyRVtest.

[18]:
testing_results.to_pickle('my_testing_results')

And read them back in to analyze:

[19]:
old_testing_results = pyRVtest.read_pickle('my_testing_results')
old_testing_results.markups
[19]:
[array([[0.03616274],
        [0.02752501],
        [0.04300875],
        ...,
        [0.03948227],
        [0.02837644],
        [0.04313683]]),
 array([[0.03616274],
        [0.02752501],
        [0.04300875],
        ...,
        [0.03948227],
        [0.02837644],
        [0.04313683]]),
 array([[0.],
        [0.],
        [0.],
        ...,
        [0.],
        [0.],
        [0.]])]