Created at 10am, Apr 16
SplinterHealth & Lifestyle
0
Pseudo P-values for Assessing Covariate Balance in a Finite Study Population with Application to the California Sugar Sweetened Beverage Tax Study
xZ_-jeq1O_bFt-CZ5tr4L9xPTGATfVs1vpnD_4sYYNM
File Type
PDF
Entry Count
75
Embed. Model
jina_embeddings_v2_base_en
Index Type
hnsw

Assessing covariate balance (CB) is a common practice in various types of evaluation studies. Two-sample descriptive statistics, such as the standardized mean difference, have been widely applied in the scientific literature to assess the goodness of CB. Studies in health policy, health services research, built and social environment research, and many other fields often involve a finite number of units that may be subject to different treatment levels. Our case study, the California Sugar Sweetened Beverage (SSB) Tax Study, include 332 study cities in the state of California, among which individual cities may elect to levy a city-wide excise tax on SSB sales. Evaluating the balance of covariates between study cities with and without the tax policy is essential for assessing the effects of the policy on health outcomes of interest. In this paper, we introduce the novel concepts of the pseudo p-value and the standardized pseudo p-value, which are descriptive statistics to assess the overall goodness of CB between study arms in a finite study population. While not meant as a hypothesis test, the pseudo p-values bear superficial similarity to the classic p-value, which makes them easy to apply and interpret in applications. We discuss some theoretical properties of the pseudo p-values and present an algorithm to calculate them. We report a numerical simulation study to demonstrate their performance. We apply the pseudo p-values to the California SSB Tax study to assess the balance of city-level characteristics between the two study arms.

We consider six study designs, i.e., sampling strategies, to draw M, N , which are abbreviated as Randomized, Segregated, Partial, Matched, R Partial, and Natural. All study designs are sampling without replacement so that M and N do not overlap. Randomized is SRS and the ideal strategy for calculating all pseudo p-values. Segregated draws a random sample N from the first half of U and a random sample M from the second half of U, which mimics a systematically biased observational study design with varying levels of biases
id: bd88784e4717d3c1b909a044b0f542c5 - page: 12
Partial draws N and a fixed part of M from the first half of U, and then draws the remainder of M from the second half of U. Specifically, scenarios 1 to 12 under the enumeration method have 1 EU of M drawn from the first half of U; scenarios 13 to 16 under the enumeration method have 2 EU of M drawn from the first half of U; and all scenarios under the Monte Carlo method set 8 EU of M drawn from the first half of U. This design mimics an imperfect matching operation to remove imbalances in observed characteristics or a restricted natural experiment that does not have the full freedom to 12 choose control EU. Matched draws both N and M from the first half of U. This design serves the role of a good matching operation to remove imbalances in observed characteristics. R Partial is a variant of Partial, where the part of M drawn from the first half is a random number centered at .5|M |.
id: d00aaa932a4ae010753b412ff7fcab3d - page: 12
Natural draws N from the first half of U, and then draws M by SRS. This design mocks a natural experiment where the treated arm is self-selected and the control arm is representative of the study population.
id: e812cb0210aea292ae7e63b3e00a7737 - page: 13
3.2 Simulation results Due to the large volume of results, we elect to present the results of the 8 scenarios using the Monte Carlo method. The results of the 16 scenarios using the enumeration methods are included in supplemental materials. The empirical distributions of the 1,000 instances of p and p are summarized by box plots in Figure 1, where the left frame illustrates p and the right frame does p. In the left frame of Figure 1, the six study designs do not differ when the bias term is zero (scenarios 1 and 6), all of which show a skewed distribution of p to the right. When the bias term is non-zero, the ideal strategy Randomized and Matched do not have notable changes. By contrast, the less-than-ideal study designs (Segregated, Partial, R Partial, Natural ) all become more skewed to the right, where the skewness is extreme when the bias term is large (scenarios 5 and 8). Although rigorously speaking these box plots do not represent the random pseudo
id: 12e2d42d416a5a99198c91317ef53d70 - page: 13
How to Retrieve?
# Search

curl -X POST "https://search.dria.co/hnsw/search" \
-H "x-api-key: <YOUR_API_KEY>" \
-H "Content-Type: application/json" \
-d '{"rerank": true, "top_n": 10, "contract_id": "xZ_-jeq1O_bFt-CZ5tr4L9xPTGATfVs1vpnD_4sYYNM", "query": "What is alexanDRIA library?"}'
        
# Query

curl -X POST "https://search.dria.co/hnsw/query" \
-H "x-api-key: <YOUR_API_KEY>" \
-H "Content-Type: application/json" \
-d '{"vector": [0.123, 0.5236], "top_n": 10, "contract_id": "xZ_-jeq1O_bFt-CZ5tr4L9xPTGATfVs1vpnD_4sYYNM", "level": 2}'