layout: true <div class="my-footer"><span> douglasrm.azevedo@gmail.com </span></div> --- class: title-page <div> <h3 style="text-align: justify; padding-top:50px">FLAMES: Flexible Link function with AsyMptotES</h3> <h4 style="text-align: justify; padding-top:0px">Estimating the SUS population in Brazil</h4> <div> <div style="padding-top:220px; color:#858585"> <div style="float:left"> <h5> Douglas R. Mesquita Azevedo </br> Marcos Oliveira Prates </br> Renato Martins Assunção </h5> </div> <div style="float:right; padding-top: 30px"> <img src="data:image/png;base64,#img/logo_ufmg.png" alt="ufmg" height="50"/> </div> </div> <div style="align:center; padding-top:135px; color:#858585"> <h5 style="text-align:center"> June 2022 </h5> </div> --- class: toc-page <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.5.0/css/font-awesome.min.css"> ##
Agenda <br><br><br> + Motivation + Anomaly detection in SUS production + SUS population + Link functions in GLM models + Introducing the link functions with asymptotes + Simulation study + Estimating the SUS population in Brazil + FLAMES + Conclusion --- class: slide-page <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.5.0/css/font-awesome.min.css"> ##
Motivation Brazil manages the largest public health system in the world: - more than .enfase[3.5 billion outpatient procedures] per year - more than .enfase[12 million hospitalizations] per year - nearly .enfase[5.18 billion dollars] per year on these services - more than .enfase[220,000 establishments] provide services to the SUS, including hospitals, clinics, and laboratories <center><img src="data:image/png;base64,#img/sus.png" alt="graph" height="150" style="padding-top:30px"/></center> --- class: slide-page ##
Motivation The private health insurance market in Brazil is also very large due to the inefficiencies of the public health sector - over .enfase[50 million people] use these services - it is the .enfase[second largest] in the world (United States is the first ranked) The .enfase[ANS] (Agência Nacional de Saúde) regulates health insurance in Brazil and also provides general statistics (by municipalities) <center><img src="data:image/png;base64,#img/planos.jpg" alt="graph" height="200"/></center> --- class: slide-page ##
Motivation .enfase[InfoSAS]: - Automatic detection of anomalies in the payment system for service providers - It is very .enfase[difficult to audit] the SUS system due to its proportions - More than .enfase[30 million time series] to be analyzed - We need the number of people using the service in each neighborhood (.enfase[SUS population]) which is not provided by ANS <center> <img src="data:image/png;base64,#img/infosas1.png" alt="graph" height="180" style = "padding-left: 10px"/> <img src="data:image/png;base64,#img/infosas2.png" alt="graph" height="180" style = "padding-left: 10px"/> <img src="data:image/png;base64,#img/infosas3.png" alt="graph" height="180" style = "padding-left: 10px"/> </center> --- class: slide-page <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.5.0/css/font-awesome.min.css"> ##
Anomaly detection <br><br> <center> <img src="data:image/png;base64,#img/anomaly.png" alt="graph" width="75%"/> </center> --- class: slide-page ##
Anomaly detection .center[ .mathbox[ How to find anomalies in a system as large as SUS? ] ] --- class: slide-page ##
Anomaly detection .center[ .mathbox[ How to find anomalies in a system as large as SUS? ] ] <br> .enfase[Using simple tools!] <br><br> Compare the number of procedures for a specific \{hospital, city, period\} with the expected number of procedures (based on country or state). -- <br> .center[ .mathbox[ <h3 class='enfase'>DON'T FORGET TO USE THE SUS POPULATION!</h3> ] ] --- class: slide-page ##
Anomaly detection .center[ .mathbox[ How to find anomalies in a system as large as SUS? ] ] <br> .enfase[Using simple tools!] <br><br> <center> <img src="data:image/png;base64,#img/ff1.png" alt="graph" width="100%"/> </center> --- class: slide-page ##
Anomaly detection .center[ .mathbox[ How to find anomalies in a system as large as SUS? ] ] <br> .enfase[Using simple tools!] <br><br> <center> <img src="data:image/png;base64,#img/ff3.png" alt="graph" width="100%"/> </center> --- class: slide-page ##
Anomaly detection .center[ .mathbox[ How to find anomalies in a system as large as SUS? ] ] <br> .enfase[Using simple tools!] <br><br> <center> <img src="data:image/png;base64,#img/ff5.png" alt="graph" width="100%"/> </center> --- class: slide-page ##
Anomaly detection <br><br> .enfase[Algorithm]: + For each city, find the .enfase[SUS population] using data from ANS + For a given city, procedure and, period, find the .enfase[total number of procedures performed] + Find .enfase[Brazilian expected rate] as well as rates for each state + Compare standard rates with the rate of a specific city (conservative approach) + Ring a bell if we observe higher rates for a specific city + Last but not least: .enfase[find out which hospitals/labs are serving these cities] --- class: slide-page <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.5.0/css/font-awesome.min.css"> ##
SUS population + ANS provides, for each city, the .enfase[number of private health insurance plans] + By difference, we can estimate the .enfase[number of people who depends on SUS] + It is .enfase[difficult to find anomalies in big populations] (São Paulo, Rio de Janeiro, Belo Horizonte, ...) + What is needed to have SUS populations in a more detailed areal space (neighborhood)? -- .mathbox[ We need to .enfase[learn] which .enfase[characteristics] are .enfase[associated] with our outcome: .enfase[the person is covered by a private health insurance]! ] -- .mathbox[ These characteristics should be .enfase[available for all areal levels desired]! ] -- .mathbox[ The .enfase[outcome is available] for at least .enfase[one level] for performance check! ] --- class: slide-page ##
SUS population - PNAD 2008 <center><img src="data:image/png;base64,#img/pnad.png" alt="graph" height="400" style="padding-top:30px"/></center> --- class: slide-page ##
SUS population - PNAD 2008 <br> .enfase[Individual level]: + Around 40,000 Brazilian households + Covariates: + Gender + Income + `\(Y\)`: + 0: The person is covered by a private health insurance + 1: The person is not covered by a private health insurance (he/she depends on SUS) --- class: slide-page ##
SUS population + .enfase[Individual level]: Learn the association between covariates and outcome (.enfase[PNAD]) + We would like a very accurate model at this level + Our main interest is prediction + Models are stratified by state + .enfase[City level]: Compare predictions: proposed model and observed populations (.enfase[ANS]) + Gender: proportion of males in each city (.enfase[IBGE]) + Income: average income in each city (.enfase[IBGE]) + .enfase[Neighborhood level]: Predict SUS population + Gender: proportion of males in each neighborhood (.enfase[IBGE]) + Income: average income in each neighborhood (.enfase[IBGE]) --- class: slide-page ##
SUS population - Individual level <center><img src="data:image/png;base64,#img/renda_vs_sus.png" alt="graph" height="400" style="padding-top:30px"/></center> --- class: slide-page <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.5.0/css/font-awesome.min.css"> ##
GLM <br> The Generalized Linear Model (GLM) is given by: .mathbox[ `\(Y|\mu, \theta \sim D(Y; \mu, \theta)\)` `\(g(\mu) = X\beta\)` ] where: - `\(D\)` is a probability distribution such as: Normal, .enfase[Bernoulli], Poisson, ... - `\(\mu \in (a, b)\)` is a parameter (mean) - `\(\theta\)` represents any other parameters - `\(g(.)\)` is the .enfase[link function] that maps `\(g(\mu):(a, b) \rightarrow \mathbb{R}\)` --- class: slide-page ##
Binary regression <br><br> As `\(\mu \in (0, 1)\)`, any `\(g(\mu): (0, 1) \rightarrow \mathbb{R}\)` is a valid link function. <br> - .enfase[logit]: `\(g(\mu) = \log(\frac{\mu}{1-\mu})\)` - .enfase[probit]: `\(g(\mu) = \Phi^{-1}(\mu)\)` - .enfase[cloglog]: `\(g(\mu) = \log(-\log(1-\mu))\)` - .enfase[loglog]: `\(g(\mu) = \log(-\log(\mu))\)` - .enfase[cauchit]: `\(g(\mu) = F_{t_1}^{-1}(\mu)\)` - .enfase[robit]: `\(g(\mu) = F_{t_{\nu}}^{-1}(\mu)\)` --- class: slide-page ##
Binary regression <br> .enfase[Some link functions]: <img src="data:image/png;base64,#index_files/figure-html/links-1.png" style="display: block; margin: auto;" /> --- class: slide-page <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.5.0/css/font-awesome.min.css"> ##
Link functions with asymptotes .center[ .mathbox[ Sometimes `\(P(Y = 1|X)\)` will not touch `\(0\)` (or `\(1\)`) ] ] `\(Y\)`: .enfase[death due to breast cancer] - the individual can heal `\(P(Y = 1|X) \leq d \leq 1\)` - even in less dangerous scenarios we can observe deaths `\(P(Y = 1|X) \geq c \geq 0\)` `\(Y\)`: .enfase[conversion in an e-commerce] - the majority will not buy anything `\(P(Y = 1|X) \leq d \leq 1\)` `\(Y\)`: .enfase[churn in a streaming service] - We can observe `\(P(Y = 1|X) \geq c \geq 0\)` `\(Y\)`: .enfase[the person has a private insurance plan] --- class: slide-page ##
Link functions with asymptotes <img src="data:image/png;base64,#img/artigo.png" alt="graph" width="95%"/> --- class: slide-page ##
Link functions with asymptotes Conventional modelling for binary outcomes: .mathbox[ `\(Y_i|\mu_i \sim \text{Bernoulli}(Y_i; \mu_i)\)` `\(g(\mu_i|\mathbf{\beta}) = \beta_0 + \beta_1\times X_{1i} + \ldots + \beta_k\times X_{ki} = X\beta\)` ] For the aforementioned `\(g(.)\)` functions we have that: .mathbox[ `\(g^{-1}(\infty) = 1\)` and `\(g^{-1}(-\infty) = 0\)` ] E.g.: logit .mathbox[ `\(g^{-1}(x) = \frac{exp(x)}{1 + \exp(x)} = \frac{1}{\exp(-x)+1} \implies g^{-1}(\infty) = 1\)` and `\(g^{-1}(-\infty) = 0\)` ] --- class: slide-page ##
Link functions with asymptotes .enfase[FLAMES]: .mathbox[ `\(Y_i|\mu_i \sim \text{Bernoulli}(Y_i; \mu_i)\)` `\(g^{-1}_{cd}(X\beta|c, d) = c + (d - c)\times g^{-1}(X\beta)\)` ] For the aforementioned `\(g(.)\)` functions we have that: .mathbox[ `\(g_{cd}^{-1}(\infty) = d\)` and `\(g_{cd}^{-1}(-\infty) = c\)` ] E.g.: logit .mathbox[ `\(g_{cd}^{-1}(x) = c + (d-c)\frac{1}{\exp(-x)+1} \implies g^{-1}(\infty) = d\)` and `\(g^{-1}(-\infty) = c\)` ] --- class: slide-page ##
Link functions with asymptotes <br> E.g. .enfase[cauchit] and .enfase[logit]: <img src="data:image/png;base64,#index_files/figure-html/links_flexible-1.png" style="display: block; margin: auto;" /> --- class: slide-page ##
Inference The model is given by: .mathbox[ `\(Y_i|\mu_i \sim \text{Bernoulli}(Y_i; \mu_i)\)` `\(\mu_i | \beta, c, d, \nu = g_{cd}^{-1}(X\beta) = c + (d-c)g^{-1}(X\beta)\)` ] Priors: .mathbox[ `\(\beta \sim N_p(0,\sigma^2_{\beta} I)\)` `\(c \sim \text{Beta}(a_c, b_c)\)` `\(d|c \sim \text{Beta}(a_d, b_d)\mathbf{I}(d > c)\)` `\(\nu \sim \text{Exp}(\lambda)\)` `\(\lambda \sim \text{Uniform}(a_{\lambda}, b_{\lambda})\)` ] --- class: slide-page ##
Inference <br> List of available link functions in .enfase[https://github.com/DouglasMesquita/FLAMES]: <br><br> <img src="data:image/png;base64,#img/functions_flames.png" alt="graph" width="95%"/> --- class: slide-page <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.5.0/css/font-awesome.min.css"> ##
Simulation <br> .enfase[Four scenarios]: 1) the conventional model, with .enfase[c = 0] and .enfase[d = 1] 2) a model with a fixed minimum proportion of cases, implying on .enfase[c = 0.20] and .enfase[d = 1] 3) a model with a maximum proportion of noncases, implying on .enfase[c = 0] and .enfase[d = 0.95] 4) a model with a minimum and maximum proportion of cases and non-cases, respectively, implying on .enfase[c = 0.20] and .enfase[d = 0.95] In all scenarios, we are taking `\(\beta_0 = 0; \beta_1 = -1; \beta_2 = 0.5\)` For each model, we generated .enfase[100 data sets] --- class: slide-page ##
Simulation <br> <center> <img src="data:image/png;base64,#img/sim.png" alt="graph" width="99%"/> </center> --- class: slide-page <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.5.0/css/font-awesome.min.css"> ##
SUS population - Model fit Because of the very large demographic, economic and cultural differences in Brazil, .enfase[a model was fitted for each one of the 27 states], indexed by `\(j = 1; ... ; 27\)`. Using the 2008 PNAD survey, let `\(Y_{ji} = 1\)`, if the `\(i\)`th person living at state `\(j\)` .enfase[is not covered by a private health insurance], and 0, otherwise. .mathbox[ `\(Y_{ji}|\mu_{ji} \sim \text{Bernoulli}(\mu_{ji})\)` `\(\mu_{ji} | \beta, c, d = c_j + (d_j - c_j)g^{-1}(\beta_{0j} + \beta_{1j}\times \text{Income}_{ji} + \beta_{2j}\times \text{Gender}_{ji})\)` ] where, + `\(g(.)\)` is a link function + `\(\text{Income}_{ji}\)` is the income of that household (scaled) + `\(\text{Gender}_{ji}\)` is the gender of the household respondent --- class: slide-page ##
SUS population - Model fit <center> <img src="data:image/png;base64,#img/renda_vs_sus_model.png" alt="graph" width="99%"/> </center> --- class: slide-page ##
SUS population - Model fit <p style="font-size: 0.8em; margin-bottom: 5px">Model parameters estimates for the 10 most populous states in Brazil</p> <center> <img src="data:image/png;base64,#img/fit_states.png" alt="graph" width="99%"/> </center> <p style="font-size: 0.8em; margin-bottom: 5px">Fit measures for MG under several models</p> <center> <img src="data:image/png;base64,#img/fit_states_mg.png" alt="graph" width="99%"/> </center> --- class: slide-page ##
SUS population - Model fit <br><br> With the estimates of `\(\hat{c}_j\)`, `\(\hat{d}_j\)`, `\(\hat{\beta}_{0j}\)`, `\(\hat{\beta}_{1j}\)` and `\(\hat{\beta}_{2j}\)` at hand, one can proceed to estimate .enfase[SUS population] at any level `\(k \in j\)` as .mathbox[ `\(\hat{\text{Pop}}_{jk} = [\hat{c}_j + (\hat{d}_j - \hat{c}_j)\times g^{-1}(\hat{\beta}_{0j} + \hat{\beta}_{1j}\times \text{Income}_{jk} + \hat{\beta}_{2j}\times \text{Gender}_{jk})]\times n_{jk}\)` ] where + `\(\text{Income}_{jk}\)` is the average income (scaled) of region `\(jk\)` + `\(\text{Gender}_{jk}\)` is the proportion of males of region `\(jk\)` + `\(n_{jk}\)` is the total population of region `\(jk\)` --- class: slide-page ##
SUS population - Model check Estimated SUS population in comparison to the ANS data in the logarithm scale for MG and SP <center> <img src="data:image/png;base64,#img/model_check.png" alt="graph" width="99%"/> </center> --- class: slide-page ##
SUS population - Model prediction <br> Neighborhoods in Belo Horizonte and its respective income average, estimated SUS population, and estimated proportion of SUS users <br> <center> <img src="data:image/png;base64,#img/model_predict.png" alt="graph" width="99%"/> </center> --- class: slide-page <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.5.0/css/font-awesome.min.css"> ##
FLAMES <br> Available at: https://github.com/DouglasMesquita/FLAMES <br> ```r devtools::install_github("DouglasMesquita/FLAMES") out_arms <- mcmc_bin( data = df, formula = y ~ x1 + x2, nsim = 2000, burnin = 5000, lag = 10, type = "cloglog", # logit, probit, robit, cauchit sample_c = TRUE, sample_d = TRUE, method = "ARMS" # metropolis ) ``` --- class: slide-page ##
FLAMES <br> ```r summary(out_arms) ``` ``` ## $Call ## mcmc_bin(data = bd, formula = f, nsim = nsim, burnin = burnin, ## lag = lag, type = type, sample_c = TRUE, sample_d = TRUE, ## method = "ARMS") ## ## $Coeficients ## mean std_error lower_95 upper_95 ## (Intercept) 0.0239149 0.1957455 -0.3387205 0.4002764 ## X -1.6048333 0.5181484 -2.6397816 -0.7724575 ## ## $`Other parameters` ## mean std_error lower_95 upper_95 ## c parameter 0.3174518 0.08798908 0.1299549 0.4642634 ## d parameter 0.9320752 0.02038448 0.8968980 0.9750762 ## ## $`Fit measures` ## DIC -2*LPML WAIC ## 1 1058.437 1058.637 1058.437 ``` --- class: slide-page ##
FLAMES <br> ```r summary(out_arms) ``` ``` ## $Call ## mcmc_bin(data = bd, formula = f, nsim = nsim, burnin = burnin, ## lag = lag, type = type, sample_c = TRUE, sample_d = TRUE, ## method = "ARMS") ## ## $Coeficients ## mean std_error lower_95 upper_95 ## (Intercept) 0.0239149 0.1957455 -0.3387205 0.4002764 ## X -1.6048333 0.5181484 -2.6397816 -0.7724575 ## ## $`Other parameters` ## mean std_error lower_95 upper_95 ## c parameter 0.3174518 0.08798908 0.1299549 0.4642634 ## d parameter 0.9320752 0.02038448 0.8968980 0.9750762 ## ## $`Fit measures` ## DIC -2*LPML WAIC ## 1 1058.437 1058.637 1058.437 ``` --- class: slide-page ##
FLAMES ```r par(mfrow = c(2, 2)) plot(out_arms, ask = F) ``` <center> <img src="data:image/png;base64,#img/flames_plot.png" alt="graph" width="90%"/> </center> --- class: slide-page <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.5.0/css/font-awesome.min.css"> ##
Conclusions <br><br> - Demographic information is useful to .enfase[guide public health decision making] but is often .enfase[not available at all desired geographic levels]. - As a workaround, a .enfase[predictive model] can be built based on relevant .enfase[covariates available in the desired levels]. - We propose a .enfase[flexible link function framework] for GLM models, capable of accommodating .enfase[asymmetry, heavy tail, and asymptotes]. - The proposed methodology is used to determine the .enfase[Brazilian SUS population] for the neighborhoods of the main cities in Brazil. - This local information is vital for the .enfase[InfoSAS project], and it is important for public agencies since it allows for local policies aimed to the population that actually takes advantage of the system. --- class: slide-page ##
Conclusions <br> - The PNAD survey was used to .enfase[link] the average .enfase[income] and .enfase[percentage of males] with the .enfase[health insurance status] by individuals. - The model was .enfase[fitted separately] for each Brazilian state given the cultural and social economic diversity of Brazil. - It was possible to verify that the model is capable of estimating SUS population in an .enfase[unbiased and precise] way (based on ANS data). - Finally, the .enfase[link function with asymptotes] was capable of .enfase[adapting] to the .enfase[asymmetry] and characteristics observed in the data. - We were able to .enfase[transfer the acquired knowledge] from the individual and municipality levels to .enfase[estimate the SUS population] of each Brazilian .enfase[neighborhood]. - The .enfase[FLAMES R package] is available to fit the proposed class of .enfase[flexible link functions using Bayesian inference]. --- class: slide-page <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.5.0/css/font-awesome.min.css"> ##
References <span class = "reference"> .enfase[Carvalho O., Meira W. Jr., Prates M. (2016)]. **Infosas: um sistema de mineração de dados para controle da produção do sus**. _Revista do TCU, 137, 52-59_. </span> <span class = "reference"> .enfase[Nagler J. (1994)]. **An alternative estimator to logit and probit**. _American Journal of Political Science, 38, 230–255_. </span> <span class = "reference"> .enfase[Chen MH., Dey D. and Shao QM. (1999)]. **A new skewed link model for dichotomous quantal response data**. _Journal of the American Statistical Association 99, 1172–1186_. </span> <span class = "reference"> .enfase[Bazán JL., Bolfarine H. and Branco MD. (2010)]. **A framework for skew-probit links in binary regression**. _Communications in Statistics 39, 678–697_. </span> <span class = "reference"> .enfase[Jiang X., Dey DK., Prunier R., et al. (2013)]. **A new class of flexible link functions with application to species co-occurrence in cape floristic region**. _Annals of Applied Statistics, 7, 2180–2204_. </span> <span class = "reference"> .enfase[Bazán JL., Romeo JS. and Rodrigues J. (2014)]. **Bayesian skew-probit regression for binary response data**. _Brazilian Journal of Probability and Statistics 28, 467–482_. </span> <span class = "reference"> .enfase[Li D., Wang X., Lin L., et al. (2016)]. **Flexible link functions in nonparametric binary regression with Gaussian process priors**. _Biometrics 72, 707–719_. </span> --- class: center, inverse <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.5.0/css/font-awesome.min.css"> <h1 style="color:#05598a; font-size:80px">
THANK YOU! </h1> <br> .mathbox[ <img src="data:image/png;base64,#img/leste.jpg" alt="graph" height="210"/> ] <br><br> <h5> <span style="color:#05598a;">
</span> <span style="color:#00000;">require-r.com</span> <span style="color:#05598a;">
</span> <span style="color:#00000;">douglas-mesquita</span> <span style="color:#05598a;">
</span> <span style="color:#00000;">DouglasMesquita</span> <span style="color:#05598a;">
</span> <span style="color:#00000;">douglas-mesquita</span> </h5>