Implementing hierarchical bayesian model to fertility data: the case of Ethiopia

Gebremeskel, Haftu Gebrehiwot

Background: Ethiopia is a country with 9 ethnically-based administrative regions and 2 city administrations, often cited, among other things, with high fertility rates and rapid population growth rate. Despite the country’s effort in their reduction, they still remain high, especially at regional-level. To this end, the study of fertility in Ethiopia, particularly on its regions, where fertility variation and its repercussion are at boiling point, is paramount important. An easy way of finding different characteristics of a fertility distribution is to build a suitable model of fertility pattern through different mathematical curves. ASFR is worthwhile in this regard. In general, the age-specific fertility pattern is said to have a typical shape common to all human populations through years though many countries some from Africa has already started showing a deviation from this classical bell shaped curve. Some of existing models are therefore inadequate to describe patterns of many of the African countries including Ethiopia. In order to describe this shape (ASF curve), a number of parametric and non-parametric functions have been exploited in the developed world though fitting these models to curves of Africa in general and that of Ethiopian in particular data has not been undertaken yet. To accurately model fertility patterns in Ethiopia, a new mathematical model that is both easily used, and provides good fit for the data is required. Objective: The principal goals of this thesis are therefore fourfold: (1). to examine the pattern of ASFRs at country and regional level,in Ethiopia; (2). to propose a model that best captures various shapes of ASFRs at both country and regional level, and then compare the performance of the model with some existing ones; (3). to fit the proposed model using Hierarchical Bayesian techniques and show that this method is flexible enough for local estimates vis-´a-vis traditional formula, where the estimates might be very imprecise, due to low sample size; and (4). to compare the resulting estimates obtained with the non-hierarchical procedures, such as Bayesian and Maximum likelihood counterparts. Methodology: In this study, we proposed a four parametric parametric model, Skew Normal model, to fit the fertility schedules, and showed that it is flexible enough in capturing fertility patterns shown at country level and most regions of Ethiopia. In order to determine the performance of this proposed model, we conducted a preliminary analysis along with ten other commonly used parametric and non-parametric models in demographic literature, namely: Quadratic Spline function, Cubic Splines, Coale-Trussell function, Beta, Gamma, Hadwiger distribution, Polynomial models, the Adjusted Error Model, Gompertz curve, Skew Normal, and Peristera & Kostaki Model. The criterion followed in fitting these models was Nonlinear Regression with nonlinear least squares (nls) estimation. We used Akaike Information Criterion (AIC) as model selecction criterion. For many demographers, however, estimating regional-specific ASFR model and the associated uncertainty introduced due those factors can be difficult, especially in a situation where we have extremely varying sample size among different regions. Recently, it has been proposed that Hierarchical procedures might provide more reliable parameter estimates than Non-Hierarchical procedures, such as complete pooling and independence to make local/regional-level analyses. In this study, a Hierarchical Bayesian procedure was, therefore, formulated to explore the posterior distribution of model parameters (for generation of region-specific ASFR point estimates and uncertainty bound). Besides, other non-hierarchical approaches, namely Bayesian and the maximum likelihood methods, were also instrumented to estimate parameters and compare the result obtained using these approaches with Hierarchical Bayesian counterparts. Gibbs sampling along with MetropolisHastings argorithm in R (Development Core Team, 2005) was applied to draw the posterior samples for each parameter. Data augmentation method was also implemented to ease the sampling process. Sensitivity analysis, convergence diagnosis and model checking were also thoroughly conducted to ensure how robust our results are. In all cases, non-informative prior distributions for all regional vectors (parameters) were used in order to real the lack of knowledge about these random variables. Result: The results obtained from this preliminary analysis testified that the values of the Akaike Information criterion(AIC) for the proposed model, Skew Normal (SN), is lowest: in the capital, Addis Ababa, Dire Dawa, Harari, Affar, Gambela, Benshangul-Gumuz, and country level data as well. On the contrary, its value was also higher some of the models and lower the rest on the remain regions, namely: Tigray, Oromiya, Amhara, Somali and SNNP. This tells us that the proposed model was able to capturing the pattern of fertility at the empirical fertility data of Ethiopia and its regions better than the other existing models considered in 6 of the 11 regions. The result from the HBA indicates that most of the posterior means were much closer to the true fixed fertility values. They were also more precise and have lower uncertainty with narrower credible interval vis-´a-vis the other approaches, ML and Bayesian estimate analogues. Conclusion: From the preliminary analysis, it can be concluded that the proposed model was better to capture ASFR pattern at national level and its regions than the other existing common models considered. Following this result, we conducted inference and prediction on the model parameters using these three approaches: HBA, BA and ML methods. The overall result suggested several points. One such is that HBA was the best approach to implement for such a data as it gave more consistent, precise (the low uncertainty) than the other approaches. Generally, both ML method and Bayesian method can be used to analyze our model, but they can be applicable to different conditions. ML method can be applied when precise values of model parameters have been known, large sample size can be obtained in the test; and similarly, Bayesian method can be applied when uncertainties on the model parameters exist, prior knowledge on the model parameters are available, and few data is available in the study.

Background: L’Etiopia è una nazione divisa in 9 regioni amministrative (definite su base etnica) e due città. Si tratta di una nazione citata spesso come esempio di alta fecondità e rapida crescita demografica. Nonostante gli sforzi del governo, fecondità e cresita della popolazione rimangono elevati, specialmente a livello regionale. Pertanto, lo studio della fecondità in Etiopia e nelle sue regioni – caraterizzate da un’alta variabilità – è di vitale importanza. Un modo semplice di rilevare le diverse caratteristiche della distribuzione della feconditàè quello di costruire in modello adatto, specificando diverse funzioni matematiche. In questo senso, vale la pena concentrarsi sui tassi specifici di fecondità, i quali mostrano una precisa forma comune a tutte le popolazioni. Tuttavia, molti paesi mostrano una “simmetrizzazione” che molti modelli non riescono a cogliere adeguatamente. Pertanto, per cogliere questa la forma dei tassi specifici, sono stati utilizzati alcuni modelli parametrici ma l’uso di tali modelliè ancora molto limitato in Africa ed in Etiopia in particolare. Obiettivo: In questo lavoro si utilizza un nuovo modello per modellare la fecondità in Etiopia con quattro obiettivi specifici: (1). esaminare la forma dei tassi specifici per età dell’Etiopia a livello nazionale e regionale; (2). proporre un modello che colga al meglio le varie forme dei tassi specifici sia a livello nazionale che regionale. La performance del modello proposto verrà confrontata con quella di altri modelli esistenti; (3). adattare la funzione di fecondità proposta attraverso un modello gerarchico Bayesiano e mostrare che tale modelloè sufficientemente flessibile per stimare la fecondità delle singole regioni – dove le stime possono essere imprecise a causa di una bassa numerosità campionaria; (4). confrontare le stime ottenute con quelle fornite da metodi non gerarchici (massima verosimiglianza o Bayesiana semplice) Metodologia: In questo studio, proponiamo un modello a 4 parametri, la Normale Asimmetrica, per modellare i tassi specifici di fecondità. Si mostra che questo modello è sufficientemente flessibile per cogliere adeguatamente le forme dei tassi specifici a livello sia nazionale che regionale. Per valutare la performance del modello, si è condotta un’analisi preliminare confrontandolo con altri dieci modelli parametrici e non parametrici usati nella letteratura demografica: la funzione splie quadratica, la Cubic-Spline, i modelli di Coale e Trussel, Beta, Gamma, Hadwiger, polinomiale, Gompertz, Peristera-Kostaki e l’Adjustment Error Model. I modelli sono stati stimati usando i minimi quadrati non lineari (nls) e il Criterio d’Informazione di Akaike viene usato per determinarne la performance. Tuttavia, la stima per le singole regioni pu‘o risultare difficile in situazioni dove abbiamo un’alta variabilità della numerosità campionaria. Si propone, quindi di usare procedure gerarchiche che permettono di ottenere stime più affidabili rispetto ai modelli non gerarchici (“pooling” completo o “unpooling”) per l’analisi a livello regionale. In questo studia si formula un modello Bayesiano gerarchico ottenendo la distribuzione a posteriori dei parametri per i tassi di fecnodità specifici a livello regionale e relativa stima dell’incertezza. Altri metodi non gerarchici (Bayesiano semplice e massima verosimiglianza) vengono anch’essi usati per confronto. Gli algoritmi Gibbs Sampling e Metropolis-Hastings vengono usati per campionare dalla distribuzione a posteriori di ogni parametro. Anche il metodo del “Data Augmentation” viene utilizzato per ottenere le stime. La robustezza dei risultati viene controllata attraverso un’analisi di sensibilità e l’opportuna diagnostica della convergenza degli algoritmi viene riportata nel testo. In tutti i casi, si sono usate distribuzioni a priori non-informative. Risultati: I risutlati ottenuti dall’analisi preliminare mostrano che il modello Skew Normal ha il pi`u basso AIC nelle regioni Addis Ababa, Dire Dawa, Harari, Affar, Gambela, Benshangul-Gumuz e anche per le stime nazionali. Nelle altre regioni (Tigray, Oromiya, Amhara, Somali e SNNP) il modello Skew Normal non risulta il milgiore, ma comunque mostra un buon adattamento ai dati. Dunque, il modello Skew Normal risulta il migliore in 6 regioni su 11 e sui tassi specifici di tutto il paese. Conclusioni: Dunque, il modello Skew Normal risulta globalmente il migliore. Da questo risultato iniziale, siè partiti per costruire i modelli Gerachico Bayesiano, Bayesiano semplice e di massima verosimiglianza. Il risultato del confronto tra questi tre approcci è che il modello gerarchico fornisce stime più preciso rispetto agli altri.

Implementing hierarchical bayesian model to fertility data: the case of Ethiopia / Gebremeskel, Haftu Gebrehiwot. - (2016 Jan 31).