¿Cómo puede contribuir el Machine Learning a la focalización de programas sociales?. Modelo XGBoost para la determinación de pobreza monetaria interpretado mediante Shap Values: Caso Colombia 2019-2020
| dc.contributor.advisor | Castro Aristizábal, Geovanny | |
| dc.contributor.author | Galvis Caballero, Ángel | |
| dc.contributor.cvlac | Castro Aristizábal, Geovanny [0000530735] | spa |
| dc.contributor.googlescholar | Castro Aristizábal, Geovanny [uWUdeZ8AAAAJ] | spa |
| dc.contributor.linkedin | Castro Aristizábal, Geovanny [geovanny-castro-aristizabal-21589968] | |
| dc.contributor.orcid | Castro Aristizábal, Geovanny [0000-0002-3567-983X] | |
| dc.contributor.researchgate | Castro Aristizábal, Geovanny [Geovanny-Castro-Aristizabal-2] | |
| dc.coverage.campus | UNAB Campus Bucaramanga | spa |
| dc.coverage.spatial | Colombia | spa |
| dc.coverage.temporal | 2019-2020 | spa |
| dc.date.accessioned | 2022-05-17T19:57:44Z | |
| dc.date.available | 2022-05-17T19:57:44Z | |
| dc.date.issued | 2021 | |
| dc.degree.name | Magíster en Análisis Económico | spa |
| dc.description.abstract | La implementación de programas de ayuda social ha sido la principal estrategia llevada a cabo por los gobiernos latinoamericanos para mitigar el impacto de la pobreza y el desempleo. Estos programas incluyen subsidios y transferencias de recursos condicionadas que buscan mejorar la situación económica de los hogares fomentado: la permanencia educativa, el acceso a la salud, la obtención de vivienda, la adquisición de una canasta básica de alimentos, etc. Debido a que estos programas tienen presupuestos limitados se han diseñado métodos para enfocar la inversión pública en poblaciones específicas, por ejemplo: los hogares cuyos ingresos son menores a la línea de pobreza monetaria. En el presente trabajo se utiliza el modelo de machine learning conocido como XGBoost (Chen, T., 2016) para predecir diferentes condiciones económicas en individuos. Entre estas, el nivel de ingresos, la condición de pobreza y la situación desempleo, a partir de características como: el genero, el número de personas en el hogar, los años de educación, las características de la vivienda, los bienes y posesiones, el estrato socioeconómico, entre otras. Permitiendo establecer un proxy que determine si cumplen con condiciones de acceso para ser beneficiarios potenciales programas sociales. El desempeño del modelo es satisfactorio en la estimación de ingresos, presentando errores de inclusión del 23% al 27%, que son inferiores a los presentados por el agregado de ayudas institucionales a nivel Colombia que se estimó en un rango de error entre el 51.8% al 58%. Por último, se aplicó la técnica Shap (SHapley Additive exPlanations) (Lundberg, S. 2017) para explicar la forma en la que se correlacionan las características que se utilizaron en los modelos predictivos y el índice de ingresos. Esto facilita proponer una aplicación de este tipo de técnica como soporte para la operación de programas sociales focalizados, pues permite que la toma de decisiones basada en algoritmos sea más transparente y auditable. | spa |
| dc.description.abstractenglish | The implementation of social assistance programs has been the main strategy carried out by Latin American governments to mitigate the impact of poverty and unemployment. These programs include subsidies and conditional resource transfers that seek to improve the economic situation of fostered households: staying in education, access to health, obtaining housing, acquiring a basic food basket, etc. Because these programs have limited budgets, methods have been designed to focus public investment on specific populations, for example: households whose income is below the monetary poverty line. In the present work, the machine learning model known as XGBoost (Chen, T., 2016) is used to predict different economic conditions in individuals. Among these, the level of income, the condition of poverty and the unemployment situation, based on characteristics such as: gender, number of people in the household, years of education, characteristics of the home, assets and possessions, socioeconomic status, among others. Allowing the establishment of a proxy that determines if they meet the access conditions to be potential beneficiaries of social programs. The performance of the model is satisfactory in the estimation of income, presenting inclusion errors of 23% to 27%, which are lower than those presented by the aggregate of institutional aid at the Colombian level, which was estimated in an error range between 51.8% at 58%. Finally, the Shap technique (SHapley Additive exPlanations) (Lundberg, S. 2017) was applied to explain the way in which the characteristics used in the predictive models and the income index are correlated. This makes it easier to propose an application of this type of technique as support for the operation of targeted social programs, since it allows decision-making based on algorithms to be more transparent and auditable. | spa |
| dc.description.degreelevel | Maestría | spa |
| dc.description.learningmodality | Modalidad Presencial | spa |
| dc.description.tableofcontents | Tabla de contenidos ............................................................................................................. VI Lista de figuras...................................................................................................................VIII Lista de tablas ........................................................................................................................X Lista de Abreviaturas........................................................................................................... XI Introducción ............................................................................................................................1 Capítulo 1: Planteamiento del problema.................................................................................6 Capítulo 2: Objetivos............................................................................................................15 2.1. Objetivo general....................................................................................................15 2.2. Objetivos específicos. ...........................................................................................15 Capítulo 3: Marco Teórico....................................................................................................16 3.1. Problema general que buscan resolver los proxy means tests...............................17 3.2. ¿Correlación o causalidad? ...................................................................................20 3.3. Tipos de modelos utilizados en los proxy means tests..........................................20 3.3.1. Modelo de regresión lineal................................................................................21 3.3.2. Modelo LASSO. ...............................................................................................22 3.3.3. Proceso general de ajuste de datos....................................................................22 3.3.4. Modelos no paramétricos..................................................................................23 3.4. Métricas utilizadas para la evaluación y comparación de modelos. .....................28 3.5. Conexión entre los modelos de predicción utilizados como proxy means tests y el análisis económico. ...........................................................................................................32 3.5.1. Modelos de predicción aplicados a la determinación de los ingresos y de la condición de pobreza. .......................................................................................................33 3.5.2. Modelos de predicción aplicados a la determinación de la condición laboral..36 Capítulo 4: Estado del Arte...................................................................................................40 Capítulo 5: Metodología. ......................................................................................................52 5.1. Fuentes de datos....................................................................................................52 5.2. Elaboración de las bases de datos. ........................................................................52 5.3. Selección de variables...........................................................................................56 5.3.1. One Hot Encoding.........................................................................................56 5.3.2. Creación de variables cardinales...................................................................57 5.3.3. Creación de variables para imputar características de interés. .....................58 5.3.4. Eliminación de variables espurias y variables altamente correlacionadas....59 VII 5.3.5. Descripción de las variables originales de la base de datos conformada......59 5.3.6. Descripción de las variables dependientes....................................................65 5.3.7. Selección de las variables regresoras para modelos de ingresos y pobreza..67 5.3.8. Selección de las variables independientes para el modelo de desempleo.....72 5.4. Modelamiento. ......................................................................................................73 5.4.1. Modelo I: Regresor del índice de ingresos. ..................................................73 5.4.2. Modelo II: Clasificador de situación pobreza...............................................80 5.4.3. Modelo III: Clasificador situación de desempleo. ........................................84 5.4.4. Métodos computacionales.............................................................................86 5.4.5. Aclaración estadística. ..................................................................................87 Capítulo 6: Resultados y análisis. .........................................................................................89 6.1. Modelo I: Regresor del índice de ingresos. ..........................................................89 6.2. Modelo II: Clasificador de condición de pobreza...............................................116 6.3. Modelo III: Clasificador de situación de desempleo. .........................................119 Conclusiones y recomendaciones .......................................................................................122 Bibliografía .........................................................................................................................125 | spa |
| dc.format.mimetype | application/pdf | spa |
| dc.identifier.instname | instname:Universidad Autónoma de Bucaramanga - UNAB | spa |
| dc.identifier.reponame | reponame:Repositorio Institucional UNAB | spa |
| dc.identifier.repourl | repourl:https://repository.unab.edu.co | spa |
| dc.identifier.uri | http://hdl.handle.net/20.500.12749/16448 | |
| dc.language.iso | spa | spa |
| dc.publisher.faculty | Facultad Economía y Negocios | spa |
| dc.publisher.grantor | Universidad Autónoma de Bucaramanga UNAB | spa |
| dc.publisher.program | Maestría en Análisis Económico | spa |
| dc.relation.references | A. M. Nalla Gounden. “Investment in Education in India.” The Journal of Human Resources 2, no. 3 (1967): 347–58. https://doi.org/10.2307/144839. | spa |
| dc.relation.references | Banco Mundial. (14 de Octubre 2021). Pobreza: panorama general. https://www.bancomundial.org/es/topic/poverty/overview#1 | spa |
| dc.relation.references | Banco Mundial. (s.f., accedido el 1 de diciembre de 2021). Desempleo a nivel global. Organización Internacional del Trabajo, base de datos sobre estadísticas de la Organización Internacional del Trabajo (OIT). https://datos.bancomundial.org/indicator/SL.UEM.TOTL.ZS | spa |
| dc.relation.references | Blofield, M. & Filgueira, F. (2020). COVID19 and Latin America: Social Impact, Policies and a Fiscal Case for an Emergency Social Protection Floor. CIPPEC Policy Brief | spa |
| dc.relation.references | Brown, C., Ravallion, M., & van de Walle, D. (2018). A poor means test? Econometric targeting in Africa. Journal of Development Economics, 134, 109-124. https://doi.org/10.1016/j.jdeveco.2018.05.004. | spa |
| dc.relation.references | Cecchini, S., & Madariaga, A. (2011). Conditional cash transfer programmes: The recent experience in Latin America and the Caribbean (First edition). United Nations, ECLAC. | spa |
| dc.relation.references | Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785-794. https://doi.org/10.1145/2939672.2939785. | spa |
| dc.relation.references | Del Boca, D., Pronzato, C., & Sorrenti, G. (2021). Conditional cash transfer programs and household labor supply. European Economic Review, 136, 103755. https://doi.org/10.1016/j.euroecorev.2021.103755. | spa |
| dc.relation.references | Del Ninno, C., Mills, B. (2015). Safety Nets in Africa : Effective Mechanisms to Reach the Poor and Most Vulnerable. Africa Development Forum. Washington, DC: World Bank; and Agence Française de Développement. World Bank. https://openknowledge.worldbank.org/handle/10986/21369. | spa |
| dc.relation.references | Departamento de Economía y Asuntos Sociales Naciones Unidas. (1 de noviembre de 2021). World Economic Situation And Prospects: November 2021 Briefing, No. 155. https://www.un.org/development/desa/dpad/publication/world-economic-situationand-prospects-november-2021-briefing-no-155/. | spa |
| dc.relation.references | Departamento Administrativo Nacional de Estadística (DANE). (s.f., accedido 1 de diciembre de 2021). Estadísticas mercado laboral. https://www.dane.gov.co/index.php/estadisticas-por-tema/mercado-laboral. | spa |
| dc.relation.references | Departamento de Prosperidad Social del Gobierno Nacional, Colombia. (s.f. accedido 1 de diciembre de 2021). https://www.monteria.gov.co/publicaciones/82/programafamilias-en-accion/. | spa |
| dc.relation.references | Dershem, Larry. (2013). Using a Proxy Means Test for Targeting in a Conditional Cash Transfer Program. | spa |
| dc.relation.references | Embarec, R. (2020). Aprendizaje Automático aplicado al sector hotelero, Machine Learning applied to Hotel Industry, La Laguna, 11 de septiembre de 2020, trabajo de fin de grado. https://riull.ull.es/xmlui/bitstream/handle/915/21338/Aprendizaje%20Automatico% 20aplicado%20al%20sector%20hotelero.pdf?sequence=1. | spa |
| dc.relation.references | García, S., & Saavedra, J. E. (2017). Educational Impacts and Cost-Effectiveness of Conditional Cash Transfer Programs in Developing Countries: A Meta-Analysis. Review of Educational Research, 87(5), 921-965. https://doi.org/10.3102/0034654317723008. | spa |
| dc.relation.references | Graham, C. (1995). Margaret E. Grosh, Administering Targeted Social Programs in Latin America: From Platitudes to Practice (Washington, D.C.: The World Bank, Regional and Sectorial Studies, 1994), pp. ix + 174, $10.95. Journal of Latin American Studies, 27(1), 280-281. https://doi.org/10.1017/S0022216X00010713 | spa |
| dc.relation.references | Gerszon-Mahler, D. Banco Mundial. (24 de Junio 2021). Updated estimates of the impact of COVID-19 on global poverty: Turning the corner on the pandemic in 2021? https://blogs.worldbank.org/opendata/updated-estimates-impact-covid-19-globalpoverty-turning-corner-pandemic-2021. | spa |
| dc.relation.references | Grimes, M., & Wängnerud, L. (2010). Curbing Corruption Through Social Welfare Reform? The Effects of Mexico’s Conditional Cash Transfer Program on Good Government. The American Review of Public Administration, 40(6), 671-690. https://doi.org/10.1177/0275074009359025. | spa |
| dc.relation.references | Grisales R, Hugo, & Arbeláez M, María P. (2008). Metodología para el diseño de un índice de condiciones de vida para los adolescentes jóvenes. Revista Facultad Nacional de Salud Pública, 26(2), 178-195. Retrieved January 23, 2022, http://www.scielo.org.co/scielo.php?script=sci_arttext&pid=S0120- 386X2008000200009&lng=en&tlng=es. | spa |
| dc.relation.references | Grosh, M. E., & Baker, J. L. (1995). Proxy means tests for targeting social programs: Simulations and speculation. The World Bank. https://doi.org/10.1596/0-8213-3313- 5 | spa |
| dc.relation.references | Houssou, N. & Zeller, M. (2007). Proxy Means Tests for Targeting the Poorest Households -- Applications to Uganda, RePEc. | spa |
| dc.relation.references | Jacob Mincer. (1970). The Distribution of Labor Incomes: A Survey With Special Reference to the Human Capital Approach. Journal of Economic Literature, 8(1), 1–26. http://www.jstor.org/stable/2720384 | spa |
| dc.relation.references | Kidd, S., Gelders, B., & Bailey-Athias, D. (2017) Organización internacional del trabajo. Decent work for sustainable development (DW4SD) Resource Platform. https://www.ilo.org/global/topics/dw4sd/WCMS_568678/lang--en/index.htm. | spa |
| dc.relation.references | Kidd, S. (2013). Rethinking "Targeting" in International Development - Pathways Perspectives. Issue 11. | spa |
| dc.relation.references | Kidd, S., Gelders, B., & Diloá Bailey-Athias. (2017). Exclusion by design: An assessment of the effectiveness of the proxy means test poverty targeting mechanism. https://doi.org/10.13140/RG.2.2.36802.68805. | spa |
| dc.relation.references | Kidd, S. & Wylde, E. (2011). Targeting the Poorest: An assessment of the proxy means test methodology AusAID. | spa |
| dc.relation.references | Londoño-Vélez, J., & Querubín, P. (2021). The Impact of Emergency Cash Assistance in a Pandemic: Experimental Evidence from Colombia. The Review of Economics and Statistics, 1-27. https://doi.org/10.1162/rest_a_01043 | spa |
| dc.relation.references | Lundberg, S., & Lee, S.-I. (2017). A Unified Approach to Interpreting Model Predictions. arXiv:1705.07874 [cs, stat]. http://arxiv.org/abs/1705.07874. | spa |
| dc.relation.references | Machado, D. B., Rodrigues, L. C., Rasella, D., Lima Barreto, M., & Araya, R. (2018). Conditional cash transfer programme: Impact on homicide rates and hospitalisations from violence in Brazil. PLOS ONE, 13(12), e0208925. https://doi.org/10.1371/journal.pone.0208925. | spa |
| dc.relation.references | Microsoft. (s.f., accedido el 1 de diciembre de 2021). SMOTE. https://docs.microsoft.com/es-es/azure/machine-learning/componentreference/smote | spa |
| dc.relation.references | McBridea, L., & Nicholsb, A.R. (2015). Improved poverty targeting through machine learning: An application to the USAID Poverty Assessment Tools. http://www.econthatmatters.com/wpcontent/uploads/2015/01/improvedtargeting_21jan2015.pdf | spa |
| dc.relation.references | Mincer, J. (1958). Investment in Human Capital and Personal Income Distribution. Journal of Political Economy, 66(4), 281-302. https://doi.org/10.1086/258055. | spa |
| dc.relation.references | Mincer, J. (1962). On-the-Job Training: Costs, Returns, and Some Implications. Journal of Political Economy, 70(5, Part 2), 50-79. https://doi.org/10.1086/258725. | spa |
| dc.relation.references | Mincer, J. (1965). [Review of The Economic Value of Education; Economic Aspects of Education: Three Essays; External Benefits of Public Education: An Economic Analysis, by T. W. Schultz, W. G. Bowen, & B. A. Weisbrod]. The American Economic Review, 55(3), 637–640. http://www.jstor.org/stable/1814619. | spa |
| dc.relation.references | Nicola, M., Alsafi, Z., Sohrabi, C., Kerwan, A., Al-Jabir, A., Iosifidis, C., Agha, M., & Agha, R. (2020). The socio-economic implications of the coronavirus pandemic (COVID19): A review. International Journal of Surgery, 78, 185-193. https://doi.org/10.1016/j.ijsu.2020.04.018. | spa |
| dc.relation.references | Noriega-Campero, A., Garcia-Bulle, B., Cantu, L. F., Bakker, M. A., Tejerina, L., & Pentland, A. (2020). Algorithmic targeting of social policies: Fairness, accuracy, and distributed governance. Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, 241-251. https://doi.org/10.1145/3351095.3375784. | spa |
| dc.relation.references | O’Neill, K. (2021). Cash or Conditions? An Analysis of Conditional Cash Transfer Programs (Doctor of Philosophy, Carleton University). https://doi.org/10.22215/etd/2021- 14644. | spa |
| dc.relation.references | Parker, S. W., & Todd, P. E. (2017). Conditional Cash Transfers: The Case of Progresa/Oportunidades. Journal of Economic Literature, 55(3), 866-915. https://doi.org/10.1257/jel.20151233. | spa |
| dc.relation.references | Psacharopoulos, G. (1972). Rates of Return to Investment in Education around the World. Comparative Education Review, 16(1), 54-67. https://doi.org/10.1086/445569 | spa |
| dc.relation.references | Rawlings, L. B. (2005). Evaluating the Impact of Conditional Cash Transfer Programs. The World Bank Research Observer, 20(1), 29-55. https://doi.org/10.1093/wbro/lki001. | spa |
| dc.relation.references | Schultz, T. W. (1967). The Rate of Return in Allocating Investment Resources to Education. The Journal of Human Resources, 2(3), 293–309. https://doi.org/10.2307/144836. | spa |
| dc.relation.references | Sen, A. (1976). Poverty: An Ordinal Approach to Measurement. Econometrica, 44(2), 219– 231. https://doi.org/10.2307/1912718 | spa |
| dc.relation.references | Sen, A. (1980). “Equality of What?”. In The Tanner Lecture on Human Values, I, 197-220. Cambridge: Cambridge University Press | spa |
| dc.relation.references | Sohnesen, T., & Stender, N., (2016). "Is random forest a superior methodology for predicting poverty ? an empirical assessment," Policy Research Working Paper Series 7612, The World Bank. https://ideas.repec.org/p/wbk/wbrwps/7612.html | spa |
| dc.relation.references | Sosa-Rubi, S. G., Walker, D., Servan, E., & Bautista-Arredondo, S. (2011). Learning effect of a conditional cash transfer programme on poor rural women’s selection of delivery care in Mexico. Health Policy and Planning, 26(6), 496-507. https://doi.org/10.1093/heapol/czq085. | spa |
| dc.relation.references | Uribe G., J. I. (Ed.). (2006). Ensayos de economía aplicada al mercado laboral (1. ed). Programa Editorial, Universidad del Valle. | spa |
| dc.relation.references | Uribe J. I., OrtizC. H., & CorreaJ. B. (2009). ¿Cómo deciden los individuos en el mercado laboral? Modelos y estimaciones para Colombia. Lecturas De Economía, 64(64), 59 - 90. https://doi.org/10.17533/udea.le.n64a2650. | spa |
| dc.relation.references | Varian, H. R. (2014). Big data: New tricks for econometrics. Journal of Economic Perspectives 28(2), 3–28. | spa |
| dc.relation.references | Verme, Paolo, (2020). "Which Model for Poverty Predictions?," GLO Discussion Paper Series 468, Global Labor Organization (GLO). | spa |
| dc.rights.accessrights | info:eu-repo/semantics/openAccess | spa |
| dc.rights.creativecommons | Atribución-NoComercial-SinDerivadas 2.5 Colombia | * |
| dc.rights.local | Abierto (Texto Completo) | spa |
| dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/2.5/co/ | * |
| dc.subject.keywords | Economic development | spa |
| dc.subject.keywords | Economy | spa |
| dc.subject.keywords | Economic analysis | spa |
| dc.subject.keywords | Proxy means tests | spa |
| dc.subject.keywords | Machine learning | spa |
| dc.subject.keywords | Interpretable machine learning | spa |
| dc.subject.keywords | Methods Assembly and Shap values | spa |
| dc.subject.keywords | Artificial intelligence | spa |
| dc.subject.keywords | Machine theory | spa |
| dc.subject.keywords | Explanation-Based Learning | spa |
| dc.subject.lemb | Análisis económico | spa |
| dc.subject.lemb | Economía | spa |
| dc.subject.lemb | Desarrollo económico | spa |
| dc.subject.lemb | Inteligencia artificial | spa |
| dc.subject.lemb | Teoría de las máquinas | spa |
| dc.subject.lemb | Aprendizaje basado en explicaciones | spa |
| dc.subject.proposal | Pruebas de medios proxy | spa |
| dc.subject.proposal | Aprendizaje automático | spa |
| dc.subject.proposal | Aprendizaje automático interpretable | spa |
| dc.subject.proposal | Métodos valores de ensamble y Shap | spa |
| dc.title | ¿Cómo puede contribuir el Machine Learning a la focalización de programas sociales?. Modelo XGBoost para la determinación de pobreza monetaria interpretado mediante Shap Values: Caso Colombia 2019-2020 | spa |
| dc.title.translated | How can Machine Learning contribute to the targeting of social programs? XGBoost model for determining monetary poverty interpreted through Shap Values: Colombia case 2019-2020 | spa |
| dc.type | Thesis | eng |
| dc.type.coar | http://purl.org/coar/resource_type/c_bdcc | |
| dc.type.driver | info:eu-repo/semantics/masterThesis | spa |
| dc.type.hasversion | info:eu-repo/semantics/acceptedVersion | spa |
| dc.type.local | Tesis | spa |
| dc.type.redcol | http://purl.org/redcol/resource_type/TM | spa |
Archivos
Bloque original
1 - 2 de 2
Cargando...
- Nombre:
- 2021_Tesis_Angel_Galvis_Caballero.pdf
- Tamaño:
- 3.55 MB
- Formato:
- Adobe Portable Document Format
- Descripción:
- Tesis
Cargando...
- Nombre:
- 2022_Licencia_Angel_Galvis_Caballero.pdf
- Tamaño:
- 73.8 KB
- Formato:
- Adobe Portable Document Format
- Descripción:
- Licencia
Bloque de licencias
1 - 1 de 1
Cargando...
- Nombre:
- license.txt
- Tamaño:
- 829 B
- Formato:
- Item-specific license agreed upon to submission
- Descripción:
