Extracción de datos semiestructurados en la web

Correa Trocha, Mayra Alejandra; Peñuela Morales, Sarith Mayerly

Extracción de datos semiestructurados en la web

dc.contributor.advisor	Pérez Alcázar, José de Jesús
dc.contributor.author	Correa Trocha, Mayra Alejandra
dc.contributor.author	Peñuela Morales, Sarith Mayerly
dc.contributor.googlescholar	Pérez Alcázar, José de Jesús [es&oi=ao]	spa
dc.contributor.orcid	Pérez Alcázar, José de Jesús [0000-0003-3389-0401]	spa
dc.coverage.campus	UNAB Campus Bucaramanga	spa
dc.coverage.spatial	Colombia	spa
dc.date.accessioned	2024-10-22T16:24:52Z
dc.date.available	2024-10-22T16:24:52Z
dc.date.issued	2001-07-31
dc.degree.name	Ingeniero de Sistemas	spa
dc.description.abstract	La gran proliferación de textos, sobre todo en el formato electrónico hace muy difícil que cualquier persona sea capaz de leer, comprender y sintetizar tal cantidad de información. Es esto lo que ha llevado a un gran número de investigadores a desarrollar una serie de estrategias para el manejo de esta información. Entre éstas se encuentra la Extracción de Información (IE). La IE pretende, a partir de estos textos, obtener información relevante que pueda ser utilizada electrónicamente, ° De esta manera, para hacer uso eficiente de la información contenida en un texto, es útil que la información sea almacenada en alguna clase de formato estructurado; por ejemplo, una base de datos relacional. Generalmente, el proceso de extracción de la información requerida de un documento hacia una Base de Datos es usualmente un proceso manual. Debido al enorme volumen de los textos que se encuentran en la Web, se crea la necesidad de tener métodos de procesamiento automático para extraer la información.	spa
dc.description.abstractenglish	The great proliferation of texts, especially in electronic format, makes it very difficult for anyone to be able to read, understand and synthesize such a quantity of information. It is this that has led to a large number of researchers to develop a series of strategies for managing this information. Among these is Information Extraction (IE). The EI aims, from these texts, to obtain relevant information that can be used electronically. ° In this way, to make efficient use of the information contained in a text, it is useful that the information be stored in some kind of structured format; for example, a relational database. Generally, the process of extracting the required information from a document into a Database is usually a manual process. Due to the enormous volume of texts found on the Web, the need to have automatic processing methods to extract information.	spa
dc.description.degreelevel	Pregrado	spa
dc.description.learningmodality	Modalidad Presencial	spa
dc.description.tableofcontents	INTRODUCCIÓN FUNDAMENTACION TEORICA 1.1 EXTRACCION DE DATOS SEMI-ESTRUCTURADOS 1,41 Visión de un dato semi-estructurado 1.1.1.1 Estructura de los datos 1.1.2 La web 1.1.3 Wrappers 1.1.4 Desarrollo del software de extracción 1.2 DATA EXTRACTION BY EXAMPLE (DEByE) 1.2.1 Propuesta DEByE 1.2:2-Conceptos básicos y notación 1.2.3 Herramienta DEByE 1.2.4 Interfaz Grafica de Usuarios (GUI) 1.2.5 Parámetros de extracción de objetos (OE) 1.2.6 Extractor de DEByE 1.2.6.1 Técnica de extracción bottorm_up 2 CUADRO COMPARATIVO DE LAS TRES TÉCNICAS DE EXTRACCIÓN DE DATOS SEMIESTRUCTURADOS EN LA WEB 3. METODOLOGÍA DE DESARROLLO 3.1 VISIÓN GENÉRICA DE LA INGENIERÍA DEL SOFTWARE 3.2 CICLO DE VIDA 3.3 VISIÓN GENERAL DE LA METODOLOGÍA 3.3.1 Metodología 3.3.1.1 Análisis de requerimientos 3.3.1.2 Diseño del sistema 3.3.1.2.1 Diseño detallado 3.3.1.3 Implementación 4. ANALISIS DE REQUERIMIENTOS 4.1 IDENTIFICACIÓN DE LOS CASOS DE USO DEL SISTEMA 5. DISEÑO DEL SISTEMA 5.1 DESCRIPCIÓN DE OBJETOS 5.2 ARQUITECTURA DEL SISTEMA 5.3 PLATAFORMA DEL SISTEMA 5.4 DEFINICIÓN INICIAL DE LA INTERFAZ DEL SISTEMA 5.4.1 Descripción textual del funcionamiento 5 5 DISEÑO DE LA INTERFAZ GRÁFICA DE USUARIO 5.5.1 Descripción pantalla menú 5.5.2 Descripción pantalla solicitud de página 6. IMPLEMENTACION 6.1 CARACTERÍSTICAS DEL EQUIPO 6.2 HERRAMIENTAS UTILIZADAS 6.3 DIFICULTADES EN LA IMPLEMENTACIÓN 6.4 EXPERIENCIAS EN LA REALIZACIÓN DE PRUEBA 7, DIRECTRICES PARA TRABAJOS EUTURO 8. CONCLUSIONES BIBLIOGRAFÍA ANEXOS	spa
dc.format.mimetype	application/pdf	spa
dc.identifier.instname	instname:Universidad Autónoma de Bucaramanga - UNAB	spa
dc.identifier.reponame	reponame:Repositorio Institucional UNAB	spa
dc.identifier.repourl	repourl:https://repository.unab.edu.co	spa
dc.identifier.uri	http://hdl.handle.net/20.500.12749/27074
dc.language.iso	spa	spa
dc.publisher.faculty	Facultad Ingeniería	spa
dc.publisher.grantor	Universidad Autónoma de Bucaramanga UNAB	spa
dc.publisher.program	Pregrado Ingeniería de Sistemas	spa
dc.publisher.programid	ISI-1791
dc.relation.references	ABITEBOUL, Serge; BUNEMAN, Peter and SUCIU, Dan. Data on the Web: A Syntax for Date. San francisco. California: Morgan Kaufmann, 2000. 254 p. !- 55860-622-X.	spa
dc.relation.references	A. Y. Aho and Gorasixk, Efficient string matching: An aid to pibliographic search. Communications of ACM, 18 (6): $554P, 1975.	spa
dc.relation.references	ATZENI, P; MECCA, G. and MERIALDO, Pp. Semiestructured and structured data in the Web : Going back and forth. En : Universita di Roma Tre and Universita della Basilicata.	spa
dc.relation.references	BOOCH, Grady; RUMBAUGH, James y JACOBSON, Ivar, The Unified Modeling Languaje User Guide, 8.1. : Addigon Wesley, s.f, 431p.	spa
dc.relation.references	CATALÁ, N. y CASTELL, N. Construcción automática de diccionario de patrones de extracción de información.	spa
dc.relation.references	COWIE, J. and LEHNERT, W. Information extraction. En : Communications of ACM, (2000).	spa
dc.relation.references	CRESCENZI, V. and MECGCA, G. Grammars have exceptions. En : Dipartimento di Informatica e Autornazione Universita di Roma Tre.	spa
dc.relation.references	CGROFT, W. B. NSF center for intelligent information retrieval. En Communications of ACM. (1985); 740p.	spa
dc.relation.references	EMBLEY, D. W. et al. A conceptual - modeling approach to extracting data from the Web. En : Brigham Young Universíty.	spa
dc.rights.accessrights	info:eu-repo/semantics/openAccess	spa
dc.rights.creativecommons	Atribución-NoComercial-SinDerivadas 2.5 Colombia	*
dc.rights.local	Abierto (Texto Completo)	spa
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/2.5/co/	*
dc.subject.keywords	Systems engineer	spa
dc.subject.keywords	Technological innovations	spa
dc.subject.keywords	Information systems	spa
dc.subject.keywords	Data extraction	spa
dc.subject.keywords	Database	spa
dc.subject.keywords	Information retrieval	spa
dc.subject.keywords	Information storage and retrieval systems	spa
dc.subject.keywords	Software architecture	spa
dc.subject.lemb	Ingeniería de sistemas	spa
dc.subject.lemb	Innovaciones tecnológicas	spa
dc.subject.lemb	Recuperación de información	spa
dc.subject.lemb	Sistemas de almacenamiento y recuperación de información	spa
dc.subject.lemb	Arquitectura de software	spa
dc.subject.proposal	Sistemas de información	spa
dc.subject.proposal	Extracción de datos	spa
dc.subject.proposal	Base de datos	spa
dc.title	Extracción de datos semiestructurados en la web	spa
dc.title.translated	Semi-structured data extraction on the web	spa
dc.type.coar	http://purl.org/coar/resource_type/c_7a1f
dc.type.coarversion	http://purl.org/coar/version/c_ab4af688f83e57aa	spa
dc.type.driver	info:eu-repo/semantics/bachelorThesis
dc.type.hasversion	info:eu-repo/semantics/acceptedVersion
dc.type.local	Trabajo de Grado	spa
dc.type.redcol	http://purl.org/redcol/resource_type/TP

Archivos

Bloque original

Mostrando 1 - 1 de 1

Nombre:: 2001_Correa_Trocha_Mayra (1).pdf
Tamaño:: 37.42 MB
Formato:: Adobe Portable Document Format
Descripción:: Tesis

Descargar

Bloque de licencias

Mostrando 1 - 1 de 1

Nombre:: license.txt
Tamaño:: 829 B
Formato:: Item-specific license agreed upon to submission
Descripción:

Descargar

Colecciones

Pregrado Ingeniería de Sistemas