Búsqueda flexible y eficiente en texto con paralelismo de Bits

Pinzón Ardila, Yoan José

Búsqueda flexible y eficiente en texto con paralelismo de Bits

dc.contributor.author	Pinzón Ardila, Yoan José
dc.coverage.campus	UNAB Campus Bucaramanga	spa
dc.coverage.spatial	Bucaramanga (Santander, Colombia)	spa
dc.date.accessioned	2024-03-08T12:57:07Z
dc.date.available	2024-03-08T12:57:07Z
dc.date.issued	2002
dc.description.abstract	El emparejamiento de secuencias se puede entender como el problema de encontrar un patrón con una cierta característica dentro de una secuencia dada de símbolos. El caso más simple es el de encontrar una secuencia dada dentro de la otra secuencia más larga. Este es uno de los más viejos y más penetrantes problemas en informática. Los usos que requieren una cierta forma de emparejamiento de secuencias se pueden encontrar virtualmente por todas partes. Sin embargo, los años recientes han atestiguado un aumento dramático en interés en problemas que emparejan secuencias, especialmente dentro de las comunidades que han crecido más rápidamente como la recuperación de datos y la Biocomputacion. Estas comunidades están haciendo frente no solamente a un aumento drástico en los tamaños del texto que tienen que manejar, sino que también están exigiendo búsquedas más rápidas y sofisticadas. Los patrones de interés no son secuencias simples, sino que también incluyen comodines, boquetes, y expresiones regulares. La definición de un calce puede también permitir diferencias leves entre el patrón y su ocurrencia en el texto. Esto se llama “emparejamiento aproximado” y es especialmente interesante en la recuperación del texto y la biología de cómputo El objetivo de esta investigación es el diseño, análisis e implementación de nuevos algoritmos de búsqueda flexible y eficiente en texto mediante el uso de una nueva técnica que hace uso inherente de la capacidad que tienen las computadoras para hacer operaciones de bits en forma paralela. El objetivo de este trabajo de investigación es el desarrollo y análisis de nuevos algoritmos para resolver el problema de búsqueda aproximada en texto bajo distintas condiciones, así como una mejor comprensión del problema mismo y su comportamiento estadístico. Si bien nuestros resultados pueden ser validos en diversas áreas, centramos nuestra atención en la búsqueda en texto típica de las aplicaciones de recuperación de información.	spa
dc.description.abstractenglish	Sequence matching can be understood as the problem of finding a pattern with a certain characteristic within a given sequence of symbols. The simplest case is to find a given sequence within the other longer sequence. This is one of the oldest and most pervasive problems in computing. Uses that require some form of sequence matching can be found virtually everywhere. However, recent years have witnessed a dramatic increase in interest in sequence matching problems, especially within the most rapidly growing communities such as data retrieval and biocomputing. These communities are facing not only a drastic increase in the text sizes they have to handle, but they are also demanding faster and more sophisticated searches. The patterns of interest are not simple sequences, but also include wildcards, gaps, and regular expressions. The definition of a fit can also allow for slight differences between the pattern and its occurrence in the text. This is called “fuzzy matching” and is especially interesting in text retrieval and computational biology. The objective of this research is the design, analysis and implementation of new flexible and efficient search algorithms in text by using a new technique that makes inherent use of the ability of computers to do bit operations in parallel. The objective of this research work is the development and analysis of new algorithms to solve the problem of approximate search in text under different conditions, as well as a better understanding of the problem itself and its statistical behavior. Although our results may be valid in various areas, we focus our attention on the text search typical of information retrieval applications.	spa
dc.description.learningmodality	Modalidad Virtual	spa
dc.description.tableofcontents	DESCRIPCIÓN DEL PROYECTO SINOPSIS RESUMEN INFORME DE RESULTADOS IMPACTOS INFORME FINANCIERO	spa
dc.format.mimetype	application/pdf	spa
dc.identifier.instname	instname:Universidad Autónoma de Bucaramanga - UNAB	spa
dc.identifier.reponame	reponame:Repositorio Institucional UNAB	spa
dc.identifier.repourl	repourl:https://repository.unab.edu.co	spa
dc.identifier.uri	http://hdl.handle.net/20.500.12749/23884
dc.language.iso	spa	spa
dc.publisher.faculty	Facultad Ingeniería	spa
dc.publisher.grantor	Universidad Autónoma de Bucaramanga UNAB	spa
dc.relation.references	R. A. Baeza-Yates and G. H. Gonnet. A new approach to text searching. Commun. ACM, 35(10):74:82, 1992.	spa
dc.relation.references	P. A. V. Hall and G. R. Dowling. Approximate string matching. ACM Comp. Surv., 12(4):381:402, 1980.	spa
dc.relation.references	J. W. Hunt and T. G. Szymanski. A fast algorithm for computing longest common subsequences. Commun. ACM, 20(5):350:353, 1977	spa
dc.relation.references	V. I. Levenshtein. Binary codes capable of correcting deletions, insertions and reversals. Sov. Phys. Dokl., 6:707:710, 1966.	spa
dc.relation.references	W. J, Masek and M. S. Paterson. A faster algorithm for computing string edit distances. J. Comput. Syst. Sci., 20(1):18:31, 1980,	spa
dc.relation.references	S. B. Needleman and C. D. Wunsch. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J, Mol. Biol., 48:443:453, 1970.	spa
dc.relation.references	D. Sanko and J. B. Kruskal. Time warps, string edits, and macromolecules: the theory and practice of sequence comparison. Addison-Wesley, Reading, MA, 1983.	spa
dc.relation.references	P. H. Sellers. The theory and computation of evolutionary distances: Pattern recognition. J, Algorithms, 1(4):359:373, 1980.	spa
dc.relation.references	T. F. Smith and M. S. Waterman. Identification of common molecular sequences. J. Mol. Biol., 147:195:197, 1981.	spa
dc.relation.references	R. A. Wagner and M. Fischer. The string-to-string correction problem. J. Assoc. Comput. Mach., 21(1):168:173, 1974.	spa
dc.relation.references	S. Wu and U, Manber. Fast text searching allowing errors. Commun. ACM, 35(10):83:91, 1992.	spa
dc.rights.accessrights	info:eu-repo/semantics/openAccess	spa
dc.rights.creativecommons	Atribución-NoComercial-SinDerivadas 2.5 Colombia	*
dc.rights.local	Abierto (Texto Completo)	spa
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/2.5/co/	*
dc.subject.keywords	Algorithm design and analysis	spa
dc.subject.keywords	Parallelism with bits	spa
dc.subject.keywords	Algorithms	spa
dc.subject.keywords	Mathematical models	spa
dc.subject.keywords	Programming languages (Electronic computers)	spa
dc.subject.keywords	Electronic data processing	spa
dc.subject.lemb	Algoritmos	spa
dc.subject.lemb	Modelos matemáticos	spa
dc.subject.lemb	Lenguajes de programación (Computadores electrónicos)	spa
dc.subject.lemb	Procesamiento electrónico de datos	spa
dc.subject.proposal	Diseño y análisis de algoritmos	spa
dc.subject.proposal	Paralelismo con bits	spa
dc.title	Búsqueda flexible y eficiente en texto con paralelismo de Bits	spa
dc.title.translated	Flexible and efficient search in text with Bits parallelism	spa
dc.type	Research report	eng
dc.type.coar	http://purl.org/coar/resource_type/c_18ws
dc.type.coarversion	http://purl.org/coar/version/c_ab4af688f83e57aa	spa
dc.type.driver	info:eu-repo/semantics/workingPaper	spa
dc.type.hasversion	info:eu-repo/semantics/acceptedVersion	spa
dc.type.local	Informe de investigación	spa
dc.type.redcol	http://purl.org/redcol/resource_type/IFI