Molecular biology and genetics provide "natural" examples of modelable objects such as linear sequences of symbols in a finite alphabet.
Thus each chromosome, carrying the genetic capital, is essentially composed of two strands of DNA: each of these strands can be modeled (disregarding the three-dimensional helical structure) as a succession of nucleotides, each composed of a phosphate or phosphoric acid, a sugar (deoxyribose) and a nitrogen base. There are four different bases: two are called purine (guanine G and adenine A), the other two are pyrimidine (cytosine C and thymine T), which work "in pairs", thymine always binding to the body. adenine and cytosine always to guanine. Since the information encoded in these databases determines a large part of the genetic information, a useful model of a chromosome strand consists of the simple linear sequence of the bases that compose it, in fact a (very long) sequence defined on a four-letter alphabet (ATCG).
From this model, the question arises of knowing how to search for particular sequences of nucleotides in a chromosome or to detect similarities / dissimilarities between two (or more) DNA fragments. These genetic similarities will serve, for example, to quantify evolutionary proximities between populations, to locate genes fulfilling the same functions in two neighboring species or to carry out tests of familiarity between individuals. Searching for sequences and measuring similarities are therefore two basic problems of bioinformatics.
This type of calculation is not limited to genes and is also used for proteins. Indeed, the primary structure of a protein can be modeled by the simple linear sequence of the amino acids it contains and which determines some of the properties of the protein. Since the amino acids are also in finite number (20), the proteins can then be modeled as finite sequences on an alphabet with 20 letters.