Textual information consisting of words can be used for areas such as classification of documents into categories, queries in web and library searches, and the record linkage of name and address lists. To use text effectively, the text might possibly be cleaned to remove typographical error and documents (records) be given a mathematical representation in a probabilistic model. This talk describes an application of Bayesian networks to classify a collection of Reuter's newpaper articles (Lewis 1992) into categories (Nigam, McCallum, Thrun, and Mitchell 2000, Winkler 2000). The generalization involves a method for finding parsimonious interactions between words within classes that are related to statistical mixture methods of Winkler (1993) and Larsen and Rubin (2001). The results are indirectly compared with the current best-performing methods such as Support Vector Machines (Vapnik 1995) and Boosting (Schapire and Singer 2000, Friedman, Hastie, and Tibshirani 2000). The theoretical method is also compared to Probabilistic Latent Semantic Indexing (Hofmann 1999), the Information Bottleneck method (Slonim and Tishby 2001), and Hierarchical Mixtures of Experts (Jordan and Jacobs 1994).