Diet is one of the main sources of exposure to toxic chemicals with carcinogenic potential, some of which are generated during food processing, depending on the type of food (primarily meat, fish, bread and potatoes), cooking methods and temperature. Although demonstrated in animal models at high doses, an unequivocal link between dietary exposure to these compounds with disease has not been proven in humans. A major difficulty in assessing the actual intake of these toxic compounds is the lack of standardised and harmonised protocols for collecting and analysing dietary information. The intestinal microbiota (IM) has a great influence on health and is altered in some diseases such as colorectal cancer (CRC). Diet influences the composition and activity of the IM, and the net exposure to genotoxicity of potential dietary carcinogens in the gut depends on the interaction among these compounds, IM and diet. This review analyses critically the difficulties and challenges in the study of interactions among these three actors on the onset of CRC. Machine Learning (ML) of data obtained in subclinical and precancerous stages would help to establish risk thresholds for the intake of toxic compounds generated during food processing as related to diet and IM profiles, whereas Semantic Web could improve data accessibility and usability from different studies, as well as helping to elucidate novel interactions among those chemicals, IM and diet.