How do LLMs generate grammatically-correct sentences every time?
I get that it predicts the next token using weights that were generated while training on a large amount of text that was scraped from the internet. But, a lot of people make typos, grammar mistakes, and sometimes even poorly written HTML can also display junk in the content.
So did they have to do some trickery to compensate for all this or is the rules of grammar part of the text-generation pipeline itself?