Meta's GAIA paper describes multi-step question challenges for LLMs-- let's address these!
Purus suspendisse a ornare non erat pellentesque arcu mi arcu eget tortor eu praesent curabitur porttitor ultrices sit sit amet purus urna enim eget. Habitant massa lectus tristique dictum lacus in bibendum. Velit ut viverra feugiat dui eu nisl sit massa viverra sed vitae nec sed. Nunc ornare consequat massa sagittis pellentesque tincidunt vel lacus integer risu.
Mauris posuere arcu lectus congue. Sed eget semper mollis felis ante. Congue risus vulputate nunc porttitor dignissim cursus viverra quis. Condimentum nisl ut sed diam lacus sed. Cursus hac massa amet cursus diam. Consequat sodales non nulla ac id bibendum eu justo condimentum. Arcu elementum non suscipit amet vitae. Consectetur penatibus diam enim eget arcu et ut a congue arcu.
Vitae vitae sollicitudin diam sed. Aliquam tellus libero a velit quam ut suscipit. Vitae adipiscing amet faucibus nec in ut. Tortor nulla aliquam commodo sit ultricies a nunc ultrices consectetur. Nibh magna arcu blandit quisque. In lorem sit turpis interdum facilisi.
Vitae vitae sollicitudin diam sed. Aliquam tellus libero a velit quam ut suscipit. Vitae adipiscing amet faucibus nec in ut. Tortor nulla aliquam commodo sit ultricies a nunc ultrices consectetur. Nibh magna arcu blandit quisque. In lorem sit turpis interdum facilisi.
“Nisi consectetur velit bibendum a convallis arcu morbi lectus aecenas ultrices massa vel ut ultricies lectus elit arcu non id mattis libero amet mattis congue ipsum nibh odio in lacinia non”
Nunc ut facilisi volutpat neque est diam id sem erat aliquam elementum dolor tortor commodo et massa dictumst egestas tempor duis eget odio eu egestas nec amet suscipit posuere fames ded tortor ac ut fermentum odio ut amet urna posuere ligula volutpat cursus enim libero libero pretium faucibus nunc arcu mauris sed scelerisque cursus felis arcu sed aenean pharetra vitae suspendisse ac.
Let's consider the challenge of answering multi-step questions, like those introduced in Meta's GAIA paper.
One of the interesting challenges with improving LLMs is not just answering individual questions, but also stringing together enough chain of thought breakdowns to be able to answer these questions.
For example, how would you train an LLM to answer a question like this?
> In 1886, a notable achievement in chemistry was made by a French scientist, who became the first person to isolate a certain halogen element. This element, the most electronegative and reactive of all, is commonly used in toothpaste and city water supplies. Identify this scientist and then research his familial background to find out his father's profession. Finally, determine the name of the company where his father worked, which played a significant role in the transportation sector of its time. This company's name is often abbreviated to a three-letter word. What is the full name of this company?
Not only would you need the facts that back each of the clauses above but you would need to communicate to the LLM how to break down a question like this, solve each sub problem, and then put them together to form an answer.
Here's how an LLM should correctly break down and answer the question above:
1. In 1886 French chemist Henri Moissan was the first person to isolate elemental fluorine
2. Henri Moissan's father was a minor officer of the easter railway company
3. The eastern railway company's name is commonly shortened simply to "est"
Teaching an LLM each of these facts is a somewhat solved problem. Teaching it to break down the question, however, is not.
In order to do the latter, we're use Talc's Data Generation solution. We generate synthetic data like the above in order to teach LLMs how to do a wide range of tasks, from multi step tasks like you see here to all sorts of NLP tasks like Named Entity Recognition.
In fact, the scientific example you see above was synthetically generated by Talc! Notably, not only do samples like these accurately reflect the truth, they also convey the steps necessary for an LLM to solve multi step style questions like this.
We have a number of these pipelines in the wild today, generating everything from medical transcripts to human labeling replacements, and are excited to share more results of those bodies of work soon!
In the meantime if you have a dataset you need, reach out!
Turn any knowledge base into training data
Learn how you can use better data to power training and evaluation today