Engineering
Aug 8, 2024

Building Datasets to power AI

Dive into how building your own custom data can power high skill models in niche domains

Low-code tools are going mainstream

Purus suspendisse a ornare non erat pellentesque arcu mi arcu eget tortor eu praesent curabitur porttitor ultrices sit sit amet purus urna enim eget. Habitant massa lectus tristique dictum lacus in bibendum. Velit ut viverra feugiat dui eu nisl sit massa viverra sed vitae nec sed. Nunc ornare consequat massa sagittis pellentesque tincidunt vel lacus integer risu.

  1. Vitae et erat tincidunt sed orci eget egestas facilisis amet ornare
  2. Sollicitudin integer  velit aliquet viverra urna orci semper velit dolor sit amet
  3. Vitae quis ut  luctus lobortis urna adipiscing bibendum
  4. Vitae quis ut  luctus lobortis urna adipiscing bibendum

Multilingual NLP will grow

Mauris posuere arcu lectus congue. Sed eget semper mollis felis ante. Congue risus vulputate nunc porttitor dignissim cursus viverra quis. Condimentum nisl ut sed diam lacus sed. Cursus hac massa amet cursus diam. Consequat sodales non nulla ac id bibendum eu justo condimentum. Arcu elementum non suscipit amet vitae. Consectetur penatibus diam enim eget arcu et ut a congue arcu.

Vitae quis ut  luctus lobortis urna adipiscing bibendum

Combining supervised and unsupervised machine learning methods

Vitae vitae sollicitudin diam sed. Aliquam tellus libero a velit quam ut suscipit. Vitae adipiscing amet faucibus nec in ut. Tortor nulla aliquam commodo sit ultricies a nunc ultrices consectetur. Nibh magna arcu blandit quisque. In lorem sit turpis interdum facilisi.

  • Dolor duis lorem enim eu turpis potenti nulla  laoreet volutpat semper sed.
  • Lorem a eget blandit ac neque amet amet non dapibus pulvinar.
  • Pellentesque non integer ac id imperdiet blandit sit bibendum.
  • Sit leo lorem elementum vitae faucibus quam feugiat hendrerit lectus.
Automating customer service: Tagging tickets and new era of chatbots

Vitae vitae sollicitudin diam sed. Aliquam tellus libero a velit quam ut suscipit. Vitae adipiscing amet faucibus nec in ut. Tortor nulla aliquam commodo sit ultricies a nunc ultrices consectetur. Nibh magna arcu blandit quisque. In lorem sit turpis interdum facilisi.

“Nisi consectetur velit bibendum a convallis arcu morbi lectus aecenas ultrices massa vel ut ultricies lectus elit arcu non id mattis libero amet mattis congue ipsum nibh odio in lacinia non”
Detecting fake news and cyber-bullying

Nunc ut facilisi volutpat neque est diam id sem erat aliquam elementum dolor tortor commodo et massa dictumst egestas tempor duis eget odio eu egestas nec amet suscipit posuere fames ded tortor ac ut fermentum odio ut amet urna posuere ligula volutpat cursus enim libero libero pretium faucibus nunc arcu mauris sed scelerisque cursus felis arcu sed aenean pharetra vitae suspendisse ac.

Reinforcement Learning with Human Feedback is a slow and expensive process, particularly if you need labels from experts like doctors or accountants. In these cases the cost of getting several thousand rows can be prohibitive in model development.

At Talc we're powering data driven AI development with one key insight: you don't always need human feedback.

For example, we may always need a doctor to tell us what treatment pathway is best, or clarify what they meant by "urosepsis." However, we don't need doctors to give us well documented knowledge like the symptoms of chronic fatigue, or the diagnostic criteria for clinical depression. In these later cases we're not looking for human feedback -- we simply need to teach our models knowledge that is already encoded in medical knowledge bases like the DSM, or in canonical ontologies.

In scenarios where we're not actually looking for human feedback, we can use the encoded knowledge to generate a dataset of positive and negative examples of that knowledge. For example, we can use LLMs to turn the DSM's diagnostic criteria for clinicial depression into 100 examples of patient records, prelabeled with whether or not they would qualify under the diagnostic criteria.

We then pass the pre-labeled dataset through a stylistic transformation to match real world data; we match the generated records to real ones in tone, style, and length. As a result, we end up with a highly accurately labeled dataset that encodes the knowledge the DSM already defined-- but in a format that can train a powerful model for almost any use case.

We've already used this approach to replace teams of accountants and doctors that used to do rote manual labeling, and the data not only matches human experts in accuracy but provides 100x the quantity.

If this sounds interesting to you, we'd love to chat! Reach out below.

Get a demo

Learn how you can use better data to power training and evaluation today

Thank you

We'll reach out within 24 hours.
Oops! Something went wrong while submitting the form.