• Survey For Issues during AI Training or Development

    The goal of this survey is to get a deeper understanding of the current issues during AI Training and Development that solo-devs/small teams/researchers are experiencing.
  • What best describes you*
  • Where do you train most?*
  • What's your typical setup*
  • Typical GPU's you rent/use*
  • In the last 30 days, how many training runs were disrupted by infrastructure issues?
  • If you've had a failed run, what was the cause for it ? (please check all that apply)*
  • How long did it take you to detect it was the infra and not your code, that caused the issue ?*
  • What's the cost of a typical incident (In hours, average per incident)*
  • How often do you abandon a VM and start fresh because debugging feels too slow ?*
  • Have you had any silent problems ?*
  • Have you ever been in a situation where comparing the difference between models and versions has been a major bottleneck*
  • Have you ever been in a situation where analyzing the data during/after training has been a major bottleneck ?*
  • What do you use today to track/compare runs?*
  • What’s the biggest pain with your current approach?*
  • Which answers would you most want to have automatically?*
  • Should be Empty: