I’m the Best (I said so)

I probably wouldn't take Google's word that their model is the best. Since results are marginal, and the leading team created the benchmark...

This is a very interesting benchmark. As results get more accurate they will be increasingly useful. Now I just need to hope this benchmark will be relevant beyond flexing their top score. (I know their ai course wasn't).

Source: https://deepmind.google/blog/facts-benchmark-suite-systematically-evaluating-the-factuality-of-large-language-models/

Previous
Previous

6x Productivity Gap

Next
Next

Hard Problems