Skip to main content

The National Institute of Standards and Technology (NIST)  is developing an artificial intelligence (AI) risk management framework (RMF) to promote the development of trustworthy and responsible AI systems. In March 2022, NIST released an initial draft of this RMF, opening it up for public comment and feedback.

CalypsoAI has submitted a response to this draft, offering insights on many elements that are critical to the development and deployment of trustworthy AI and machine learning models.

The AI RMF contains the necessary elements to manage risk with flexibility and can serve as an enduring resource. However, one piece that cannot be overemphasized is the need for rigorous and independent testing, evaluation, verification, and validation (TEVV). This process is key to building trust in AI models, which will ultimately enable widespread AI adoption. It is also the best protection the U.S. has to mitigate risks associated with AI, such as resilience, explainability, and privacy.

Our comments focus on the following points:

  1. It is critical for models to be rigorously tested and evaluated before deployment

“If AI/ML models are not tested during the procurement process, it is possible that vulnerabilities or inaccuracies in the models may go undetected. Consequently, we recommend updating this category to include model developers, who should perform a separate TEVV process so that they can confidently advance robust models to the operators and evaluators. This will also enhance understanding of AI risks throughout the AI/ML lifecycle, which will enable better organizational decision-making.”

  1. Automated solutions and clear benchmarks can lead to more effective deployment

“As it currently stands, the testing and evaluation (T&E) process is labor intensive. As such, automating this process both pre- and post-deployment gives data scientists valuable time back and shifts dependence away from arbitrary model evaluation metrics, such as F1 scores, ROC, AUC, Precision, and Recall … Moreover, this section should focus on developing guidance for organizations to determine acceptable model thresholds for their specific conditions and risk factors. This will enable them to choose tests that are automated, adaptable, and scalable.”

  1. Independent testing should be ongoing and repeatable

“In order to build trust in AI systems, safe deployment is essential. This requires users to both rigorously test and validate their models before deployment into production, as well as continuing to validate models once they are deployed. This will greatly reduce any “risks” that may arise when determining whether to deploy these systems.”

CalypsoAI agrees that standards will continue to evolve with the technology landscape. However, this should not require the creation of a cumbersome validation process that requires sign-off from multiple stakeholders each time we seek to deploy AI models, or cause a delay in creating a standardized validation method. Given CalypsoAI’s expertise in third-party AI/ML model validation, we know that it is possible to institutionalize an automated TEVV process that mitigates risk and builds trust.

View our comments on the RMF