Skip to main content

In August, the National Institute of Standards and Technology (NIST) released the second draft of its Artificial Intelligence Risk Management Framework, seeking feedback and further information as the institute moves toward a final version. CalypsoAI has provided feedback at each stage of the process with NIST, and last week, we submitted our official response to this latest draft, praising the inclusion of testing, evaluation, verification, and validation (TEVV) at each phase of the AI lifecycle.

Below, we’ve highlighted key portions of our response to the draft, as well as where we think there is an opportunity to improve the RMF and create a truly comprehensive framework for the deployment of responsible AI.

TEVV is critical throughout the AI/ML pipeline

The RMF draft identifies TEVV tasks at every stage of the AI lifecycle, including design and planning (validating capabilities relative to the intended context of application); development (pre-deployment model validation and assessment); deployment (system validation, with recalibration based on internal and external factors); and operations (ongoing monitoring and testing). 

CalypsoAI believes that TEVV is crucial pre- and post-deployment; however, pre-deployment TEVV is especially important. It allows for the evaluation of model performance, identification of risks associated with deployment, and the opportunity to improve a model before it is put into action, all of which reduce negative consequences. NIST’s inclusion of TEVV across the entirety of the AI lifecycle illustrates the positive impact a repeatable, trustworthy AI pipeline can have at all stages and for all stakeholders.

There is no one-size-fits-all approach to risk management

Every AI/ML system is different, every deployment environment is different, and every mission is different. For AI missions, context is key, and a standardized, repeatable TEVV framework aligned to NIST’s guidelines is critical to mitigating risk.

Often, organizations purchase pre-configured algorithms, which means users only have the vendor’s word that the model will perform as intended, Without knowing how the model is trained, explainability challenges arise, increasing the likelihood of unintended consequences this framework seeks to address. Testing needs to take into account the conditions surrounding the AI/ML systems being deployed.

Independent TEVV can ensure consistency

While TEVV is emphasized throughout the framework, independent TEVV is not specifically called out as a critical element. It is important that an independent TEVV process is embraced to ensure organizations and developers are not “checking their own homework,” so to speak. Additionally, this ensures consistency and holds every AI/ML system to the same standard. Creating a standardized, repeatable approach ensures widespread AI mission success.

Specificity and standardization are critical

While the RMF mentions the need for deployment context and acknowledges the risks in operational environments different than lab environments, it does not mention what general types of operational conditions to test for. This specificity is important if the U.S. wishes to accelerate toward widespread AI adoption.

Additionally, we believe there needs to be more than a voluntary framework to provide guidance on responsible AI. While it is agreed that AI policy discussions are live and evolving, regulatory guidance would be more powerful, and a more specific, standardized framework will be key in promoting AI innovation across the U.S. government. Requiring accountability practices now, in the nascent stage of AI adoption, will reap greater benefits in the future in terms of safety, security, and transparency.

CalypsoAI firmly supports NIST’s effort to establish a risk management framework for responsible AI, and we welcome any opportunity to work with NIST, industry partners, and broader government agencies to assist in developing a responsible, trustworthy, and secure AI RMF for the benefit of all sectors.

You can read the entirety of CalypsoAI’s response to the draft RMF here.