Generation of harmful content such as false/misleading information, biased/discriminatory outputs, or material violating public order and legal/regulatory requirements.
OS/application vulnerabilities, insecure network communications, improper data storage practices, API exposure risks, Identity & Access Management gaps, etc.
Risks introduced by training data, such as sensitive data leakage, intellectual property infringement, privacy breaches and corpus contamination.
Batch-evaluate the model’s response to prohibited content via automated testing tools, rapidly identifying content security blind spots.
Simulate the behavior of real attackers and conduct “red team” attacks on the model using various jailbreaking techniques to uncover underlying security vulnerabilities.
Starting from the system architecture, identify vulnerabilities and configuration risks in the model’s operational environment.