Schema Validation
Schema validation with Marshmallow and Pydantic for Python applications
What is Schema Validation?
Schema validation is the process of verifying that data conforms to a predefined structure. In API development and testing, schemas define the expected format of request and response data.
An API schema is like a database schema definition but for APIs, making integration between platforms easier for developers.
Why Use Schema Validation?
- Request validation: Verify data before sending API requests
- Response validation: Ensure API responses match expected formats
- Contract testing: Validate that APIs conform to their specifications
- Documentation: Schemas serve as living documentation
- Type safety: Catch data type errors early
Schema Characteristics
A schema defines:
- Structure: The fields a data object should contain
- Data types: The type of each field (string, number, boolean, etc.)
- Required fields: Which fields must be present
- Constraints: Minimum/maximum values, patterns, etc.
Prerequisites
Installation
uv pip install pydantic
# For OpenAPI schema generation
uv pip install datamodel-code-generatoruv pip install marshmallow
# For OpenAPI schema generation
uv pip install swagger-marshmallow-codegenVerify Installation
import pydantic
print(pydantic.__version__)
# Expected: 2.x.ximport marshmallow
print(marshmallow.__version__)Basic Usage
Define a Model
from pydantic import BaseModel
from typing import Optional
class User(BaseModel):
id: int
name: str
email: str
is_active: bool = True
age: Optional[int] = NoneValidate Data
# Valid data
user = User(id=1, name="John Doe", email="john@example.com")
print(user.model_dump())
# {'id': 1, 'name': 'John Doe', 'email': 'john@example.com', 'is_active': True, 'age': None}
# Invalid data raises ValidationError
from pydantic import ValidationError
try:
user = User(id="not_an_int", name="John", email="john@example.com")
except ValidationError as e:
print(e.errors())Define a Schema
from marshmallow import Schema, fields
class UserSchema(Schema):
id = fields.Int(dump_only=True)
name = fields.Str(required=True)
email = fields.Email(required=True)
created_at = fields.DateTime(dump_only=True)Serialize (Dump) Data
user_data = {
"id": 1,
"name": "John Doe",
"email": "john@example.com"
}
schema = UserSchema()
result = schema.dump(user_data)
print(result)
# {'id': 1, 'name': 'John Doe', 'email': 'john@example.com'}Deserialize and Validate (Load) Data
input_data = {
"name": "Jane Doe",
"email": "jane@example.com"
}
schema = UserSchema()
result = schema.load(input_data)
# Returns validated data or raises ValidationErrorValidation Approaches
Request Schema Validation
Validate data before making an API request:
from pydantic import BaseModel, ValidationError
class CreateUserRequest(BaseModel):
first_name: str
last_name: str
email: str
data = {
"first_name": "Jane",
"last_name": "Doe",
"email": "jane@example.com"
}
try:
validated = CreateUserRequest(**data)
# Proceed with API request using validated.model_dump()
except ValidationError as e:
print(f"Validation failed: {e}")from marshmallow import Schema, fields, ValidationError
class UserCreateRequest(Schema):
firstName = fields.String(required=True)
lastName = fields.String(required=True)
emailAddress = fields.Email(required=True)
data = {
"firstName": "User FName",
"lastName": "User LName",
"emailAddress": "new_user@example.com"
}
schema = UserCreateRequest()
try:
validated = schema.load(data)
except ValidationError as e:
print(f"Request validation failed: {e.messages}")Response Schema Validation
Validate data after receiving an API response:
import requests
from pydantic import BaseModel, ValidationError
class User(BaseModel):
id: int
name: str
email: str
response = requests.get("https://api.example.com/users/1")
try:
user = User(**response.json())
print(f"Valid user: {user.name}")
except ValidationError as e:
raise AssertionError(f"Response schema mismatch: {e}")import requests
from marshmallow import Schema, fields, ValidationError, INCLUDE
class UserCreateResponse(Schema):
emailAddress = fields.String(required=True)
class Meta:
unknown = INCLUDE
response = requests.post(
url="https://api.example.com/users",
json={"firstName": "User", "lastName": "Name", "emailAddress": "user@example.com"}
)
response_data = response.json()
schema = UserCreateResponse()
try:
validated = schema.load(response_data)
except ValidationError as e:
print(f"Response validation failed: {e.messages}")Generate from OpenAPI
If your API has OpenAPI specifications (Swagger), you can export the schema as JSON or YAML and use code generators to create Python models.
For FastAPI applications:
- Navigate to
/docsor/openapi.json - Download the specification
For SwaggerHub:
- Open the API in SwaggerHub Editor
- Export as YAML or JSON
Install and Run datamodel-code-generator
uv pip install datamodel-code-generator
# From YAML
datamodel-codegen --input api_spec.yaml --input-file-type openapi --output models.py
# From JSON
datamodel-codegen --input api_spec.json --input-file-type openapi --output models.pyGenerated Output Example
# generated by datamodel-codegen
from __future__ import annotations
from typing import Optional
from pydantic import BaseModel
class AuthenticationRequest(BaseModel):
device_name: Optional[str] = None
device_id: Optional[str] = None
secret: Optional[str] = None
client_version: Optional[str] = None
client_id: Optional[int] = None
class LoginResponse(BaseModel):
user_id: Optional[str] = None
second_factor_required: Optional[bool] = NoneInstall and Run swagger-marshmallow-codegen
uv pip install swagger-marshmallow-codegen
# From YAML
swagger-marshmallow-codegen api_spec.yaml models.py
# From JSON
swagger-marshmallow-codegen api_spec.json models.pyGenerated Output Example
# Auto-generated by swagger-marshmallow-codegen
from marshmallow import Schema, fields, INCLUDE
class UserCreateRequest(Schema):
firstName = fields.String(required=True)
lastName = fields.String(required=True)
emailAddress = fields.String(required=True)
birthDate = fields.Date()
class Meta:
unknown = INCLUDE
class UserCreateResponse(Schema):
emailAddress = fields.String(required=True)
class Meta:
unknown = INCLUDEAdvanced Features
Field Validation
from pydantic import BaseModel, field_validator, Field
class User(BaseModel):
name: str = Field(min_length=1, max_length=100)
age: int = Field(ge=0, le=150)
email: str
@field_validator('email')
@classmethod
def validate_email(cls, v):
if '@' not in v:
raise ValueError('Invalid email format')
return v.lower()Nested Models
class Address(BaseModel):
street: str
city: str
country: str
class User(BaseModel):
name: str
address: Address
tags: list[str] = []Model Configuration
from pydantic import BaseModel, ConfigDict
class StrictUser(BaseModel):
model_config = ConfigDict(
strict=True, # No type coercion
extra='forbid', # Error on extra fields
frozen=True, # Immutable instances
)
name: str
age: intJSON Schema Export
class User(BaseModel):
name: str
email: str
# Export as JSON Schema
print(User.model_json_schema())Alias for API Field Names
from pydantic import BaseModel, ConfigDict, Field
class User(BaseModel):
model_config = ConfigDict(populate_by_name=True)
first_name: str = Field(alias="firstName")
last_name: str = Field(alias="lastName")
# Can parse with either name
user = User(firstName="John", lastName="Doe")
# Or
user = User(first_name="John", last_name="Doe")Nested Schemas
from marshmallow import Schema, fields
class AddressSchema(Schema):
street = fields.Str()
city = fields.Str()
country = fields.Str()
class UserSchema(Schema):
name = fields.Str()
address = fields.Nested(AddressSchema)Custom Validation
from marshmallow import Schema, fields, validates, ValidationError
class UserSchema(Schema):
age = fields.Int()
@validates('age')
def validate_age(self, value):
if value < 0:
raise ValidationError('Age must be positive.')
if value > 150:
raise ValidationError('Age seems unrealistic.')Handle Unknown Fields
from marshmallow import Schema, EXCLUDE, INCLUDE, RAISE
class StrictSchema(Schema):
class Meta:
unknown = RAISE # Raise error on unknown fields
class FlexibleSchema(Schema):
class Meta:
unknown = INCLUDE # Include unknown fields
class FilteringSchema(Schema):
class Meta:
unknown = EXCLUDE # Silently ignore unknown fieldsPydantic vs Marshmallow
| Feature | Pydantic | Marshmallow |
|---|---|---|
| Type hints | Native | Via plugins |
| Performance | Faster (Rust core) | Good |
| JSON Schema | Built-in export | Via extension |
| ORM Integration | SQLModel | SQLAlchemy |
| Approach | Type-annotation-first | Class-attribute-first |
| Default in FastAPI | Yes | No |
| Serialization | .model_dump() | .dump() |
| Deserialization | Constructor / model_validate() | .load() |
Best Practices
- Version your schemas: Keep schemas in sync with API versions
- Use strict mode: Fail on unexpected fields
- Test schema changes: Breaking changes should be caught in CI
- Generate from source: Prefer generating schemas from API specs over manual definition
- Validate at boundaries: Validate incoming data as early as possible
Troubleshooting
Validation Errors with Nested Objects
Ensure nested schemas are properly defined:
# Wrong - won't validate nested structure
class User(BaseModel):
address: dict # Too loose
# Right - properly typed
class Address(BaseModel):
street: str
city: str
class User(BaseModel):
address: Address# Wrong - won't validate nested structure
class UserSchema(Schema):
address = fields.Dict() # Too loose
# Right - properly typed
class AddressSchema(Schema):
street = fields.Str()
city = fields.Str()
class UserSchema(Schema):
address = fields.Nested(AddressSchema)Schema Drift Between Code and API
Keep schemas in sync with API specs:
- Regenerate schemas when API changes
- Run schema tests in CI to catch drift
- Use OpenAPI specs as the source of truth
datamodel-codegen --input openapi.json --output models.pyswagger-marshmallow-codegen openapi.json models.pyPerformance Issues with Large Payloads
For large JSON documents:
- Use streaming validation where possible
- Validate only required fields for quick checks
- Consider Pydantic v2 (Rust-based, much faster)
Type Coercion Issues (Pydantic)
Pydantic coerces types by default. For strict validation:
from pydantic import BaseModel, ConfigDict
class StrictModel(BaseModel):
model_config = ConfigDict(strict=True)
count: int # Won't accept "123" string
Pydantic v1 vs v2
If youβre migrating from v1:
| v1 | v2 |
|---|---|
class Config | model_config = ConfigDict() |
.dict() | .model_dump() |
.json() | .model_dump_json() |
@validator | @field_validator |
parse_obj() | model_validate() |
Optional Fields Still Required (Pydantic v2)
In Pydantic v2, Optional only affects the type (allows None), not whether
the field is required. A field without a default value is always required,
regardless of Optional. This differs from Pydantic v1, where Optional alone
implied a default of None.
# Pydantic v2: still required (no default value)
field: Optional[str]
# Pydantic v2: truly optional (has a default)
field: Optional[str] = None
Extra Fields Causing Errors (Pydantic)
By default, extra fields are ignored. To change:
model_config = ConfigDict(extra='forbid') # Raise error
model_config = ConfigDict(extra='allow') # Include them
model_config = ConfigDict(extra='ignore') # Silently drop (default)
ValidationError Not Raised (Marshmallow)
Make sure youβre using load() not dump():
load()- deserializes AND validatesdump()- serializes only, no validation
YAML Parsing Error with swagger-marshmallow-codegen
If you see:
INFO:dictknife.loading._lazyimport:yaml package is not found
TypeError: JSONDecoder.__init__() got an unexpected keyword argument 'loader'
Use JSON format instead:
swagger-marshmallow-codegen api_spec.json models.py
Field Name Mismatches (Marshmallow)
Use data_key for different API field names:
class UserSchema(Schema):
first_name = fields.Str(data_key="firstName")
Resources
Official documentation with comprehensive guides
Official documentation with guides and API reference
Generate Pydantic models from various sources
Generate Marshmallow schemas from Swagger/OpenAPI specs
Guide for migrating from Pydantic v1 to v2
The standard for JSON data validation
Standard for describing RESTful APIs