Schema Validation
Schema Validation Schema validation with Marshmallow and Pydantic for Python applications
Tools Quest #22 Intermediate

Schema Validation

Schema validation with Marshmallow and Pydantic for Python applications

pythonvalidationschemaspydanticmarshmallowapiserialization
Download as:

What is Schema Validation?

Schema validation is the process of verifying that data conforms to a predefined structure. In API development and testing, schemas define the expected format of request and response data.

An API schema is like a database schema definition but for APIs, making integration between platforms easier for developers.

Why Use Schema Validation?

  • Request validation: Verify data before sending API requests
  • Response validation: Ensure API responses match expected formats
  • Contract testing: Validate that APIs conform to their specifications
  • Documentation: Schemas serve as living documentation
  • Type safety: Catch data type errors early

Schema Characteristics

A schema defines:

  • Structure: The fields a data object should contain
  • Data types: The type of each field (string, number, boolean, etc.)
  • Required fields: Which fields must be present
  • Constraints: Minimum/maximum values, patterns, etc.

Prerequisites

Installation

uv pip install pydantic

# For OpenAPI schema generation
uv pip install datamodel-code-generator

Verify Installation

import pydantic
print(pydantic.__version__)
# Expected: 2.x.x

Basic Usage

Define a Model

from pydantic import BaseModel
from typing import Optional

class User(BaseModel):
    id: int
    name: str
    email: str
    is_active: bool = True
    age: Optional[int] = None

Validate Data

# Valid data
user = User(id=1, name="John Doe", email="john@example.com")
print(user.model_dump())
# {'id': 1, 'name': 'John Doe', 'email': 'john@example.com', 'is_active': True, 'age': None}

# Invalid data raises ValidationError
from pydantic import ValidationError

try:
    user = User(id="not_an_int", name="John", email="john@example.com")
except ValidationError as e:
    print(e.errors())

Validation Approaches

Request Schema Validation

Validate data before making an API request:

from pydantic import BaseModel, ValidationError

class CreateUserRequest(BaseModel):
    first_name: str
    last_name: str
    email: str

data = {
    "first_name": "Jane",
    "last_name": "Doe",
    "email": "jane@example.com"
}

try:
    validated = CreateUserRequest(**data)
    # Proceed with API request using validated.model_dump()
except ValidationError as e:
    print(f"Validation failed: {e}")

Response Schema Validation

Validate data after receiving an API response:

import requests
from pydantic import BaseModel, ValidationError

class User(BaseModel):
    id: int
    name: str
    email: str

response = requests.get("https://api.example.com/users/1")

try:
    user = User(**response.json())
    print(f"Valid user: {user.name}")
except ValidationError as e:
    raise AssertionError(f"Response schema mismatch: {e}")

Generate from OpenAPI

If your API has OpenAPI specifications (Swagger), you can export the schema as JSON or YAML and use code generators to create Python models.

For FastAPI applications:

  • Navigate to /docs or /openapi.json
  • Download the specification

For SwaggerHub:

  • Open the API in SwaggerHub Editor
  • Export as YAML or JSON

Install and Run datamodel-code-generator

uv pip install datamodel-code-generator

# From YAML
datamodel-codegen --input api_spec.yaml --input-file-type openapi --output models.py

# From JSON
datamodel-codegen --input api_spec.json --input-file-type openapi --output models.py

Generated Output Example

# generated by datamodel-codegen
from __future__ import annotations
from typing import Optional
from pydantic import BaseModel

class AuthenticationRequest(BaseModel):
    device_name: Optional[str] = None
    device_id: Optional[str] = None
    secret: Optional[str] = None
    client_version: Optional[str] = None
    client_id: Optional[int] = None

class LoginResponse(BaseModel):
    user_id: Optional[str] = None
    second_factor_required: Optional[bool] = None

Advanced Features

Field Validation

from pydantic import BaseModel, field_validator, Field

class User(BaseModel):
    name: str = Field(min_length=1, max_length=100)
    age: int = Field(ge=0, le=150)
    email: str

    @field_validator('email')
    @classmethod
    def validate_email(cls, v):
        if '@' not in v:
            raise ValueError('Invalid email format')
        return v.lower()

Nested Models

class Address(BaseModel):
    street: str
    city: str
    country: str

class User(BaseModel):
    name: str
    address: Address
    tags: list[str] = []

Model Configuration

from pydantic import BaseModel, ConfigDict

class StrictUser(BaseModel):
    model_config = ConfigDict(
        strict=True,           # No type coercion
        extra='forbid',        # Error on extra fields
        frozen=True,           # Immutable instances
    )

    name: str
    age: int

JSON Schema Export

class User(BaseModel):
    name: str
    email: str

# Export as JSON Schema
print(User.model_json_schema())

Alias for API Field Names

from pydantic import BaseModel, ConfigDict, Field

class User(BaseModel):
    model_config = ConfigDict(populate_by_name=True)

    first_name: str = Field(alias="firstName")
    last_name: str = Field(alias="lastName")

# Can parse with either name
user = User(firstName="John", lastName="Doe")
# Or
user = User(first_name="John", last_name="Doe")

Pydantic vs Marshmallow

FeaturePydanticMarshmallow
Type hintsNativeVia plugins
PerformanceFaster (Rust core)Good
JSON SchemaBuilt-in exportVia extension
ORM IntegrationSQLModelSQLAlchemy
ApproachType-annotation-firstClass-attribute-first
Default in FastAPIYesNo
Serialization.model_dump().dump()
DeserializationConstructor / model_validate().load()

Best Practices

  1. Version your schemas: Keep schemas in sync with API versions
  2. Use strict mode: Fail on unexpected fields
  3. Test schema changes: Breaking changes should be caught in CI
  4. Generate from source: Prefer generating schemas from API specs over manual definition
  5. Validate at boundaries: Validate incoming data as early as possible

Troubleshooting

Validation Errors with Nested Objects

Ensure nested schemas are properly defined:

# Wrong - won't validate nested structure
class User(BaseModel):
    address: dict  # Too loose

# Right - properly typed
class Address(BaseModel):
    street: str
    city: str

class User(BaseModel):
    address: Address

Schema Drift Between Code and API

Keep schemas in sync with API specs:

  1. Regenerate schemas when API changes
  2. Run schema tests in CI to catch drift
  3. Use OpenAPI specs as the source of truth
datamodel-codegen --input openapi.json --output models.py

Performance Issues with Large Payloads

For large JSON documents:

  • Use streaming validation where possible
  • Validate only required fields for quick checks
  • Consider Pydantic v2 (Rust-based, much faster)

Type Coercion Issues (Pydantic)

Pydantic coerces types by default. For strict validation:

from pydantic import BaseModel, ConfigDict

class StrictModel(BaseModel):
    model_config = ConfigDict(strict=True)
    count: int  # Won't accept "123" string

Pydantic v1 vs v2

If you’re migrating from v1:

v1v2
class Configmodel_config = ConfigDict()
.dict().model_dump()
.json().model_dump_json()
@validator@field_validator
parse_obj()model_validate()

Optional Fields Still Required (Pydantic v2)

In Pydantic v2, Optional only affects the type (allows None), not whether the field is required. A field without a default value is always required, regardless of Optional. This differs from Pydantic v1, where Optional alone implied a default of None.

# Pydantic v2: still required (no default value)
field: Optional[str]

# Pydantic v2: truly optional (has a default)
field: Optional[str] = None

Extra Fields Causing Errors (Pydantic)

By default, extra fields are ignored. To change:

model_config = ConfigDict(extra='forbid')  # Raise error
model_config = ConfigDict(extra='allow')   # Include them
model_config = ConfigDict(extra='ignore')  # Silently drop (default)

ValidationError Not Raised (Marshmallow)

Make sure you’re using load() not dump():

  • load() - deserializes AND validates
  • dump() - serializes only, no validation

YAML Parsing Error with swagger-marshmallow-codegen

If you see:

INFO:dictknife.loading._lazyimport:yaml package is not found
TypeError: JSONDecoder.__init__() got an unexpected keyword argument 'loader'

Use JSON format instead:

swagger-marshmallow-codegen api_spec.json models.py

Field Name Mismatches (Marshmallow)

Use data_key for different API field names:

class UserSchema(Schema):
    first_name = fields.Str(data_key="firstName")

Resources

πŸ”—
Pydantic Documentation docs.pydantic.dev

Official documentation with comprehensive guides

πŸ”—
Marshmallow Documentation marshmallow.readthedocs.io

Official documentation with guides and API reference

πŸ”—
datamodel-code-generator github.com

Generate Pydantic models from various sources

πŸ”—
swagger-marshmallow-codegen github.com

Generate Marshmallow schemas from Swagger/OpenAPI specs

πŸ”—
Pydantic v1 to v2 Migration docs.pydantic.dev

Guide for migrating from Pydantic v1 to v2

πŸ”—
JSON Schema json-schema.org

The standard for JSON data validation

πŸ”—
OpenAPI Specification swagger.io

Standard for describing RESTful APIs