Pages

9/10/18

Using Python 3.7 dataclasses in JSON (de)serialization purposes

python json dataclasses
    One of the most common process that goes together with APIs creation is a serialization (marshalling) or deserialization (unmarshalling) of data. So, what is (de)serialization? According to Wikipedia it is the process of translating data structures or object state into a format that can be stored (for example, in a file or memory buffer) or transmitted (for example, across a network connection) and reconstructed later (possibly in a different computer environment). You may ask "Hey, python has already tons of different packages for these purposes". And this is true. There are many cool packages like marshmallow, but in this article we'll be talking about (de)serialization with Python 3.7 standard features.
    One of the best things (for me) that come up with python 3.7 release is a dataclasses module. Basically dataclasses module "provides a decorator and functions for automatically adding generated special methods such as __init__() and __repr__() to user-defined classes". In other words this module just provides some syntax sugar and frees you from adding __init__() or __repr__() or other methods described in docs by hand.
    As dataclass has fields typing and look almost common to marshmallow Schemas I just supposed they can be useful for serializing of responses in my API. After reading of PEP 557 I created the following mixin:
@dataclass
class DataClassMixin:
    def __post_init__(self):
        errors = {}
        self_fields = fields(self)
        for fld in self_fields:
            value = getattr(self, fld.name)
            value_type = type(value)
            has_origin_attr = hasattr(fld.type, '__origin__')
            field_type_original = fld.type.__origin__ if has_origin_attr else fld.type
            invalid_type_message = f"Invalid data type. " \
                                   f"Expected '{field_type_original.__name__}' got '{value_type.__name__}'"
            # cheat the 'TypeError: Subscripted generics cannot be used with class and instance checks'
            # https://bugs.python.org/issue29262
            if (
                not fld.metadata.get('allow_empty') and (
                (has_origin_attr and value_type is not field_type_original) or
                (not has_origin_attr and not issubclass(value_type, field_type_original))) and
                (fld.name not in errors or invalid_type_message not in errors[fld.name])
            ):
                errors[fld.name] = [invalid_type_message]
            if value:
                validation_fn = fld.metadata.get('validate')
                if validation_fn and callable(validation_fn):
                    is_valid, message = validation_fn(value)
                        if not is_valid:
                            errors[fld.name] = [message]
        if errors:
            raise ValidationError(errors)
    
    def as_dict(self):
        return asdict(self)
    This mixin has only 2 methods where as_dict is just a wrapper around the native asdict function and __post_init__ is a magic method which come up from a dataclasses package and can be useful for data validation purposes. In the example above it validates the attribute types and throws a custom ValidationError (see it below). Note that it also checks the __origin__ type of data structures like List from the typing module.

Defining a ValidationError
class BaseError(Exception):
    code = 400
    def __init__(self, message=None):
        super().__init__()
        self.message = message

class ValidationError(BaseError):
    code = 422
    def __init__(self, errors: dict, message=None):
        message = message or 'Validation of incoming data failed'
        super().__init__(message=message)
        self.errors = errors
Usage example

Let's define a User class and a custom validation function called validate_age:
def validate_age(age):
    return age > 0, 'Age must be greater than 0'

@dataclass
class User(DataClassMixin):
    id: int = field(default=None)
    email: str = field(default=None)
    age: int = field(default=None, metadata={'allow_empty': False, 'validate': validate_age})
    name: str = field(default=None)
And to ensure the validation works just run the following code:
In: try:
    attributes = dict(id=10, email='john.doe@testmail.com', name='John Doe', age=0)
    user = User(**attributes)
except Exception as e:
    print(e.message, e.errors)

Validation of incoming data failed {'age': ['Age must be greater than 0']}

In: try:
    attributes = dict(id=10, email='john.doe@testmail.com', name='John Doe', age=10)
    user = User(**attributes)
except Exception as e:
    print(e.message, e.errors)
In: user.as_dict()
Out: {'email': 'john.doe@testmail.com', 'id': 10, 'age': 10, 'name': 'John Doe'}

As you can see it's very lightweight and flexible solution made with python standard library. Hope it can be useful for someone else.

No comments:

Post a Comment