Python Internals :: PyObject

Greetings, folks.

Before we dive into the deep of Python language implementation, we need to get familiar with the main concept in Python. It's quite simple - everything is an object. This is our first step in learning about Python internals and an entry point to our journey.

Main topic today is to understand how Python's objects are being handled on a low-level. We'll be talking about CPython implementation of Python 2.7.8.

I assume you download Python sources and unzip it, so all references to source code will be pointed relative to the root folder.

PyObject & PyVarObject

Everything in Python is an Object. Literally, anything you are working with in Python is a C's PyObject:

  • functions
  • slices
  • files
  • classes
  • iterators
  • descriptors
  • sequences
  • numeric types

Specifically, you work with plain C structures. Internally, Python objects are represented via PyObject and PyVarObject - an opaque data types hold an arbitrary Python object. The latter is for all variable-size container objects (they are mutable) and the former is for all other objects (immutable).

As long as every built-in and user specific type have been wrapped into an object, it's free to populate one with auxiliary information. And Python is not exception - all Python objects have a pointer to a type object and a reference counter. Quite convenient, but has a price. Performance to put on sacrifice stone. However, some techniques and algorithms (string interning, adaptive numbers multiplication, etc.) that Python uses to speed up things allow to polish overhead.

Here are a few quotes from the official Python 2.7.10 documentation:

PyObject

All object types are extensions of this type. This is a type which contains the information Python needs to treat a pointer to an object as an object. In a normal "release" build, it contains only the object's reference count and a pointer to the corresponding type object. It corresponds to the fields defined by the expansion of the PyObject_HEAD macro.

PyVarObject

This is an extension of PyObject that adds the ob_size field. This is only used for objects that have some notion of length. This type does not often appear in the Python/C API. It corresponds to the fields defined by the expansion of the PyObject_VAR_HEAD macro.

Don't worry about mentioned macros or unknown variable names - we'll get there...

How does PyObject and PyVarObject structures look like? Here is an excerpt from the source code:

..\include\object.h

...
typedef struct _object {  
    PyObject_HEAD
} PyObject;
...
typedef struct {  
    PyObject_VAR_HEAD
} PyVarObject;
...


This is not a whole picture. Let's trace further (some details are omitted for the sake of clarity):

..\include\object.h

...
#define _PyObject_HEAD_EXTRA
#define _PyObject_EXTRA_INIT
...
#define PyObject_HEAD                   \
    _PyObject_HEAD_EXTRA                \
    Py_ssize_t ob_refcnt;               \
    struct _typeobject *ob_type;

#define PyObject_HEAD_INIT(type)        \
    _PyObject_EXTRA_INIT                \
    1, type,

#define PyVarObject_HEAD_INIT(type, size)       \
    PyObject_HEAD_INIT(type) size,
...
#define PyObject_VAR_HEAD               \
    PyObject_HEAD                       \
    Py_ssize_t ob_size;
...


Some macros are for defining fields, others for initialization.

Have you noticed that _PyObject_HEAD_EXTRA and _PyObject_EXTRA_INIT macro definitions are empty? It's default behavior for all Python versions. The only way for them not to be empty is compiling "Debug" Python build. However, this is another story for another day. Assume, those fields are always empty and given here for educational purposes.

That's how PyObject looks after all macros expanded:

typedef struct _object {  
    Py_ssize_t ob_refcnt;
    struct _typeobject *ob_type;
} PyObject;


Don't worry about Py_ssize_t type - assume it's just int. Other fields speak for themselves: reference counter (ob_refcnt) and a pointer to PyTypeObject (ob_type). We'll talk about this a bit later.

Now, it's PyVarObject's time:

typedef struct {  
    Py_ssize_t ob_refcnt;
    struct _typeobject *ob_type;
    Py_ssize_t ob_size;
} PyVarObject;


Looks almost the same as PyObject, except one additional field - ob_size - which shows a number of items variable-size container contains (excuse the pun).

OOP in C

So, why PyObject and PyVarObject (and all other Python objects as we'll see later) share some common traits - fields ob_refcnt and ob_type?

This allows to abstract the knowledge about underlying types and work with objects in a similar way no matter which kind of object we are working with - plain integer or string, class instance or slice object.

Each Python's type implementation (PyIntObject, PyFloatObject or PyDictObject) has PyObject_HEAD located as its first member (or the first member of its first member, and so on). This member sub-object is guaranteed to be located at the same address as the full object.

The PyObject_HEAD refers at that member sub-object, but could be cast to the full type once ob_type has been inspected to get knowledge of what the full type is.
This technique introduces some OOP (specifically lightweight inheritance) in C language.

PyIntObject & PyDictObject

Let's see how concrete objects work in Python: PyIntObject and PyDictObject.

..\Include\intobject.h

...
typedef struct {  
    PyObject_HEAD
    long ob_ival;
} PyIntObject;
...

..\Include\intobject.h

...
typedef struct {  
    PyObject_HEAD
    long ob_ival;
} PyIntObject;
...


See PyObject_HEAD again? That means we can treat PyIntObject as PyObject plus some extra data (in our case it is long). Check PyDictData object (represents dictionary {} in Python):

..\Include\dictobject.h

...
typedef struct _dictobject PyDictObject;  
struct _dictobject {  
    PyObject_HEAD
    Py_ssize_t ma_fill;
    Py_ssize_t ma_used;

    /* ... */
    Py_ssize_t ma_mask;

    /* ... */
    PyDictEntry *ma_table;
    PyDictEntry *(*ma_lookup)(PyDictObject *mp, PyObject *key, long hash);
    PyDictEntry ma_smalltable[PyDict_MINSIZE];
};
...


Although the dictionary representation is complicated a bit, still it's valid to treat it as PyObject as we do with PyIntObject. The only difference, PyDictObject has greater number of additional members. Most importantly, all these things are located strictly after PyObject_HEAD section.

Gathering together all that we know about the guts of the PyObject_HEAD section and how we can treat specific Py*Object objects in Python, the following code snippet should be self-explanatory. It shows how Python determines which type it's currently working with:

...
// "op" is of PyObject* type 
if ((op)->ob_type == &PyInt_Type) {  
    // work with numeric type
}
...
if ((op)->ob_type == &PyDict_Type) {  
    // work with dictionary type
}
...


Python defines a lot of macros in order to increase code readability. For example, instead of explicitly using ob_type member you can use the PyInt_Check or PyInt_CheckExact macros. Similar macros definitions can be found at the beginning of each C file with Python's object implementation:

  • PyDict_Check and PyDict_CheckExact for dictionary {} objects
  • PyFunction_Check for function objects
  • PyTuple_Check and PyTuple_CheckExact for tuple () objects

Thus, code above can be rewritten as follows:

...
if (PyInt_CheckExact(op)) {  
    // work with numeric type
}
...
if (PyDict_CheckExact(op)) {  
    // work with dictionary type
}
...


Some Python's objects implementations have their own and specific type checking stuff as well as the common ones. All common things are located in ..\Include\object.h file. For example:

...
#define Py_REFCNT(ob)           (((PyObject*)(ob))->ob_refcnt)
#define Py_TYPE(ob)             (((PyObject*)(ob))->ob_type)
#define Py_SIZE(ob)             (((PyVarObject*)(ob))->ob_size)
...

PyTypeObject

The only untold thing left about PyObject is type. Type in Python is not only a name ("int" or "tuple") and appropriate storage vault, but a lot of related things (functions, data members) that if being linked together allow to define and use a set of common traits.

Recall PyObject_HEAD section:

#define PyObject_HEAD                   \
    ...                                 \
    Py_ssize_t ob_refcnt;               \
    struct _typeobject *ob_type;


The ob_type pointer in PyObject_HEAD section refers exactly to object's type instance. Let's take a closer look at it (I picked most interesting sections):

..\include\dictobject.h

typedef struct _typeobject {  
    PyObject_VAR_HEAD
    const char *tp_name;
    ....

    /* Methods to implement standard operations */
    destructor tp_dealloc;
    printfunc tp_print;
    getattrfunc tp_getattr;
    setattrfunc tp_setattr;
    cmpfunc tp_compare;
    reprfunc tp_repr;

    /* Method suites for standard classes */
    PyNumberMethods *tp_as_number;
    PySequenceMethods *tp_as_sequence;
    PyMappingMethods *tp_as_mapping;

    /* More standard operations (here for binary compatibility) */
    hashfunc tp_hash;
    ternaryfunc tp_call;
    reprfunc tp_str;
    getattrofunc tp_getattro;
    setattrofunc tp_setattro;
    ...
} PyTypeObject;


It's not the whole structure, but the most interesting part. As you may guess, the weird looking qualifiers with func piece are just callbacks. And each Python object should initialize it in its own way.

For example, cmpfunc tp_compare; line obviously tells us that it's somehow related to object comparison. And PyIntObject's implementation of the compare function will differ from the one PyTupleObject provides.

Another line, hashfunc tp_hash; defines hash function for a type. For example, string will have such a function, but dictionary will not. Guess why?

If you want to read more about all these stuff, refer to Python/C API Reference Manual, "Object Implementation Support", "Type Objects" section.

We are about to compare how these functions:

/* Method suites for standard classes */
PyNumberMethods *tp_as_number;  
PySequenceMethods *tp_as_sequence;  
PyMappingMethods *tp_as_mapping;  
hashfunc tp_hash;  

are implemented in PyInt_Type, PyDict_Type and PyTuple_Type objects.

First three functions with their return values remind us the Abstract Objects Layer in Python.

Abstract Object layer defines a number of protocols that each Python object should implement and later be classified against. A protocol is a sort of convention of what functions should type implement in order to provide this well-defined behavior. let's say a type is classified as sequence-based if and only if it implements a set of specific functions (length, size, concat, etc).

There are several protocols, but we are mainly interested in these:

  • Number protocol - all numeric types (int, float, complex, etc.)
  • Sequence protocol - sequence types (str, list, tuple, etc.)
  • Mapping protocol - mapping types (dict)

Let's get back to our examples.

PyInt_Type

PyTypeObject PyInt_Type = {  
    PyVarObject_HEAD_INIT(&PyType_Type, 0)
    "int",
    ...
    &int_as_number,                             /* tp_as_number */
    0,                                          /* tp_as_sequence */
    0,                                          /* tp_as_mapping */
    (hashfunc)int_hash,                         /* tp_hash */
    ...
};

  • PyInt_Type implements Number Protocol, that's why tp_as_sequence and tp_as_mapping functions are null.
  • All immutable types have their own hash function, so does PyInt_Type (int_hash).

PyDict_Type

PyTypeObject PyDict_Type = {  
    PyVarObject_HEAD_INIT(&PyType_Type, 0)
    "dict",
    ...
    0,                                          /* tp_as_number */
    &dict_as_sequence,                          /* tp_as_sequence */
    &dict_as_mapping,                           /* tp_as_mapping */
    (hashfunc)PyObject_HashNotImplemented,      /* tp_hash */
    ...
};

  • Dictionary type is a tricky one in Python. Although it's the only full qualified representative of Mapping Protocol, it does implement some parts of the Sequence Protocol (actually one function: __contains__ - a sort of hack to implement "key in dict").
  • As long as dictionary is a mutable type, there is no hash function (only PyObject_HashNotImplemented exception).

PyTuple_Type

PyTypeObject PyTuple_Type = {  
    PyVarObject_HEAD_INIT(&PyType_Type, 0)
    "tuple",
    ...
    0,                                          /* tp_as_number */
    &tuple_as_sequence,                         /* tp_as_sequence */
    &tuple_as_mapping,                          /* tp_as_mapping */
    (hashfunc)tuplehash,                        /* tp_hash */
    ...
};

  • Tuple type is a tricky one too. Although tuple is a sequence-based type, it does almost fully implement both protocols: Sequence and Mapping. That's why both functions (tp_as_sequence and tp_as_mapping) are not empty.
  • Tuple is immutable object, so we have a hash function (tuplehash).

At this point I hope you've enjoyed travelling over CPython values and hills. Although it's a sort of overview (not actually a deep dive), it will help to learn more sophisticated things in Python later.

References

  1. Python Standard Library
  2. Python/C API Reference Manual

Sergei Danielian

Read more posts by this author.

Vladivostok, Russia