Welcome to DataJoint’s API Documentation!

datajoint module

DataJoint for Python is a framework for building data piplines using MySQL databases to represent pipeline structure and bulk storage systems for large objects. DataJoint is built on the foundation of the relational data model and prescribes a consistent method for organizing, populating, and querying data.

The DataJoint data model is described in https://arxiv.org/abs/1807.11104

DataJoint is free software under the LGPL License. In addition, we request that any use of DataJoint leading to a publication be acknowledged in the publication.

Please cite:

class datajoint.AndList(iterable=(), /)

Bases: list

A list of conditions to by applied to a query expression by logical conjunction: the conditions are AND-ed. All other collections (lists, sets, other entity sets, etc) are applied by logical disjunction (OR).

Example: expr2 = expr & dj.AndList((cond1, cond2, cond3)) is equivalent to expr2 = expr & cond1 & cond2 & cond3

append(restriction)

Append object to the end of the list.

class datajoint.AttributeAdapter

Bases: object

Base class for adapter objects for user-defined attribute types.

property attribute_type
Returns

a supported DataJoint attribute type to use; e.g. “longblob”, “blob@store

get(value)

convert value retrieved from the the attribute in a table into the adapted type

Parameters

value – value from the database

Returns

object of the adapted type

put(obj)

convert an object of the adapted type into a value that DataJoint can store in a table attribute

Parameters

obj – an object of the adapted type

Returns

value to store in the database

class datajoint.Computed

Bases: datajoint.user_tables.UserTable, datajoint.autopopulate.AutoPopulate

Inherit from this class if the table’s values are computed from other relations in the schema. The inherited class must at least provide the function _make_tuples.

tier_regexp = '(?P<computed>__[a-z][a-z0-9]*(_[a-z][a-z0-9]*)*)'
class datajoint.Connection(host, user, password, port=None, init_fun=None, use_tls=None)

Bases: object

A dj.Connection object manages a connection to a database server. It also catalogues modules, schemas, tables, and their dependencies (foreign keys).

Most of the parameters below should be set in the local configuration file.

Parameters
  • host – host name, may include port number as hostname:port, in which case it overrides the value in port

  • user – user name

  • password – password

  • port – port number

  • init_fun – connection initialization function (SQL)

  • use_tls – TLS encryption option

cancel_transaction()

Cancels the current transaction and rolls back all changes made during the transaction.

close()
commit_transaction()

Commit all changes made during the transaction and close it.

connect()

Connect to the database server.

get_user()
Returns

the user name and host name provided by the client to the server.

property in_transaction
Returns

True if there is an open transaction.

property is_connected

Return true if the object is connected to the database server.

ping()

Ping the connection or raises an exception if the connection is closed.

purge_query_cache()

Purges all query cache.

query(query, args=(), *, as_dict=False, suppress_warnings=True, reconnect=None)

Execute the specified query and return the tuple generator (cursor).

Parameters
  • query – SQL query

  • args – additional arguments for the client.cursor

  • as_dict – If as_dict is set to True, the returned cursor objects returns query results as dictionary.

  • suppress_warnings – If True, suppress all warnings arising from underlying query library

  • reconnect – when None, get from config, when True, attempt to reconnect if disconnected

register(schema)
set_query_cache(query_cache=None)

When query_cache is not None, the connection switches into the query caching mode, which entails: 1. Only SELECT queries are allowed. 2. The results of queries are cached under the path indicated by dj.config[‘query_cache’] 3. query_cache is a string that differentiates different cache states.

Parameters

query_cache – a string to initialize the hash for query results

start_transaction()

Starts a transaction error.

property transaction

Context manager for transactions. Opens an transaction and closes it after the with statement. If an error is caught during the transaction, the commits are automatically rolled back. All errors are raised again.

Example: >>> import datajoint as dj >>> with dj.conn().transaction as conn: >>> # transaction is open here

exception datajoint.DataJointError(*args)

Bases: Exception

Base class for errors specific to DataJoint internal operation.

suggest(*args)

regenerate the exception with additional arguments

Parameters

args – addition arguments

Returns

a new exception of the same type with the additional arguments

datajoint.Di

alias of datajoint.diagram.Diagram

class datajoint.Diagram(source, context=None)

Bases: networkx.classes.digraph.DiGraph

Entity relationship diagram.

Usage:

>>>  diag = Diagram(source)

source can be a base relation object, a base relation class, a schema, or a module that has a schema.

>>> diag.draw()

draws the diagram using pyplot

diag1 + diag2 - combines the two diagrams. diag + n - expands n levels of successors diag - n - expands n levels of predecessors Thus dj.Diagram(schema.Table)+1-1 defines the diagram of immediate ancestors and descendants of schema.Table

Note that diagram + 1 - 1 may differ from diagram - 1 + 1 and so forth. Only those tables that are loaded in the connection object are displayed

add_parts()

Adds to the diagram the part tables of tables already included in the diagram :return:

draw()
classmethod from_sequence(sequence)

The join Diagram for all objects in sequence

Parameters

sequence – a sequence (e.g. list, tuple)

Returns

Diagram(arg1) + … + Diagram(argn)

make_dot()
make_image()
make_png()
make_svg()
save(filename, format=None)
topological_sort()
Returns

list of nodes in topological order

datajoint.ERD

alias of datajoint.diagram.Diagram

class datajoint.FreeTable(conn, full_table_name)

Bases: datajoint.table.Table

A base relation without a dedicated class. Each instance is associated with a table specified by full_table_name.

Parameters
  • conn – a dj.Connection object

  • full_table_name – in format database.`table_name`

class datajoint.Imported

Bases: datajoint.user_tables.UserTable, datajoint.autopopulate.AutoPopulate

Inherit from this class if the table’s values are imported from external data sources. The inherited class must at least provide the function _make_tuples.

tier_regexp = '(?P<imported>_[a-z][a-z0-9]*(_[a-z][a-z0-9]*)*)'
class datajoint.Lookup

Bases: datajoint.user_tables.UserTable

Inherit from this class if the table’s values are for lookup. This is currently equivalent to defining the table as Manual and serves semantic purposes only.

tier_regexp = '(?P<lookup>#[a-z][a-z0-9]*(_[a-z][a-z0-9]*)*)'
class datajoint.Manual

Bases: datajoint.user_tables.UserTable

Inherit from this class if the table’s values are entered manually.

tier_regexp = '(?P<manual>[a-z][a-z0-9]*(_[a-z][a-z0-9]*)*)'
class datajoint.MatCell

Bases: numpy.ndarray

a numpy ndarray representing a Matlab cell array

class datajoint.MatStruct(shape, dtype=None, buf=None, offset=0, strides=None, formats=None, names=None, titles=None, byteorder=None, aligned=False, order='C')

Bases: numpy.recarray

numpy.recarray representing a Matlab struct array

class datajoint.Not(restriction)

Bases: object

invert restriction

class datajoint.Part

Bases: datajoint.user_tables.UserTable

Inherit from this class if the table’s values are details of an entry in another relation and if this table is populated by this relation. For example, the entries inheriting from dj.Part could be single entries of a matrix, while the parent table refers to the entire matrix. Part relations are implemented as classes inside classes.

connection = None
delete(force=False)

unless force is True, prohibits direct deletes from parts.

drop(force=False)

unless force is True, prohibits direct deletes from parts.

full_table_name = None
master = None
table_name = None
tier_regexp = '(?P<master>(?P<manual>[a-z][a-z0-9]*(_[a-z][a-z0-9]*)*)|(?P<lookup>#[a-z][a-z0-9]*(_[a-z][a-z0-9]*)*)|(?P<imported>_[a-z][a-z0-9]*(_[a-z][a-z0-9]*)*)|(?P<computed>__[a-z][a-z0-9]*(_[a-z][a-z0-9]*)*)){1,1}__(?P<part>[a-z][a-z0-9]*(_[a-z][a-z0-9]*)*)'
class datajoint.Schema(schema_name=None, context=None, *, connection=None, create_schema=True, create_tables=True, add_objects=None)

Bases: object

A schema object is a decorator for UserTable classes that binds them to their database. It also specifies the namespace context in which other UserTable classes are defined.

activate(schema_name=None, *, connection=None, create_schema=None, create_tables=None, add_objects=None)

Associate database schema schema_name. If the schema does not exist, attempt to create it on the server.

Parameters
  • schema_name – the database schema to associate. schema_name=None is used to assert that the schema has already been activated.

  • connection – Connection object. Defaults to datajoint.conn().

  • create_schema – If False, do not create the schema and raise an error if missing.

  • create_tables – If False, do not create tables and raise errors when attempting to access missing tables.

  • add_objects – a mapping with additional objects to make available to the context in which table classes are declared.

property code
drop(force=False)

Drop the associated schema if it exists

property exists
Returns

true if the associated schema exists on the server

is_activated()
property jobs

schema.jobs provides a view of the job reservation table for the schema

Returns

jobs table

list_tables()

Return a list of all tables in the schema except tables with ~ in first character such as ~logs and ~job

Returns

A list of table names from the database schema.

property log
save(python_filename=None)

Generate the code for a module that recreates the schema. This method is in preparation for a future release and is not officially supported.

Returns

a string containing the body of a complete Python module defining this schema.

property size_on_disk
Returns

size of the entire schema in bytes

spawn_missing_classes(context=None)

Creates the appropriate python user relation classes from tables in the schema and places them in the context.

Parameters

context – alternative context to place the missing classes into, e.g. locals()

class datajoint.Table

Bases: datajoint.expression.QueryExpression

Table is an abstract class that represents a table in the schema. It implements insert and delete methods and inherits query functionality. To make it a concrete class, override the abstract properties specifying the connection, table name, database, and definition.

alter(prompt=True, context=None)

Alter the table definition from self.definition

ancestors(as_objects=False)
Parameters

as_objects – False - a list of table names; True - a list of table objects.

Returns

list of tables ancestors in topological order.

children(primary=None, as_objects=False, foreign_key_info=False)
Parameters
  • primary – if None, then all children are returned. If True, then only foreign keys composed of primary key attributes are considered. If False, return foreign keys including at least one secondary attribute.

  • as_objects – if False, return table names. If True, return table objects.

  • foreign_key_info – if True, each element in result also includes foreign key info.

Returns

list of children as table names or table objects with (optional) foreign key information.

database = None
declaration_context = None
declare(context=None)

Declare the table in the schema based on self.definition.

Parameters

context – the context for foreign key resolution. If None, foreign keys are not allowed.

property definition
delete(transaction=True, safemode=None, force_parts=False)

Deletes the contents of the table and its dependent tables, recursively.

Parameters
  • transaction – if True, use the entire delete becomes an atomic transaction. This is the default and recommended behavior. Set to False if this delete is nested within another transaction.

  • safemode – If True, prohibit nested transactions and prompt to confirm. Default is dj.config[‘safemode’].

  • force_parts – Delete from parts even when not deleting from their masters.

Returns

number of deleted rows (excluding those from dependent tables)

delete_quick(get_count=False)

Deletes the table without cascading and without user prompt. If this table has populated dependent tables, this will fail.

descendants(as_objects=False)
Parameters

as_objects – False - a list of table names; True - a list of table objects.

Returns

list of tables descendants in topological order.

describe(context=None, printout=True)
Returns

the definition string for the relation using DataJoint DDL.

drop()

Drop the table and all tables that reference it, recursively. User is prompted for confirmation if config[‘safemode’] is set to True.

drop_quick()

Drops the table associated with this relation without cascading and without user prompt. If the table has any dependent table(s), this call will fail with an error.

property external
from_clause()
Returns

the FROM clause of SQL SELECT statements.

property full_table_name
Returns

full table name in the schema

get_select_fields(select_fields=None)
Returns

the selected attributes from the SQL SELECT statement.

insert(rows, replace=False, skip_duplicates=False, ignore_extra_fields=False, allow_direct_insert=None)

Insert a collection of rows.

Parameters
  • rows – An iterable where an element is a numpy record, a dict-like object, a pandas.DataFrame, a sequence, or a query expression with the same heading as self.

  • replace – If True, replaces the existing tuple.

  • skip_duplicates – If True, silently skip duplicate inserts.

  • ignore_extra_fields – If False, fields that are not in the heading raise error.

  • allow_direct_insert – applies only in auto-populated tables. If False (default), insert are allowed only from inside the make callback.

Example:

>>> relation.insert([
>>>     dict(subject_id=7, species="mouse", date_of_birth="2014-09-01"),
>>>     dict(subject_id=8, species="mouse", date_of_birth="2014-09-02")])
insert1(row, **kwargs)

Insert one data record into the table. For kwargs, see insert().

Parameters

row – a numpy record, a dict-like object, or an ordered sequence to be inserted as one row.

property is_declared
Returns

True is the table is declared in the schema.

parents(primary=None, as_objects=False, foreign_key_info=False)
Parameters
  • primary – if None, then all parents are returned. If True, then only foreign keys composed of primary key attributes are considered. If False, return foreign keys including at least one secondary attribute.

  • as_objects – if False, return table names. If True, return table objects.

  • foreign_key_info – if True, each element in result also includes foreign key info.

Returns

list of parents as table names or table objects with (optional) foreign key information.

parts(as_objects=False)

return part tables either as entries in a dict with foreign key informaiton or a list of objects

Parameters

as_objects – if False (default), the output is a dict describing the foreign keys. If True, return table objects.

show_definition()
property size_on_disk
Returns

size of data and indices in bytes on the storage device

property table_name
update1(row)

update1 updates one existing entry in the table. Caution: In DataJoint the primary modes for data manipulation is to insert and delete entire records since referential integrity works on the level of records, not fields. Therefore, updates are reserved for corrective operations outside of main workflow. Use UPDATE methods sparingly with full awareness of potential violations of assumptions.

Parameters

row – a dict containing the primary key values and the attributes to update. Setting an attribute value to None will reset it to the default value (if any).

The primary key attributes must always be provided.

Examples:

>>> table.update1({'id': 1, 'value': 3})  # update value in record with id=1
>>> table.update1({'id': 1, 'value': None})  # reset value to default
class datajoint.U(*primary_key)

Bases: object

dj.U objects are the universal sets representing all possible values of their attributes. dj.U objects cannot be queried on their own but are useful for forming some queries. dj.U(‘attr1’, …, ‘attrn’) represents the universal set with the primary key attributes attr1 … attrn. The universal set is the set of all possible combinations of values of the attributes. Without any attributes, dj.U() represents the set with one element that has no attributes.

Restriction:

dj.U can be used to enumerate unique combinations of values of attributes from other expressions.

The following expression yields all unique combinations of contrast and brightness found in the stimulus set:

>>> dj.U('contrast', 'brightness') & stimulus

Aggregation:

In aggregation, dj.U is used for summary calculation over an entire set:

The following expression yields one element with one attribute s containing the total number of elements in query expression expr:

>>> dj.U().aggr(expr, n='count(*)')

The following expressions both yield one element containing the number n of distinct values of attribute attr in query expressio expr.

>>> dj.U().aggr(expr, n='count(distinct attr)')
>>> dj.U().aggr(dj.U('attr').aggr(expr), 'n=count(*)')

The following expression yields one element and one attribute s containing the sum of values of attribute attr over entire result set of expression expr:

>>> dj.U().aggr(expr, s='sum(attr)')

The following expression yields the set of all unique combinations of attributes attr1, attr2 and the number of their occurrences in the result set of query expression expr.

>>> dj.U(attr1,attr2).aggr(expr, n='count(*)')

Joins:

If expression expr has attributes ‘attr1’ and ‘attr2’, then expr * dj.U(‘attr1’,’attr2’) yields the same result as expr but attr1 and attr2 are promoted to the the primary key. This is useful for producing a join on non-primary key attributes. For example, if attr is in both expr1 and expr2 but not in their primary keys, then expr1 * expr2 will throw an error because in most cases, it does not make sense to join on non-primary key attributes and users must first rename attr in one of the operands. The expression dj.U(‘attr’) * rel1 * rel2 overrides this constraint.

aggr(group, **named_attributes)

Aggregation of the type U(‘attr1’,’attr2’).aggr(group, computation=”QueryExpression”) has the primary key (‘attr1’,’attr2’) and performs aggregation computations for all matching elements of group.

Parameters
  • group – The query expression to be aggregated.

  • named_attributes – computations of the form new_attribute=”sql expression on attributes of group”

Returns

The derived query expression

aggregate(group, **named_attributes)

Aggregation of the type U(‘attr1’,’attr2’).aggr(group, computation=”QueryExpression”) has the primary key (‘attr1’,’attr2’) and performs aggregation computations for all matching elements of group.

Parameters
  • group – The query expression to be aggregated.

  • named_attributes – computations of the form new_attribute=”sql expression on attributes of group”

Returns

The derived query expression

join(other, left=False)

Joining U with a query expression has the effect of promoting the attributes of U to the primary key of the other query expression.

Parameters
  • other – the other query expression to join with.

  • left – ignored. dj.U always acts as if left=False

Returns

a copy of the other query expression with the primary key extended.

property primary_key
class datajoint.VirtualModule(module_name, schema_name, *, create_schema=False, create_tables=False, connection=None, add_objects=None)

Bases: module

A virtual module imitates a Python module representing a DataJoint schema from table definitions in the database. It declares the schema objects and a class for each table.

datajoint.conn(host=None, user=None, password=None, *, init_fun=None, reset=False, use_tls=None)

Returns a persistent connection object to be shared by multiple modules. If the connection is not yet established or reset=True, a new connection is set up. If connection information is not provided, it is taken from config which takes the information from dj_local_conf.json. If the password is not specified in that file datajoint prompts for the password.

Parameters
datajoint.create_virtual_module

alias of datajoint.schemas.VirtualModule

class datajoint.key

Bases: object

object that allows requesting the primary key as an argument in expression.fetch() The string “KEY” can be used instead of the class key

datajoint.key_hash(mapping)

32-byte hash of the mapping’s key values sorted by the key name. This is often used to convert a long primary key value into a shorter hash. For example, the JobTable in datajoint.jobs uses this function to hash the primary key of autopopulated tables.

datajoint.kill(restriction=None, connection=None, order_by=None)

view and kill database connections.

Parameters
  • restriction – restriction to be applied to processlist

  • connection – a datajoint.Connection object. Default calls datajoint.conn()

  • order_by – order by a single attribute or the list of attributes. defaults to ‘id’.

Restrictions are specified as strings and can involve any of the attributes of information_schema.processlist: ID, USER, HOST, DB, COMMAND, TIME, STATE, INFO.

Examples:

dj.kill(‘HOST LIKE “%compute%”’) lists only connections from hosts containing “compute”. dj.kill(‘TIME > 600’) lists only connections in their current state for more than 10 minutes

datajoint.list_schemas(connection=None)
Parameters

connection – a dj.Connection object

Returns

list of all accessible schemas on the server

datajoint.schema

alias of datajoint.schemas.Schema

datajoint.set_password(new_password=None, connection=None, update_config=None)

Indices and tables