Welcome to DataJoint’s API Documentation!
datajoint module
DataJoint for Python is a framework for building data piplines using MySQL databases to represent pipeline structure and bulk storage systems for large objects. DataJoint is built on the foundation of the relational data model and prescribes a consistent method for organizing, populating, and querying data.
The DataJoint data model is described in https://arxiv.org/abs/1807.11104
DataJoint is free software under the LGPL License. In addition, we request that any use of DataJoint leading to a publication be acknowledged in the publication.
Please cite:
- class datajoint.AndList(iterable=(), /)
Bases:
list
A list of conditions to by applied to a query expression by logical conjunction: the conditions are AND-ed. All other collections (lists, sets, other entity sets, etc) are applied by logical disjunction (OR).
Example: expr2 = expr & dj.AndList((cond1, cond2, cond3)) is equivalent to expr2 = expr & cond1 & cond2 & cond3
- append(restriction)
Append object to the end of the list.
- class datajoint.AttributeAdapter
Bases:
object
Base class for adapter objects for user-defined attribute types.
- property attribute_type
- Returns
a supported DataJoint attribute type to use; e.g. “longblob”, “blob@store”
- get(value)
convert value retrieved from the the attribute in a table into the adapted type
- Parameters
value – value from the database
- Returns
object of the adapted type
- put(obj)
convert an object of the adapted type into a value that DataJoint can store in a table attribute
- Parameters
obj – an object of the adapted type
- Returns
value to store in the database
- class datajoint.Computed
Bases:
UserTable
,AutoPopulate
Inherit from this class if the table’s values are computed from other relations in the schema. The inherited class must at least provide the function _make_tuples.
- tier_regexp = '(?P<computed>__[a-z][a-z0-9]*(_[a-z][a-z0-9]*)*)'
- class datajoint.Connection(host, user, password, port=None, init_fun=None, use_tls=None)
Bases:
object
A dj.Connection object manages a connection to a database server. It also catalogues modules, schemas, tables, and their dependencies (foreign keys).
Most of the parameters below should be set in the local configuration file.
- Parameters
host – host name, may include port number as hostname:port, in which case it overrides the value in port
user – user name
password – password
port – port number
init_fun – connection initialization function (SQL)
use_tls – TLS encryption option
- cancel_transaction()
Cancels the current transaction and rolls back all changes made during the transaction.
- close()
- commit_transaction()
Commit all changes made during the transaction and close it.
- connect()
Connect to the database server.
- get_user()
- Returns
the user name and host name provided by the client to the server.
- property in_transaction
- Returns
True if there is an open transaction.
- property is_connected
Return true if the object is connected to the database server.
- ping()
Ping the connection or raises an exception if the connection is closed.
- purge_query_cache()
Purges all query cache.
- query(query, args=(), *, as_dict=False, suppress_warnings=True, reconnect=None)
Execute the specified query and return the tuple generator (cursor).
- Parameters
query – SQL query
args – additional arguments for the client.cursor
as_dict – If as_dict is set to True, the returned cursor objects returns query results as dictionary.
suppress_warnings – If True, suppress all warnings arising from underlying query library
reconnect – when None, get from config, when True, attempt to reconnect if disconnected
- register(schema)
- set_query_cache(query_cache=None)
When query_cache is not None, the connection switches into the query caching mode, which entails: 1. Only SELECT queries are allowed. 2. The results of queries are cached under the path indicated by dj.config[‘query_cache’] 3. query_cache is a string that differentiates different cache states.
- Parameters
query_cache – a string to initialize the hash for query results
- start_transaction()
Starts a transaction error.
- property transaction
Context manager for transactions. Opens an transaction and closes it after the with statement. If an error is caught during the transaction, the commits are automatically rolled back. All errors are raised again.
Example: >>> import datajoint as dj >>> with dj.conn().transaction as conn: >>> # transaction is open here
- exception datajoint.DataJointError(*args)
Bases:
Exception
Base class for errors specific to DataJoint internal operation.
- suggest(*args)
regenerate the exception with additional arguments
- Parameters
args – addition arguments
- Returns
a new exception of the same type with the additional arguments
- class datajoint.Diagram(source, context=None)
Bases:
DiGraph
Entity relationship diagram.
Usage:
>>> diag = Diagram(source)
source can be a base relation object, a base relation class, a schema, or a module that has a schema.
>>> diag.draw()
draws the diagram using pyplot
diag1 + diag2 - combines the two diagrams. diag + n - expands n levels of successors diag - n - expands n levels of predecessors Thus dj.Diagram(schema.Table)+1-1 defines the diagram of immediate ancestors and descendants of schema.Table
Note that diagram + 1 - 1 may differ from diagram - 1 + 1 and so forth. Only those tables that are loaded in the connection object are displayed
- add_parts()
Adds to the diagram the part tables of tables already included in the diagram :return:
- draw()
- classmethod from_sequence(sequence)
The join Diagram for all objects in sequence
- Parameters
sequence – a sequence (e.g. list, tuple)
- Returns
Diagram(arg1) + … + Diagram(argn)
- make_dot()
- make_image()
- make_png()
- make_svg()
- save(filename, format=None)
- topological_sort()
- Returns
list of nodes in topological order
- class datajoint.FreeTable(conn, full_table_name)
Bases:
Table
A base relation without a dedicated class. Each instance is associated with a table specified by full_table_name.
- Parameters
conn – a dj.Connection object
full_table_name – in format database.`table_name`
- class datajoint.Imported
Bases:
UserTable
,AutoPopulate
Inherit from this class if the table’s values are imported from external data sources. The inherited class must at least provide the function _make_tuples.
- tier_regexp = '(?P<imported>_[a-z][a-z0-9]*(_[a-z][a-z0-9]*)*)'
- class datajoint.Lookup
Bases:
UserTable
Inherit from this class if the table’s values are for lookup. This is currently equivalent to defining the table as Manual and serves semantic purposes only.
- tier_regexp = '(?P<lookup>#[a-z][a-z0-9]*(_[a-z][a-z0-9]*)*)'
- class datajoint.Manual
Bases:
UserTable
Inherit from this class if the table’s values are entered manually.
- tier_regexp = '(?P<manual>[a-z][a-z0-9]*(_[a-z][a-z0-9]*)*)'
- class datajoint.MatCell
Bases:
ndarray
a numpy ndarray representing a Matlab cell array
- class datajoint.MatStruct(shape, dtype=None, buf=None, offset=0, strides=None, formats=None, names=None, titles=None, byteorder=None, aligned=False, order='C')
Bases:
recarray
numpy.recarray representing a Matlab struct array
- class datajoint.Not(restriction)
Bases:
object
invert restriction
- class datajoint.Part
Bases:
UserTable
Inherit from this class if the table’s values are details of an entry in another relation and if this table is populated by this relation. For example, the entries inheriting from dj.Part could be single entries of a matrix, while the parent table refers to the entire matrix. Part relations are implemented as classes inside classes.
- connection = None
- delete(force=False)
unless force is True, prohibits direct deletes from parts.
- drop(force=False)
unless force is True, prohibits direct deletes from parts.
- full_table_name = None
- master = None
- table_name = None
- tier_regexp = '(?P<master>(?P<manual>[a-z][a-z0-9]*(_[a-z][a-z0-9]*)*)|(?P<lookup>#[a-z][a-z0-9]*(_[a-z][a-z0-9]*)*)|(?P<imported>_[a-z][a-z0-9]*(_[a-z][a-z0-9]*)*)|(?P<computed>__[a-z][a-z0-9]*(_[a-z][a-z0-9]*)*)){1,1}__(?P<part>[a-z][a-z0-9]*(_[a-z][a-z0-9]*)*)'
- class datajoint.Schema(schema_name=None, context=None, *, connection=None, create_schema=True, create_tables=True, add_objects=None)
Bases:
object
A schema object is a decorator for UserTable classes that binds them to their database. It also specifies the namespace context in which other UserTable classes are defined.
- activate(schema_name=None, *, connection=None, create_schema=None, create_tables=None, add_objects=None)
Associate database schema schema_name. If the schema does not exist, attempt to create it on the server.
- Parameters
schema_name – the database schema to associate. schema_name=None is used to assert that the schema has already been activated.
connection – Connection object. Defaults to datajoint.conn().
create_schema – If False, do not create the schema and raise an error if missing.
create_tables – If False, do not create tables and raise errors when attempting to access missing tables.
add_objects – a mapping with additional objects to make available to the context in which table classes are declared.
- property code
- drop(force=False)
Drop the associated schema if it exists
- property exists
- Returns
true if the associated schema exists on the server
- is_activated()
- property jobs
schema.jobs provides a view of the job reservation table for the schema
- Returns
jobs table
- list_tables()
Return a list of all tables in the schema except tables with ~ in first character such as ~logs and ~job
- Returns
A list of table names from the database schema.
- property log
- save(python_filename=None)
Generate the code for a module that recreates the schema. This method is in preparation for a future release and is not officially supported.
- Returns
a string containing the body of a complete Python module defining this schema.
- property size_on_disk
- Returns
size of the entire schema in bytes
- spawn_missing_classes(context=None)
Creates the appropriate python user relation classes from tables in the schema and places them in the context.
- Parameters
context – alternative context to place the missing classes into, e.g. locals()
- class datajoint.Table
Bases:
QueryExpression
Table is an abstract class that represents a table in the schema. It implements insert and delete methods and inherits query functionality. To make it a concrete class, override the abstract properties specifying the connection, table name, database, and definition.
- alter(prompt=True, context=None)
Alter the table definition from self.definition
- ancestors(as_objects=False)
- Parameters
as_objects – False - a list of table names; True - a list of table objects.
- Returns
list of tables ancestors in topological order.
- children(primary=None, as_objects=False, foreign_key_info=False)
- Parameters
primary – if None, then all children are returned. If True, then only foreign keys composed of primary key attributes are considered. If False, return foreign keys including at least one secondary attribute.
as_objects – if False, return table names. If True, return table objects.
foreign_key_info – if True, each element in result also includes foreign key info.
- Returns
list of children as table names or table objects with (optional) foreign key information.
- database = None
- declaration_context = None
- declare(context=None)
Declare the table in the schema based on self.definition.
- Parameters
context – the context for foreign key resolution. If None, foreign keys are not allowed.
- property definition
- delete(transaction=True, safemode=None, force_parts=False)
Deletes the contents of the table and its dependent tables, recursively.
- Parameters
transaction – if True, use the entire delete becomes an atomic transaction. This is the default and recommended behavior. Set to False if this delete is nested within another transaction.
safemode – If True, prohibit nested transactions and prompt to confirm. Default is dj.config[‘safemode’].
force_parts – Delete from parts even when not deleting from their masters.
- Returns
number of deleted rows (excluding those from dependent tables)
- delete_quick(get_count=False)
Deletes the table without cascading and without user prompt. If this table has populated dependent tables, this will fail.
- descendants(as_objects=False)
- Parameters
as_objects – False - a list of table names; True - a list of table objects.
- Returns
list of tables descendants in topological order.
- describe(context=None, printout=True)
- Returns
the definition string for the relation using DataJoint DDL.
- drop()
Drop the table and all tables that reference it, recursively. User is prompted for confirmation if config[‘safemode’] is set to True.
- drop_quick()
Drops the table associated with this relation without cascading and without user prompt. If the table has any dependent table(s), this call will fail with an error.
- property external
- from_clause()
- Returns
the FROM clause of SQL SELECT statements.
- property full_table_name
- Returns
full table name in the schema
- get_select_fields(select_fields=None)
- Returns
the selected attributes from the SQL SELECT statement.
- insert(rows, replace=False, skip_duplicates=False, ignore_extra_fields=False, allow_direct_insert=None)
Insert a collection of rows.
- Parameters
rows – An iterable where an element is a numpy record, a dict-like object, a pandas.DataFrame, a sequence, or a query expression with the same heading as self.
replace – If True, replaces the existing tuple.
skip_duplicates – If True, silently skip duplicate inserts.
ignore_extra_fields – If False, fields that are not in the heading raise error.
allow_direct_insert – applies only in auto-populated tables. If False (default), insert are allowed only from inside the make callback.
Example:
>>> relation.insert([ >>> dict(subject_id=7, species="mouse", date_of_birth="2014-09-01"), >>> dict(subject_id=8, species="mouse", date_of_birth="2014-09-02")])
- insert1(row, **kwargs)
Insert one data record into the table. For
kwargs
, seeinsert()
.- Parameters
row – a numpy record, a dict-like object, or an ordered sequence to be inserted as one row.
- property is_declared
- Returns
True is the table is declared in the schema.
- parents(primary=None, as_objects=False, foreign_key_info=False)
- Parameters
primary – if None, then all parents are returned. If True, then only foreign keys composed of primary key attributes are considered. If False, return foreign keys including at least one secondary attribute.
as_objects – if False, return table names. If True, return table objects.
foreign_key_info – if True, each element in result also includes foreign key info.
- Returns
list of parents as table names or table objects with (optional) foreign key information.
- parts(as_objects=False)
return part tables either as entries in a dict with foreign key informaiton or a list of objects
- Parameters
as_objects – if False (default), the output is a dict describing the foreign keys. If True, return table objects.
- show_definition()
- property size_on_disk
- Returns
size of data and indices in bytes on the storage device
- property table_name
- update1(row)
update1
updates one existing entry in the table. Caution: In DataJoint the primary modes for data manipulation is toinsert
anddelete
entire records since referential integrity works on the level of records, not fields. Therefore, updates are reserved for corrective operations outside of main workflow. Use UPDATE methods sparingly with full awareness of potential violations of assumptions.- Parameters
row – a
dict
containing the primary key values and the attributes to update. Setting an attribute value to None will reset it to the default value (if any).
The primary key attributes must always be provided.
Examples:
>>> table.update1({'id': 1, 'value': 3}) # update value in record with id=1 >>> table.update1({'id': 1, 'value': None}) # reset value to default
- class datajoint.U(*primary_key)
Bases:
object
dj.U objects are the universal sets representing all possible values of their attributes. dj.U objects cannot be queried on their own but are useful for forming some queries. dj.U(‘attr1’, …, ‘attrn’) represents the universal set with the primary key attributes attr1 … attrn. The universal set is the set of all possible combinations of values of the attributes. Without any attributes, dj.U() represents the set with one element that has no attributes.
Restriction:
dj.U can be used to enumerate unique combinations of values of attributes from other expressions.
The following expression yields all unique combinations of contrast and brightness found in the stimulus set:
>>> dj.U('contrast', 'brightness') & stimulus
Aggregation:
In aggregation, dj.U is used for summary calculation over an entire set:
The following expression yields one element with one attribute s containing the total number of elements in query expression expr:
>>> dj.U().aggr(expr, n='count(*)')
The following expressions both yield one element containing the number n of distinct values of attribute attr in query expressio expr.
>>> dj.U().aggr(expr, n='count(distinct attr)') >>> dj.U().aggr(dj.U('attr').aggr(expr), 'n=count(*)')
The following expression yields one element and one attribute s containing the sum of values of attribute attr over entire result set of expression expr:
>>> dj.U().aggr(expr, s='sum(attr)')
The following expression yields the set of all unique combinations of attributes attr1, attr2 and the number of their occurrences in the result set of query expression expr.
>>> dj.U(attr1,attr2).aggr(expr, n='count(*)')
Joins:
If expression expr has attributes ‘attr1’ and ‘attr2’, then expr * dj.U(‘attr1’,’attr2’) yields the same result as expr but attr1 and attr2 are promoted to the the primary key. This is useful for producing a join on non-primary key attributes. For example, if attr is in both expr1 and expr2 but not in their primary keys, then expr1 * expr2 will throw an error because in most cases, it does not make sense to join on non-primary key attributes and users must first rename attr in one of the operands. The expression dj.U(‘attr’) * rel1 * rel2 overrides this constraint.
- aggr(group, **named_attributes)
Aggregation of the type U(‘attr1’,’attr2’).aggr(group, computation=”QueryExpression”) has the primary key (‘attr1’,’attr2’) and performs aggregation computations for all matching elements of group.
- Parameters
group – The query expression to be aggregated.
named_attributes – computations of the form new_attribute=”sql expression on attributes of group”
- Returns
The derived query expression
- aggregate(group, **named_attributes)
Aggregation of the type U(‘attr1’,’attr2’).aggr(group, computation=”QueryExpression”) has the primary key (‘attr1’,’attr2’) and performs aggregation computations for all matching elements of group.
- Parameters
group – The query expression to be aggregated.
named_attributes – computations of the form new_attribute=”sql expression on attributes of group”
- Returns
The derived query expression
- join(other, left=False)
Joining U with a query expression has the effect of promoting the attributes of U to the primary key of the other query expression.
- Parameters
other – the other query expression to join with.
left – ignored. dj.U always acts as if left=False
- Returns
a copy of the other query expression with the primary key extended.
- property primary_key
- class datajoint.VirtualModule(module_name, schema_name, *, create_schema=False, create_tables=False, connection=None, add_objects=None)
Bases:
module
A virtual module imitates a Python module representing a DataJoint schema from table definitions in the database. It declares the schema objects and a class for each table.
- datajoint.conn(host=None, user=None, password=None, *, init_fun=None, reset=False, use_tls=None)
Returns a persistent connection object to be shared by multiple modules. If the connection is not yet established or reset=True, a new connection is set up. If connection information is not provided, it is taken from config which takes the information from dj_local_conf.json. If the password is not specified in that file datajoint prompts for the password.
- Parameters
host – hostname
user – mysql user
password – mysql password
init_fun – initialization function
reset – whether the connection should be reset or not
use_tls – TLS encryption option. Valid options are: True (required), False (required no TLS), None (TLS prefered, default), dict (Manually specify values per https://dev.mysql.com/doc/refman/5.7/en/connection-options.html#encrypted-connection-options).
- datajoint.create_virtual_module
alias of
VirtualModule
- class datajoint.key
Bases:
object
object that allows requesting the primary key as an argument in expression.fetch() The string “KEY” can be used instead of the class key
- datajoint.key_hash(mapping)
32-byte hash of the mapping’s key values sorted by the key name. This is often used to convert a long primary key value into a shorter hash. For example, the JobTable in datajoint.jobs uses this function to hash the primary key of autopopulated tables.
- datajoint.kill(restriction=None, connection=None, order_by=None)
view and kill database connections.
- Parameters
restriction – restriction to be applied to processlist
connection – a datajoint.Connection object. Default calls datajoint.conn()
order_by – order by a single attribute or the list of attributes. defaults to ‘id’.
Restrictions are specified as strings and can involve any of the attributes of information_schema.processlist: ID, USER, HOST, DB, COMMAND, TIME, STATE, INFO.
- Examples:
dj.kill(‘HOST LIKE “%compute%”’) lists only connections from hosts containing “compute”. dj.kill(‘TIME > 600’) lists only connections in their current state for more than 10 minutes
- datajoint.list_schemas(connection=None)
- Parameters
connection – a dj.Connection object
- Returns
list of all accessible schemas on the server
- datajoint.set_password(new_password=None, connection=None, update_config=None)
Submodules:
- datajoint.admin module
- datajoint.attribute_adapter module
- datajoint.autopopulate module
- datajoint.blob module
- datajoint.condition module
- datajoint.connection module
- datajoint.declare module
- datajoint.dependencies module
- datajoint.diagram module
- datajoint.errors module
- datajoint.expression module
- datajoint.external module
- datajoint.fetch module
- datajoint.hash module
- datajoint.heading module
- datajoint.jobs module
- datajoint.migrate module
- datajoint.plugin module
- datajoint.preview module
- datajoint.s3 module
- datajoint.schemas module
- datajoint.settings module
- datajoint.table module
- datajoint.user_tables module
- datajoint.utils module
- datajoint.version module