Architecture

Here there is info about main components in Superdesk and how these interact. To run superdesk we use honcho to define processes for each of components:

rest: gunicorn -c gunicorn_config.py wsgi
wamp: python3 -u ws.py
work: celery -A worker worker
beat: celery -A worker beat --pid=

REST API Server

The entry point is Superdesk REST API. This is a python application built on top of eve and flask frameworks. Clients communicate with this api to authenticate, fetch and modify data, upload new content etc.

There is an app factory which you can use to create apps for production/testing:

It can use different wsgi servers, we use Gunicorn.

Notifications

There is also websockets server where both API server and celery workers can push notifications to clients, which use that information to refresh views or otherwise keep in sync. In the background it’s using celery queue and from there it sends everything to clients. There is no communication from client to server, all changes are done via API server.

There is also a factory to create notification server:

create_server(config)

Create websocket server and run it until it gets Ctrl+C or SIGTERM.

Parameters:

config – config dictionary

Celery Workers

Tasks that involve communication with external services (ingest update, publishing), do some binary files manipulation (image cropping, file metadata extraction) or happen periodically (content expiry) are executed using celery.

It uses same app factory like API server.

Data Layer

In short - main data storage is mongoDB, content items are also indexed using elastic. This logic is implemented via custom eve data layer, superdesk service layer and data backend.

class SuperdeskDataLayer(app)

Superdesk Data Layer.

Implements eve data layer interface, is used to make eve work with superdesk service layer. It handles app initialization and later it forwards eve calls to respective service.

find(resource, req, lookup, perform_count=True)

Retrieves a set of documents (rows), matching the current request. Consumed when a request hits a collection/document endpoint (/people/).

Parameters:
  • resource – resource being accessed. You should then use the datasource helper function to retrieve both the db collection/table and base query (filter), if any.

  • req – an instance of eve.utils.ParsedRequest. This contains all the constraints that must be fulfilled in order to satisfy the original request (where and sort parts, paging, etc). Be warned that where and sort expressions will need proper parsing, according to the syntax that you want to support with your driver. For example eve.io.Mongo supports both Python and Mongo-like query syntaxes.

  • sub_resource_lookup – sub-resource lookup from the endpoint url.

  • perform_count – whether a document count should be performed and returned to the client.

Changed in version 0.3: Support for sub-resources.

find_list_of_ids(resource, ids, client_projection=None)

Retrieves a list of documents based on a list of primary keys The primary key is the field defined in ID_FIELD. This is a separate function to allow us to use per-database optimizations for this type of query.

Parameters:
  • resource – resource name.

  • ids – a list of ids corresponding to the documents

to retrieve :param client_projection: a specific projection to use :return: a list of documents matching the ids in ids from the collection specified in resource

New in version 0.1.0.

find_one(resource, req, check_auth_value=True, force_auth_field_projection=False, **lookup)

Retrieves a single document/record. Consumed when a request hits an item endpoint (/people/id/).

Parameters:
  • resource – resource being accessed. You should then use the datasource helper function to retrieve both the db collection/table and base query (filter), if any.

  • req – an instance of eve.utils.ParsedRequest. This contains all the constraints that must be fulfilled in order to satisfy the original request (where and sort parts, paging, etc). As we are going to only look for one document here, the only req attribute that you want to process here is req.projection.

  • check_auth_value – a boolean flag indicating if the find operation should consider user-restricted resource access. Defaults to True.

  • force_auth_field_projection – a boolean flag indicating if the find operation should always include the user-restricted resource access field (if configured). Defaults to False.

  • mongo_options – options to pass to PyMongo. e.g. read_preferences of the initial get.

  • **lookup

    the lookup fields. This will most likely be a record id or, if alternate lookup is supported by the API, the corresponding query.

Changed in version 0.4: Added the ‘req’ argument.

find_one_raw(resource, _id)

Retrieves a single, raw document. No projections or datasource filters are being applied here. Just looking up the document using the same lookup.

Parameters:
  • resource – resource name.

  • lookup (**) – lookup query.

New in version 0.4.

get_elastic_resources()

Get set of available elastic resources.

init_app(app)

This is where you want to initialize the db driver so it will be alive through the whole instance lifespan.

init_elastic(app, raise_on_mapping_error=False)

Init elastic index.

It will create index and put mapping. It should run only once so locks are in place. Thus mongo must be already setup before running this.

insert(resource, docs, **kwargs)

Inserts a document into a resource collection/table.

Parameters:
  • resource – resource being accessed. You should then use the datasource helper function to retrieve both the actual datasource name.

  • doc_or_docs – json document or list of json documents to be added to the database.

Changed in version 0.0.6: ‘document’ param renamed to ‘doc_or_docs’, making support for bulk inserts apparent.

is_empty(resource)

Returns True if the collection is empty; False otherwise. While a user could rely on self.find() method to achieve the same result, this method can probably take advantage of specific datastore features to provide better performance.

Don’t forget, a ‘resource’ could have a pre-defined filter. If that is the case, it will have to be taken into consideration when performing the is_empty() check (see eve.io.mongo.mongo.py implementation).

Parameters:

resource – resource being accessed. You should then use the datasource helper function to retrieve the actual datasource name.

remove(resource, lookup=None)

Removes a document/row or an entire set of documents/rows from a database collection/table.

Parameters:
  • resource – resource being accessed. You should then use the datasource helper function to retrieve the actual datasource name.

  • lookup – a dict with the query that documents must match in order to qualify for deletion. For single document deletes, this is usually the unique id of the document to be removed.

Changed in version 0.3: ‘_id’ arg removed; replaced with ‘lookup’.

replace(resource, id_, document, original)

Replaces a collection/table document/row. :param resource: resource being accessed. You should then use

the datasource helper function to retrieve the actual datasource name.

Parameters:
  • id – the unique id of the document.

  • document – the new json document

  • original – definition of the json document that should be

updated. :raise OriginalChangedError: raised if the database layer notices a change from the supplied original parameter. .. versionadded:: 0.1.0

update(resource, id_, updates, original)

Updates a collection/table document/row. :param resource: resource being accessed. You should then use

the datasource helper function to retrieve the actual datasource name.

Parameters:
  • id – the unique id of the document.

  • updates – json updates to be performed on the database document (or row).

  • original – definition of the json document that should be

updated. :raise OriginalChangedError: raised if the database layer notices a change from the supplied original parameter.

class BaseService(datasource: Optional[str] = None, backend=None)

Base service for all endpoints, defines the basic implementation for CRUD datalayer functionality.

delete_from_mongo(lookup: Dict[str, Any])

Delete items from mongo only

New in version 2.4.0.

Warning

on_delete and on_deleted is NOT called with this action

Parameters:

lookup (dict) – User mongo query syntax

:raises SuperdeskApiError.forbiddenError if search is enabled for this resource

find(where, **kwargs)

Find items in service collection using mongo query.

Parameters:

where (dict) –

get_all_batch(size=500, max_iterations=10000, lookup=None)

Gets all items using multiple queries.

When processing big collection and doing something time consuming you might get a mongo cursor timeout, this should avoid it fetching size items in memory and closing the cursor in between.

is_authorized(**kwargs)

Subclass should override if the resource handled by the service has intrinsic privileges.

Parameters:

kwargs – should have properties which help in authorizing the request

Returns:

False if unauthorized and True if authorized

Remove item from search.

Parameters:

item (dict) – item

search(source)

Search using search backend.

Parameters:

source – query source param

class EveBackend

Superdesk data backend, handles mongodb/elastic data storage.

create(endpoint_name, docs, **kwargs)

Insert documents into given collection.

Parameters:
  • endpoint_name – api resource name

  • docs – list of docs to be inserted

create_in_mongo(endpoint_name, docs, **kwargs)

Create items in mongo.

Parameters:
  • endpoint_name – resource name

  • docs – list of docs to create

Create items in elastic.

Parameters:
  • endpoint_name – resource name

  • docs – list of docs

delete(endpoint_name, lookup)

Delete method to delete by using mongo query syntax.

Parameters:
  • endpoint_name – Name of the endpoint

  • lookup – User mongo query syntax. example 1. {'_id':123}, 2. {'item_id': {'$in': [123, 234]}}

Returns:

Returns list of ids which were removed.

delete_docs(endpoint_name, docs)

Delete using list of documents.

delete_from_mongo(endpoint_name: str, lookup: Dict[str, Any])

Delete from mongo using a lookup without searching or checking

New in version 2.4.0.

Parameters:
  • endpoint_name (str) – The name of the resource to delete documents for

  • lookup (dict) – The MongoDB query to use for deleting documents

:raises SuperdeskApiError.forbiddenError if search is enabled for this resource

delete_ids_from_mongo(endpoint_name, ids)

Delete the passed ids from mongo without searching or checking

Parameters:

ids

Returns:

find(endpoint_name, where, max_results=0, sort=None)

Find items for given endpoint using mongo query in python dict object.

It handles request creation here so no need to do this in service.

:param string endpoint_name :param dict where :param int max_results

find_and_modify(endpoint_name, **kwargs)

Find and modify in mongo.

Parameters:
  • endpoint_name – resource name

  • kwargs – kwargs for pymongo find_and_modify

find_one(endpoint_name, req, **lookup)

Find single item.

Parameters:
  • endpoint_name – resource name

  • req – parsed request

  • lookup – additional filter

get(endpoint_name, req, lookup, **kwargs)

Get list of items.

Parameters:
  • endpoint_name – resource name

  • req – parsed request

  • lookup – additional filter

get_from_mongo(endpoint_name, req, lookup, perform_count=False)

Get list of items from mongo.

No matter if there is elastic configured, this will use mongo.

Parameters:
  • endpoint_name – resource name

  • req – parsed request

  • lookup – additional filter

notify_on_change(endpoint_name)

Test if we should push notifications for given resource.

Remove document from search backend.

:param endpoint_name :param dict doc: Document to delete

replace(endpoint_name, id, document, original)

Replace an item.

Parameters:
  • endpoint_name – resource name

  • id – item id

  • document – next version of item

  • original – current version of document

replace_in_mongo(endpoint_name, id, document, original)

Replace item in mongo.

Parameters:
  • endpoint_name – resource name

  • id – item id

  • document – next version of item

  • original – current version of item

Replace item in elastic.

Parameters:
  • endpoint_name – resource name

  • id – item id

  • document – next version of item

  • original – current version of item

search(endpoint_name, source)

Search for items using search backend

:param string endpoint_name :param dict source

set_default_dates(doc)

Helper to populate _created and _updated timestamps.

system_update(endpoint_name, id, updates, original, change_request=False, push_notification=True)

Only update what is provided, without affecting etag.

This is useful when you want to make some changes without affecting users.

Parameters:
  • endpoint_name – api resource name

  • id – document id

  • updates – changes made to document

  • original – original document

  • change_request – if True it will allow you to use other mongo operations than $set

  • push_notification – if False it won’t send resource: notifications for update

update(endpoint_name, id, updates, original)

Update document with given id.

Parameters:
  • endpoint_name – api resource name

  • id – document id

  • updates – changes made to document

  • original – original document

update_in_mongo(endpoint_name, id, updates, original)

Update item in mongo.

Modifies _updated timestamp and _etag.

Parameters:
  • endpoint_name – resource name

  • id – item id

  • updates – updates to item to be saved

  • original – current version of the item

Media Storage

By default uploaded/ingested files are stored in mongoDB GridFS.

There is also Amazon S3 implementation, which is used when Amazon is configured via settings.