Architecture¶
Here there is info about main components in Superdesk and how these interact. To run superdesk we use honcho to define processes for each of components:
rest: gunicorn -c gunicorn_config.py wsgi
wamp: python3 -u ws.py
work: celery -A worker worker
beat: celery -A worker beat --pid=
REST API Server¶
The entry point is Superdesk REST API. This is a python application built on top of eve and flask frameworks. Clients communicate with this api to authenticate, fetch and modify data, upload new content etc.
There is an app factory which you can use to create apps for production/testing:
- get_app(config=None, media_storage=None, config_object=None, init_elastic=None)[source]¶
App factory.
- Parameters:
config – configuration that can override config from
default_settings.pymedia_storage – media storage class to use
config_object – config object to load (can be module name, module or an object)
init_elastic – obsolete config - kept there for BC
- Returns:
a new SuperdeskEve app instance
It can use different wsgi servers, we use Gunicorn.
Notifications¶
There is also websockets server where both API server and celery workers can push notifications to clients, which use that information to refresh views or otherwise keep in sync. In the background it’s using celery queue and from there it sends everything to clients. There is no communication from client to server, all changes are done via API server.
There is also a factory to create notification server:
Celery Workers¶
Tasks that involve communication with external services (ingest update, publishing), do some binary files manipulation (image cropping, file metadata extraction) or happen periodically (content expiry) are executed using celery.
It uses same app factory like API server.
Data Layer¶
In short - main data storage is mongoDB, content items are also indexed using elastic. This logic is implemented via custom eve data layer, superdesk service layer and data backend.
- class SuperdeskDataLayer(app)[source]¶
Superdesk Data Layer.
Implements eve data layer interface, is used to make eve work with superdesk service layer. It handles app initialization and later it forwards eve calls to respective service.
- find(resource, req, lookup, perform_count=True)¶
Retrieves a set of documents (rows), matching the current request. Consumed when a request hits a collection/document endpoint (/people/).
- Parameters:
resource – resource being accessed. You should then use the
datasourcehelper function to retrieve both the db collection/table and base query (filter), if any.req – an instance of
eve.utils.ParsedRequest. This contains all the constraints that must be fulfilled in order to satisfy the original request (where and sort parts, paging, etc). Be warned that where and sort expressions will need proper parsing, according to the syntax that you want to support with your driver. For exampleeve.io.Mongosupports both Python and Mongo-like query syntaxes.sub_resource_lookup – sub-resource lookup from the endpoint url.
perform_count – whether a document count should be performed and returned to the client.
Changed in version 0.3: Support for sub-resources.
- find_list_of_ids(resource, ids, client_projection=None)¶
Retrieves a list of documents based on a list of primary keys The primary key is the field defined in ID_FIELD. This is a separate function to allow us to use per-database optimizations for this type of query.
- Parameters:
resource – resource name.
ids – a list of ids corresponding to the documents
to retrieve :param client_projection: a specific projection to use :return: a list of documents matching the ids in ids from the collection specified in resource
Added in version 0.1.0.
- find_one(resource, req, check_auth_value=True, force_auth_field_projection=False, **lookup)¶
Retrieves a single document/record. Consumed when a request hits an item endpoint (/people/id/).
- Parameters:
resource – resource being accessed. You should then use the
datasourcehelper function to retrieve both the db collection/table and base query (filter), if any.req – an instance of
eve.utils.ParsedRequest. This contains all the constraints that must be fulfilled in order to satisfy the original request (where and sort parts, paging, etc). As we are going to only look for one document here, the only req attribute that you want to process here isreq.projection.check_auth_value – a boolean flag indicating if the find operation should consider user-restricted resource access. Defaults to
True.force_auth_field_projection – a boolean flag indicating if the find operation should always include the user-restricted resource access field (if configured). Defaults to
False.mongo_options – options to pass to PyMongo. e.g. read_preferences of the initial get.
**lookup –
the lookup fields. This will most likely be a record id or, if alternate lookup is supported by the API, the corresponding query.
Changed in version 0.4: Added the ‘req’ argument.
- find_one_raw(resource, _id)¶
Retrieves a single, raw document. No projections or datasource filters are being applied here. Just looking up the document using the same lookup.
- Parameters:
resource – resource name.
lookup (**) – lookup query.
Added in version 0.4.
- get_elastic_resources()¶
Get set of available elastic resources.
- init_app(app)¶
This is where you want to initialize the db driver so it will be alive through the whole instance lifespan.
- async init_elastic(app, raise_on_mapping_error=False)¶
Init elastic index.
It will create index and put mapping. It should run only once so locks are in place. Thus mongo must be already setup before running this.
- insert(resource, docs, **kwargs)¶
Inserts a document into a resource collection/table.
- Parameters:
resource – resource being accessed. You should then use the
datasourcehelper function to retrieve both the actual datasource name.doc_or_docs – json document or list of json documents to be added to the database.
Changed in version 0.0.6: ‘document’ param renamed to ‘doc_or_docs’, making support for bulk inserts apparent.
- is_empty(resource)¶
Returns True if the collection is empty; False otherwise. While a user could rely on self.find() method to achieve the same result, this method can probably take advantage of specific datastore features to provide better performance.
Don’t forget, a ‘resource’ could have a pre-defined filter. If that is the case, it will have to be taken into consideration when performing the is_empty() check (see eve.io.mongo.mongo.py implementation).
- Parameters:
resource – resource being accessed. You should then use the
datasourcehelper function to retrieve the actual datasource name.
- remove(resource, lookup=None)¶
Removes a document/row or an entire set of documents/rows from a database collection/table.
- Parameters:
resource – resource being accessed. You should then use the
datasourcehelper function to retrieve the actual datasource name.lookup – a dict with the query that documents must match in order to qualify for deletion. For single document deletes, this is usually the unique id of the document to be removed.
Changed in version 0.3: ‘_id’ arg removed; replaced with ‘lookup’.
- replace(resource, id_, document, original)¶
Replaces a collection/table document/row. :param resource: resource being accessed. You should then use
the
datasourcehelper function to retrieve the actual datasource name.- Parameters:
id – the unique id of the document.
document – the new json document
original – definition of the json document that should be
updated. :raise OriginalChangedError: raised if the database layer notices a change from the supplied original parameter. .. versionadded:: 0.1.0
- update(resource, id_, updates, original)¶
Updates a collection/table document/row. :param resource: resource being accessed. You should then use
the
datasourcehelper function to retrieve the actual datasource name.- Parameters:
id – the unique id of the document.
updates – json updates to be performed on the database document (or row).
original – definition of the json document that should be
updated. :raise OriginalChangedError: raised if the database layer notices a change from the supplied original parameter.
- class BaseService(datasource: str | None = None, backend=None)[source]¶
Base service for all endpoints, defines the basic implementation for CRUD datalayer functionality.
- delete_from_mongo(lookup: Dict[str, Any])¶
Delete items from mongo only
Added in version 2.4.0.
Warning
on_deleteandon_deletedis NOT called with this action- Parameters:
lookup (dict) – User mongo query syntax
:raises SuperdeskApiError.forbiddenError if search is enabled for this resource
- find(where, **kwargs)¶
Find items in service collection using mongo query.
- Parameters:
where (dict)
- get_all_batch(size=500, max_iterations=10000, lookup=None)¶
Gets all items using multiple queries.
When processing big collection and doing something time consuming you might get a mongo cursor timeout, this should avoid it fetching size items in memory and closing the cursor in between.
- async is_authorized(**kwargs)¶
Subclass should override if the resource handled by the service has intrinsic privileges.
- Parameters:
kwargs – should have properties which help in authorizing the request
- Returns:
Falseif unauthorized and True if authorized
- remove_from_search(item)¶
Remove item from search.
- Parameters:
item (dict) – item
- search(source)¶
Search using search backend.
- Parameters:
source – query source param
- class EveBackend[source]¶
Superdesk data backend, handles mongodb/elastic data storage.
- create(endpoint_name, docs, **kwargs)¶
Insert documents into given collection.
- Parameters:
endpoint_name – api resource name
docs – list of docs to be inserted
- async create_async(endpoint_name, docs, **kwargs)¶
Insert documents into given collection.
- Parameters:
endpoint_name – api resource name
docs – list of docs to be inserted
- create_in_mongo(endpoint_name, docs, **kwargs)¶
Create items in mongo.
- Parameters:
endpoint_name – resource name
docs – list of docs to create
- async create_in_mongo_async(endpoint_name, docs, **kwargs)¶
Create items in mongo.
- Parameters:
endpoint_name – resource name
docs – list of docs to create
- create_in_search(endpoint_name, docs, **kwargs)¶
Create items in elastic.
- Parameters:
endpoint_name – resource name
docs – list of docs
- async create_in_search_async(endpoint_name, docs, **kwargs)¶
Create items in elastic.
- Parameters:
endpoint_name – resource name
docs – list of docs
- delete(endpoint_name, lookup)¶
Delete method to delete by using mongo query syntax.
- Parameters:
endpoint_name – Name of the endpoint
lookup – User mongo query syntax. example 1.
{'_id':123}, 2.{'item_id': {'$in': [123, 234]}}
- Returns:
Returns list of ids which were removed.
- async delete_async(endpoint_name, lookup)¶
Delete method to delete by using mongo query syntax.
- Parameters:
endpoint_name – Name of the endpoint
lookup – User mongo query syntax. example 1.
{'_id':123}, 2.{'item_id': {'$in': [123, 234]}}
- Returns:
Returns list of ids which were removed.
- delete_docs(endpoint_name, docs)¶
Delete using list of documents.
- async delete_docs_async(endpoint_name, docs)¶
Delete using list of documents.
- delete_from_mongo(endpoint_name: str, lookup: Dict[str, Any])¶
Delete from mongo using a lookup without searching or checking
Added in version 2.4.0.
- Parameters:
endpoint_name (str) – The name of the resource to delete documents for
lookup (dict) – The MongoDB query to use for deleting documents
:raises SuperdeskApiError.forbiddenError if search is enabled for this resource
- async delete_from_mongo_async(endpoint_name: str, lookup: Dict[str, Any])¶
Delete from mongo using a lookup without searching or checking
Added in version 2.4.0.
- Parameters:
endpoint_name (str) – The name of the resource to delete documents for
lookup (dict) – The MongoDB query to use for deleting documents
:raises SuperdeskApiError.forbiddenError if search is enabled for this resource
- delete_ids_from_mongo(endpoint_name, ids)¶
Delete the passed ids from mongo without searching or checking
- Parameters:
ids
- Returns:
- async delete_ids_from_mongo_async(endpoint_name, ids)¶
Delete the passed ids from mongo without searching or checking
- Parameters:
ids
- Returns:
- find(endpoint_name, where, max_results=0, sort=None)¶
Find items for given endpoint using mongo query in python dict object.
It handles request creation here so no need to do this in service.
:param string endpoint_name :param dict where :param int max_results
- find_and_modify(endpoint_name, **kwargs)¶
Find and modify in mongo.
- Parameters:
endpoint_name – resource name
kwargs – kwargs for pymongo
find_and_modify
- async find_and_modify_async(endpoint_name, **kwargs)¶
Find and modify in mongo.
- Parameters:
endpoint_name – resource name
kwargs – kwargs for pymongo
find_and_modify
- async find_async(endpoint_name, where, max_results=0, sort=None)¶
Find items for given endpoint using mongo query in python dict object.
It handles request creation here so no need to do this in service.
:param string endpoint_name :param dict where :param int max_results
- find_one(endpoint_name, req, **lookup)¶
Find single item.
- Parameters:
endpoint_name – resource name
req – parsed request
lookup – additional filter
- async find_one_async(endpoint_name, req, **lookup)¶
Find single item.
- Parameters:
endpoint_name – resource name
req – parsed request
lookup – additional filter
- get(endpoint_name, req, lookup, **kwargs)¶
Get list of items.
- Parameters:
endpoint_name – resource name
req – parsed request
lookup – additional filter
- async get_async(endpoint_name, req, lookup, **kwargs)¶
Get list of items.
- Parameters:
endpoint_name – resource name
req – parsed request
lookup – additional filter
- get_from_mongo(endpoint_name, req, lookup, perform_count=False)¶
Get list of items from mongo.
No matter if there is elastic configured, this will use mongo.
- Parameters:
endpoint_name – resource name
req – parsed request
lookup – additional filter
- async get_from_mongo_async(endpoint_name, req, lookup, perform_count=False) AsyncIOMotorCursor¶
Get list of items from mongo.
No matter if there is elastic configured, this will use mongo.
- Parameters:
endpoint_name – resource name
req – parsed request
lookup – additional filter
- notify_on_change(endpoint_name)¶
Test if we should push notifications for given resource.
- remove_from_search(endpoint_name, doc)¶
Remove document from search backend.
:param endpoint_name :param dict doc: Document to delete
- async remove_from_search_async(endpoint_name, doc)¶
Remove document from search backend.
:param endpoint_name :param dict doc: Document to delete
- replace(endpoint_name, id, document, original)¶
Replace an item.
- Parameters:
endpoint_name – resource name
id – item id
document – next version of item
original – current version of document
- async replace_async(endpoint_name, id, document, original)¶
Replace an item.
- Parameters:
endpoint_name – resource name
id – item id
document – next version of item
original – current version of document
- replace_in_mongo(endpoint_name, id, document, original)¶
Replace item in mongo.
- Parameters:
endpoint_name – resource name
id – item id
document – next version of item
original – current version of item
- async replace_in_mongo_async(endpoint_name, id, document, original)¶
Replace item in mongo.
- Parameters:
endpoint_name – resource name
id – item id
document – next version of item
original – current version of item
- replace_in_search(endpoint_name, id, document, original)¶
Replace item in elastic.
- Parameters:
endpoint_name – resource name
id – item id
document – next version of item
original – current version of item
- async replace_in_search_async(endpoint_name, id, document, original)¶
Replace item in elastic.
- Parameters:
endpoint_name – resource name
id – item id
document – next version of item
original – current version of item
- search(endpoint_name, source)¶
Search for items using search backend
:param string endpoint_name :param dict source
- async search_async(endpoint_name, source)¶
Search for items using search backend
:param string endpoint_name :param dict source
- set_default_dates(doc)¶
Helper to populate
_createdand_updatedtimestamps.
- system_update(endpoint_name, id, updates, original, change_request=False, push_notification=True)¶
Only update what is provided, without affecting etag.
This is useful when you want to make some changes without affecting users.
- Parameters:
endpoint_name – api resource name
id – document id
updates – changes made to document
original – original document
change_request – if True it will allow you to use other mongo operations than $set
push_notification – if False it won’t send resource: notifications for update
- async system_update_async(endpoint_name, id, updates, original, change_request=False, push_notification=True)¶
Only update what is provided, without affecting etag.
This is useful when you want to make some changes without affecting users.
- Parameters:
endpoint_name – api resource name
id – document id
updates – changes made to document
original – original document
change_request – if True it will allow you to use other mongo operations than $set
push_notification – if False it won’t send resource: notifications for update
- update(endpoint_name, id, updates, original)¶
Update document with given id.
- Parameters:
endpoint_name – api resource name
id – document id
updates – changes made to document
original – original document
- async update_async(endpoint_name, id, updates, original)¶
Update document with given id.
- Parameters:
endpoint_name – api resource name
id – document id
updates – changes made to document
original – original document
- update_in_mongo(endpoint_name, id, updates, original)¶
Update item in mongo.
Modifies
_updatedtimestamp and_etag.- Parameters:
endpoint_name – resource name
id – item id
updates – updates to item to be saved
original – current version of the item
- async update_in_mongo_async(endpoint_name, id, updates, original)¶
Update item in mongo.
Modifies
_updatedtimestamp and_etag.- Parameters:
endpoint_name – resource name
id – item id
updates – updates to item to be saved
original – current version of the item
Media Storage¶
By default uploaded/ingested files are stored in mongoDB GridFS.
- class SuperdeskGridFSMediaStorage(app=None)[source]¶
- delete(_id, resource=None)¶
Deletes the file referenced by name or unique id. If deletion is not supported on the target storage system this will raise NotImplementedError instead
- exists(id_or_filename, resource=None)¶
Returns True if a file referenced by the given name or unique id already exists in the storage system, or False if the name is available for a new file.
- find(folder=None, upload_date=None, resource=None)¶
Search for files in the GridFS
Searches for files in the GridFS using a combination of folder name and/or upload date comparisons. The upload date comparisons uses the same mongodb BSON comparison operators, i.e. $eq, $gt, $gte, $lt, $lte and $ne, and can be combined together.
- Parameters:
folder (str) – Folder name
upload_date (dict) – Upload date with comparison operator (i.e. $lt, $lte, $gt or $gte)
resource – The resource type to use
- Return list:
List of files that matched the provided parameters
- async find_async(folder=None, upload_date=None, resource=None)¶
Search for files in the GridFS
Searches for files in the GridFS using a combination of folder name and/or upload date comparisons. The upload date comparisons uses the same mongodb BSON comparison operators, i.e. $eq, $gt, $gte, $lt, $lte and $ne, and can be combined together.
- Parameters:
folder (str) – Folder name
upload_date (dict) – Upload date with comparison operator (i.e. $lt, $lte, $gt or $gte)
resource – The resource type to use
- Return list:
List of files that matched the provided parameters
- fs(resource=None)¶
Provides the instance-level GridFS instance, instantiating it if needed.
Changed in version 0.6: Support for multiple, cached, GridFS instances
- get(_id, resource=None)¶
Opens the file given by name or unique id. Note that although the returned file is guaranteed to be a File object, it might actually be some subclass. Returns None if no file was found.
- put(content, filename=None, content_type=None, metadata=None, resource=None, folder=None, **kwargs)¶
Store content in gridfs.
- Parameters:
content – binary stream
filename – unique filename
content_type – mime type
metadata – file metadata
resource – type of resource
folder (str) – Folder that the file will be stored in
- Return str:
The ID that was generated for this object
- async put_async(content, filename=None, content_type=None, metadata=None, resource=None, folder=None, **kwargs)¶
Store content in gridfs.
- Parameters:
content – binary stream
filename – unique filename
content_type – mime type
metadata – file metadata
resource – type of resource
folder (str) – Folder that the file will be stored in
- Return str:
The ID that was generated for this object
- remove_unreferenced_files(existing_files, resource=None)¶
Get the files from Grid FS and compare against existing files and delete the orphans.
- async remove_unreferenced_files_async(existing_files, resource=None)¶
Get the files from Grid FS and compare against existing files and delete the orphans.
- url_for_download(media_id, content_type=None)¶
Return url for download.
- Parameters:
media_id – media id from media_id method
- url_for_external(media_id: str, resource: str | None = None) str¶
Returns a URL for external use
Returns a URL for use with the Content/Production API
- Parameters:
media_id (str) – The ID of the asset
resource (str) – The name of the resource type this Asset is attached to
- Return type:
str
- Returns:
The URL for external use
- url_for_media(media_id, content_type=None)¶
Return url for given media id.
- Parameters:
media_id – media id from media_id method
There is also Amazon S3 implementation, which is used when Amazon is configured via settings.
- class AmazonMediaStorage(app=None)[source]¶
- delete(id_or_filename, resource=None)¶
Deletes the file referenced by name or unique id. If deletion is not supported on the target storage system this will raise NotImplementedError instead
- delete_objects(ids)¶
Delete the objects with given list of ids.
- async delete_objects_async(ids)¶
Delete the objects with given list of ids.
- exists(id_or_filename, resource=None)¶
Test if given name or unique id already exists in storage system.
- async exists_async(id_or_filename, resource=None)¶
Test if given name or unique id already exists in storage system.
- find(folder=None, upload_date=None, resource=None)¶
Search for files in the S3 bucket
Searches for files in the S3 bucket using a combination of folder name and/or upload date comparisons. Also uses the superdesk.utc.query_datetime method to compare the upload_date provided and the upload_date of the file.
- Parameters:
folder (str) – Folder name
upload_date (dict) – Upload date with comparison operator (i.e. $lt, $lte, $gt or $gte)
resource – The resource type to use
- Return list:
List of files that matched the provided parameters
- async find_async(folder=None, upload_date=None, resource=None)¶
Search for files in the S3 bucket
Searches for files in the S3 bucket using a combination of folder name and/or upload date comparisons. Also uses the superdesk.utc.query_datetime method to compare the upload_date provided and the upload_date of the file.
- Parameters:
folder (str) – Folder name
upload_date (dict) – Upload date with comparison operator (i.e. $lt, $lte, $gt or $gte)
resource – The resource type to use
- Return list:
List of files that matched the provided parameters
- get(id_or_filename, resource=None)¶
Open the file given by name or unique id.
Note that although the returned file is guaranteed to be a File object, it might actually be some subclass. Returns None if no file was found.
- get_all_keys()¶
Return the list of all keys from the bucket.
- async get_all_keys_async()¶
Return the list of all keys from the bucket.
- async get_async(id_or_filename, resource=None, begin: int = 0, end: int | None = None)¶
Open the file given by name or unique id.
Note that although the returned file is guaranteed to be a File object, it might actually be some subclass. Returns None if no file was found.
- media_id(filename, content_type=None, version=True)¶
Get the
media_idpath for the givenfilename.if filename doesn’t have an extension one is guessed, and additional version option to have automatic version or not to have, or to send a string one.
- put(content, filename=None, content_type=None, metadata=None, resource=None, _id=None, version=True, folder=None, **kwargs)¶
Save a new file using the storage system, preferably with the name specified.
If there already exists a file with this name, the storage system may modify the filename as necessary to get a unique name. Depending on the storage system, a unique id or the actual name of the stored file will be returned. The content type argument is used to appropriately identify the file when it is retrieved.
- Parameters:
content (ByteIO) – Data to store in the file object
filename (str) – Filename used to store the object
content_type (str) – Content type of the data to be stored
resource – Superdesk resource, i.e. ‘upload’ or ‘download’
metadata – Not currently used with Amazon S3 storage
_id (str) – ID to be used as the key in the bucket
version – If True the timestamp will be prepended to the key else a string can be used to prepend the key
folder (str) – The folder to store the object in
- Return str:
The ID that was generated for this object
- async put_async(content, filename=None, content_type=None, metadata=None, resource=None, _id=None, version=True, folder=None, **kwargs)¶
Save a new file using the storage system, preferably with the name specified.
If there already exists a file with this name, the storage system may modify the filename as necessary to get a unique name. Depending on the storage system, a unique id or the actual name of the stored file will be returned. The content type argument is used to appropriately identify the file when it is retrieved.
- Parameters:
content (ByteIO) – Data to store in the file object
filename (str) – Filename used to store the object
content_type (str) – Content type of the data to be stored
resource – Superdesk resource, i.e. ‘upload’ or ‘download’
metadata – Not currently used with Amazon S3 storage
_id (str) – ID to be used as the key in the bucket
version – If True the timestamp will be prepended to the key else a string can be used to prepend the key
folder (str) – The folder to store the object in
- Return str:
The ID that was generated for this object
- remove_unreferenced_files(existing_files, resource=None)¶
Get the files from S3 and compare against existing and delete the orphans.
- async remove_unreferenced_files_async(existing_files, resource=None)¶
Get the files from S3 and compare against existing and delete the orphans.
- url_for_external(media_id: str, resource: str | None = None) str¶
Returns a URL for external use
Returns a URL for use with the Content/Production API
- Parameters:
media_id (str) – The ID of the asset
resource (str) – The name of the resource type this Asset is attached to
- Return type:
str
- Returns:
The URL for external use