Architecture¶
Here there is info about main components in Superdesk and how these interact. To run superdesk we use honcho to define processes for each of components:
rest: gunicorn -c gunicorn_config.py wsgi
wamp: python3 -u ws.py
work: celery -A worker worker
beat: celery -A worker beat --pid=
REST API Server¶
The entry point is Superdesk REST API. This is a python application built on top of eve and flask frameworks. Clients communicate with this api to authenticate, fetch and modify data, upload new content etc.
There is an app factory which you can use to create apps for production/testing:
-
superdesk.factory.
get_app
(config=None, media_storage=None, config_object=None, init_elastic=True)¶ App factory.
Parameters: - config – configuration that can override config from
default_settings.py
- media_storage – media storage class to use
- config_object – config object to load (can be module name, module or an object)
- init_elastic – should it init elastic indexes or not
Returns: a new SuperdeskEve app instance
- config – configuration that can override config from
It can use different wsgi servers, we use Gunicorn.
Notifications¶
There is also websockets server where both API server and celery workers can push notifications to clients, which use that information to refresh views or otherwise keep in sync. In the background it’s using celery queue and from there it sends everything to clients. There is no communication from client to server, all changes are done via API server.
There is also a factory to create notification server:
-
superdesk.ws.
create_server
(config)¶ Create websocket server and run it until it gets Ctrl+C or SIGTERM.
Parameters: config – config dictionary
Celery Workers¶
Tasks that involve communication with external services (ingest update, publishing), do some binary files manipulation (image cropping, file metadata extraction) or happen periodically (content expiry) are executed using celery.
It uses same app factory like API server.
Data Layer¶
In short - main data storage is mongoDB, content items are also indexed using elastic. This logic is implemented via custom eve data layer, superdesk service layer and data backend.
-
class
superdesk.datalayer.
SuperdeskDataLayer
(app)¶ Superdesk Data Layer.
Implements eve data layer interface, is used to make eve work with superdesk service layer. It handles app initialization and later it forwards eve calls to respective service.
-
init_elastic
(app)¶ Init elastic index.
It will create index and put mapping. It should run only once so locks are in place. Thus mongo must be already setup before running this.
-
-
class
superdesk.services.
BaseService
(datasource=None, backend=None)¶ Base service for all endpoints, defines the basic implementation for CRUD datalayer functionality.
-
find
(where, **kwargs)¶ Find items in service collection using mongo query.
Parameters: where (dict) –
Subclass should override if the resource handled by the service has intrinsic privileges.
Parameters: kwargs – should have properties which help in authorizing the request Returns: False
if unauthorized and True if authorized
-
remove_from_search
(_id)¶ Remove item from search by its id.
Parameters: _id – item id
-
search
(source)¶ Search using search backend.
Parameters: source – query source param
-
-
class
superdesk.eve_backend.
EveBackend
¶ Superdesk data backend, handles mongodb/elastic data storage.
-
create
(endpoint_name, docs, **kwargs)¶ Insert documents into given collection.
Parameters: - endpoint_name – api resource name
- docs – list of docs to be inserted
-
create_in_mongo
(endpoint_name, docs, **kwargs)¶ Create items in mongo.
Parameters: - endpoint_name – resource name
- docs – list of docs to create
-
create_in_search
(endpoint_name, docs, **kwargs)¶ Create items in elastic.
Parameters: - endpoint_name – resource name
- docs – list of docs
-
delete
(endpoint_name, lookup)¶ Delete method to delete by using mongo query syntax.
Parameters: - endpoint_name – Name of the endpoint
- lookup – User mongo query syntax. example 1.
{'_id':123}
, 2.{'item_id': {'$in': [123, 234]}}
Returns: Returns the mongo remove command response. {‘n’: 12, ‘ok’: 1}
-
find
(endpoint_name, where, max_results=0)¶ Find items for given endpoint using mongo query in python dict object.
It handles request creation here so no need to do this in service.
:param string endpoint_name :param dict where :param int max_results
-
find_and_modify
(endpoint_name, **kwargs)¶ Find and modify in mongo.
Parameters: - endpoint_name – resource name
- kwargs – kwargs for pymongo
find_and_modify
-
find_one
(endpoint_name, req, **lookup)¶ Find single item.
Parameters: - endpoint_name – resource name
- req – parsed request
- lookup – additional filter
-
get
(endpoint_name, req, lookup)¶ Get list of items.
Parameters: - endpoint_name – resource name
- req – parsed request
- lookup – additional filter
-
get_from_mongo
(endpoint_name, req, lookup)¶ Get list of items from mongo.
No matter if there is elastic configured, this will use mongo.
Parameters: - endpoint_name – resource name
- req – parsed request
- lookup – additional filter
-
remove_from_search
(endpoint_name, _id)¶ Remove document from search backend.
:param endpoint_name :param _id
-
replace
(endpoint_name, id, document, original)¶ Replace an item.
Parameters: - endpoint_name – resource name
- id – item id
- document – next version of item
- original – current version of document
-
replace_in_mongo
(endpoint_name, id, document, original)¶ Replace item in mongo.
Parameters: - endpoint_name – resource name
- id – item id
- document – next version of item
- original – current version of item
-
replace_in_search
(endpoint_name, id, document, original)¶ Replace item in elastic.
Parameters: - endpoint_name – resource name
- id – item id
- document – next version of item
- original – current version of item
-
search
(endpoint_name, source)¶ Search for items using search backend
:param string endpoint_name :param dict source
-
set_default_dates
(doc)¶ Helper to populate
_created
and_updated
timestamps.
-
system_update
(endpoint_name, id, updates, original)¶ Only update what is provided, without affecting etag.
This is useful when you want to make some changes without affecting users.
Parameters: - endpoint_name – api resource name
- id – document id
- updates – changes made to document
- original – original document
-
update
(endpoint_name, id, updates, original)¶ Update document with given id.
Parameters: - endpoint_name – api resource name
- id – document id
- updates – changes made to document
- original – original document
-
update_in_mongo
(endpoint_name, id, updates, original)¶ Update item in mongo.
Modifies
_updated
timestamp and_etag
.Parameters: - endpoint_name – resource name
- id – item id
- updates – updates to item to be saved
- original – current version of the item
-
Media Storage¶
By default uploaded/ingested files are stored in mongoDB GridFS.
-
class
superdesk.storage.desk_media_storage.
SuperdeskGridFSMediaStorage
(app=None)¶ -
media_id
(filename, content_type=None, version=True)¶ Get media id for given filename.
It can be used by async task to first generate id upload file later.
Parameters: filename – unique file name
-
put
(content, filename=None, content_type=None, metadata=None, resource=None, **kwargs)¶ Store content in gridfs.
Parameters: - content – binary stream
- filename – unique filename
- content_type – mime type
- metadata – file metadata
- resource – type of resource
-
remove_unreferenced_files
(existing_files)¶ Get the files from Grid FS and compare agains existing files and delete the orphans.
-
url_for_media
(media_id, content_type=None)¶ Return url for given media id.
Parameters: media_id – media id from media_id method
-
There is also Amazon S3 implementation, which is used when Amazon is configured via settings.
-
class
superdesk.storage.amazon.amazon_media_storage.
AmazonMediaStorage
(app=None)¶ -
delete_objects
(ids)¶ Delete the objects with given list of ids.
-
exists
(id_or_filename, resource=None)¶ Test if given name or unique id already exists in storage system.
-
get
(id_or_filename, resource=None)¶ Open the file given by name or unique id.
Note that although the returned file is guaranteed to be a File object, it might actually be some subclass. Returns None if no file was found.
-
get_all_keys
()¶ Return the list of all keys from the bucket.
-
media_id
(filename, content_type=None, version=True)¶ Get the
media_id
path for the givenfilename
.if filename doesn’t have an extension one is guessed, and additional version option to have automatic version or not to have, or to send a string one.
AMAZON_S3_SUBFOLDER
configuration is used for easement deploying multiple instance on the same bucket.
-
put
(content, filename=None, content_type=None, resource=None, metadata=None, _id=None, version=True)¶ Save a new file using the storage system, preferably with the name specified.
If there already exists a file with this name name, the storage system may modify the filename as necessary to get a unique name. Depending on the storage system, a unique id or the actual name of the stored file will be returned. The content type argument is used to appropriately identify the file when it is retrieved.
-
remove_unreferenced_files
(existing_files)¶ Get the files from S3 and compare against existing and delete the orphans.
-