Ingest¶
With ingest you can import content into Superdesk. It supports multiple formats and ways of delivery.
Ingest is running using celery, an update is triggered every 30s.
It iterates over all providers and check if provider is not closed, and
then checks last_updated time and schedule to realise if provider should
be updated now or later. If now it runs another celery task for each so it
can execute multiple updates in parallel.
- update_provider(provider, rule_set=None, routing_scheme=None, sync=False)[source]¶
Fetch items from ingest provider, ingest them into Superdesk and update the provider.
- Parameters:
provider – Ingest Provider data
rule_set – Translation Rule Set if one is associated with Ingest Provider.
routing_scheme – Routing Scheme if one is associated with Ingest Provider.
sync – Running in sync mode from cli.
Once provider is updated, last_updated time is updated and it will ignore
that provider for some time according to schedule.
Ingest Provider¶
Ingest provider specifies configuration for single ingest channel.
- class IngestProviderResource(endpoint_name, app, service, endpoint_schema=None)[source]¶
Ingest provider model
- Parameters:
name – provider name
source – populates item source field
feeding_service – feeding service name
feed_parser – feed parser name
content_types – list of content types of items to ingest from provider
allow_remove_ingested – allow deleting of items from ingest
disable_item_updates – disables updating items from ingest
content_expiry – ttl for ingested items in minutes
config – provider specific config
private – can contain any data useful for provider (e.g. to manage feeds position)
ingested_count – number of items ingested so far
tokens – auth tokens used by provider
is_closed – provider closed status
update_schedule – update schedule, will run every x hours x minutes x seconds
idle_time – usual idle time for provider, if there is no item after that it will warn
last_updated – last update timestamp
rule_set – rule sets used when ingesting
routing_scheme – routing scheme used when ingesting
notifications – set when notification should be sent for this provider
last_closed – info when and by whom provider was closed last time
last_opened – info when and by whom provider was opened last time
critical_errors – error codes which are considered critical and should close provider
Feeding Services¶
Handle transport protocols when ingesting.
- class HTTPFeedingServiceBase[source]¶
Base class for feeding services using HTTP.
This class contains helpers to make the creation of HTTP based feeding services easier.
There are a couple of class attributes you can use:
Attribute
Explanation
HTTP_URL
Main URL of your service, will be used by default in get_url
HTTP_TIMEOUT
Timeout of requests in seconds
HTTP_DEFAULT_PARAMETERS
Parameters used in every
getrequests. Will be updated with params set in argumentsHTTP_AUTH
Indicate if HTTP authentication is needed for your service. If None, the authentication will be determined by the existence of user and password. Will be overriden by auth_required config if it exists.
In addition, you have some pre-filled fields:
Field
Explanation
AUTH_FIELDS
username and password fields
AUTH_REQ_FIELDS
username and password fields + auth_required field to indicate if they are needed
When ingest is updated, the provider is automatically saved to
self.provider.configproperty allows to access easily the user configuration.auth_infoproperty returns a dictionary withusernameandpasswordget_urlmethod do a HTTP Get request. url can be ommited in which case HTTP_URL will be used. Authentication parameters are set automatically, and errors are catched appropriately. Extra arguments are used directly in requests call.
- class EmailFeedingService[source]¶
Feeding Service class which can read the article(s) from a configured mail box.
- class FileFeedingService[source]¶
Feeding Service class which can read the configured local file system for article(s).
- class FTPFeedingService[source]¶
Feeding Service class which can read article(s) which exist in a file system and accessible using FTP.
- class RSSFeedingService[source]¶
Feeding service for providing feeds received in RSS 2.0 format.
(NOTE: it should also work with other syndicated feeds formats, too, since the underlying parser supports them, but for our needs RSS 2.0 is assumed)
Add new Service¶
- register_feeding_service(service_class)[source]¶
Registers the Feeding Service with the application.
- Class:
superdesk.io.feeding_services.RegisterFeedingService uses this function to register the feeding service.
- Parameters:
service_class – Feeding Service class
- Raises:
AlreadyExistsError if a feeding service with same name already been registered
Feed Parsers¶
Parse items from services.
- class AFPNewsMLOneFeedParser[source]¶
AFP specific NewsML parser.
Feed Parser which can parse the AFP feed basically it is in NewsML 1.2 format, but the firstcreated and versioncreated times are localised.
- class AP_ANPAFeedParser[source]¶
Feed parser for AP supplied ANPA, maps category codes and maps the prefix on some sluglines to subject codes
- class PAFeedParser[source]¶
NITF Parser extension for Press Association, it maps the category meta tag to an anpa category
Add new Parser¶
- register_feed_parser(parser_name, parser_class, override=False)[source]¶
Registers the Feed Parser with the application.
- Class:
superdesk.io.feed_parsers.RegisterFeedParser uses this function to register the feed parser.
- Parameters:
parser_name – unique name to identify the Feed Parser class
parser_class – Feed Parser class
override – if True, allows to override the existing parser with the same name
- Raises:
AlreadyExistsError if a feed parser with same name already been registered
Add a Webhook¶
Webhook are a way to trigger ingestion without polling an ingest provider: the service do a POST HTTP request on a given URL to trigger the ingestion, resulting in resources saving and quicker ingestion.
Webhooks are using webhook endpoint. The service triggering the webhook must use this endpoint with 2 URLs parameters:
provider_namewhich is the name of the provider to triggerauthwhich is an authentication key. This key is set in aWEBHOOK_[PROVIDER_NAME]_AUTHenvironment variable, when[PROVIDER_NAME]is the name of the provider in uppercase.
To activate the webhook, the WEBHOOK_[PROVIDER_NAME]_AUTH environment variable must be set.
Note that because auth parameter is used in request, HTTPS protocol should be used to avoid the key being sent unencrypted.