Ingest

With ingest you can import content into Superdesk. It supports multiple formats and ways of delivery.

Ingest is running using celery, an update is triggered every 30s.

update_ingest()[source]

Check ingest providers and trigger an update when appropriate.

It iterates over all providers and check if provider is not closed, and then checks last_updated time and schedule to realise if provider should be updated now or later. If now it runs another celery task for each so it can execute multiple updates in parallel.

update_provider(provider, rule_set=None, routing_scheme=None, sync=False)[source]

Fetch items from ingest provider, ingest them into Superdesk and update the provider.

Parameters:
  • provider – Ingest Provider data

  • rule_set – Translation Rule Set if one is associated with Ingest Provider.

  • routing_scheme – Routing Scheme if one is associated with Ingest Provider.

  • sync – Running in sync mode from cli.

Once provider is updated, last_updated time is updated and it will ignore that provider for some time according to schedule.

Ingest Provider

Ingest provider specifies configuration for single ingest channel.

class IngestProviderResource(endpoint_name, app, service, endpoint_schema=None)[source]

Ingest provider model

Parameters:
  • name – provider name

  • source – populates item source field

  • feeding_service – feeding service name

  • feed_parser – feed parser name

  • content_types – list of content types of items to ingest from provider

  • allow_remove_ingested – allow deleting of items from ingest

  • disable_item_updates – disables updating items from ingest

  • content_expiry – ttl for ingested items in minutes

  • config – provider specific config

  • private – can contain any data useful for provider (e.g. to manage feeds position)

  • ingested_count – number of items ingested so far

  • tokens – auth tokens used by provider

  • is_closed – provider closed status

  • update_schedule – update schedule, will run every x hours x minutes x seconds

  • idle_time – usual idle time for provider, if there is no item after that it will warn

  • last_updated – last update timestamp

  • rule_set – rule sets used when ingesting

  • routing_scheme – routing scheme used when ingesting

  • notifications – set when notification should be sent for this provider

  • last_closed – info when and by whom provider was closed last time

  • last_opened – info when and by whom provider was opened last time

  • critical_errors – error codes which are considered critical and should close provider

Feeding Services

Handle transport protocols when ingesting.

class HTTPFeedingServiceBase[source]

Base class for feeding services using HTTP.

This class contains helpers to make the creation of HTTP based feeding services easier.

There are a couple of class attributes you can use:

Attribute

Explanation

HTTP_URL

Main URL of your service, will be used by default in get_url

HTTP_TIMEOUT

Timeout of requests in seconds

HTTP_DEFAULT_PARAMETERS

Parameters used in every get requests. Will be updated with params set in arguments

HTTP_AUTH

Indicate if HTTP authentication is needed for your service. If None, the authentication will be determined by the existence of user and password. Will be overriden by auth_required config if it exists.

In addition, you have some pre-filled fields:

Field

Explanation

AUTH_FIELDS

username and password fields

AUTH_REQ_FIELDS

username and password fields + auth_required field to indicate if they are needed

When ingest is updated, the provider is automatically saved to self.provider. config property allows to access easily the user configuration. auth_info property returns a dictionary with username and password

get_url method do a HTTP Get request. url can be ommited in which case HTTP_URL will be used. Authentication parameters are set automatically, and errors are catched appropriately. Extra arguments are used directly in requests call.

class EmailFeedingService[source]

Feeding Service class which can read the article(s) from a configured mail box.

class FileFeedingService[source]

Feeding Service class which can read the configured local file system for article(s).

class FTPFeedingService[source]

Feeding Service class which can read article(s) which exist in a file system and accessible using FTP.

class HTTPFeedingService[source]

Feeding Service class which can read article(s) using HTTP.

class RSSFeedingService[source]

Feeding service for providing feeds received in RSS 2.0 format.

(NOTE: it should also work with other syndicated feeds formats, too, since the underlying parser supports them, but for our needs RSS 2.0 is assumed)

class WufooFeedingService[source]

Feeding Service class which can read article(s) using Wufoo API

Add new Service

register_feeding_service(service_class)[source]

Registers the Feeding Service with the application.

Class:

superdesk.io.feeding_services.RegisterFeedingService uses this function to register the feeding service.

Parameters:

service_class – Feeding Service class

Raises:

AlreadyExistsError if a feeding service with same name already been registered

Feed Parsers

Parse items from services.

class ANPAFeedParser[source]

Feed Parser which can parse if the feed is in ANPA 1312 format.

class IPTC7901FeedParser[source]

Feed Parser which can parse if the feed is in IPTC 7901 format.

class NewsMLOneFeedParser[source]

Feed Parser which can parse if the feed is in NewsML 1.2 format.

class NewsMLTwoFeedParser[source]

Feed Parser which can parse if the feed is in NewsML 2 format.

class NITFFeedParser[source]

Feed Parser which can parse if the feed is in NITF format.

class EMailRFC822FeedParser[source]

Feed Parser which can parse if the feed is in RFC 822 format.

class WENNFeedParser[source]

Feed Parser for parsing the XML supplied by WENN

class DPAIPTC7901FeedParser[source]
class AFPNewsMLOneFeedParser[source]

AFP specific NewsML parser.

Feed Parser which can parse the AFP feed basically it is in NewsML 1.2 format, but the firstcreated and versioncreated times are localised.

class ScoopNewsMLTwoFeedParser[source]
class AP_ANPAFeedParser[source]

Feed parser for AP supplied ANPA, maps category codes and maps the prefix on some sluglines to subject codes

class PAFeedParser[source]

NITF Parser extension for Press Association, it maps the category meta tag to an anpa category

Add new Parser

register_feed_parser(parser_name, parser_class, override=False)[source]

Registers the Feed Parser with the application.

Class:

superdesk.io.feed_parsers.RegisterFeedParser uses this function to register the feed parser.

Parameters:
  • parser_name – unique name to identify the Feed Parser class

  • parser_class – Feed Parser class

  • override – if True, allows to override the existing parser with the same name

Raises:

AlreadyExistsError if a feed parser with same name already been registered

Add a Webhook

Webhook are a way to trigger ingestion without polling an ingest provider: the service do a POST HTTP request on a given URL to trigger the ingestion, resulting in resources saving and quicker ingestion. Webhooks are using webhook endpoint. The service triggering the webhook must use this endpoint with 2 URLs parameters:

  • provider_name which is the name of the provider to trigger

  • auth which is an authentication key. This key is set in a WEBHOOK_[PROVIDER_NAME]_AUTH environment variable, when [PROVIDER_NAME] is the name of the provider in uppercase.

To activate the webhook, the WEBHOOK_[PROVIDER_NAME]_AUTH environment variable must be set. Note that because auth parameter is used in request, HTTPS protocol should be used to avoid the key being sent unencrypted.