Ingest

With ingest you can import content into Superdesk. It supports multiple formats and ways of delivery.

Ingest is running using celery, an update is triggered every 30s.

superdesk.io.update_ingest()

Check ingest providers and trigger an update when appropriate.

It iterates over all providers and check if provider is not closed, and then checks last_updated time and schedule to realise if provider should be updated now or later. If now it runs another celery task for each so it can execute multiple updates in parallel.

superdesk.io.update_provider(provider, rule_set=None, routing_scheme=None)

Fetch items from ingest provider, ingest them into Superdesk and update the provider.

Parameters:
  • provider – Ingest Provider data
  • rule_set – Translation Rule Set if one is associated with Ingest Provider.
  • routing_scheme – Routing Scheme if one is associated with Ingest Provider.

Once provider is updated, last_updated time is updated and it will ignore that provider for some time according to schedule.

Ingest Provider

Ingest provider specifies configuration for single ingest channel.

class superdesk.io.IngestProviderResource(endpoint_name, app, service, endpoint_schema=None)

Ingest provider model

Parameters:
  • name – provider name
  • source – populates item source field
  • feeding_service – feeding service name
  • feed_parser – feed parser name
  • content_types – list of content types of items to ingest from provider
  • allow_remove_ingested – allow deleting of items from ingest
  • content_expiry – ttl for ingested items in minutes
  • config – provider specific config
  • private – can contain any data useful for provider (e.g. to manage feeds position)
  • ingested_count – number of items ingested so far
  • tokens – auth tokens used by provider
  • is_closed – provider closed status
  • update_schedule – update schedule, will run every x hours x minutes x seconds
  • idle_time – usual idle time for provider, if there is no item after that it will warn
  • last_updated – last update timestamp
  • rule_set – rule sets used when ingesting
  • routing_scheme – routing scheme used when ingesting
  • notifications – set when notification should be sent for this provider
  • last_closed – info when and by whom provider was closed last time
  • last_opened – info when and by whom provider was opened last time
  • critical_errors – error codes which are considered critical and should close provider

Feeding Services

Handle transport protocols when ingesting.

class superdesk.io.feeding_services.EmailFeedingService

Feeding Service class which can read the article(s) from a configured mail box.

class superdesk.io.feeding_services.FileFeedingService

Feeding Service class which can read the configured local file system for article(s).

class superdesk.io.feeding_services.FTPFeedingService

Feeding Service class which can read article(s) which exist in a file system and accessible using FTP.

class superdesk.io.feeding_services.HTTPFeedingService

Feeding Service class which can read article(s) using HTTP.

class superdesk.io.feeding_services.RSSFeedingService

Feeding service for providing feeds received in RSS 2.0 format.

(NOTE: it should also work with other syndicated feeds formats, too, since the underlying parser supports them, but for our needs RSS 2.0 is assumed)

class apps.io.feeding_services.wufoo.WufooFeedingService

Feeding Service class which can read article(s) using Wufoo API

Add new Service

superdesk.io.registry.register_feeding_service(service_class)

Registers the Feeding Service with the application.

Class:superdesk.io.feeding_services.RegisterFeedingService uses this function to register the feeding service.
Parameters:service_class – Feeding Service class
Raises:AlreadyExistsError if a feeding service with same name already been registered

Feed Parsers

Parse items from services.

class superdesk.io.feed_parsers.ANPAFeedParser

Feed Parser which can parse if the feed is in ANPA 1312 format.

class superdesk.io.feed_parsers.IPTC7901FeedParser

Feed Parser which can parse if the feed is in IPTC 7901 format.

class superdesk.io.feed_parsers.NewsMLOneFeedParser

Feed Parser which can parse if the feed is in NewsML 1.2 format.

class superdesk.io.feed_parsers.NewsMLTwoFeedParser

Feed Parser which can parse if the feed is in NewsML 2 format.

class superdesk.io.feed_parsers.NITFFeedParser

Feed Parser which can parse if the feed is in NITF format.

class superdesk.io.feed_parsers.EMailRFC822FeedParser

Feed Parser which can parse if the feed is in RFC 822 format.

class superdesk.io.feed_parsers.WENNFeedParser

Feed Parser for parsing the XML supplied by WENN

class superdesk.io.feed_parsers.DPAIPTC7901FeedParser
class superdesk.io.feed_parsers.AFPNewsMLOneFeedParser

AFP specific NewsML parser.

Feed Parser which can parse the AFP feed basically it is in NewsML 1.2 format, but the firstcreated and versioncreated times are localised.

class superdesk.io.feed_parsers.ScoopNewsMLTwoFeedParser
class superdesk.io.feed_parsers.AP_ANPAFeedParser

Feed parser for AP supplied ANPA, maps category codes and maps the prefix on some sluglines to subject codes

class superdesk.io.feed_parsers.PAFeedParser

NITF Parser extension for Press Association, it maps the category meta tag to an anpa category

Add new Parser

superdesk.io.registry.register_feed_parser(parser_name, parser_class)

Registers the Feed Parser with the application.

Class:

superdesk.io.feed_parsers.RegisterFeedParser uses this function to register the feed parser.

Parameters:
  • parser_name – unique name to identify the Feed Parser class
  • parser_class – Feed Parser class
Raises:

AlreadyExistsError if a feed parser with same name already been registered

Add a Webhook

Webhook are a way to trigger ingestion without polling an ingest provider: the service do a POST HTTP request on a given URL to trigger the ingestion, resulting in resources saving and quicker ingestion. Webhooks are using webhook endpoint. The service triggering the webhook must use this endpoint with 2 URLs parameters:

  • provider_name which is the name of the provider to trigger
  • auth which is an authentication key. This key is set in a WEBHOOK_[PROVIDER_NAME]_AUTH environment variable, when [PROVIDER_NAME] is the name of the provider in uppercase.

To activate the webhook, the WEBHOOK_[PROVIDER_NAME]_AUTH environment variable must be set. Note that because auth parameter is used in request, HTTPS protocol should be used to avoid the key being sent unencrypted.