Google Cloud Pubsub Exporter
⚠️ This is a community-provided module. It has been developed and extensively tested at Collibra, but it is not officially supported by GCP.
This exporter sends OTLP messages to a Google Cloud Pubsub topic.
The following configuration options are supported:
project (Optional): The Google Cloud Project of the topics.
topic (Required): The topic name to send OTLP data over. The topic name should be a fully qualified resource
name (eg: projects/otel-project/topics/otlp).
compression (Optional): Set the payload compression, only gzip is supported. Default is no compression.
watermark Behaviour of how the ce-time attribute is set (see watermark section for more info)
behavior (Optional): current sets the ce-time attribute to the system clock, earliest sets the attribute to
the smallest timestamp of all the messages.
allow_drift (Optional): The maximum difference the ce-time attribute can be set from the system clock. When the
drift is set to 0, the maximum drift from the clock is allowed (only applicable to earliest).
endpoint (Optional): Override the default Pubsub Endpoint, useful when connecting to the PubSub emulator instance
or switching between global and regional service endpoints.
insecure (Optional): Allows performing “insecure” SSL connections and transfers, useful when connecting to a local
emulator instance. Only has effect if Endpoint is not ""
ordering: Configures the PubSub ordering feature, see
ordering section for more info.
enabled (default = false): Enables the ordering. Default is disabled.
from_resource_attribute (no default): resource attribute that will be used as the ordering key. Required when
ordering.enabled is true. If the resource attribute is missing or has an empty value, the messages will not be
ordered for this resource.
remove_resource_attribute (default = false): if the ordering key resource attribute specified
from_resource_attribute should be removed from the resource attributes.
traces, metrics and logs (Optional): Allows overriding the standard OTLP Protobuf
encoding and the message attributes.
attributes.
encoding (Optional): An encoding extension, if not specified it uses the default Protobuf marshaller.
attributes (Optional): Attributes that will be added to the Pub/Sub message.
exporters:
googlecloudpubsub:
project: my-project
topic: projects/my-project/topics/otlp-traces
Pubsub topic
The Google Cloud Pubsub exporter doesn't automatically create topics, it expects the topic
to be created upfront. Security wise it's best to give the collector its own service account and give the
topic Pub/Sub Publisher permission.
Messages
The message published on the topic are CloudEvent compliance and uses the binary content mode
defined in the
Google Cloud Pub/Sub Protocol Binding for CloudEvents
.
The data field is either a ExportTraceServiceRequest, ExportMetricsServiceRequest or ExportLogsServiceRequest for
traces, metrics or logs respectively. Each message is accompanied by the following attributes:
| attributes |
description |
| ce-specversion |
Follow version 1.0 of the CloudEvent spec |
| ce-source |
The source is this /opentelemetry/collector/googlecloudpubsub/<version> exporter |
| ce-id |
a random UUID to uniquely define the message |
| ce-time |
a watermark indicating when the events, encapsulated in the OTLP message, where generated. The behavior will depend on the watermark setting in the configuration |
| ce-type |
depending on the data org.opentelemetry.otlp.traces.v1, org.opentelemetry.otlp.metrics.v1 or org.opentelemetry.otlp.logs.v1 |
| content-type |
the content type is application/protobuf |
| content-encoding |
indicates that payload is compressed. Only gzip compression is supported |
Compression
By default, the messages are not compressed. By compressing the messages, the cost of Pubsub can be reduced to
up to 20% of the cost. This can be done by setting the compression to gzip.
exporters:
googlecloudpubsub:
project: my-project
topic: projects/my-project/topics/otlp-traces
compression: gzip
The exporter will add the content-encoding attribute to the message. The receiver will look at this attribute
to detect the compression that is used on the payload.
Only gzip is supported.
Watermark
A watermark is a threshold that indicates where streaming processing frameworks (like Apache Beam) expects all the
data in a window to have arrived. If new data arrives with a timestamp that's in the window but older than the
watermark, the data is considered late data. The watermark section will change the behaviour of the ce-time
attribute of the message. If you don't use such frameworks you can ignore the section and the ce-time will
be set to the current time, but to have a more reliable watermark behaviour in such streaming it's better to set
the ce-time attribute to the earliest timestamp of the messages embedded in the Pubsub message.
Setting the behaviour to earliest will scan all the embedded message before sending the actual Pubsub message to
figure out what the earliest timestamp is. You have to set allow_drift, the allowed maximum for the ce-time
timestamp , if you want to behaviour to have effect as the default is 0s.
exporters:
googlecloudpubsub:
project: my-project
topic: projects/my-project/topics/otlp-traces
watermark:
behavior: earliest
allow_drift: 1h
The default behavior is that the watermark is set to the current time of the processor. This timestamp will not differ
that much as the timestamp that is attached to a Pubsub message. Most users that don't do anything outside using Pubsub
as a global distribution system will not need anything else.
If you use Google Cloud Dataflow and want to rely on the advanced streaming
feature you may want to change the behavior of the watermark and de-duplication. You can leverage the unique id (ce-id)
and a timestamp (ce-time) attributes on the message. In Apache Beam (the framework used by Dataflow) you can set the
attributes names on the Pubsub connector
via the .withTimestampAttribute("ce-time") and .withIdAttribute("ce-id") methods. A good settings for this
scenario is behavior: earliest with a reasonable allow_drift of 1h.
Allowed behavior values are current or earliest. For allow_drift the default is 0s, so make sure to set the
value.
Ordering
When ordering is enabled (ordering.enabled), you are required to specify a resource attribute key that will be used as
the ordering key (ordering.from_resource_attribute). If this resource attribute is only meant to be used as an
ordering key, you may want to choose to get this resource attribute key (ordering.from_resource_attribute) removed
before publishing to PubSub by enabling the ordering.remove_resource_attribute configuration.
exporters:
googlecloudpubsub:
project: my-project
topic: projects/my-project/topics/otlp-traces
ordering:
enabled: true
from_resource_attribute: some.resource.attribute.key
remove_resource_attribute: true
Notes
While the PubSub topic doesn't require any configuration for ordering, you will need to enable ordering on your
subscription(s) if you need it. Enabling ordering on a subscription is only possible at creation.
For composite ordering keys you'd need to compose the resource attribute value before exporting e.g., by using a
transform processor
.
Empty values in the ordering key are accepted but won't be ordered, see PubSub ordering documentation
for more details.
PubSub requires one publish request per ordering key value, so this exporter groups the signals per ordering key before
publishing.
Encoding and message attributes
The traces, metrics and logs section allows you to specify Encoding Extensions for marshalling the messages on
the topic and the attributes on the Pub/Sub message. All the signals have the same config options.
It's important to note that when you use an extension all the CloudEvent attributes are removed as you use your own
encoder as the exporter can't know what valute to set. You have the opportunity to manually set them.
extensions:
otlp_encoding:
protocol: otlp_json
exporters:
googlecloudpubsub:
project: my-project
topic: projects/my-project/topics/otlp-traces
traces:
encoding: otlp_encoding
attributes:
"ce-type": "org.opentelemetry.otlp.traces.v1"
"content-type": "application/json"
The encoding option allows you to specify Encoding Extensions for marshalling the messages on the topic. An
extension need to be configured in the extensions section, and added to pipeline in the collectors configuration file.
The attributes option allows you to set any attributes, the values are key/value pairs. You can avoid the removal of
CloudEvent attributes if you manually specify the ce-type and content-type to an appropriate value for the chosen
encoding.