May 3, 2023 8 min read

Audit Events for Flipt

Yoofi Quansah
Events for Flipt

Image generated by MidJourney

Events and logging are vital for many organizations to either keep track of history within the application/platform or to actually take action on things once they happen. Two brief examples include:

  1. An organization keeping track of payment events that have happened on their platform
  2. A platform triggering a job to purge data somewhere if an event is of a particular nature

In Flipt's case it may prove essential for some organizations to know when feature flags are mutated in some fashion as they can affect the real-time behavior of the consumer’s application.

It's also important to know who made the change and when it was made, especially when it comes to trying to determine the root cause of an issue in the consuming application. This is where audit events come in.

We're happy to announce that support for these type of audit events are now implemented as of v1.21 of Flipt! 🎉

Ideal Implementation

Before diving into how the feature was implemented, it is worth discussing what an ideal scenario looks like for Flipt and audit events. Many providers, whether SaaS or open source, leave it strictly up to the consumer to deal with events as they please. In most cases, this is done through a webhook URL that you provide to the platform which sends a request to that URL when an event occurs.

An application that is listening to the endpoint of the webhook URL can then do whatever they want with the event. For instance, they can choose to send the message to a Kafka broker, do some other sort of pub/sub, or send that audit event to Slack in a specialized channel, so this is more of the laissez-faire approach.

What we decided on for an ideal implementation is to have this concept of first-class support for ‘native sinks’. In this case, Flipt will house the logic necessary to interact directly with the sinks that the user provides configuration for. The idea is that messages can go straight to a Kafka broker, pub/sub system, Slack channel, etc, as a direct form of communication. This allows users to avoid having to write their own backend to communicate with these systems and avoid the extra network hop.

The implementations for communicating with these native sinks would ideally have an abstraction point so that users can make easy sense of it to contribute to Flipt and write their own implementations for native sinks. One of the downsides of this native approach, however, is that as more and more sinks become implemented, it can lead to version churn for the Flipt application. To combat this potential for churn and to make contributions more accessible, we are currently looking into a way to make the sink implementations pluggable in their own repositories, potentially using a plugin system like Hashicorp's go-plugin.

Implementation

Implementing any event-driven system comes with many considerations and trade-offs. On the producer side of the system, an ‘event’ usually represents things in the system that have already happened to then notify the consumer. The benefit for the consumer(s) is that once the event is received on its end it knows the event is complete, and therefore does not have to poll and waste resources getting the complete event from the source.

While the definition of an event-driven system seems simple enough, there are many considerations to keep in mind. In this case particularly, what happens to the producer or consumer during high-traffic scenarios? Or what if the consumer becomes unavailable for a portion of the time? These represent just a few of many considerations, and we sought to address those in our implementation.

OTEL Event Diagram
The two basic components of an event system. One has to consider where to implement the functionality to deal with the considerations described above.

Our first stab at an implementation was a basic homegrown publisher which contains abilities to batch events and send batches asynchronously to different sinks that are configured. This is done so the consumer is not burdened with a whole bunch of writes, and it frees up the publisher to do other work without worry of communication failures with the consumer (addressing some of those considerations mentioned above).

In addition to a batch size, there should also be configuration for a flush period to avoid messages laying around in memory for an indefinite amount of time. Considering all of this, it was a pretty straightforward implementation, but halfway through we realized that this problem has been solved several times for a lot of different use cases, leading us to pause the publisher implementation and seek out a more tried and true solution.

Enter OpenTelemetry

OpenTelemetry

For the sake of brevity, OpenTelemetry will not be discussed in this article, but simply how it was used in the Flipt application to achieve desired functionality for audit events. For trace data, OpenTelemetry (OTEL) not only provides configurable batching, and flushing abilities but also allows users to provide implementations of Exporters which OTEL uses to send trace data to whatever the implementation specifies. In simpler words... OTEL does everything already implemented in the homegrown publisher implementation, and it already exists as a dependency in the Flipt source code for exporting trace data to Jaeger, Zipkin, or OTLP.

There might be some lack of context as to what a Trace or Span here is and how it ties into audit events for Flipt. Essentially, a Trace represents a unique transaction of a collection of operations throughout an application. A Span represents a single unit within that Trace or a single operation. Spans allow for additions of Events which are viewable through any tracing provider client. The addition of Events is the bit that we leveraged for audit events.

Specifically in the code, we’ve implemented a middleware that will intercept the request, and determine if the event is auditable, if it is, the code will add the details of that audit event to the Span via the Event described above. When a batch size or flush period is reached for the SpanProcessor, events are pulled off of the Span(s) and converted into a data structure housed in our code to then send it off to the native sinks that the user has configured on their Flipt instance. In addition to viewing the audit events in their sink implementation, an added benefit of using OTEL is the span events will show up in the tracing provider client for users to view.

Jaeger UI with Event
The audit event added to the flipt.Flipt/CreateConstraint span. The flipt* properties represent the audit event.

What's Released

With this initial release, there is only functionality for audit event logging to a file on disk. Users can specify the name of that log file, and Flipt will write the audit events to it in a JSON-encoded format.

If you were to tail the logs of your audit event log file for orthodox Flipt usage you might see something like the following:

{"version":"0.1","type":"flag","action":"created","metadata":{"actor":{"authentication":"none","ip":"127.0.0.1"}},"payload":{"description":"your favorite NBA team","enabled":true,"key":"scenter","name":"scenter","namespace_key":"default"},"timestamp":"2023-05-01T14:01:12-05:00"}
{"version":"0.1","type":"variant","action":"created","metadata":{"actor":{"authentication":"none","ip":"127.0.0.1"}},"payload":{"attachment":"","description":"","flag_key":"scenter","id":"41e84846-bfbd-426f-afa9-3358d0d40875","key":"lakers","name":"","namespace_key":"default"},"timestamp":"2023-05-01T14:01:12-05:00"}
{"version":"0.1","type":"segment","action":"created","metadata":{"actor":{"authentication":"none","ip":"127.0.0.1"}},"payload":{"constraints":[],"description":"Segments for evaluation NBA teams","key":"postclavicula","match_type":"ANY_MATCH_TYPE","name":"postclavicula","namespace_key":"default"},"timestamp":"2023-05-01T14:01:12-05:00"}
{"version":"0.1","type":"constraint","action":"created","metadata":{"actor":{"authentication":"none","ip":"127.0.0.1"}},"payload":{"id":"a7ccde21-870e-4e36-953c-4d5849d96a77","namespace_key":"default","operator":"eq","property":"championship","segment_key":"postclavicula","type":"STRING_COMPARISON_TYPE","value":"most"},"timestamp":"2023-05-01T14:01:12-05:00"}
{"version":"0.1","type":"flag","action":"updated","metadata":{"actor":{"authentication":"none","ip":"127.0.0.1"}},"payload":{"description":"your favorite NBA team","enabled":false,"key":"eyepoint","name":"eyepoint","namespace_key":"default"},"timestamp":"2023-05-01T14:01:37-05:00"}
{"version":"0.1","type":"token","action":"created","metadata":{"actor":{"authentication":"none","ip":"127.0.0.1"}},"payload":{"io.flipt.auth.token.description":"hello","io.flipt.auth.token.name":"worldh"},"timestamp":"2023-05-01T14:02:45-05:00"}

Currently, the auditable events are all CRUD operations (sans Read) upon the following entities:

The event payload which is logged has the following fields for each event:

  • type: the type of entity changing
  • action: the action taken upon the entity
  • metadata: extra information about the event such as identity of the subject who initiated the event
  • payload: the actual payload of the request on Flipt
  • timestamp: the timestamp of when the event was created

Any combination of actions on those entities will result in an event with the above fields being logged out to the file you specify via configuration:

audit:
  sinks:
    log:
      enabled: true
      file: /tmp/flipt/audit.log

We also have an example of how you could potentially collect, parse, visualize, and query these events using Grafana's Promtail + Loki in our examples on GitHub. This would allow you to query the audit events in a similar fashion to how you would query logs in a traditional logging system, even when Flipt is running in a distributed fashion.

Grafana Loki Query
Querying Flipt audit events using Grafana Loki

What's Next

We are excited for this release to see just how users will interact with this feature, and also excited for community discussion on potential event sinks that have not been added yet.

Contributions are always appreciated! This document describes how you can implement your own audit event sink.

We hope you found this post useful. As always, you can find us on GitHub, Discord, Twitter, or Mastodon.

Scarf