Azure SDK Design Guidelines
Version 1.0.0
Adrian Hall
Azure SDK Team / PM
Brian Terlson
Azure SDK Team / JS & TS
Jeffrey Richter
Azure SDK Team / .NET
Johan Stenberg
Azure SDK Team / Python
Jonathan Giles
Azure SDK Team / Java
Krzysztof Cwalina
Azure SDK Team / .NET
Peter Marcu
Azure SDK Team / Mgmt

Azure SDK design guidelines

This document describes the architectural considerations and API design guidelines for the Azure SDK client libraries.

This document applies to all languages. There are language-specific guidelines for:

If you are developing a client library in one of the above languages, consult the design guidelines for that specific language instead of these guidelines.

The general guidelines contained herein are for the benefit of:

In the latter case, please work with the Azure Developer Platform Architectural Board more closely to ensure the client library is appropriately designed and the developer experience is exemplary.

1. Introduction

1.1. Design principles

The Azure SDK should be designed to enhance the productivity of developers connecting to Azure services. Other qualities (such as completeness, extensibility, and performance) are important but secondary. Productivity is achieved by adhering to the principles described below:

1.1.1. Idiomatic

1.1.2. Consistent

1.1.3. Approachable

1.1.4. Diagnosable

1.1.5. Compatible

1.2. Terminology

Throughout these documents, the following terms should be understood:

adparch
The Azure Developer Platform Architecture Board, which is a board comprised of language experts who advise and review client libraries used for accessing Azure services.
Azure SDK
The collection of client libraries for a single target language, used for accessing Azure services.
Azure Core
A dependency of many client libraries. The Azure Core library provides access to the HTTP pipeline, common credential types, and other types that are appropriate to the Azure SDK as a whole.
Client Library
A library (and associated tools, documentation, and samples) that consumers use to ease working with an Azure service. There is generally a client library per Azure service and per target language. Sometimes a single client library will contain the ability to connect to multiple services.
Consumer
Where appropriate to disambiguate between the various types of developers, we use the term consumer to indicate the developer who is using a client library in an app to connect to an Azure service.
Docstrings
The comments embedded within the code that describe the API surface being implemented. The docstrings are extracted and post-processed during the build to generate API reference documentation.
Library Developer
Where appropriate to disambiguate between the various types of developers, we use the term library developer to indicate the developer who is writing a client library.
Package
A client library after it has been packaged for distribution to consumers. Packages are generally installed using a package manager from a package repository.
Package Repository
Each client library is published separately to the appropriate language-specific package repository. For example, we distribute JavaScript libraries to npmjs.org (also known as the NPM Registry), and Python libraries to PyPI. These releases are performed exclusively by the Azure SDK engineering systems team. Consumers install packages using a package manager. For example, a JavaScript consumer might use yarn, npm, or similar, whereas a Python consumer will use pip to install packages into their project.
Progressive Concept Disclosure
The first interaction with the client library should not rely on advanced service concepts. As the consumer of the library becomes more adept, we expose the concepts necessary at the point at which the consumer needs those concepts for implementation. Progressive Disclosure was first discussed by the Nielson Norman Group as an approach to designing better user interfaces.

1.3. Definitions

Each requirement in this document is labelled and color-coded to show the relative importance.

DO adopt this requirement for the client library. Exception is by prior approval of adparch only.

⛔️ DO NOT adopt this requirement for the client library. Exception is by prior approval of adparch only.

☑️ YOU SHOULD strongly consider this requirement for the client library. If varying from this advice, note the fact during the client library board review.

⚠️ YOU SHOULD NOT be considering this requirement for the client library. If varying from this advice, note the fact during the client library board review.

☑️ YOU MAY consider this advice if appropriate. No notification to the architecture board is required.

1.4. Process

Building a new client library is a multi-month effort requiring careful design, review, and testing. When developing a set of client libraries for your service, note the following requirements:

DO provide SDKs for the following languages:

☑️ YOU MAY provide client libraries for other languages based on use case. Consult adparch if you need assistance in following design guidelines for other languages.

1.4.1. Starting the client library design process

You should start the client library design process at the same time that you start thinking about the public-facing interfaces for your Azure service. At this point in time, you can engage with an API Design Architect with an initial design meeting. During this meeting, you will discuss the service offering and a proposed REST API design with the architect who will offer advice on the proposed REST API design and suggestions on client library experience.

Once you have a solid REST API Design, you should build the appropriate Swagger file and get it approved through the Azure OpenAPI Hub. Alongside this effort, you can start building client libraries for each language, integrate the library into the appropriate monorepo for the language, and onboard to the Azure SDK Engineering Systems.

1.4.2. The review process

Your first step in building the client library should be designing the API surface that the consumer will use to access your service. Like all design processes, this involves identifying the champion scenarios - a set of key use cases that you expect the majority of your users to use to access your service. Once the champion scenarios are identified, you will write the same code that the consumer of your client library will write to fulfill the champion scenarios, complete with integration with error handling, diagnostics, and other client libraries. This exercise informs the design of the API surface.

Once you have an initial design for the API surface, engage adparch again to schedule a full architectural review of your API surface. You should schedule this before major development starts on your client library as changes can be requested during the adparch review.

Approval of your API design indicates that you can work with the Azure SDK Engineering Systems group on releasing your client libraries to the various language-specific repositories.

2. Open source

DO ensure that all library code is public and open-source on GitHub. Library code must be placed in the Azure SDK ‘mono-repo’ for its language:

☑️ YOU SHOULD develop in the open on GitHub. Seek feedback from the community on design choices and be active in conversations with the community.

DO remain active in GitHub. Your client library is your primary touchpoint with the developer community, so it's important to keep up with the activity there. Issues and pull requests on GitHub must have an authoritative comment within one week of filing.

DO review the Microsoft Open Source Guidelines' community section for more information on fostering a healthy open-source community.

DO use the Microsoft CLA. Microsoft makes significant contributions to cla-assistant. It is the easiest way to ensure the CLA is signed by all contributors.

DO include a copyright header at the top of every source file (including samples). See the Microsoft Open Source Guidelines for example headers in various languages.

2.1. CONTRIBUTING.md

DO include a CONTRIBUTING.md file in your GitHub repository, using it to describe the process by which contributors can make contributions to the project. An example CONTRIBUTING.md is provided by the Microsoft Open Source Guidelines:

# Contributing

This project welcomes contributions and suggestions. Most contributions require you to
agree to a Contributor License Agreement (CLA) declaring that you have the right to,
and actually do, grant us the rights to use your contribution. For details, visit
https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need
to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the
instructions provided by the bot. You will only need to do this once across all repositories using our CLA.

This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/)
or contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.

2.2. LICENSE

DO include a LICENSE file containing your license text (which by default should be the standard MIT license).

2.3. CODEOWNERS

CODEOWNERS is a GitHub standard to specify who is automatically assigned pull requests to review. This helps prevent pull requests from languishing without review. GitHub can also be configured to require review from code owners before a pull request can be merged. Further reading is available from the following two URLs:

DO edit the root-level CODEOWNERS file to ensure that it is updated to redirect all pull requests for the directory of the client library to point to the relevant engineers of this component. If the client library will exist within its own repository, then a CODEOWNERS file must be introduced and configured appropriately.

3. The Client library

3.1. The API surface

The API surface of your client library must have the most thought as it is the primary interaction that the consumer has with your service.

3.1.1. Namespaces

Some languages have a concept of namespaces to group related types. Grouping services within a cloud infrastructure is common since it aids discoverability and provides structure to the reference documentation.

In cases where namespaces are supported, the namespace should be named <AZURE>.<group>.<service>. All consumer-facing APIs that are commonly used should exist within this namespace. Here:

DO start the namespace with “Azure” or “com.azure” or similar to indicate an Azure client library.

DO pick a package name that allows the consumer to tie the namespace to the service being used. As a default, use the compressed service name at the end of the namespace. The namespace does NOT change when the branding of the product changes, so avoid the use of marketing names that may change. (See below for examples).

A compressed service name is the service name without spaces. It may further be shortened if the shortened version is well known in the community. For example, “Azure Media Analytics” would have a compressed service name of “MediaAnalytics”, whereas “Azure Service Bus” would become “ServiceBus”.

DO use the following list as the group of services (if the target language supports namespaces):

Namespace Group Functional Area
ai Artificial intelligence, including machine learning
analytics Gathering data for metrics or usage
data Dealing with structured data stores like databases
diagnostics Gathering data for diagnosing issues
identity Authentication and authorization
iot Internet of things
management Control Plane (ARM)
media Audio, video, or mixed reality
messaging Messaging services, like push notifications or pub-sub
search Search technologies
security Security and cryptography
storage Storage of unstructured data

DO place the management (ARM) API in the “management” group. Use the grouping <Azure>.management.<Group>.<ServiceName> for the namespace. Since more services require control plane APIs than data plane APIs, other namespaces may be used explicitly for control plane only. Data plane usage is by exception only. Additional namespaces that can be used for control plane SDKs include:

Namespace Group Functional Area
appmodel Application models, such as Functions or App Frameworks
compute Virtual machines, containers, and other compute services
integration Integration services (such as Logic Apps)
management Management services (such as Cost Analytics)
networking Services such as VPN, WAN, and Networking

Many management APIs do not have a data plane. For these, it's reasonable to place the client library in the <AZURE>.management namespace. For example, use azure.management.cost-analysis, not azure.management.management.cost-analysis.

⛔️ DO NOT choose similar names for clients that do different things.

DO register the chosen namespace with adparch.

Here are some examples of selections that meet the guidelines (from the .NET world):

Here are some examples that do not meet the guidelines:

If the client library does not seem to fit into the group list, contact adparch for advice on the most appropriate group. If you feel your service requires a new group, then open a “Design Guidelines Change” request.

3.1.2. Client interface

In general, your API surface will consist of one or more service clients that the consumer will instantiate to connect to your service, plus a set of supporting types.

DO name service client types with the Client suffix.

There are times when operations require the addition of optional data, provided in what is colloquially known as an “options bag”. Libraries should strive for consistent naming.

☑️ YOU SHOULD name the type for service client option bags with the ClientOptions suffix.

☑️ YOU SHOULD name operation option bag types with the Options suffix. For example, if the operation is GetSecret, then the type of the options bag would be called GetSecretOptions.

DO place service client types that the consumer is most likely to interact with in the root namespace of the client library (assuming namespaces are supported in the target language). Specialized service clients may be placed in sub-namespaces.

DO allow the consumer to construct a service client with the minimal information needed to connect and authenticate to the service.

DO standardize verb prefixes within a set of client libraries for a service. The service must be able to speak about a specific operation in a cross-language manner within outbound materials (such as documentation, blogs, and public speaking). They cannot do this if the same operation is referred to by different verbs in different languages.

DO support 100% of the features provided by the Azure service the client library represents. Gaps in functionality cause confusion and frustration among developers.

3.1.3. Network requests

Since the client library generally wraps one or more HTTP requests, it is important to support standard network capabilities. Asynchronous programming techniques are not widely understood, although such techniques are essential in developing scalable web services and required in certain environments (such as mobile or Node environments). Many developers prefer synchronous method calls for their easy semantics when learning how to use a technology. In addition, consumers have come to expect certain capabilities in a network stack - capabilities such as call cancellation, automatic retry, and logging.

DO support both synchronous and asynchronous method calls, except where the language (or default runtime) does not support one or the other.

DO ensure that the consumer can identify which methods are async and which are synchronous.

When an application makes a network request, the network infrastructure (like routers) and the called service may take a long time to respond and, in fact, may never respond. A well-written application SHOULD NEVER give up its control to the network infrastucture or service. Here are some examples as to why this is so important:

The best way for consumers to work with cancellation is to think of cancellation objects as forming a tree. For example:

Here is an example of how an application would use the tree of cancellations:

DO accept platform-native cancellation tokens (that implement a timeout) on all asynchronous calls.

☑️ YOU SHOULD only check cancellation tokens on I/O calls (such as network requests and file loads). Do not check cancellation tokens in between I/O calls within the client library (for example, when processing data between I/O calls).

⛔️ DO NOT leak the underlying protocol transport implementation details to the consumer. All types from the protocol transport implementation must be appropriately abstracted.

3.1.4. Authentication

Azure services use a variety of different authentication schemes to allow clients to access the service. Conceptually, there are two entities responsible in this process: a credential and an authentication policy. Credentials provide confidential authentication data. Authentication policies use the data provided by a credential to authenticate requests to the service.

DO support all authentication techniques that the service supports.

DO use credential and authentication policy implementations from the Azure Core library where available.

DO provide credential types that can be used to fetch all data needed to authenticate a request to the service in a non-blocking atomic manner for each authentication scheme that does not have an implementation in Azure Core.

DO provide service client constructors or factories that accept any supported authentication credentials.

Client libraries may support providing credential data via a connection string ONLY IF the service provides a connection string to users via the portal or other tooling. Connection strings are generally good for getting started as they are easily integrated into an application by copy/paste from the portal. However, connection strings are considered a lesser form of authentication because the credentials cannot be rotated within a running process.

⛔️ DO NOT support constructing a service client with a connection string unless such connection string is available within tooling (for copy/paste operations).

3.1.5. Response formats

Requests to the service fall into two basic groups - methods that make a single logical request, or a deterministic sequence of requests. An example of a single logical request is a request that may be retried inside the operation. An example of a deterministic sequence of requests is a paged operation.

The logical entity is a protocol neutral representation of a response. For HTTP, the logical entity may combine data from headers, body and the status line. A common example is exposing an ETag header as a property on the logical entity in addition to any deserialized content from the body.

DO optimize for returning the logical entity for a given request. The logical entity MUST represent the information needed in the 99%+ case.

DO make it possible for a developer to get access to the complete response, including the status line, headers and body. The client library MUST follow the language specific guidance for accomplishing this.

DO document and provide examples on how to access the raw and streamed response for a given request, where exposed by the client library. We do not expect all methods to expose a streamed response.

DO provide a language idiomatic way to enumerate all logical entities for a paged operation, automatically fetching new pages as needed. For example, in Python:

# Yes:
for instance in client.list_instances():
    print(instance)

# No - don't force the caller of the library to do paging:
next_page = None
while not done:
    list_instance_result = client.list_instances(page=next_page):
    for instance ln list_instance_result.response():
        print(instance)
    next_page = list_instance_result.next_page
    done = next_page is None

For methods that combine multiple requests into a single call:

⛔️ DO NOT return headers and other per-request metadata unless it is obvious as to which specific HTTP request the methods return value corresponds to.

DO provide enough information in failure cases for an application to take appropriate corrective action.

⚠️ YOU SHOULD NOT use the following as a property name within the models returned within the logical entity.

Such usage can cause confusion and will inevitably have to be changed on a per-language basis, which can cause consistency problems.

3.1.6. Pagination

These guidelines eschew low-level pagination APIs in favor of high-level abstractions. High-level APIs are easy for developers to use for the majority of use cases but can be more confusing when finer-grained control is required (for example, over-quota/throttling) and debugging when things go wrong. Other guidelines in this document work to mitigate this limitation, for example by providing robust logging, tracing, and pipeline customization options.

DO expose paginated collections using language-canonical iterators over items within pages. The APIs used to expose the async iterators are language-dependent but should align with any existing common practices in your ecosystem.

DO expose paginated collections using an iterator or cursor pattern if async iterators aren't a built-in feature of your language.

DO expose non-paginated list endpoints identically to paginated list endpoints. Users shouldn't need to appreciate the difference.

DO use distinct types for entities in a list endpoint and an entity returned from a get endpoint if these are different types, and otherwise you must use the same types in these situations.

Note. ⚠️Services should refrain from having a difference between the type of a particular entity as it exists in a list versus the result of a GET request for that individual item as it makes the client library's surface area simpler.

⛔️ DO NOT expose an iterator over each individual item if getting each item requires a corresponding GET request to the service. One GET per item is often too expensive and so not an action we want to take on behalf of users.

⛔️ DO NOT expose an API to get a paginated collection into an array. This is a dangerous capability for services which may return many, many pages.

DO expose paging APIs when iterating over a collection. Paging APIs must accept a continuation token (from a prior run) and a maximum number of items to return, and must return a continuation token as part of the response so that the iterator may continue, potentially on a different machine.

3.1.7. Long-Running Operations

Long-running operations are operations which consist of an initial request to start the operation followed by polling to determine when the operation has completed or failed. Long-running operations in Azure tend to follow the REST API guidelines for Long-running Operations, but there are exceptions.

DO represent long-running operations with some object that encapsulates the polling and the operation status. This object, called a “poller”, must provide APIs for:

  1. querying the current operation state (either asynchronously, which may consult the service, or synchronously which must not)
  2. requesting an asynchronous notification when the operation has completed
  3. cancelling the operation if cancellation is supported by the service
  4. registering disinterest in the operation so polling stops
  5. triggering a poll operation manually (automatic polling must be disabled)
  6. progress reporting (if supported by the service)

DO support the following polling configuration options. Polling configuration may be used only in the absence of relevant retry-after headers from service, and otherwise should be ignored.

DO prefix method names which return a poller with either begin or start.

DO provide a way to instantiate a poller with the serialized state of another poller to begin where it left off, for example by passing the state as a parameter to the same method which started the operation, or by directly instantiating a poller with that state.

⛔️ DO NOT cancel the long-running operation when cancellation is requested via a cancellation token. The cancellation token is cancelling the polling operation and should not have any effect on the service.

DO log polling status as info (including time-to-next-retry)

DO expose a progress reporting mechanism to the consumer if the service reports progress as part of the polling operation. Language-dependent guidelines will present additional guidance on how to expose progress reporting in this case.

3.1.8. Supporting non-HTTP protocols

Most Azure services expose a RESTful API over HTTPS. However, a few services use other protocols, such as AMQP, MQTT, or WebRTC. In these cases, the operation of the protocol can be split into two phases:

The policies that are added to a HTTP request/response (authentication, unique request ID, telemetry, distributed tracing, and logging) are still valid on both a per-connection and per-operation basis. However, the methods by which these policies are implemented are protocol dependent.

DO implement as many of the policies as possible on a per-connection and per-operation basis.

For example, MQTT over WebSockets provides the ability to add headers during the initiation of the WebSockets connection, so this is a good place to add authentication, telemetry, and distributed tracing policies. However, MQTT has no metadata (the equivalent of HTTP headers), so per-operation policies are not possible. AMQP, by contract, does have per-operation metadata. Unique request ID, and distributed tracing headers can be provided on a per-operation basis with AMQP.

DO consult adparch on policy decisions for non-HTTP protocols. Implementation of all policies is the standard requirement. Exceptions must be discussed and approved by adparch.

Consumers will expect the client library to honor global configuration that they have established for the entire Azure SDK.

DO use the global configuration established in the Azure Core library to configure policies for non-HTTP protocols.

3.2. Implementing the API

Once you have worked through an acceptable API surface, you can start implementing the service clients.

3.2.1. Configuration

When configuring your client library, particular care must be taken to ensure that the consumer of your client library can properly configure the connectivity to your Azure service both globally (along with other client libraries the consumer is using) and specifically with your client library.

Client configuration

DO use relevant global configuration settings either by default or when explicitly requested to by the user, for example by passing in a configuration object to a client constructor.

DO allow different clients of the same type to use different configurations.

DO allow consumers of your service clients to opt out of all global configuration settings at once.

DO allow all global configuration settings to be overridden by client-provided options. The names of these options should align with any user-facing global configuration keys.

⛔️ DO NOT change behavior based on configuration changes that occur after the client is constructed. Hierarchies of clients inherit parent client configuration unless explicitly changed or overridden. Exceptions to this requirement are as follows:

  1. Log level, which must take effect immediately across the Azure SDK.
  2. Tracing on/off, which must take effect immediately across the Azure SDK.
Client library-specific environment variables

DO prefix Azure-specific environment variables with AZURE_.

☑️ YOU MAY use client library-specific environment variables for portal-configured settings which are provided as parameters to your client library. This generally includes credentials and connection details. For example, Service Bus could support the following environment variables:

Storage could support:

DO get approval from adparch for every new environment variable.

DO use this syntax for environment variables specific to a particular Azure service:

AZURE_<ServiceName>_<ConfigurationKey>

where ServiceName is the canonical shortname without spaces, and ConfigurationKey refers to an unnested configuration key for that client library.

⛔️ DO NOT use non-alpha-numeric characters in your environment variable names with the exception of underscore. This ensures broad interoperability.

3.2.2. Parameter validation

The service client will have several methods that perform requests on the service. Service parameters are directly passed across the wire to an Azure service. Client parameters are not passed directly to the service, but used within the client library to fulfill the request. Examples of client parameters include values that are used to construct a URI, or a file that needs to be uploaded to storage.

DO validate client parameters.

⛔️ DO NOT validate service parameters. This includes null checks, empty strings, and other common validating conditions. Let the service validate any request parameters.

DO validate the developer experience when the service parameters are invalid to ensure appropriate error messages are generated by the service. If the developer experience is compromised due to service-side error messages, work with the service team to correct prior to release.

3.2.3. Network requests

DO use the HTTP pipeline component for communicating to service REST endpoints.

The HTTP pipeline consists of a HTTP transport that is wrapped by multiple policies. Each policy is a control point during which the pipeline can modify either the request or response (or both). To standardize the way that client libraries interact with Azure services, we prescribe a default set of policies. The order in the list is the most sensible order for implementation.

DO implement the following policies in the HTTP pipeline:

☑️ YOU SHOULD use the policy implementations in Azure Core whenever possible. Do not try to “write your own” policy unless it is doing something unique to your service. If you need another option to an existing policy, engage the Azure SDK development team to add the option.

3.2.4. Native code

Some languages support the development of platform-specific native code plugins. These cause compatibility issues and require additional scrutiny. Certain languages compile to a machine-native format (for example, C or C++), whereas most modern languages opt to compile to an intermediary format to aid in cross-platform support.

⛔️ DO NOT write platform-specific / native code unless the language compiles to a machine-native format.

3.2.5. Authentication

When implementing authentication, don't open up the consumer to security holes like PII (personally identifiable information) leakage or credential leakage. Credentials are generally issued with a time limit, and must be refreshed periodically to ensure that the service connection continues to function as expected. Ensure your client library follows all current security recommendations and consider an independent security review of the client library to ensure you're not introducing potential security problems for the consumer.

⛔️ DO NOT persist, cache, or reuse security credentials. Security credentials should be considered short lived to cover both security concerns and credential refresh situations.

If your service implements a non-standard credential system (one that is not supported by Azure Core), then you need to produce an authentication policy for the HTTP pipeline that can authenticate requests given the alternative credential types provided by the client library.

DO provide a suitable authentication policy that authenticates the HTTP request in the HTTP pipeline when using non-standard credentials. This includes custom connection strings, if supported.

3.2.6. Error handling

Error handling is an important aspect of implementing a client library. It is the primary method by which problems are communicated to the consumer. There are two methods by which errors are reported to the consumer. Either the method throws an exception, or the method returns an error code (or value) as its return value, which the consumer must then check. In this section we refer to “producing an error” to mean returning an error value or throwing an exception, and “an error” to be the error value or exception object.

☑️ YOU SHOULD prefer the use of exceptions over returning an error value when producing an error.

DO produce an error when any HTTP request fails with an HTTP status code that is not defined by the service/Swagger as a successful status code. These errors should also be logged as errors.

DO ensure that the error produced contains the HTTP response (including status code and headers) and originating request (including URL, query parameters, and headers).

In the case of a higher-level method that produces multiple HTTP requests, either the last exception or an aggregate exception of all failures should be produced.

DO ensure that if the service returns rich error information (via the response headers or body), the rich information must be available via the error produced in service-specific properties/fields.

⚠️ YOU SHOULD NOT create a new error type unless the developer can perform an alternate action to remediate the error. Specialized error types should be based on existing error types present in the Azure Core package.

⛔️ DO NOT create a new error type when a language-specific error type will suffice. Use system-provided error types for validation.

DO document the errors that are produced by each method (with the exception of commonly thrown errors that are generally not documented in the target language).

3.2.7. Logging

Client libraries must support robust logging mechanisms so that the consumer can adequately diagnose issues with the method calls and quickly determine whether the issue is in the consumer code, client library code, or service.

DO support pluggable log handlers.

DO make it easy for a consumer to enable logging output to the console. The specific steps required to enable logging to the console must be documented.

DO use one of the following log levels when emitting logs: Verbose (details), Informational (things happened), Warning (might be a problem or not), and Error.

DO use the Error logging level for failures that the application is unlikely to recover from (out of memory, etc.).

DO use the Warning logging level when a function fails to perform its intended task. This generally means that the function will raise an exception. It does not include occurrences of self-healing events (for example, when a request will be automatically retried).

DO use the Informational logging level when a function operates normally.

DO use the Verbose logging level for detailed troubleshooting scenarios. This is primarily intended for developers or system administrators to diagnose specific failures.

⛔️ DO NOT send sensitive information in log levels other than Verbose. For example, remove account keys when logging headers.

DO log request line, response line, and headers, as Informational message.

DO log an Informational message if a service call is cancelled.

DO log exceptions thrown as a Warning level message. If the log level set to Verbose, append stack trace information to the message.

3.2.8. Distributed tracing

Distributed tracing mechanisms allow the consumer to trace their code from frontend to backend. To do this, the distributed tracing library creates spans - units of unique work. Each span is in a parent-child relationship. As you go deeper into the hierarchy of code, you create more spans. These spans can then be exported to a suitable receiver as needed. To keep track of the spans, a distributed tracing context (called a context in the remainder of this section) is passed into each successive layer. For more information on this topic, visit the OpenTelemetry topic on tracing.

DO support OpenTelemetry for distributed tracing.

DO accept a context from calling code to establish a parent span.

DO pass the context to the backend service through the appropriate headers (traceparent, tracestate, etc.) to support Azure Monitor. This is generally done with the HTTP pipeline.

DO create a new span for each method that user code calls. New spans must be children of the context that was passed in. If no context was passed in, a new root span must be created.

DO create a new span (which must be a child of the per-method span) for each REST call that the client library makes. This is generally done with the HTTP pipeline.

For more client library implementations, some of these requirements will be handled by the HTTP pipeline. However, as a client library writer, you must handle the incoming context appropriately.

3.2.9. Dependencies

Dependencies bring in many considerations that are often easily avoided by avoiding the dependency.

DO depend on the Azure Core library for functionality that is common across all client libraries. This library includes APIs for HTTP connectivity, global configuration, and credential handling.

⛔️ DO NOT be dependent on any other packages within the client library distribution package. Dependencies are by-exception and need a thorough vetting through architecture review. This does not apply to build dependencies, which are acceptable and commonly used.

☑️ YOU SHOULD consider copying or linking required code into the client library in order to avoid taking a dependency on another package that could conflict with the ecosystem. Make sure that you are not violating any licensing agreements and consider the maintenance that will be required of the duplicated code. “A little copying is better than a little dependency” (YouTube).

⛔️ DO NOT depend on concrete logging, dependency injection, or configuration technologies (except as implemented in the Azure Core library). The client library will be used in applications that might be using the logging, DI, and configuration technologies of their choice.

Language specific guidelines will maintain a list of approved dependencies.

3.2.10. Common library usage

There are occasions when common code needs to be shared between several client libraries. For example, a set of cooperating client libraries may wish to share a set of exceptions or models.

DO gain adparch approval prior to implementing a common library.

DO minimize the code within a common library. Code within the common library should be available to the consumer of the client library and shared by multiple client libraries within the same namespace.

DO store the common library in the same namespace as the associated client libraries.

A common library will only be approved if:

Let's take two examples:

  1. Implementing two Cognitive Services client libraries, we find a model is required that is produced by one Cognitive Services client library and consumed by another Coginitive Services client library, or the same model is produced by two client libraries. The consumer is required to do the passing of the model in their code, or may need to compare the model produced by one client library vs. that produced by another client library. This is a good candidate for choosing a common library.

  2. Two Cognitive Services client libraries throw an ObjectNotFound exception to indicate that an object was not detected in an image. The user might trap the exception, but otherwise will not operate on the exception. There is no linkage between the ObjectNotFound exception in each client library. This is not a good candidate for creation of a common library (although you may wish to place this exception in a common library if one exists for the namespace already). Instead, produce two different exceptions - one in each client library.

3.3. Testing Support

One of the key things we want to support is to allow consumers of the library to easily write repeatable unit-tests for their applications without activating a service. This allows them to reliable and quickly test their code without worrying about the vagaries of the underlying service implementation (including, for example, network conditions or service outages). Mocking is also helpful to simulate failures, edge cases, and hard to reproduce situations (for example: does code work on February 29th).

DO support mocking of network operations.

3.4. Documentation

There are several documentation deliverables that must be included in or as a companion to your client library. Beyond complete and helpful API documentation within the code itself (docstrings), you need a great README and other supporting documentation.

3.4.1. General guidelines

DO include your service's content developer in the adparch review for your library. To find the content developer you should work with, check with your team's Program Manager.

DO follow the Azure SDK Contributors Guide. (MICROSOFT INTERNAL)

DO adhere to the specifications set forth in the Microsoft style guides when you write public-facing documentation. This applies to both long-form documentation like a README and the docstrings in your code. (MICROSOFT INTERNAL)

☑️ YOU SHOULD attempt to document your library into silence. Preempt developers' usage questions and minimize GitHub issues by clearly explaining your API in the docstrings. Include information on service limits and errors they might hit, and how to avoid and recover from those errors.

As you write your code, doc it so you never hear about it again. The less questions you have to answer about your client library, the more time you have to build new features for your service.

3.4.2. Code snippets

DO include example code snippets alongside your library's code within the repository. The snippets should clearly and succinctly demonstrate the operations most developers need to perform with your library. Include snippets for every common operation, and especially for those that are complex or might otherwise be difficult for new users of your library. At a bare minimum, include snippets for the champion scenarios you've identified for the library.

DO build and test your example code snippets using the repository's continuous integration (CI) to ensure they remain functional.

DO include the example code snippets in your library's docstrings so they appear in its API reference. If the language and its tools support it, ingest these snippets directly into the API reference from within the docstrings. For example, use the .. literalinclude:: directive in Python docstrings to instruct Sphinx to ingest the snippets automatically.

⛔️ DO NOT combine more than one operation in a code snippet unless it's required for demonstrating the type or member, or it's in addition to existing snippets that demonstrate atomic operations. For example, a Cosmos DB code snippet should not include both account and container creation operationscreate two different snippets, one for account creation, and one for container creation.

Combined operations cause unnecessary friction for a library consumer by requiring knowledge of additional operations which might be outside their current focus. It requires them to first understand the tangential code surrounding the operation they're working on, then carefully extract just the code they need for their task. The developer can no longer simply copy and paste the code snippet into their project.

4. Releasing your Client library

Releasing your client library requires coordination between several groups, including the Azure SDK engineering systems, service, support, and documentation teams.

4.1. Release requirements

Releasing a client library involves integrating with the appropriate mono-repo. Release is performed by the Azure SDK release engineering team and only happens on successful build and test processes, in conjunction with the service team (in the case of a release that is coordinated with a service release).

DO integrate with the Azure SDK Engineering Services build, test, and release processes.

DO use a language-specific linter to enforce the design guidelines and required coding style.

⛔️ DO NOT make a GA release of a client library until the underlying service is also in GA, with a stable protocol.

⛔️ DO NOT make a GA release of a client library for a service whose protocol is based on a REST API until there is a stable (and approved) GA Swagger specification available.

4.2. Versioning

Consistent versioning allows consumers to determine what to expect from a new version of the library. However, versioning rules tend to be very idiomatic to the language. The engineering system release guidelines require the use of MAJOR.MINOR.PATCH format for the version.

DO change the version number of the client library when ANYTHING changes in the client library.

DO increment the patch version when fixing a bug.

⛔️ DO NOT include new features in a patch release.

DO increment the major or minor version when adding support for a service API version, or add a backwards-compatible feature.

⛔️ DO NOT make breaking changes. If a breaking change is absolutely required, then you MUST engage with the ADP Architecture Board prior to making the change. If a breaking change is approved, increment the major version.

☑️ YOU SHOULD increment the major version when making large feature changes.

DO provide the ability to call a specific supported version of the service API.

A particular (major.minor) version of a library can choose what service APIs it supports. We recommend the support window be no less than two service versions (if available) and no less than what is specified in the Fixed Lifecycle Policy for Microsoft business, developer, and desktop systems.

5. Appendix: Azure Core

Azure Core is a library that provides common services to other client libraries. These services include:

The following sections define the requirements for the Azure Core library. If you are implementing a client library in a language that already has an Azure Core library, you do not need to read this section. It is primarily targeted at developers who work on the Azure Core library.

5.1. The HTTP pipeline

The HTTP pipeline consists of a HTTP transport that is wrapped by multiple policies. Each policy is a control point during which the pipeline can modify either the request or response (or both). To standardize the way that client libraries interact with Azure services, we prescribe a default set of policies.

In general, the client library will only need to configure these policies. However, if you are producing a new Azure Core library (for a new language), you will need to understand the requirements for each policy.

5.1.1. Telemetry policy

Client library usage telemetry is used by service teams (not consumers) to monitor what SDK language, client library version, and language/platform info a client is using to call into their service. Clients can prepend additional information indicating the name and version of the client application.

DO send telemetry information in the User-Agent header using the following format:

[<application_id> ]azsdk-<sdk_language>-<package_name>/<package_version> <platform_info>

🔍Example 1. (Example User-Agent header values)
For example, if we re-wrote AzCopy in each language using the Azure Blob Storage client library, we may end up with the following user-agent strings:

☑️ YOU SHOULD send telemetry information that is normally sent in the User-Agent header in the X-MS-UserAgent header when the platform does not support change the User-Agent header. Note that services will need to configure log gathering to capture the X-MS-UserAgent header in such a way that it can be queried through normal analytics systems.

☑️ YOU SHOULD send additional (dynamic) telemetry information as a semi-colon separated set of key-value types in the X-MS-AZSDK-Telemetry header. For example:

```http
X-MS-AZSDK-Telemetry: class=BlobClient;method=DownloadFile;blobType=Block
```

Use the following keys as specific meaning:

Any other keys that are used should be common across all client libraries for a specific service. DO NOT include personally identifiable information (even encoded) in this header. Note that services will need to configure log gathering to capture the X-MS-SDK-Telemetry header in such a way that it can be queried through normal analytics systems.

5.1.2. Unique request ID policy

✏️Todo. Unique Request ID Policy requirements

5.1.3. Retry policy

There are many reasons why failure can occur when a client application attempts to send a network request to a service. Some examples are timeout, network infrastructure failures, service rejecting the request due to throttle/busy, service instance terminating due to service scale-down, service instance going down to be replaced with another version, service crashing due to an unhandled exception, etc. By offering a built-in retry mechanism (with a default configuration the consumer can override), our SDKs and the consumer's application become resilient to these kinds of failures. Note that some services charge real money for each try and so consumers should be able to disable retries entirely if they prefer to save money over resiliency.

For more information, see “Transient fault handling”.

The HTTP Pipeline provides this functionality.

DO offer the following configuration settings:

☑️ YOU MAY offer additional retry mechanisms if supported by the service - For example, the Azure Storage Blob service supports retrying read operations against a secondary datacenter, or recommends the use of a per-try timeout for resilience.

DO reset (or seek back to position 0) any request data stream before retrying a request

DO honor any cancellation mechanism passed in to the caller which can terminate the request before retries have been attempted.

DO update any query parameter or request header that gets sent to the service telling it how much time the service has to process the individual request attempt.

DO retry in the case of a hardware network failure as it may self-correct.

DO retry in the case of a “service not found” error as the service may be coming back online or a load balancer may be reconfiguring itself.

DO retry if the service successfully responds indicating that it is throttling requests (for example, with an “x-ms-delay-until” header or similar metadata).

⛔️ DO NOT retry if the service responds with a 400-level response code unless a retry-after header is also returned.

⛔️ DO NOT change any client-side generated request-id as this represents the logical operation and should be the same across all physical retries of this operation. When looking at server logs, multiple entries with the same client request-id show each retry and this is useful information to help diagnose issues.

☑️ YOU SHOULD implement a default policy that starts at 3 retries with a 0.8s delay with exponential (plus jitter) backoff.

5.1.4. Authentication policy

Services across Azure use a variety of different authentication schemes to authenticate clients. Conceptually there are two entities responsible for authenticating service client requests, a credential and an authentication policy. Credentials provide confidential authentication data needed to authenticate requests. Authentication policies use the data provided by a credential to authenticate requests to the service. It is essential that credential data can be updated as needed across the lifetime of a client, and authentication policies must always use the most current credential data.

⛔️ DO NOT persist, cache, or reuse tokens returned from the token credential. This is CRITICAL as credentials generally have a short validity period and the token credential is responsible for refreshing these.

DO implement Bearer authorization policy (which accepts a token credential and scope).

5.1.5. Response downloader policy

The response downloader is required for most (but not all) operations to change whatever is returned by the service into a model that the consumer code can use. An example of a method that does not deserialize the response payload is a method that downloads a raw blob within the Blob Storage client library. In this case, the raw data bytes are required. For most operations, the body must be downloaded in totality before deserialization. This pipeline policy must implement the following requirements:

DO download the entire response body and pass the complete downloaded body up to the operation method for methods that deserialize the response payload. If a network connection fails while reading the body, the retry policy must automatically retry the operation.

5.1.6. Distributed tracing policy

As mentioned earlier, distributed tracing mechanisms allow the consumer to trace their code from frontend to backend. To do this, the distributed tracing library creates spans - units of unique work. Each span is in a parent-child relationship. As you go deeper into the hierarchy of code, you create more spans. These spans can then be exported to a suitable receiver as needed. To keep track of the spans, a distributed tracing context (called a context within the rest of this section) is passed into each successive layer. For more information on this topic, visit the OpenTelemetry topic on tracing.

The Distributed Tracing policy is responsible for:

See also Distributed Tracing with the client library implementation.

DO support OpenTelemetry for distributed tracing.

DO accept a context from calling code to establish a parent span.

DO pass the context to the backend service through the appropriate headers (traceparent, tracestate, etc.) to support Azure Monitor.

DO create a new span (which must be a child of the per-method span) for each REST call that the client library makes.

5.1.7. Logging policy

Many logging requirements within Azure Core mirror the same requirements for logging within the client library.

DO allow the client library to set the log handler and log settings.

DO use one of the following log levels when emitting logs: Verbose (details), Informational (things happened), Warning (might be a problem or not), and Error.

DO use the Error logging level for failures that the application is unlikely to recover from (out of memory, etc.).

DO use the Warning logging level when a function fails to perform its intended task. This generally means that the function will raise an exception. It does not include occurrences of self-healing events (for example, when a request will be automatically retried).

DO use the Informational logging level when a function operates normally.

DO use the Verbose logging level for detailed troubleshooting scenarios. This is primarily intended for developers or system administrators to diagnose specific failures.

⛔️ DO NOT send sensitive information in log levels other than Verbose. For example, remove account keys when logging headers.

DO log request line, response line, and headers as Informational message.

DO log an Informational message if a service call is cancelled.

DO log exceptions thrown as a Warning level message. If the log level set to Verbose, append stack trace information to the message.

5.1.8. Proxy

Apps that integrate the Azure SDK need to operate in common enterprises. It is a common practice to implement HTTP proxies for control and caching purposes. Proxies are generally configured at the machine level and, as such, are part of the environment. However, there are reasons to adjust proxies (for example, testing may use a proxy to rewrite URLs to a test environment instead of a production environment). The Azure SDK and all client libraries should operate in those environments.

There are a number of common methods for proxy configuration. However, they fall into four groups:

  1. Inline, no authentication (filtering only)
  2. Inline, with authentication
  3. Out-of-band, no authentication
  4. Out of band, with authentication

For inline/no-auth proxy, nothing needs to be done. The Azure SDK will work without any proxy configuration. For inline/auth proxy, the connection may receive a 407 Proxy Authentication Required status code. This will include a scheme, realm, and potentially other information (such as a nonce for digest authentication). The client library must resubmit the request with a Proxy-Authorization header that provides authentication information suitably encoded for the scheme. The most common schemes are Basic, Digest, and NTLM.

For an out-of-band/no-auth proxy, the client will send the entire request URL to the proxy instead of the service. For example, if the client is communicating to https://foo.blob.storage.azure.net/path/to/blob, it will connect to the HTTPS_PROXY and send a GET https://foo.blob.storage.azure.net/path/to/blob HTTP/1.1. For an out-of-band/auth proxy, the client will send the entire request URL just as in the out-of-band/no-auth proxy version, but it may send back a 407 Proxy Authentication Required status code (as with the inline/auth proxy).

WebSockets can normally be tunneled through an HTTP proxy, in which case the proxy authentication happens during the CONNECT call. This is the preferred mechanism for tunneling non-HTTP traffic to the Azure service. However, there are other types of proxies. The most notable is the SOCKS proxy used for non-HTTP traffic (such as AMQP or MQTT). We make no recommendation (for or against) support of SOCKS. It is explicitly not a requirement to support SOCKS proxy within the client library.

Most proxy configuration will be done by adopting the HTTP pipeline that is common to all Azure service client libraries.

DO support proxy configuration via common global configuration directives configured on a platform or runtime basis.

DO support Azure SDK-wide configuration directives for proxy configuration, including disabling the proxy functionality.

DO support client library-specific configuration directives for proxy configuration, including disabling the proxy functionality.

DO log 407 Proxy Authentication Required requests and responses.

DO indicate in logging if the request is being sent to the service via a proxy, even if proxy authentication is not required.

DO support Basic and Digest authentication schemes.

☑️ YOU SHOULD support the NTLM authentication scheme.

There is no requirement to support SOCKS at this time. We recommend services adopt a WebSocket connectivity option (for example, AMQP or MQTT over WebSockets) to ensure compatibility with proxies.

5.2. Global configuration

The Azure SDK can be configured by a variety of sources, some of which are necessarily language-dependent. This will generally be codified in the Azure Core library. The configuration sources include:

  1. System settings
  2. Environment variables
  3. Global configuration store (code)
  4. Runtime parameters

DO apply configuration in the order above by default, such that subsequent items in the list override settings from previous items in the list.

☑️ YOU MAY support configuration systems that users opt in to that do not follow the above ordering.

DO be consistent with naming between environment variables and configuration keys.

DO log when a configuration setting is found somewhere in the environment or global configuration store.

☑️ YOU MAY ignore configuration settings that are irrelevant for your client library.

5.2.1. System settings

☑️ YOU SHOULD respect system settings for proxies.

5.2.2. Environment variables

Environment variables are a well-known method for IT administrators to configure basic settings when running code in the cloud.

Well-known environment variables

DO load relevant configuration settings from the environment variables listed in Table 1.

Environment Variable Purpose
Proxy Settings
HTTP_PROXY Proxy for HTTP connections
HTTPS_PROXY Proxy for HTTPS connections
NO_PROXY Hosts which must not use a proxy
Identity
MSI_ENDPOINT AAD MSI Credentials
MSI_SECRET AAD MSI Credentials
AZURE_SUBSCRIPTION_ID Azure subscription
AZURE_USERNAME Azure username for U/P Auth
AZURE_PASSWORD Azure password for U/P Auth
AZURE_CLIENT_ID AAD
AZURE_CLIENT_SECRET AAD
AZURE_TENANT_ID AAD
AZURE_RESOURCE_GROUP Azure RG
AZURE_CLOUD mooncake, govcloud, etc.
Pipeline Configuration
AZURE_TELEMETRY_DISABLED Disables telemetry
AZURE_LOG_LEVEL Enable logging by setting a log level.
AZURE_TRACING_DISABLED Disables tracing

Table 1. Well-known environment variables

DO prefix Azure-specific environment variables with AZURE_.

DO support CIDR notation for NO_PROXY.

5.2.3. Global configuration

Global configuration refers to configuration settings that are applied to all applicable client constructors in some manner.

DO support global configuration of shared pipeline policies including:

DO provide configuration keys for setting or overriding every configuration setting inherited from the system or environment variables.

DO provide a method of opting out from importing system settings and environment variables into the configuration.

5.3. Authentication and credentials

OAuth token authentication, obtained via Managed Security Identities (MSI) or Azure Identity is the preferred mechanism for authenticating service requests, and the only authentication credentials supported by the Azure Core library.

DO provide a token credential type that can fetch an OAuth-compatible token needed to authenticate a request to the service in a non-blocking atomic manner.