dotnet/docs/ecosystem-issues.md

527 lines
26 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

# Growing the .NET ecosystem
**Immo Landwerth**
*This document is an output of the planning process for .NET 6. It's written
from a Microsoft perspective and it's a collection of problem statements and
assertions based on how we perceive the .NET ecosystem. The sections at the end
contain next steps we'd like to take to validate or invalidate some of the
explicit (or implicit) hypotheses in this document. We're collaborating with the
.NET Foundation [working group on ecosystem
grow](https://github.com/dotnet-foundation/ecosystem-growth) and have presented
this document there as well*.
Today, we're often (even weekly) approached by both customers and internal
partners to build libraries that already exist in the .NET ecosystem because
people feel they can't trust 3rd parties and sometimes even 1st party when the
technology isn't part of .NET. This results in a negative cycle whereby adding
new 1st class libraries from Microsoft further underlines the notion that
quality and longevity is only provided by Microsoft, and it continues.
In the .NET Core 3.1 timeframe, we started a maturity model which didn't go well
with the .NET Community, for a variety of reasons, but mostly due to the way we
collaborated & communicated with the community. We believe we can learn from
this experience and pick this idea up again.
The overall goal is to enable .NET consumers to make trust decisions and to
encourage customers (external as well as internal) to trust libraries that
aren't built by Microsoft. This work will comprise of a user experience when
consuming libraries (such as when adding packages using the NuGet UI), a defined
set of practices (such as making source code and symbols available for debugging
as well as allowing to patch the code), a defined development process (how to
distribute releases, dealing with breaking changes and security issues), as well
as initiatives in order to grow the set of non-Microsoft controlled libraries.
Getting critical adoption of this model in the .NET ecosystem will likely take
years, but the goal for .NET 6 is to create the vision, get buy-in from the .NET
Foundation and the open source community at large, as well as getting some of
the fundamentals in-place.
## Grow the set of trusted libraries that aren't controlled by Microsoft
### Why?
There is a perception that other ecosystems (specifically Java, JavaScript, and
Python) have more technological diversity and thus an overall stronger open
source ecosystem. Historically, we've taught customers to expect all the
features to come from Microsoft. Since we can't build everything, especially not
at a pace at which other OSS ecosystems evolve, the set of trusted libraries for
.NET must grow beyond just Microsoft.
### Summary
We need to normalize the practice that application developers can depend on
libraries that aren't controlled by Microsoft.
This requires that we make it easier for customers to evaluate quality and for
package authors to deliver a consistent quality. This is tackled by other
objectives in this document.
This also requires a culture shift at Microsoft. Today, we're usually reactive
when it comes to library & framework investments. By the time we know there is a
need for a library (say, a new serialization protocol) we routinely research
existing options but usually end up rolling our own, because nothing fits the
bill as-is and we either don't have the time or we believe we wouldn't be able
to successfully influence the design of the existing library. This results in a
perception where Microsoft "sucks the air" out of the OSS ecosystem because our
solutions are usually more promoted and often tightly integrated into the
platform, thus rendering existing solutions less attractive. Several maintainers
have cited this as a reason that they gave up or avoid building libraries that
seem foundational. See [this section](#feedback-critical-opinions-on-the-microsoft-oss-ecosystem) with
pointers to blog posts.
To avoid this, we need to start engaging with owners of existing libraries and
work with them to increase their quality (this implies documentation of
processes, patterns, and availability of tools which is tackled by another
objective in this document). We're doing this quite successfully with gRPC,
OpenTelemetry, and Apache Arrow/Spark already, but it should become a general
practice. We should even consider an ongoing work item to actively investigate
widely used libraries and help them raise the quality or tighten their
integration into the .NET developer experience. There is a separate objective in
this document that describes the need for product extension points that enable
that for 3rd parties.
We also need to change our approach when we create net-new technologies for
which there is no ecosystem yet, as is the case for GPIO-based IOT libraries for
.NET Core. Instead of us building everything, we should create new projects in
such a way that we're not the sole maintainer and actively seek out and invite
external contributors to take part in long term ownership and maintenance. This
also includes our naming patterns. For example, it has become the norm that we
name packages and namespaces with Microsoft, which isn't very friendly for
creating multi-vendor projects.
One challenge that exists in the .NET OSS ecosystem is that most successful
libraries are developed by a single person as a hobby. While this often results
in highly rated libraries (because they are labors of love) it creates a high
risk for maintainer burnout and thus uncertainty for support and longevity. This
is also reflected by the projects we see in the .NET Foundation as well as for
members of the board -- virtually nobody has the luxury of being paid by a
company to do this as part of their job.
When you look at other foundations, such as the Linux Foundation or the Apache
Foundation, you see a very different picture: most maintainers aren't doing this
as a hobby, they are employees and work on this as part of their job. This is
also captured in the book [Working in Public: The Making and Maintenance of Open
Source Software](https://www.amazon.com/dp/0578675862).
While there are exceptions in the .NET ecosystem (Samsung or Unity) it very much
is not the norm. Microsoft should take a more active role to work with the .NET
Foundation and help found a program where company-sponsored "maintainerships"
are being created, that is, they sponsor one or more employees who help maintain
a given library as part of their job. The goal is to create libraries that are
co-owned by employees from various companies, akin to how the Apache Foundation
works. This would be an alternative to monetary sponsorships which don't provide
a dependable income and hence do little to reduce the maintainers' stress
because it still demands balancing OSS commitments with their jobs and personal
lives. Perhaps Microsoft should also lobby for making the .NET Foundation more
similar to Apache who apparently make a concerted effort to ensure that project
committee membership is diverse so that the withdrawal of interest by a single
party would not jeopardize the project.
Another challenge is around support. There seems to be a perception that
Microsoft produced code is always supported and anybody's else code isn't. We
often hear the need for support as a reason why non-Microsoft libraries aren't
chosen but it's unclear whether that means having an option for paid support or
whether it's simply lack of confidence that the library will be around tomorrow.
We should do customer development in this space and use the results to inform
whether there is a need for paid support for non-Microsoft libraries. If there
is, we should work with the .NET Foundation to define how this model would work
so that companies can offer support for libraries that they don't control. This
would include Microsoft but also other companies (such as Red Hat, Samsung or
whoever else wants to offer paid support).
### End to End Scenarios
* We need to define joint-ownership mechanics for libraries. We should try this
in the context of the IOT libraries (and potentially other customer asks, such
as additional collections we'll likely never add to the BCL).
* We need to do customer development with existing OSS maintainers to understand
if maintainerships are a good plan to address maintainer burn out and ensure
project longevity.
* We need to do customer development to understand the consumer's concerns
around support.
### Engineering Needs
* We should use telemetry, customer, and partner input to select a set of
non-Microsoft owned libraries that we can help make better. This includes
offering API reviews, as well as contributing fixes and features. The goal is
to make this a continuous effort so that we spread our expertise outside of
Microsoft and normalize the practice that we help make .NET better, no matter
who owns the code.
### Customer Cohorts
* OSS library maintainers
* Internal partners at Microsoft, enterprise customers, and ISVs who consume
OSS libraries and have concerns around quality and support
## 3rd party experiences can be as good as 1st party experiences
### Why?
In order to level the playing field and make it possible that the .NET platform
can promote and bet on non-Microsoft controlled technologies, we need the
ability to provide a curated discovery & acquisition experience for optional
components that is used by both 1st party as well as 3rd parties.
### Summary
At Microsoft we strive to deliver end-to-end experiences, for example, building
a Windows Forms application requires a shared framework, a set of project and
item templates, tooling in Visual Studio, and documentation. We deliver all of
this as part of the core product as this makes the experience seamless.
With .NET 6 and the support for the mobile workloads we're moving to a model
where part of .NET is optional. This ensures the core product can be small and
snappy to install while still supporting the full breadth of .NET.
However, this work itself doesn't guarantee that 3rd parties can also provide
optional components that customers can easily discover and install, thus
potentially still having a clear boundary between "this is from Microsoft" and
"this is 3rd party".
At the same time, we don't want to provide an open registry like nuget.org as
the primary mechanism to discover and extend the core product because this will
result in a lot of noise. We still want a curated experience. The only
difference is that the curated set can be comprised of Microsoft and
non-Microsoft provided functionality.
### End to End Scenarios
* .NET developers have a single-entry point for discovering and acquiring
additional workloads. This entry point will be use works for all 1st party
provided components, such as ASP.NET Core, EF Core, and Xamarin.
* The experience presents Microsoft provided components as peers to
non-Microsoft provided components. This aids discovery of related technologies
and ensures customers can choose the best tool for the job.
* The set of available components is curated. This list needs to be owned by
some party, but it doesn't have to be just Microsoft (but we should have a
say, too). For example, there could be a .NET Foundation working group that
can decide which technologies are important enough to be a part of that core
product. This requires defining a bar that components need to pass to be
considered for addition, which includes relevance, quality, and project
health.
### Engineering Needs
* Move built-in components such as ASP.NET Core and EF to the optional workload
model to level the playing fields and ensure expressiveness of extension
points.
* Provide extension points for all parts of the product that need to be extended
by optional components, such as templates, shared frameworks, and tooling.
### Customer Cohorts
* Providers of web frameworks such as Nancy and Giraffe
* Providers of OR mappers such as NHibernate, Dapper, LLBLGen
* Providers of unit test frameworks such as xUnit and NUnit
* Providers of application models such as Uno
* Providers of cloud integrations such as Pulumi and Farmer
## Developers can make trust decisions about packages and binaries they consume
### Why?
Our library ecosystem is entirely revolving around NuGet packages which contain
binaries. It is essential that customers can have the confidence that using
pre-built code doesn't corner them. This includes the ability to reproduce them
if necessary but also includes installing packages will not result in build or
runtime errors due to incompatibilities that NuGet didn't surface early enough.
### Summary
.NET's ecosystem uses binaries as the primary exchange mechanism. This solves
various problems, especially shielding the consumer from having to replicate the
build environment. It's also what enables the multi-language ecosystem. All in
all, binaries have worked very well for .NET, especially because the underlying
format is rich and self-describing.
.NET uses Authenticode signing for binaries and packages. The .NET Foundation
has made it easier for member projects to get a code signing certificate and
actively pushes its projects to do so. On top, the official NuGet gallery signs
packages to indicate they came from nuget.org.
However, one largely unsolved problem is the link between source code and
binaries. While a significant portion of packages on nuget.org are open source
and often link to the GitHub repo, there is no way to ensure that the source
code being pointed too actually matches the binaries submitted to nuget.org.
Which could either be by choice of the uploader or because the uploader used a
compromised tool chain that injects code that wasn't part of the source code.
In other words, just because you can trust the source code doesn't mean you can
trust the binaries. And while nuget.org scans packages for virus and malware,
there is no way to detect all possible malware. For all you know, someone, or
something, [injected a bit coin
miner](https://www.theregister.com/2018/11/26/npm_repo_bitcoin_stealer/).
There also less malicious trust issues with consuming NuGet packages: the
package manager UI will offer packages that might not install or are not
relevant for the consuming project. It's frustrating and erodes trust when
customers try to use them only to see NuGet fail to install the package or
(worse) seeing the app crash at runtime. In many cases this isn't a bug in the
package, it's a missing feature on the NuGet side to use the package information
to communicate to the customer what will and will not work.
A closely related issue is that finding a package that is compatible with a
project can be a challenge to developers in the .NET ecosystem.
### End to End Scenarios
* Package consumers know whether a given package offers sources for debugging.
* Package consumers know whether a given package can be reproduced from sources.
* Package consumers will understand whether a given package will work before
trying to install them. They should clearly be able understand what platforms
are supported by viewing a NuGet package's details instead of having to guess
& check by installing the package or downloading the package to ensure there
is an available asset for their platform.
* Package consumers can evaluate packages by popularity, quality, and
maintenance.
### Engineering Needs
* We need to create a validation tool that can check whether a given binary can
be reproduced.
* The validation tool needs to be integrated into the NuGet.org ingestion
pipeline so that it can serve the results in searches and the package pages.
* We need to improve the NuGet gallery and the NuGet package management UI to
service information regarding compatibility, relevance to the consuming
project type, popularity, and quality metrics, such as how well it is
maintained.
### Customer Cohorts
* Package consumers who wish to manually check results on their own.
* Package authors, as we need them to enable support for debugging and
reproducibility.
* NuGet.org, to validate packages on upload and present the status as part of
overall package/project health.
* Internal partner Terrapin, which is an effort to secure Azure Ring 0 and Ring
1 services.
## Developers have access to guidance & tools that supports them in building cross-platform packages
### Why?
With .NET Core & Xamarin we have made cross-platform a mainstream requirement
for library authors. However, we lack comprehensive tooling and guidance to
ensure that library developers fall into the pit of success, which results in
packages that don't work well which in turn hurts our ecosystem. This is
especially problematic for emerging platforms where adoption isn't high enough
to warrant special attention by library authors.
### Summary
We have pushed the envelope with what NuGet can do since we did .NET Core 1.0.
However, we have solved many of the hard problems only for the platform layer,
rather than focusing on problems library authors have day to day. In the
dotnet/runtime repo we use custom tooling to build packages, which other repos,
and more importantly, our customers don't use. This makes it very hard to build
packages that, for example, contain native code or packages that need to build
different binaries for different operating systems or CPU architecture.
Specifically, the ability to multi-target for a runtime identifier (RID) isn't
well supported at all.
Furthermore, the tooling we provide as part of the SDK has close to zero
validation that multi-targeted packages are well-formed. For example, a package
that multi-targets for .NET Core 3.1 and .NET Standard 2.0 needs to ensure that
code compiled against the .NET Standard 2.0 binary can run against the binary
that is produced for .NET Core 3.1. In the dotnet/runtime repo we have tooling
that ensures that this is the case, but this can't easily be used by our
customers. We have seen this issue in the wild, even with 1st parties, for
example, the Azure AD libraries.
And while there are several well-established patterns for building
cross-platform packages using the bait & switch approach, our guidance for
building cross-platform packages is still very rudimentary. Especially with .NET
5 and the advent of operating system specific framework names we need to provide
more prescriptive guidance.
### End to End Scenarios
* Multi-targeting for all operating systems and CPU architectures needs to be
supported in the same way that multi-targeting for different frameworks is.
* Packages are automatically validated to ensure that consumers don't
experience compatibility issues because provided assets aren't compatible with
different multi-targeted versions of themselves.
### Engineering Needs
* We need to integrate RID targeting into the multi-targeting experience
* We need to extract and productize API compatibility tooling (such as our
APICompat) and make it part of the package multi-targeting experience
* We need have guidance for how to multi-target across frameworks, operating
systems, and architectures
* We need to have guidance for how to deploy & wrap native code
### Customer Cohorts
* NuGet package authors
* NuGet package consumers
## Package consumers can patch their dependencies
### Why?
Using binaries as the primary exchange vehicle in conjunction with
package/assembly identity makes it harder for people to experiment and patch
dependencies. Other ecosystems don't suffer from this to the same degree because
they are either source based or have specific features to address that.
### Summary
From the very start, .NET had a notion of identity (via strong naming and the
GAC). In spirit, NuGet is a continuation of this idea. Identity is important to
enable sharing. However, it also has problems when only one party can produce
the artifact under that identity. With strong naming, this was originally by
design. In .NET Core, we have pretty much neutered this constraint and for .NET
Framework we provide public signing to work this around (to a degree).
In case of NuGet packages this problem is harder to solve. The central nuget.org
registry requires that only the party owning a given package can provide new
versions. However, this is mightily inconvenient when you want to modify
packages locally, for example, to experiment with an OSS contribution or when
you want to patch an issue you're having with a package.
In source-based ecosystems this problem isn't as big because sources are
inherently patchable.
We should make it easier for application developers to convert a package
dependency to source dependency that they can control and patch. In some cases,
it might be desirable to make the patched version available (publicly or
internally).
However, this generally requires a new package ID which now means that all
consumers would need to rebuild all their dependencies to use the new ID. A
better option would be to allow application authors to redirect package ID (and
version) to a different ID (and version). This also solves the problem where a
package author no longer maintains a package or doesn't want to support the
package for a specific operating system or framework. Someone else can provide
the package and application authors can replace the canonical one with the
forked one.
Other ecosystems (specifically Java and NPM) have the ability for private
dependencies, where a given package dependency is considered an implementation
detail and doesn't become a direct dependency of the consumer. This would allow,
for example, the Azure SDK to depend on JSON.NET without constraining if and
which version of JSON.NET the application can use (Java calls this shading). To
be reliable, shading would need to enforce that shaded dependencies cannot leak
into the public APIs.
### End to End Scenarios
* Package consumers can replace a package reference with a version that they
built themselves.
* Package consumer can replace a package with one that has a different ID and
version.
### Engineering Needs
* NuGet and runtime need to support shaded dependencies.
* NuGet needs to support redirecting a package ID in the graph. Non-shaded
dependencies can only be replaced at the application level while shaded
dependencies could be replaced at the level that shades them.
### Customer Cohorts
* Package consumers
## Feedback: Critical opinions on the Microsoft OSS ecosystem
Here is a list of blog posts that have criticized Microsoft for not playing well
with the OSS ecosystem:
* [The New Rules for Playing in Microsoft's Open Source
Sandbox](https://aaronstannard.com/new-rules-dotnet-oss/), by Aaron Stannard
* [How to Build Sustainable Open Source Software
Projects](https://aaronstannard.com/sustainable-open-source-software/), by Aaron Stannard
* [The Next Decade of .NET Open
Source](https://aaronstannard.com/next-decade-dotnet/), by Aaron Stannard
* [The Day AppGet
Died](https://medium.com/@keivan/the-day-appget-died-e9a5c96c8b22), by Keivan
Beigi
* [Why we terminated our partnership with
Microsoft](https://paulstovell.com/re-next-decade-of-open-source/), by Paul
Stovell
## Action: Customer Development
In order support these objectives, we need to collect more customer data.
Specifically, we should research the following:
* **Quantitative competitive analysis of OSS ecosystems**. We should consider
ordering market research that compares the perceptions of various open source
ecosystems, such as .NET, JS/NPM, and Java. This would help us to gain more
insight into how the market views .NET and how that compares to other
ecosystems. Right now, such insights are anecdotal since the sample size are
too small.
* **How businesses select which technologies and libraries to depend on**. There
is lots of anecdotal evidence that Microsoft customers prefer to use
technologies built by & supported by Microsoft. Sometimes, the same customers
also use other ecosystems, such as Java and NPM, where this isn't an option.
We need to understand what they are looking for in each of these ecosystems
and how they think of this double standard. This data will be used to inform
how we proceed from the maturity model.
* **What is important to library developers**. There is an inherent assumption
that library developers want to be consumed by as many consumers as possible.
First, this might not actually be true in all cases but even if it is, it
doesn't necessarily mean that library authors greatly care to be attractive
for business. For instance, would library developers appreciate a maturity
model that prescribes specific practices and quality gates, or would they
ignore it? Are there any concerns that are specific to library authors
depending on other libraries (compared to app developers depending on
libraries)?
## Action: Technical Insights
* **Shading in Java.** We need to understand how shading in Java works.
Specifically, what does it enable, what is the resulting user experience for
the library producer & consumer, is there validation that prevents shaded
dependencies from leaking, how does it handle conflicts & how well, do package
managers have a direct understanding of shading, and are there are any known
issues with this feature.
* **Side-by-side loading in Java.** Another feature of Python and JS is the
ability to load multiple versions of a component if its major version has
changed. For example, LibA is used by B and C, B requires LibA v1.0 and C
requires LibA v2.0 and there are breaking changes between v1 and v2. (Using
semver, it is assumed there are breaking changes across major versions). A
quote from a dev working on Azure SDK for JS: "JavaScript uses many
dependencies that are non-Microsoft owned as this is easy to do (libraries
follow semantic versioning, side-by-side versioning is possible and can be
configured via package.json and so it is typical and expected in our
ecosystem)". Both Python and JS aren't compiled language so doing this
relatively simple. It would be interesting to see how side-by-side loading
works in Java and whether this is tied to shading.