The problems with data sharing
The world needs a data sharing platform and that's why Bobsled was created.
6 min read
Thanks to the amazing people who gave me feedback and helped this article come to life.
Bobsled is building a cross-cloud data sharing platform. The mission is to make data sharing friction and effortless, to any cloud or platform. By making data sharing super easy and fast, Bobsled will help the world come to value and decisions faster.
Companies share data with other companies. Especially as companies grow, different types of data serve as the foundation to learn things and come to decisions.
A share is a term for the sharing of data that gets done between a provider and a consumer.
A provider is the organization or business unit that shares data.
A consumer is the organization or business unit that receives data.
Companies store their data in numerous different places, in today's age often in one of the popular cloud platforms.
Generally speaking, three types of data form the foundation for business operations and analytics within many enterprises:
First-party data is data a company can collect from its own sources: For example, from interactions with customers.
Second-party data is data you acquire from a trusted partner: For example, data from a SaaS vendor.
Third-party data is data collected from external sources that don’t have a relationship with your business. Examples: demographic, weather, and financial market data.
First-party data are direct relationships with the customers compared to the other types of data.
Explosive growth of third-party data
The demand for third-party data has blown up over the past years. According to Explorium: 2022 State of External Data:
The market for 3rd party data is roughly $100b a year, growing at 10-15% CAGR (Compound Annual Growth Rate).
52% of companies surveyed said they were purchasing data from 5 or more paid or public sources - this is up from 9% in 2021.
41% of companies spend more than $500,000 annually on external data, with many spending well into eight or nine figures.
The traditional problems with data sharing
Let's go over the traditional problems with data sharing before we dig into the problems of modern data sharing. The problems span from manually doing & customizing things to building and maintaining custom pipelines.
A data file is emailed from a provider to a consumer:
Emails aren't secure.
It is hard to edit and undo things with emails.
There can be limitations around sending very large files.
File Transfer Protocol (FTP), data files are shared and downloaded between two computers or via the Internet:
FTP is not secure.
FTP is unreliable, it can take a lot of time to develop, maintain and troubleshoot scripts.
FTP is outdated, it can't grow with your organization as you need more features and security.
Secure File Transfer Protocol (SFTP), like FTP but encrypted and secure:
Requires opening special ports.
Not designed around collaboration.
The interaction is binary and cannot be logged as something human-readable.
An API (Application Programming Interface) is used to initiate and manage the data transfer:
Data is stored at least twice.
There is a need for API management. Constant development and fixing broken things that require a lot of time.
Issues with keeping copies of data set in sync and correct.
ETL (Extract, Transform, Load) software extracts data from the provider’s database, transforms it into a format that fits for consumption, and then loads it into the consumer’s database:
Lack of ETL developers.
Takes time to develop (we're speaking of weeks to months) and then maintain & fixing broken things.
Data formats change over time.
Copy and provide credentials
The provider stores a copy of the data and provides the consumer with credentials for accessing it:
Something can go wrong when sharing credentials
If the provider needs to share data with multiple consumers, this process won't scale
I mentioned a few problems with the common traditional ways of sharing data. You can probably understand why modern data sharing solutions were built. The traditional ways of sharing data all contained problems and pain.
Modern data sharing
There are multiple modern ways of sharing data in today's age:
AWS Data Exchange (both S3 and Redshift)
Snowflake Data Sharing and Snowflake Marketplace
Azure Data Share
In particular, Snowflake is very famous for having removed the pain that came with sharing data traditionally.
Sharing data within those cloud platforms works like a charm, but sharing data between multiple different cloud platforms get quite tricky and leads to similar problems that previously existed. This is inevitable because all companies don't store their data on the same cloud platform.
So, what are the problems with the modern solutions mentioned? Hasn't the problem with data sharing been solved? Not quite!
Sharing data with external sources hasn't fully been solved yet, in fact, new problems have been created.
Cloud platforms are building walled gardens. Providers' data is trapped within the cloud providers.
To share data between different platforms, providers need to build their integrations. For an instance, if they want to share data from Google to Amazon.
Building and managing custom integrations are expensive.
By always having pipelines to develop and maintain, providers can't keep up with the demand.
Custom pipelines will always be different when developed, leading to another point of failure.
Image clarifying the current situation with modern data sharing solutions:
Just Bobsled It
What if you didn't have to move your data to a platform or build custom pipelines to share it with a provider that is on a different platform than you?
What if you could just with a few clicks share data to any cloud or platform, fulfilling the vision we mentioned?
That's right, Just Bobsled It. Just use Bobsled to share your data. In our language, sled your data!
Bobsled isn't a feature, it is THE data sharing product, the ultimate solution for data sharing. A cross-cloud data sharing platform.
Bobsled is cloud & platform agnostic. Share data to any cloud or platform without developing custom pipelines or having to move your data to different platforms.
Bobsled is quick & flexible. Share different formats of data in a single delivery, customize existing deliveries, automate deliveries to run on a schedule & much more.
Bobsled is designed to enable providers to deliver to any customer, regardless of originating sources or delivery requirements.
Image of sledding your data (sharing your data using Bobsled):
The outcome of sledding your data is magnificent:
Consumers get to value faster. There is no need for any custom pipelines to be built, and by delivering the data to the consumer's platform of choice, consumers can begin working with the data right away and avoid building ingestion pipelines (pipelines to get the data onto their platform).
The consumer experience gets improved.
Sharing data is cheaper. There is no need to have a whole team maintaining a pipeline, on either the consumer or provider's side.
Lets providers meet requirements they couldn't do before due to the technical cost hence they can reach new customer segments.
Improved reliability, governance, and security.
By removing the friction of having to maintain pipelines, technical teams can focus on building new and improved data products.
Just Bobsled It.
That is the conclusion. The ultimate solution for data sharing is to sled your data through Bobsled.
Bobsled will help companies exchange data faster, get to value faster, come to decisions faster and help the world move forward at a faster pace.