Gone are the days of having a single SQL database to manage all of your organization’s information. In today’s data-saturated age, more storage opportunities emerge to meet these rapidly changing needs.
You may have heard the term NoSQL tossed around, but what does it mean? And what can it do for you? How can a stakeholder know when one option will be more effective over the other, and what should you choose for your business? Those are the topics we will cover in this article.
The old standby of data storage since the 1970s, SQL databases store information in a relational fashion. This means the data has a relationship to other data in the database. For example, a class directory for a school might have tables for classrooms, students, teachers, and more.
SQL (or Structured Query Language) is used to describe these objects and their relations to one another. While SQL is versatile enough to create complex queries and is widely-used and tested, things start to break down if you need to add more fields or a different structure down the line.
Since SQL requires predefined schemas of information, new types of information or ill-formatted data will grind the system to a halt.
When you start getting more and more data, as many companies these days are, you have to find out a way to scale up. To scale a SQL database you need to add more resources to the server. This is called vertical scaling.
To scale vertically, you must increase system resources such as RAM, storage space, or CPU. If you’re hosting your database on a cloud server like AWS, this can get expensive very quickly.
These databases are best utilized with structured data such as that from our school example. Other datasets could hold weather information, inventory management data, or stock prices.
As the variety and type of data we produce changes, so too do the tools we use to contain that data.
NoSQL databases focus on storing collections of unstructured data. Many APIs return JSON documents that are essentially lists of key-value pairs. The structure changes over time and data is coming in rapidly, maybe even in real-time. This type of data doesn’t fit easily into a traditional relational database, but you need somewhere to easily store and access the information. So what do you do?
Enter the NoSQL database.
True to its name, NoSQL databases eschew the SQL language and format in favor of more flexible storage. Data is stored in a more amorphous fashion that allows for greater scalability and real-time data ingestion.
There are four main types of data held by NoSQL databases:
- Key-value pairs
- Wide columns
The benefit of these different types means that you don’t have to have a defined schema or format before starting to ingest data. This cuts down on maintenance or upgrades down the road to add new types or structure.
NoSQL databases scale horizontally instead of vertically. This is done by a process called “sharding”, where the database’s storage is split over multiple servers. While sharding is possible with SQL databases it takes a lot more work and maintenance, while on many NoSQL stores this comes enabled by default.
NoSQL databases work well for lots of varied, unstructured data. If you need to hold incoming sensor data or API responses, as two examples, NoSQL would be most effective. They can also be ideal for very, very large datasets (tens or hundreds of terabytes or more) because while there is a theoretical upper limit on how much you can increase one system’s resources you can add machines over and over.
The Big Question
So when should you stick with a relational database versus trying out a NoSQL solution? First, ask yourself how your data is structured. If it is in a fairly 2-dimensional (flat) format and has strong relations with other data in your dataset, consider a SQL database.
If you’re dealing with variable data that changes in format or a key-value store like JSON or XML, then give a NoSQL solution a try.
Here are some other basic criteria you want to look at when evaluating a new data storage solution:
- structure of your data
- the volume of your data
- whether you anticipate this dataset to grow significantly in the future
There are many different options out there today, but you now know the major types of data stores and where to apply them.