The beauty of good story telling is its applicability to the most unexpected situations. In 1871, Lewis Carroll wrote about the evil Queen trying to convince Alice to work for her, with a promise of “jam to-morrow and jam yesterday – but never jam to-day.” Little did he know that this one statement would be used by economists, politicians, playwrights, and musicians long after he wrote it – it's time to add data analysts to the list.
In the beginning it was a hype. Then it morphed into a nice-to-have, and now, it’s mandatory. I am referring to the need for data-driven decision making. If necessity is the mother of all inventions, then the data deluge of the past decades birthed big data tools like Hadoop and S3. But what good is data if it’s under lock and key? Thus was born the art and science of predictive analytics and reporting.
Data teams were set up to engineer the infrastructure for getting and storing data. All this while businesses were rubbing their hands in glee. Finally, they would have the magic bullet to kill their competition, drive up their stock price, and end hunger (if not the world’s, certainly their executives’), and it was all because of the unending and overflowing buckets of data they could use. For once in their lives, technology teams were more than the help desk and code monkeys, they were the custodians of data and the business’s new best friend.
If the SVP of Marketing wants a report to decide where they should spend the next dollar, it’s important to know where effort was put in the past, how it failed or succeeded, and what the ROI on those Super Bowl ads was. They know they have the data to help them, but they don’t know how to access it easily. Sure enough, what follows is a long chain of emails between the marketing team and the data custodians (yes, the same IT that was rarely acknowledged in the past). Marketing wants a report on customer segmentation, but the data support team doesn’t know the difference between a customer and a segment. There is back and forth, with explanations and meetings to work together. The report is finally generated and sent back to the SVP, who by this time has shorted their stocks in the company (apparently that customer segmentation report was crucial in keeping the business alive) and moved to a small start-up/an island in the Bahamas/the competition.
You have the data, you have the people who can analyze it effectively because they know the business, but you don’t have a self-serve platform to support them? It’s akin to admitting there is a virus, it can be stopped by using a mask, I have a mask, but I don’t know how to wear it.
At the highest level, inefficient data utilization typically falls into three categories:
The data technology teams are gatekeepers of the data assets. If the business wants a report or a data cut, they send a request to a data technologist and wait with bated breath for a quick turnaround. There is no collaboration and open communication between the requestor and the provider. The requestor is entirely dependent on the provider for what they need.
The requestor and the provider have reached a truce. The providers know what the requestors need, when, and how. The providers automate the request, make an ETL script to run resource-exhaustive and time-consuming jobs on the third business day of every month, generate exactly what was asked for, and right before shipping the reports to the requestor, they get asked if the latest round of reports could also include another computed column. With clenched fists and gritted teeth, the provider changes the script, runs the ETL again, and round and round the cycle goes.
The requestor is no longer willing to engage the provider. They don’t care for the wait (or the attitude?). Shadow IT is born. The requestor makes a “friend” on the provider's team. Data requests are sent through back channels and received under the radar. Suddenly, there are redundant models, inconsistent KPIs, fragmented analytics, and the worst form of siloes one can imagine.
DataOps is an operational function that controls the data’s journey from source to value. As a design and value-maximization principle, it allows organizations to overcome data-sharing inefficiencies that have seeped their way into the end-to-end data utilization process.
Laying the foundation of Agile DataOps and then maturing it is not a simple task, but that’s the case with all good things. Instead of looking at the cost and effort, focus on the value the organization may be able to derive from the result.
A sign that your DataOps practice is maturing can be judged by the change in the WHO, WHAT, WHERE, and HOW of the data being managed.
WHO: The business replaces IT as the primary executors of actions on data. This becomes particularly evident when the business becomes responsible for data wrangling as they have better understanding of the business domain.
WHAT: Instead of working with transactional data, the business starts working with interactional and behavioral data. The capability of being able to chain disparate events to create deeper insight into customer activity allows for strategizing of new ideas that can keep their engagement high.
WHERE: Being exclusively on-prem is being relegated to the pages of infrastructure history. Hybrid and multi-cloud is gaining dominance and will continue to do so as cost, flexibility, and reliability become paramount concerns for the new digital normal.
HOW: Siloes are replaced by iterative enhancements through collaborative efforts between the requestors and providers of data.
An expensive platform is not necessary to start thinking about DataOps. The foundation of DataOps is based on the principles of right people, right communication, right time, and right processes.
Rome wasn’t built in a day and neither will your DataOps practice, but as it matures, it will implicitly induce positive cultural changes in the organization and make collaboration and innovation table stakes.
IBM is changing the terms of its ubiquitous Passport Advantage agreement to remove entitled discounts on over 5,000 on-premises software products, resulting in an immediate price increase for IBM Software & Support (S&S) across its vast customer landscape.
PHEMI is a data privacy solution focused on keeping data-processing activities secure by redacting information based on the role of the accessor. Thus, allowing such data to be used for multiple use cases without compromising privacy.
Board International follows the trend of delivering solutions by opening a solution marketplace while strengthening customer trust by getting SOC-2 and SOC-3 certifications.
Databricks, a data processing and analytics platform with a strong focus on AI and ML, has partnered with Immuta to deliver automated end-to-end data governance for AI, data science, and ML projects.
There’s a proliferation of AI-driven/AI-powered/AI-[insert-your-own-favorite-verb-here] tools and products on the market, because AI – and its underlying technology, machine learning – is sexy and it sells. (And, in some cases, delivers.) We decided to take a look at one of the vendors, AnswerRocket.
AWS Lake Formation makes it easier for users to set up and manage data lakes. But organizations will face challenges in determining how to derive value from their data lakes.
Data cataloging has significant value but remains a challenge for many organizations. Unifi’s AI-based data cataloging tool enables discovery and cataloging of data sets and metadata. Dell Boomi’s data integration tool recently acquired Unifi to manage customers’ unknown data better.
DataOps is a complex topic. Industry leaders such as LinkedIn, Airbnb, and Uber have created their own operations platforms. Without open standards currently in place, Cloudera promises to incorporate an open standard in its upcoming product feature.
ThoughtSpot demonstrates a new approach for enabling customers to build analytics solutions: facilitate integration with other vendors rather than trying to build/sell its own components (e.g. database, data acquisition).