Within the quickly rising discipline of information engineering, restructuring knowledge pipelines has change into elementary to driving enterprise development and operational effectivity. Manohar Sai Jasti, Software program Growth Engineer at Workday, shares his journey of implementing progressive options and guaranteeing scalability in knowledge pipelines. On this interview, we discover his experiences and insights into reshaping knowledge pipelines to empower companies with data-driven decision-making.
What are some key tasks involving knowledge pipeline restructuring, and what outcomes did you obtain?
Once I was engaged at Stord, a number one cloud provide chain, and achievement platform, I used to be the only knowledge engineer there. My accountability was to guide a number of essential tasks that reshaped our knowledge infrastructure. One of the crucial vital initiatives was the Log-Based mostly Replication (LBR) Migration venture, which I spearheaded in collaboration with our Website Reliability Engineering (SRE) group.
Earlier than this venture, we confronted substantial knowledge discrepancies between our supply system and BigQuery. They have been resulting in inefficiencies and slower knowledge updates, so the migration yielded outstanding outcomes.
To be exact, we achieved annual price financial savings of $72,000 per 12 months, equating to $6,000 per thirty days. The information discrepancies have been virtually eradicated and decreased by nearly 100%. Knowledge refresh charges have been additionally improved by no less than 30%.
This venture has been an enormous enterprise and has impacted the entire main datasets for each Stord One Commerce and Stord One Warehouse, that are cloud-based order administration and warehouse administration merchandise. Because of the outstanding outcomes, I used to be acknowledged and awarded for “Efficiency Driver”.
One other key venture was the Vital Orders Dataflow Enhancement. I owned this important knowledge movement the place the aim was to consolidate info throughout Stord’s legacy and new programs. This venture considerably improved our knowledge aggregation and reporting capabilities. Its principal benefit was offering logistics clients with detailed and correct insights into their provide chain operations.
Moreover, I accomplished all data-end migrations from Veracore to Stord One Commerce, which was an enormous buyer obsession win. This migration improved operational effectivity, grew income, and enhanced our services.
At present, as an Analytics Engineer at Workday since Might 2024, I’m concerned in growing and sustaining strong knowledge transformation pipelines. I’m a part of the Efficiency, Resilience, and Scalability (PRS) Engineering Instruments Group. My position includes creating a whole knowledge pipeline, from knowledge warehouse to knowledge science functions, empowering Workmates with data-driven choices at their fingertips.
Right here, I’ve been extensively leveraging DBT, the info construct instrument, to boost our FinOps practices and create fashions that ingest and rework billing knowledge from numerous cloud suppliers. This work has improved our skill to research prices throughout our multi-cloud infrastructure, offering precious insights for useful resource allocation and spend optimization.
Knowledge product governance is essential for stopping siloed improvement and guaranteeing constant, high-quality knowledge property throughout a company. In my present position at Workday, I’ve been addressing this problem by implementing complete knowledge governance practices for our knowledge merchandise utilized by the analysts, knowledge scientists and so on, by means of cross-functional collaboration, standardization, entry administration, knowledge pipeline life cycle administration, and so on.
Scalability and adaptability are cornerstones of any strong knowledge infrastructure. How do you guarantee your programs can scale seamlessly whereas supporting enterprise development?
Scalability and adaptability are certainly crucial at our job, particularly at Stord. The matter is that we have now quickly expanded our cloud provide chain providers, and to assist this development additional and be sure that all new options are versatile, I centered on a number of key areas.
The primary was question efficiency enhancements. I corrected our knowledge infrastructure by strategically separating truth tables. Actually, I can boast that this restructuring dramatically enhanced question efficiency and optimized knowledge retrieval processes for Stord’s complicated logistics operations.
One other key space was the transition to DBT (Knowledge Construct Software). I moved essential knowledge processing logic that powers most of our dashboards from conventional saved procedures to DBT. This has introduced comparatively fruitful outcomes—the general operational effectivity and alerting programs have been improved. Because of that, it has change into simpler to adapt to new necessities with out repairing the whole system.
Complete alerting and monitoring have been additionally an space of precedence. I carried out 100% alerting and monitoring throughout all pipelines and important processes. This resulted in minimized knowledge downtime and improved skill to reply rapidly to points.
In my present position at Workday, I proceed to give attention to scalability and adaptability. I make the most of a spread of instruments, together with DBT, Trino/Presto, Jupyter Notebooks, Python, Apache AirFlow, AWS RDS, MySQL/Postgresql, and Git for knowledge processing and evaluation.
What steps have you ever taken to modernize knowledge processing workflows, and the way have these enhancements impacted effectivity and accuracy?
At Stord, probably the most impactful adjustments I made when it comes to modernizing knowledge workflows was the Log-Based mostly Replication Migration. It solved knowledge accuracy points, improved refresh charges, and minimize prices, which helped us present real-time insights into logistics operations.
I additionally launched DBT to handle essential knowledge processes. This allowed us to deal with knowledge extra effectively and made it simpler for group members to work collectively on updates.
One other venture concerned enhancing how we deal with grasp order knowledge. These updates gave us a clearer image of warehouse actions and made our reviews extra precious for patrons.
At Workday, I’ve centered on multi-cloud infrastructure, creating pipelines that guarantee correct and up-to-date knowledge for price evaluation. These enhancements have helped groups make choices quicker and with extra confidence.
Let’s discuss innovation—how have automated monitoring and machine studying formed your strategy to managing knowledge?
At Stord, innovation was all about staying forward in how we managed knowledge. One main enchancment was introducing automated monitoring and alerting for all pipelines. With 100% protection, we may catch and repair points earlier than clients have been affected. This was particularly helpful in guaranteeing correct logistics monitoring and reporting.
I additionally labored on enhancing our alerting system to give attention to issues like stale or duplicate knowledge. These enhancements helped us keep excessive knowledge high quality and improved buyer belief in our analytics.
At Workday, I’ve continued to prioritize innovation by growing instruments and processes that make our knowledge merchandise higher. For instance, I’m engaged on enhancing alerting programs to establish points quicker and create smoother workflows for our groups.
Talking about present tendencies, machine studying is now remodeling virtually each data-driven enterprise. Are you able to share the way you’ve built-in machine studying into knowledge processing and its affect on analytics high quality and timeliness?
Throughout my time at Stord, I used to be concerned in exploring machine studying applied sciences’ integration into our knowledge processing. Considered one of my key tasks was constructing an AI-powered chatbot in collaboration with cross-functional groups. This chatbot used generative AI to deal with analytical queries, permitting customers to ask questions in plain language and get SQL-based solutions rapidly.
We additionally added error-handling mechanisms that helped the chatbot study and enhance over time. This not solely decreased response occasions for ad-hoc queries but additionally gave our groups quicker entry to the info they wanted.
At Workday, I’m making use of this expertise to construct a information bot that makes use of generative AI. The bot is designed to assist customers ask questions on use analytics instruments, reducing down the necessity for documentation and offering real-time assist. It’s an thrilling venture that’s making analytics simpler and quicker for everybody concerned.
As we wrap up, what hurdles did you face throughout tasks like log-based replication, and the way did you overcome them?
The Log-Based mostly Replication Migration at Stord had its share of challenges. The principle technical hurdle was the complexity of provide chain knowledge. It was additionally vital to combine the brand new system with out disrupting ongoing logistics operations.
We typically bumped into sudden issues—what we referred to as “black swan” points—after making updates to grasp orders logic. These required deep troubleshooting and teamwork to resolve.
To deal with these challenges, I made certain to check totally at each step. I labored intently with the SRE group to resolve technical issues and collaborated with stakeholders to maintain everybody aligned on objectives.
In my present position at Workday, I’ve confronted completely different challenges associated to multi-cloud infrastructure. For instance, guaranteeing knowledge accuracy throughout completely different cloud platforms is essential. To unravel this, I constructed checks to validate knowledge and created a system to flag stale knowledge earlier than it affected clients. This proactive strategy has helped guarantee our analytics are all the time dependable and up-to-date.