Every data scientist has an origin story, and almost none of them are tidy. Mine begins two decades before I’d have called myself anything close to a data scientist — writing Java front-ends and back-ends, hand-rolling SQL queries and stored procedures, and pushing rows onto a web page. I didn’t know it then, but those were my first quiet touchpoints with the “data” half of data science.

01The accidental beginning

In the early 2000s, “analytics” meant a well-written GROUP BY and a report nobody read twice. I was an applications engineer: build the form, persist the record, render the result. The craft was in the query plan, not the inference. What that decade quietly taught me — and what I’d undervalue for years — was a feel for how data is shaped, stored, and moved. That instinct turns out to be 80% of the job long before any model enters the room.

The unglamorous years — schemas, joins, file formats — were the apprenticeship. The science came later; the data literacy came first.

02Where the spark actually came from

The urge to understand the science behind the data arrived in 2015. I had just wrapped a deputation setting up Bosch’s first offshore development centre outside India, in Vietnam, and came back to a Program Manager role for the Safety division. If you’ve never thought about how a car decides to fire an airbag, here’s the surface area I inherited:

Active safetyBrake Assist, Electronic Stability Program, standstill & speed control
Passive safetyAirbag deployment logic and timing
Drive assistanceRADAR, adaptive cruise control, and the early ADAS stack

My mandate was a roadmap to simulate every one of these functions — and each breaks into dozens of smaller components. Picture the data exhaust: input files in a zoo of formats and extensions, wildly different sizes, scattered across storage locations, fed through a chain of tools we politely called a “toolchain.” It was, in hindsight, a textbook big-data problem. I just didn’t have the vocabulary for it yet.

Then a friend working in data storage at another multinational introduced me to the words “Big Data” and “Data Science.” Suddenly my simulation mess had a name. I worked through a stack of O’Reilly books and applied what little I understood to the component analysis. It was clumsy — but it was the first time I was modelling, not just querying.

03Patents turn out to be big data in disguise

In May 2016 I took a cross-functional leap into Bosch India’s Intellectual Property division — building patent-analytics case studies and maintaining the IP portfolio. Within weeks the realisation landed: this was another big-data problem hiding in plain sight. Filings, examiners, technology domains, durations, attorneys — all of it structured, all of it predictive, none of it being treated that way.

That conviction sent me to a Postgraduate Program in Business Analytics & Business Intelligence at Great Lakes Institute, in collaboration with The University of Texas at Austin. Most of my cohort were junior, budding data scientists; I was the operations guy who’d arrived from the other direction. That difference in vantage point became my edge — I wasn’t looking for a model to admire, I was looking for a decision to change.

The mindset shift
  • Stop asking “what model should I use?” and start asking “what decision am I trying to change?”
  • Domain context isn’t a nice-to-have — it’s what tells you which correlations are real.
  • The best reinforcement of new knowledge is shipping it against a target someone cares about.

04Proof of work: one patent every working day

Theory is cheap. The test came as a brutal target: grow the IP portfolio by more than 50% — roughly one patent per working day. I had about 8 GB of data spanning five years and ten Bosch divisions to work with. Here’s how I approached it:

First, a decision tree for the initial drill-down — to see where filings clustered and where they stalled. Then I layered Lasso regression on top to estimate the average duration of each phase of the filing pipeline. I built in Python on JupyterLab and cross-checked everything in R (RStudio) — R being the friendlier home for someone strong in statistics but light on programming.

Why two tools? Python was my build environment; R was my second opinion. When a stats-heavy result survives a re-run in a different ecosystem, you trust it enough to put it in front of leadership.

The insights were the kind you only get by combining the model with domain knowledge: emerging technologies took far longer to clear expert evaluation, while proven domains turned around quickly — and a handful of patent attorneys were measurably more productive than the rest. We stood up a task force, turned those findings into actions, and hit the target. Bosch India recognised the work with a Hall of Fame award under the Innovations category.

“Decision trees first, Lasso regression second — on 8 GB across five years and ten divisions. The model found the bottleneck; the task force removed it.”

05Connected mobility, and data as a revenue line

Today I lead Engineering Services and Product Operations for Connected Mobility at Bosch India. From day one the brief has been the same: find use cases that translate into business value — new revenue for the customer, lower cost for the organisation. A few that shipped:

Predictive service maintenance99% accuracy — flag a service need before the customer feels it
Component failure prediction90% accuracy — catch failures upstream of a field return
Ticket auto-classificationRoute and triage support load without a human first-pass
Infrastructure optimisationRight-size cloud and compute against real demand curves

And none of it matters if no one can see it. Visualisation does as much persuading as the model does predicting — I’ve leaned on Tableau for years to turn dashboards into decisions. A model that can’t be read by the person who signs off the budget is a model that doesn’t ship.

Staying current without drowning

The field moves fast enough that standing still is moving backward. I keep a low-effort, high-signal habit: subscribe to a few data-science publications, read widely, and actually try new packages and IDEs around Python and R. I’m active in intra-org forums like the Tableau Workgroup and an AIoT Expert Group — and I keep it playful, too. I installed Juno on my iPad Pro to crunch small personal datasets, like the sleep patterns my watch’s Pillow app quietly collects. Curiosity is a muscle; small reps keep it warm.

06If you’re standing where I stood

People ask how to “break into” data science as though there’s a door. There isn’t — there’s a loop, and you can step into it from wherever you already are:

The loop that worked for me
  • Find your initial motivation — a real problem in your current work, not a tutorial dataset.
  • Implement on a sample use case — small, scoped, yours.
  • Look for results — a number that moves, a decision that changes.
  • Realise the benefit — show it to someone who cares about the outcome.

Run that loop once and the subject stops being a syllabus and starts being a craft. You’ll fall in love with it — and that’s when the journey actually begins. Data, and the science in it, has become part and parcel of my life, and I genuinely feel it has only just started. There’s a long way to go, and I’m enjoying every bit of it.

Wishing you a very happy data science journey.