T
traeai
登录
返回首页
Databricks

高质量的数据是AI策略的核心

7.8Score
高质量的数据是AI策略的核心

TL;DR · AI 摘要

高质量的数据是实现成功的AI策略的关键。

核心要点

  • 数据质量直接影响AI模型的性能。
  • 企业应重视数据治理和管理。
  • Databricks平台提供了统一的数据、分析和AI解决方案。

结构提纲

按章节快速跳转。

  1. 介绍数据质量对企业AI策略的重要性。

  2. 数据质量直接影响AI模型的性能。

  3. ·数据治理的重要性

    数据治理有助于提高数据质量和AI项目成功率。

  4. ·Databricks平台的优势

    Databricks平台提供了统一的数据、分析和AI解决方案。

  5. 通过实际案例展示高质量数据如何提升AI项目的成功率。

思维导图

用一张图看清主题之间的关系。

查看大纲文本(无障碍 / 无 JS 友好)
  • 高质量数据是AI策略的关键
    • 数据质量影响
      • AI模型性能
    • 数据治理重要性
      • 提高数据质量
    • Databricks平台优势
      • 统一数据、分析和AI解决方案

金句 / Highlights

值得收藏与分享的关键句。

#数据质量#AI策略#Databricks#数据治理
打开原文

Data quality is the AI strategy | Databricks Blog

Skip to main content

[![Image 1](blob:http://localhost/c3d26385bd032c882a09c45135533626)](https://www.databricks.com/)

[![Image 2](blob:http://localhost/c3d26385bd032c882a09c45135533626)](https://www.databricks.com/)

  • Why Databricks
  • * Discover
  • Customers
  • Partners
  • Product
  • * Databricks Platform
  • Integrations and Data
  • Pricing
  • Open Source
  • Solutions
  • * Databricks for Industries
  • Cross Industry Solutions
  • Migration & Deployment
  • Solution Accelerators
  • Resources
  • * Learning
  • Events
  • Blog and Podcasts
  • Get Help
  • Dive Deep
  • About
  • * Company
  • Careers
  • Press
  • Security and Trust
  • DATA + AI SUMMIT ![Image 3: Data+ai summit promo JUNE 15–18|SAN FRANCISCO Join us at the world’s largest data, apps and AI event. Register](https://www.databricks.com/dataaisummit?itm_source=www&itm_category=home&itm_page=home&itm_location=navigation&itm_component=navigation&itm_offer=dataaisummit)
  1. All blogs
  2. / Data Strategy

Table of contents

Table of contents

Table of contents

Data LeaderMay 13, 2026

Data quality is the AI strategy

Why an academic health system fixed data at the source before betting on AI

by Aly McGue

Summary

  • The guiding principle for high-quality AI is high-quality data, and that means fixing the transactional systems first.
  • Real-time clinical decision support is already preventing misdiagnoses in the emergency room.
  • The tools and models will keep changing. The organizations that focus on value creation with unified data will be the ones that benefit most.

Healthcare may be one of the greatest beneficiaries of AI. Few industries generate as much data, and few have as much to gain from extracting insight from it. But the gap between generating data and actually using it to improve care, accelerate research, and run operations more efficiently remains enormous in most health systems. The ones closing that gap are starting with data, not models.

NYU Langone Health, a leading academic health system, serves the greater New York area through patient care, medical research, and medical education. NYU Langone utilizes Databricks for its unified data and AI platform, having recently retired its on-premises data lake and is now migrating its enterprise data warehouse. The institution has built a broad community of clinicians, analysts, scientists, and members of the corporate workforce using the platform across care delivery, operations, and research.

Nader Mherabi, the Chief Digital and Information Officer at NYU Langone Health, has led the institution's data strategy well before the current wave of AI, building the foundations for a data-driven health system. In 2017, he recognized that the quality of NYU Langone's data collection and created an opportunity to push further with emerging AI capabilities.

The metaphor Nader returned to: If you want clean water, fix the pipes. Don't try to filter it at the end.

Fix your data quality at the source

Aly McGue:NYU Langone is a metrics-driven organization with a mature data stack. When you already have a functional warehouse and data lake, what is the 'missing piece’ that makes a move to a modern data platform necessary?

Nader Mherabi: Our path was a little different from some institutions. We've always been a highly data-driven, metrics-driven organization. We already had unified data in a data lake and an enterprise data warehouse, even in the traditional stack. So, the lift to a modern platform was easier for us than it might be for others.

But the imperative was clear. Back in 2017, we recognized that the potential of AI, even at that very early stage, meant we needed to modernize our data stack. It's one thing to build models. It's another thing to run them 24/7 in a safe, reliable way. We needed a platform that could help us realize our ambitions around patient quality, safety, efficiency, and medical research, and that could grow with us as the technology evolves.

One guiding principle we established over a decade ago is that if you really want high-quality data in your intelligence layer, you have to fix it at the transactional systems first. It's like water coming through the pipes. If you have clean water at the source, you don't have to keep filtering it at the end. Filtering dirty water is expensive. So, the goal should always be clean water first. Some things you'll still have to filter along the way, but the principle should be to get it right upstream.

Aly:How has the discipline of fixing data at the transactional level transformed the actual utility of your data layer?

Nader: Years ago, we had many systems with patient data scattered across multiple locations without unified identifiers. That's a huge challenge for data quality, and it limits what you can do with it. Part of our approach was to invest in common transactional platforms: One electronic health record and one ERP system. As we brought in new practices or hospitals, we invested in bringing everyone onto common platforms and then created guiding principles for data.

For example, we would never map data in the data warehouse layer. We always try to fix it at the source. We mastered the systems and the data so we know that this is the authoritative source for patient data, this is the source for financial data, this is the source for operational data. Once you do that, your data platform becomes much more meaningful. People can crosswalk data, which is critical in healthcare. Take a patient at the center: You need to connect their care data to what clinical trials are available, all the way through to the financial side, to specimens collected during surgery and where they physically sit. If you don't have that mapping, you're missing an enormous capability. The guiding principle that makes it possible is always the same: Fix it upstream.

What unified data actually unlocks

Aly:In healthcare, the stakes for data accuracy are high. How does a unified data foundation prevent the 'conflicting metrics’ debate between different departments, and why is that trust so critical when moving toward agentic AI systems?

Nader: It's huge. Even before AI, the gains from unified data were enormous. When your data is unified, you can create better metrics, and different sides of the business aren't coming in saying, "That number doesn't make sense." If your data isn't unified, your metrics will never line up.

With AI, of course, the stakes go up. If you don't have great data, you're not going to have great AI. Performance depends on data quality. And then there's the real-time dimension. Getting people's insight at the right time and the right place is what matters.

Unified governance is a strategic AI imperative

Aly McGue: Once you have unified data, the next challenge is making it discoverable and trustworthy at scale. How does data governance fit into that?

Nader Mherabi: It's fundamental. You need a catalog to operate on data and AI models. We use Unity Catalog, and we're continuing to push it further.

But the investment is not just in the tool, it's the strategy around it. You need to define your master data sources, decide who owns each part of the catalog, and then carefully consider how you expose it to the broader community so people can find what they need without duplicating work. It's one thing to have an enormous data program. It's another for people to actually find the right data within it. If you're adopting a platform like this, I would always suggest getting the catalog right from the start. It underpins everything else.

Building a data-literate community

Aly McGue:A unified platform only delivers value if people across the institution actually use it. How have you approached building that community beyond the data engineering team?

Nader: When you invest in a platform like this, you have to optimize the investment. For us, that means evangelizing what it can do across the institution. The goal is to become a learning health system, one that learns from every patient interaction and feeds that insight back into practice. That only works if the community using the platform extends well beyond IT. We've built a broad user base of clinicians, analysts, and scientists, all working within proper access controls, and we've invested in literacy programs and training to make sure people across care delivery, operations, and research can take advantage of it. Getting IT on the platform is a given. The real measure of success is whether the rest of the institution can use it, too.

Real-time insight where it matters most

Aly:In a high-acuity environment like an Emergency Room, 'insight the day after’ is effectively useless. What are the architectural requirements for a platform to move from retrospective reporting to real-time clinical decision support that can actually prevent a misdiagnosis?

Nader: In care delivery, the impact is direct. We have models running in the emergency room that look for certain critical conditions and provide decision support in front of clinicians. The goal is to make sure that if a patient is being discharged, the system can flag: did you identify this diagnosis? Did you look at this? Because what we don't want is a patient leaving the emergency room with a condition that could have severe consequences if it's missed.

We all hear about cases at other institutions where a misdiagnosis leads to a bad outcome. We want real-time models that continuously run and provide the best advice to clinicians. Not replacing their judgment, but saying, "Hey, you may have overlooked this. Please take another look." For that to work, the models need real-time data. And that requires the data platform to support real-time feeds so the models can operate on current information and provide just-in-time insight.

Three layers of data analytics

Aly:How has AI transformed how your organization approaches analytics and BI strategy?

Nader: I believe analytics is three layers. First, you do have to provide some basic visualization. You can't just say, "What do you want to look at?" People need some structured starting points. Second, you add the conversational layer, tools like Genie, where people can get curious and ask deeper questions. And third, you need to be able to deliver the answer in different forms depending on the user: Sometimes it's a direct fact, sometimes it's a visualization, and sometimes it's a few numbers on a screen.

What's powerful about where we are now is that for the first time in human-machine history, we can actually talk to machines in human terms, the way you'd ask a colleague. That clearly has a place. But I'd advise everyone to think about where it makes sense and to what degree. Don't replace your visualization entirely. Add the conversational layer so people can get curious, ask more questions, and help themselves in a simple way.

Aly: The pace of AI development can be paralyzing for many leaders. How do you balance the need for a stable long-term strategy with the reality that the technology might look completely different six months from now?

Nader: First, accept the unpredictability of AI. You're going to wake up tomorrow, and something new will have arrived. The tools and technology will continue to change. Don't get hung up on that. Find good partners who can grow their platform as part of the change, and focus on value creation.

Whether you're delivering safe, high-quality care, improving operational efficiency, or making the patient experience better, that's the value. Go after it with the capabilities that exist today, and then continue to evolve. And the other piece is to educate yourself. Part of what makes people hesitant is that they don't feel like they understand what's happening. You have to stay in the know as best you can, because that helps you make better decisions as the market evolves, especially at the pace it's moving now.

Closing Thoughts

NYU Langone's early and intentional approach is the key takeaway from this discussion. The clean water metaphor captures something important. Organizations that invest in filtering dirty data downstream are always playing catch-up. The ones that fix it at the transactional layer, even though it takes longer and costs more upfront, build a foundation that every subsequent investment, from analytics to AI to real-time clinical decision support, can reliably build on. In a setting where the stakes are patient safety, that discipline isn't optional.

To hear from industry leaders and define your path to operationalizing AI, download the Economist Enterprise report, “Making AI Deliver.”

Get the latest posts in your inbox

Subscribe to our blog and get the latest posts delivered to your inbox.

Sign up

*

Work Email

*

Country Country*

By clicking “Subscribe” I understand that I will receive Databricks communications, and I agree to Databricks processing my personal data in accordance with its Privacy Policy.

Subscribe

View all blogs

Image 4: databricks logo

Why Databricks

Discover

Customers

Partners

Why Databricks

Discover

Customers

Partners

Product

Databricks Platform

Pricing

Open Source

Integrations and Data

Product

Databricks Platform

Pricing

Open Source

Integrations and Data

Solutions

Databricks For Industries

Cross Industry Solutions

Data Migration

Professional Services

Solution Accelerators

Solutions

Databricks For Industries

Cross Industry Solutions

Data Migration

Professional Services

Solution Accelerators

Resources

Documentation

Customer Support

Community

Learning

Events

Blog and Podcasts

Resources

Documentation

Customer Support

Community

Learning

Events

Blog and Podcasts

About

Company

Careers

Press

Security and Trust

About

Company

Careers

Press

Security and Trust

Image 6: databricks logo

Databricks Inc.

160 Spear Street, 15th Floor

San Francisco, CA 94105

1-866-330-0121

  • [](https://www.linkedin.com/company/databricks)
  • [](https://www.facebook.com/pages/Databricks/560203607379694)
  • [](https://twitter.com/databricks)
  • [](https://www.databricks.com/feed)
  • [](https://www.glassdoor.com/Overview/Working-at-Databricks-EI_IE954734.11,21.htm)
  • [](https://www.youtube.com/@Databricks)
Image 8

See Careers

at Databricks

  • [](https://www.linkedin.com/company/databricks)
  • [](https://www.facebook.com/pages/Databricks/560203607379694)
  • [](https://twitter.com/databricks)
  • [](https://www.databricks.com/feed)
  • [](https://www.glassdoor.com/Overview/Working-at-Databricks-EI_IE954734.11,21.htm)
  • [](https://www.youtube.com/@Databricks)

© Databricks 2026. All rights reserved. Apache, Apache Spark, Spark, the Spark Logo, Apache Iceberg, Iceberg, and the Apache Iceberg logo are trademarks of the Apache Software Foundation.

We Care About Your Privacy

Databricks uses cookies and similar technologies to enhance site navigation, analyze site usage, personalize content and ads, and as further described in our Cookie Notice. To disable non-essential cookies, click “Reject All”. You can also manage your cookie settings by clicking “Manage Preferences.”

Manage Preferences

Reject All Accept All

Image 11: Databricks Company Logo

Privacy Preference Center

Opt-Out Preference Signal Honored

Privacy Preference Center

  • ### Your Privacy
  • ### Strictly Necessary Cookies
  • ### Performance Cookies
  • ### Functional Cookies
  • ### Targeting Cookies
  • ### TOTHR

#### Your Privacy

When you visit any website, it may store or retrieve information on your browser, mostly in the form of cookies. This information might be about you, your preferences or your device and is mostly used to make the site work as you expect it to. The information does not usually directly identify you, but it can give you a more personalized web experience. Because we respect your right to privacy, you can choose not to allow some types of cookies. Click on the different category headings to find out more and change our default settings. However, blocking some types of cookies may impact your experience of the site and the services we are able to offer.

#### Opting out of sales, sharing, and targeted advertising

Depending on your location, you may have the right to opt out of the “sale” or “sharing” of your personal information or the processing of your personal information for purposes of online “targeted advertising.” You can opt out based on cookies and similar identifiers by disabling optional cookies here. To opt out based on other identifiers (such as your email address), submit a request in our Privacy Request Center.

More information

#### Strictly Necessary Cookies

Always Active

These cookies are necessary for the website to function and cannot be switched off in our systems. They assist with essential site functionality such as setting your privacy preferences, logging in or filling in forms. You can set your browser to block or alert you about these cookies, but some parts of the site will no longer work.

#### Performance Cookies

  • [x] Performance Cookies

These cookies allow us to count visits and traffic sources so we can measure and improve the performance of our site. They help us to know which pages are the most and least popular and see how visitors move around the site.

#### Functional Cookies

  • [x] Functional Cookies

These cookies enable the website to provide enhanced functionality and personalization. They may be set by us or by third party providers whose services we have added to our pages. If you do not allow these cookies then some or all of these services may not function properly.

#### Targeting Cookies

  • [x] Targeting Cookies

These cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant advertisements on other sites. If you do not allow these cookies, you will experience less targeted advertising.

#### TOTHR

  • [x] TOTHR

Cookie List

Consent Leg.Interest

  • [x] checkbox label label
  • [x] checkbox label label
  • [x] checkbox label label

Clear

  • - [x] checkbox label label

Apply Cancel

Confirm My Choices

Allow All

Image 12: Powered by Onetrust
Image 14
Image 15

Image 16Image 17

AI 可能会生成不准确的信息,请核实重要内容