Analyzing Data

At Emory University, responsible AI use in data analysis refers to the ethical, informed, and secure application of artificial intelligence tools in support of academic, research, business and operational objectives.

At Emory University, responsible AI use in data analysis refers to the ethical, informed, and secure application of artificial intelligence tools in support of academic, research, business and operational objectives. When applied thoughtfully, AI can aid in identifying patterns, organizing information, and enhancing insights. However, AI must not replace human expertise in evaluating data, violate data privacy regulations, or bypass legal and compliance requirements.

This guidance explains how to use AI for data analysis while safeguarding sensitive data, upholding academic integrity, and aligning with Emory's values. The following sections outline recommended practices and activities that should be avoided when applying AI in various data analytic contexts. 

Best Practices

Use only the minimum data required for your AI tool to produce its results. Remove extra identifiers, redundant variables, or unnecessary records before processing. Data minimization includes:

  • Using aggregated or de-identified data when PHI / PII details are unnecessary
  • Restricting data to relevant time periods or cohorts
  • Excluding variables unrelated to analytic objectives

Consult Emory’s data minimization or Electronic privacy information center to determine the appropriate data scope for analytical purposes.

De-identification is a required best practice for restricted data use, including when using Emory-approved AI tools.  Remove, mask, or substitute direct and indirect identifiers when possible as shown below.  

  • Replace names with pseudonyms or study codes

Example: "John Smith" → "Patient A1234" or "Subject_001"

  • Remove or generalize addresses

Example: "123 Maple Street, Peachtree City, GA 30269" → "303xx" (3-digit ZIP) or "Fayette County, GA"

  • Strip or generalize birthdates and other dates

Example: "1978-06-15" → "1978" or "Age 45–50"

Given AI’s ability to infer or cross-reference identities, minimizing identifiers is essential. Consult institutional resources such as the Winship Data and Technology Shared Resource, Office of Information Technology, Institutional Review Board or other public sources such as Data de-identification guidelines by California department of health care services for support in applying appropriate de-identification techniques.

Practices to Avoid

Do not rely solely on AI analysis for decision-making purposes

AI should support, not replace, human expertise in the decision-making process, this includes (but is not limited to) topic areas around legal, clinical, academic, or personnel decision making. AI-generated analyses should not dictate legal policy, patient care, or employee actions. AI systems can generate results that may be irrelevant and/or inexplicable. Therefore, it is encouraged to review, validate and interpret AI results carefully before drawing conclusion or taking action.

Additional Contacts and Support

If you have questions about responsible AI use in data analysis or need assistance with legal, compliance, or data governance, the following Emory offices can provide guidance: