Analyzing Data
At Emory University, responsible AI use in data analysis refers to the ethical, informed, and secure application of artificial intelligence tools in support of academic, research, business and operational objectives.
At Emory University, responsible AI use in data analysis refers to the ethical, informed, and secure application of artificial intelligence tools in support of academic, research, business and operational objectives. When applied thoughtfully, AI can aid in identifying patterns, organizing information, and enhancing insights. However, AI must not replace human expertise in evaluating data, violate data privacy regulations, or bypass legal and compliance requirements.
This guidance explains how to use AI for data analysis while safeguarding sensitive data, upholding academic integrity, and aligning with Emory's values. The following sections outline recommended practices and activities that should be avoided when applying AI in various data analytic contexts.
Best Practices
Use only the minimum data required for your AI tool to produce its results. Remove extra identifiers, redundant variables, or unnecessary records before processing. Data minimization includes:
- Using aggregated or de-identified data when PHI / PII details are unnecessary
- Restricting data to relevant time periods or cohorts
- Excluding variables unrelated to analytic objectives
Consult Emory’s data minimization or Electronic privacy information center to determine the appropriate data scope for analytical purposes.
De-identification is a required best practice for restricted data use, including when using Emory-approved AI tools. Remove, mask, or substitute direct and indirect identifiers when possible as shown below.
- Replace names with pseudonyms or study codes
Example: "John Smith" → "Patient A1234" or "Subject_001"
- Remove or generalize addresses
Example: "123 Maple Street, Peachtree City, GA 30269" → "303xx" (3-digit ZIP) or "Fayette County, GA"
- Strip or generalize birthdates and other dates
Example: "1978-06-15" → "1978" or "Age 45–50"
Given AI’s ability to infer or cross-reference identities, minimizing identifiers is essential. Consult institutional resources such as the Winship Data and Technology Shared Resource, Office of Information Technology, Institutional Review Board or other public sources such as Data de-identification guidelines by California department of health care services for support in applying appropriate de-identification techniques.
Practices to Avoid
Do not rely solely on AI analysis for decision-making purposes
AI should support, not replace, human expertise in the decision-making process, this includes (but is not limited to) topic areas around legal, clinical, academic, or personnel decision making. AI-generated analyses should not dictate legal policy, patient care, or employee actions. AI systems can generate results that may be irrelevant and/or inexplicable. Therefore, it is encouraged to review, validate and interpret AI results carefully before drawing conclusion or taking action.
Additional Contacts and Support
If you have questions about responsible AI use in data analysis or need assistance with legal, compliance, or data governance, the following Emory offices can provide guidance:
- Office of Chief Data Analytics Officer for guidance on appropriate data use and minimization practices.
- Office of Information Technology (OIT) for secure AI tool use and technical support
- Office of Compliance
- Office of General Counsel (OGC)
- Institutional Review Board (IRB) for research ethics and regulatory compliance