본문 바로가기
반치용/문제해결(trouble shooting)

[파이썬]dicom 파일 비식별화 예제 (기본)

by  반  2020. 5. 28.

필요사항 : python 3.7이상

관련 패키지 설치
  pip install pydicom

  pip install tqdm

코드

import os
import pydicom
#from tqdm import tqdm_notebook
from tqdm import tqdm


# get dcm_file_list
def get_file_list() :
    try :
        list_path = []
        list_file = []
        list_full = []   
        
        for (path, _, file) in os.walk('.\\'):
            for each_file in file:
                if each_file[-4:] == '.dcm':
                    list_path.append(path)    
                    list_file.append(each_file)
                    list_full.append(os.path.join(os.getcwd(),path,each_file).replace('.\\',''))
        return list_full
    except : 
        return 'get_file_list error.'    
        
  # main de-identifier
def de_identifier(opt_each_file):

    for filename in tqdm(get_file_list()):
        try:
            Metadata = pydicom.filereader.dcmread(str(filename))
        except: return 'de_identifier // file reading error. '
        
        try:            
            # de-identify
            Metadata.PatientName = 'Anonymized'
            Metadata.PatientBirthDate = 'Anonymized'
            Metadata.PatientSex = 'Anonymized'
            Metadata.OtherPatientIDs = 'Anonymized'
            Metadata.PatientAge = 'Anonymized'
            Metadata.RequestingPhysician = 'Anonymized'
            Metadata.InstitutionName = 'Anonymized'
            Metadata.InstitutionAddress = 'Anonymized'
            Metadata.ReferringPhysicianName = 'Anonymized'
            Metadata.StationName = 'Anonymized'
            Metadata.PhysiciansofRecord = 'Anonymized'
                        
            Metadata.save_as(str(filename))
            
                # TODO - revive
                # sql_query(True)  
            if opt_each_file == 1 :
                print(f'\[complete\] {filename}')
        
        except:            
            
                # TODO - revive
                # sql_query(False)  
                return 'de_identifier error'
    print('de_identified.')
    
    # run

de_identifier(0)

# de_identifier() [run] -> get_file_list() [get dcm filelist] -> de_identifier() [replace each attribute] -> each replace function [] -> de_identifier()

파일이 위치한 폴더 및 하위 폴더에 있는 모든 dcm header 파일을 비식별화하는 코드

 

 

댓글