Home About us Editorial board Search Ahead of print Current issue Archives Submit article Instructions Subscribe Contact us Login 
  • Users Online:642
  • Home
  • Print this page
  • Email this page

 Table of Contents  
Year : 2022  |  Volume : 10  |  Issue : 1  |  Page : 112-117

Big data in clinical sciences-value, impact, and fallacies

1 Department of Anaesthesiology, Cosmopolitan Hospitals, Thiruvananthapuram, Kerala, India
2 Department of Neurosurgery, Sree Chitra Tirunal Institute for Medical Sciences and Technology, Thiruvananthapuram, Kerala, India

Date of Submission15-Dec-2021
Date of Decision03-Jan-2022
Date of Acceptance15-Jan-2022
Date of Web Publication23-Jun-2022

Correspondence Address:
Dr. George C Vilanilam
Department of Neurosurgery, Sree Chitra Tirunal Institute for Medical Sciences and Technology, Thiruvananthapuram - 695 011, Kerala
Login to access the Email id

Source of Support: None, Conflict of Interest: None

DOI: 10.4103/amhs.amhs_296_21

Rights and Permissions

The ever-burgeoning healthcare enigmata may find their answers in Big Data. When data cannot be collected, curated, managed, and processed by commonly used software tools within a requisite time frame, they are referred to as Big Data. We put forth a narrative review on the evolution and spectrum of the clinical applications of Big Data across medical and surgical sciences, evaluating their impact and cautioning about their potential fallibilities. There is an explosion of health care data generated as a byproduct of clinical care and research in the digital information era. The challenge lies in converting these unstructured datasets into clinical wisdom and practice-defining insights. Big data provides information on the quality of health care, resource utilization, public health deficiencies, research hypothesis creation, and overall holds the potential to revolutionize clinical sciences. Several fallacies of big data like data inaccuracies, privacy, confidentiality, proprietary concerns, and caveats in data analysis algorithms may misdirect the lessons from big data.

Keywords: Big data impact, clinical big data, fallacies big data

How to cite this article:
Abraham L, Vilanilam GC. Big data in clinical sciences-value, impact, and fallacies. Arch Med Health Sci 2022;10:112-7

How to cite this URL:
Abraham L, Vilanilam GC. Big data in clinical sciences-value, impact, and fallacies. Arch Med Health Sci [serial online] 2022 [cited 2023 Feb 9];10:112-7. Available from: https://www.amhsjournal.org/text.asp?2022/10/1/112/347965

  Introduction Top

“The world is one big data problem.”

Andrew McAfee

  Andrew MacAfee Top

Information and insights are key to human progress in any branch of scientific development. However, a deluge of data is not equivalent to having information. Massive volumes of clinical and healthcare research information obtained by digital technologies and beyond the capabilities of traditional data storage and analysis methods are collectively included in the term “Big Data” in healthcare.[1],[2] The challenge lies in converting this ocean of data into useful insights and actionable information.[3]

Just as oil fuels an engine, data fuels breakthroughs in clinical insights and better treatment outcomes. Big data have taken the world by storm and has revolutionized the effective use of resources in almost all sectors of human activity.[4] Routine clinical activity continuously generates big datasets as a byproduct, creating challenges in storage, retrieval, collation, and analysis. Big Data in healthcare lags behind other fields primarily due to unstructured datasets, privacy and data security concerns, siloed data, and financial constraints in data preservation.[4],[5],[6]

  History and Evolution of Big Data Top

We live today in a world of 7.9 billion people and a digital universe where several zettabytes of data are already created. In 2005, Roger Mougalas from O'Reilly Media coined the term Big Data, and John Mashey is credited with popularising it.[6],[7]

Gross disparities in health care availability and access exist the world over. Not all clinical care is documented and digitalized, but in developed nations, the advent of Electronic Medical Records has created a plethora of unstructured datasets.[5] Big data holds the potential to plan resources better, reduce costs of treatment, predict outbreaks of epidemics, avoid preventable diseases, and improve the quality of life[1],[3] [Table 1].
Table 1: Terminology with reference to big data

Click here to view

  Sources of Big Data and their Mammoth Volumes Top

The yottabyte (280 bytes or 1024 zettabytes), the largest existing unit of digital information, would soon need further expansion to encompass the sheer volumes of digital data. Health care data at the current pace grows at an exponential pace.One hundred and fifty-three exabytes of clinical data had been generated worldwide, exceeding 2314 exabytes by 2020 (an annual growth of over 48%). The digital clinical recording needs have grown exponentially from bytes (e.g., intermittent clinical observations), to kilobytes (e.g., clinical notes), to megabytes (e.g., clinical photographs), to gigabytes (e.g., computed tomography images), to terabytes (e.g., genomic sequencing). Seventy-four zettabytes of data are estimated to be created in 2021 on the Internet[5],[6] [Figure 1].
Figure 1: The “Clinical Big Data” Sources

Click here to view

”Big data in health” is an umbrella term involving high diversity, large volume, biological, environmental, clinical, and lifestyle information collected from single individuals to large cohorts, in relation to their health and wellness status, at one or several points of time. Big data sources include clinical trials; electronic health records (HER); patient registries and databases; multidimensional data from genomic, epigenomic, transcriptomic, proteomic, metabolomic, microbiomic measures, and medical imaging. In the current era, data are being integrated from social media, socioeconomic or behavioral indicators, occupational information, mobile applications, or environmental monitoring.[8],[9],[10],[11],[12] While the term “data lake” is often used to describe a collection of raw big data, several efforts promise to build “data oceans” brimming with research and analysis opportunities.[11],[12]

Large administrative datasets are an indispensable source of Big Data. Existing research registries specialized in improving perioperative outcomes are the National Surgical Quality Program and the Society of Thoracic Surgeons. Currently, in developed countries, almost all clinical specialties have their own sources of big data from large administrative datasets. For example, registries specified in the field of anesthesiology are the Multicenter Perioperative Outcomes Group and the National Anesthesia Clinical Outcomes Registry. Clinical databases are often managed by clinicians and are better suited to evaluating disease characteristics and outcome scores (like the Glasgow coma scores and Rankin scale for disability outcomes). Administrative databases are more classification and disease coding oriented. Thus, these are better suited for resource allocation and administrative planning.[10],[11]

  The Clinical Big Data Analysis Algorithm-The 6 V's Versus of Big Data Top

The hallmarks of Big Data, as originally proposed by Laney in 2001, are the “3 Versus:” Volume, Velocity, and Variety.[12],[13] Further to this, another 3 Versus are added, veracity, variability, and value.

  Volume Top

Although volumes are an important characteristic of big data, it is the inclusiveness and representativeness of big data that makes it special. Colossal volumes of clinical digital data are generated each day. The anesthesiology data during surgery, such as a 5-lead electrocardiogram for a 2-h case, would generate 37 MB of data. Capnography, arterial blood pressure and central venous pressure, pulse oximetry, electroencephalograph traces, airway pressure, and volume waveforms added to it make the data volume reach higher limits. Liu et al. recorded waveform data with 10 msond resolution (100 Hz) from 32 patients undergoing anesthesia generating approximately 5.5 GB of data per case. The challenge lies in converting this volume to information and clinical practice insights.[5],[6]

  Velocity Top

Data are generated at a great speed and continuously inundating existing storage limits. More than 2.5 quintillion bytes of clinical digital data are generated every day. As data is unstructured and continuously generated, it can reach colossal limits. The pace of data storage, analysis, and information output may not often keep pace with the data collection.[6],[7]

  Variety Top

Heterogenous data from multiple sources about a particular illness, make big data a researcher's paradise. One can strain the wheat from the chaff to choose the data that suits one's research hypothesis.[12]

  Veracity Top

Data reliability and truthfulness rely heavily on data quality and adds to its veracity. Veracity is not intrinsic to big data but is difficult to regulate in large datasets. On it depends the weight of the research output from big data.[13]

  Variability Top

The format and structure of big data are always dynamic. They involve structured, unstructured, and a mix of both types of datasets. Raw data may be integrated from multiple sources.[12]

  Value Top

Value refers to the profitability of information obtained from big data. It is a cumulation of all other 'Vs' and therein lies the true worth of big data.[12],[13]

  Mining Big Data-Information to Clinical Insights Top

Datasets are not equal to information. Finding the needle in the haystack of unstructured datasets is a daunting task. Advanced analytics applied to unstructured datasets with the use of natural language processing (NLP) and machine learning are crucial to big data analytics.[12] Sizes could range from terrabytes to zettabytes. Computer hardware continues to follow “Moore's Law,” roughly doubling in processing power every 2 years The four types of big data analytics include descriptive analytics (what happened?), diagnostic analytics (why did it happen?), predictive analytics (what is likely to happen?) and prescriptive analytics (what do we need to do?). Analytics can help identify patterns, correlation, trends, and make futuristic predictions. The steps involved are data collection, processing, cleansing and analysis. The analysis includes data mining, predictive analysis, machine learning, deep learning, and others.[13] Key technologies used in big data analysis include Hadoop (open-source framework), predictive analysis, and steam analysis tools [Figure 2].
Figure 2: The Big Data cycle in clinical sciences

Click here to view

  Impact of Clinical Big Data in Surgical and Medical Care Top

Clinical sciences are a vast multi-dimensional system and a major repository of healthcare data.[10],[14] The broad benefits of big data have impacted almost all clinical specialties to a great extent[1],[8],[15],[16],[17],[18],[19],[20],[21] [Table 2].
Table 2: Overview of big data research in clinical sciences

Click here to view

Several areas stand to benefit immensely as follows.

Electronic Health records data mining

Murphy, Hanken, and Waters defined EHR as computerized medical records for patients with information relating to the past, present, or future physical/mental health or condition of an individual which resides in electronic system (s) used to capture, transmit, receive, store, retrieve, link and manipulate multimedia data for the primary purpose of providing healthcare and health-related services.”[14] EHR remains the biggest source of big data and also its greatest beneficiary. Big data services ensure that the unstructured EHR data are curated, collated, and analyzed for practice-changing insights. According to Stanley Reiser, the clinical case records freeze the episode of illness as a story in which the patient, family, and the doctor are a part of the plot. Big data perhaps, makes this plot more decipherable and reproducible.[14]

Intraoperative monitoring during surgery

When all physiological parameters in Big Data are used for analytics, it holds the potential to improve intraoperative monitoring, thereby enhancing surgery and anesthesiology outcomes.[21]

Genomics and personalized medicine

Individualized and bespoke medical care tailored by genomics could help in avoiding drug allergies, complications such as malignant hyperthermia, and better therapy based on pharmacogenetics.[1],[2] The clinician anesthetist will then have a plethora of information on which complications to expect during therapy/surgery and can prepare accordingly to provide optimal individualized care and precision medicine.[2],[5],[14]

Machine learning and predictive analytics

Genomics and machine learning have the potential to radically evolve the current perioperative care and the prevention of adverse outcomes when combined with real-time data analytics. Big data analytics on preoperative, intraoperative, and postoperative data combined with knowledge helps to determine how a patient may potentially react based on their genetic code (Example: An anesthetist using Big data analytics to predict malignant hyperthermia in a patient and taking preventive/corrective action). Artificial Intelligence could help to accurately discover patterns, which would help predictions of adverse clinical outcomes before they occur.

Epidemiology and public health

Risk stratification across large populations had been impacted by big data. Researchers have been efficient in using mobile phone information in recording patient movements in pandemics like COVID-19 and other infectious diseases such as cholera, malaria, dengue, human immunodeficiency virus, Ebola, rubella, and schistosomiasis. Air travel data, Global Passenger Survey data-loggers, social media, etc., have helped track patient movement data for larger decision-making and public health measures.

Intervention evaluation

Newer therapeutic modalities such as minimally invasive surgery, robotic surgery, and newer procedures, stand to gain immensely from the cumulation of datasets from multiple centers. Such an aggregation of worldwide surgical experience helps to compare these newer techniques with established standards of care.

Observational research

Hypothesis generation for clinical research based on trend analysis from big data is an evolving practice. Meta-analyses and systematic reviews, especially involving rare illnesses, benefit greatly from the vast volumes of data. Caution has to be exercised while evaluating heterogeneous datasets across varied populations.

Omics data research

Genome, proteome, transcriptome, and metabolome data constitute omics data. Each of these individual experiments generates a large amount of data with more depth of information and newer insights generated from these big volumes of data.[6]

Service and resource planning

Efficient staffing and optimum use of resources, both material and human resources have been possible from the lessons of big data. This helps to reduce the cost of health care services and prioritize care as per needs.[2],[4],[6]

Medicolegal research

Big data from medicolegal databases and ethical committee reviews can establish benchmarks of care. They could identify vulnerable areas for ethical and medicolegal deviations from standards and suggest corrective timely action.[11]

Surgical training

Surgical operative videos produce volumes of digital data that can be used for enhancing training and surgical standards. Surgeon's movement analysis for precision, economy of effort, and finesse can be analyzed further to define surgical standards.

Internet of things and clinical support tools

Internet of things (IoT) devices create a continuous stream of data while monitoring the health of people, thereby contributing to big data in healthcare.[13] These include fitness or health-tracking wearable devices, biosensors, clinical devices for monitoring vital signs, etc. Such IoT devices generate a large amount of health-related data that can be integrated with EHR data. This could be used to predict a patients' health status and its progression from preclinical to pathological states.[6],[10],[14],[16]

  Fallacies and Limitations Top

Data analysis caveats

Statistical analysis of big data has certain fallacies such as noise accumulation, spurious correlation, and measurement errors. “Noise accumulation” stands for the increasing amount of corrupt, missing, or spurious data. This can decrease the signal-to-noise ratio and make it tough to identify true positives. Spurious correlations can result from the high dimensionality of big data, wherein unrelated random variables appear to be highly and causally related, but are in fact not.[14],[16],[20],[21],[22]

Data quality concerns

Missing data and data of low quality are a challenge to overcome. Data extraction from unstructured sources using NLP could have significant potential to overcome such limitations. Some studies selectively use data and statistical analysis techniques to emphasize findings consistent with the study hypothesis

Privacy, proprietary ethical, and security concerns

Wide availability and access on open-source databases and registries may create data privacy concerns. Less stringent regulations may also make big data more vulnerable to ethical violations.

Insight related pitfalls

Too much data may cause an “analysis paralysis,” leading to a delay or paralysis in decision making despite the plethora of data. The MacNamara fallacy where an over reliance on metrics may mislead the “big picture” may also result in fallacious lessons from big data.[22]

  Future of Big Data-Bigger and Better Top

Newer metrics and further computational efficiency may be needed in future as big data gets bigger. Big data will further revolutionize solutions and drive value in healthcare organizations. The technological infrastructure to accommodate and collate the massive volume of healthcare data, which industry analysts estimate will grow beyond a massive 2,314 exabytes by 2021, is a further challenge. Information technology experts, data architects, and big data engineers would usher in a new future in healthcare.[1],[2],[14]

  Conclusion Top

Newer avenues for big data applications have revolutionized clinical care and health science research. Integrated with complex computational analytics, artificial intelligence, machine learning, deep learning and NLP, big data are realizing its full potential. An era of personalized precision medicine, optimal resource allocation, efficient public health decision-making, deeper insights into rare illnesses, and better research hypothesis generation have all been aided immensely by big data. The challenge however lies in straining the signal from the noise, the insight from the data. Cautions about the fallacies and misdirections of big data analysis are essential to avoid picking up the wrong lessons.

Financial support and sponsorship


Conflicts of interest

There are no conflicts of interest.

  References Top

Mathias B, Lipori G, Moldawer LL, Efron PA. Integrating “big data” into surgical practice. Surgery 2016;159:371-4.  Back to cited text no. 1
Targarona EM, Balla A, Batista G. Big data and surgery: The digital revolution continues. Cir Esp (Engl Ed) 2018;96:247-9.  Back to cited text no. 2
Gibson JA, Dobbs TD, Kouzaris L, Lacey A, Thompson S, Akbari A, et al. Making the most of big data in plastic surgery: Improving outcomes, protecting patients, informing service providers. Ann Plast Surg 2021;86:351-8.  Back to cited text no. 3
Coleman AL. How big data informs us about cataract surgery: The LXXII edward jackson memorial lecture. Am J Ophthalmol 2015;160:1091-103.e3.  Back to cited text no. 4
Knight SR, Ots R, Maimbo M, Drake TM, Fairfield CJ, Harrison EM. Systematic review of the use of big data to improve surgery in low- and middle-income countries. Br J Surg 2019;106:e62-72.  Back to cited text no. 5
Murdoch TB, Detsky AS. The inevitable application of big data to health care. JAMA 2013;309:1351-2.  Back to cited text no. 6
Cobb AN, Benjamin AJ, Huang ES, Kuo PC. Big data: More than big data sets. Surgery 2018;164:640-2.  Back to cited text no. 7
Perry A, Kerezoudis P, Graffeo CS, Carlstrom LP, Peris-Celda M, Meyer FB, et al. Little insights from big data: Cerebrospinal fluid leak after skull base surgery and the limitations of database research. World Neurosurg 2019;127:e561-9.  Back to cited text no. 8
Wall J, Krummel T. The digital surgeon: How big data, automation, and artificial intelligence will change surgical practice. J Pediatr Surg 2020;55S: 47-50.  Back to cited text no. 9
West JL, Fargen KM, Hsu W, Branch CL, Couture DE. A review of big data analytics and potential for implementation in the delivery of global neurosurgery. Neurosurg Focus 2018;45:16.  Back to cited text no. 10
Kerezoudis P. Big data in neurosurgery: Harder, better, faster, stronger? World Neurosurg 2020;133:398-400.  Back to cited text no. 11
Carlos RC, Kahn CE, Halabi S. Data science: Big data, machine learning, and artificial intelligence. J Am Coll Radiol 2018;15:497-8.  Back to cited text no. 12
Shekhar S. Internet of things and creation of the fifth V of big data. Int J Sci Res 2017:2319-7064.  Back to cited text no. 13
Dash S, Shakyawar S, Sharma M, Kaushik S. Big data in healthcare: Management, analysis and future prospects. J Big Data 2019;6:1-25. [doi: 10.1186/s40537-019-0217-0].  Back to cited text no. 14
Craven M, Page CD. Big data in healthcare: Opportunities and challenges. Big Data 2015;3:209-10.  Back to cited text no. 15
Baro E, Degoul S, Beuscart R, Chazard E. Toward a literature-driven definition of big data in healthcare. Biomed Res Int 2015;2015:639021.  Back to cited text no. 16
Resteghini C, Trama A, Borgonovi E, Hosni H, Corrao G, Orlandi E, et al. Big data in head and neck cancer. Curr Treat Options Oncol 2018;19:62.  Back to cited text no. 17
Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 2017;542:115-8.  Back to cited text no. 18
Weintraub WS. Role of big data in cardiovascular research. J Am Heart Assoc 2019;8:e012791.  Back to cited text no. 19
Goodin A, Delcher C, Valenzuela C, Wang X, Zhu Y, Roussos-Ross D, et al. The power and pitfalls of big data research in obstetrics and gynecology: A consumer's guide. Obstet Gynecol Surv 2017;72:669-82.  Back to cited text no. 20
Levin MA, Wanderer JP, Ehrenfeld JM. Data, big data, and metadata in anesthesiology. Anesth Analg 2015;121:1661-7.  Back to cited text no. 21
Practice B. Statistical Fallacies and how to Avoid them | Geckoboard. Geckoboard; 2021. Available from: https://www.geckoboard.com/best-practice/statistical-fallacies/. [Last accessed on 2021 Dec 01].  Back to cited text no. 22


  [Figure 1], [Figure 2]

  [Table 1], [Table 2]


Similar in PUBMED
   Search Pubmed for
   Search in Google Scholar for
 Related articles
Access Statistics
Email Alert *
Add to My List *
* Registration required (free)

  In this article
Andrew MacAfee
History and Evol...
Sources of Big D...
The Clinical Big...
Mining Big Data-...
Impact of Clinic...
Fallacies and Li...
Future of Big Da...
Article Figures
Article Tables

 Article Access Statistics
    PDF Downloaded62    
    Comments [Add]    

Recommend this journal