top of page

A BI 2.0 Application Architecture for Healthcare Data Mining Services in the Cloud

  • josephmwoodside
  • Jul 1, 2010
  • 11 min read

As organizations seek to take part in the growing IT services sector, they must develop the appropriate architecture and business solutions which provide meaningful information in real-time. Services definition and improvements to existing systems is an important and growing area of research. An overall architecture and model is presented for enterprise integration of information and data mining components, and outlines a portfolio of healthcare data mining applications that can be configured for and consumed by healthcare service users. A sample healthcare data mining service application example is presented for real-time information exchange and support in clinical decision making.

Introduction

The economy is being transformed into a service industry, with services accounting for 75% of the US GDP, and 80% of private employment. Overall IT services are projected to increase at a 6.4% annual growth rate and value of $855.6 billion through 2010. Organizations that introduce a service-oriented architecture reduce integration and maintenance costs by up to 30%, and it is expected that 33% of business application spending will be on SaaS by 2012 and 40% of capital spending on IaaS by 2011, with spending to exceed $33.8 billion in 2010. A similar report found BI specific SaaS will grow at 22.4% annual rate through 2013. SaaS BI is being utilized during peak demand periods, and for those organizations seeking to reduce on premise costs or who typically do not have dedicated technology staff. IBM, a key player in the services sector continues to grow is service offerings, and is seeking to transform itself into an on-demand service organization, and has extensively studied how those service innovations can lead to sustainability and long-term growth. For business decision makers, the issue remains as to what level of commitment and investment should be made to these new services. SOA have been studied in research, some gaps include how these service capabilities will be used within the business to create value, and how can these structures be created (Demirkan 2008; Kanaracus 2010).

In 2010, The Department of Health and Human Services’ (HHS) National Coordinator for Health Information Technology announced $60 million of funding for Strategic Health IT Advanced Research Projects (SHARP). These special research projects are intended to develop solutions to existing barriers in adoption and use of healthcare information technology (HIT). As a component of the American Recovery and Reinvestment Act (ARRA) the grants are intended to promote the use of HIT to improve the quality and efficiency of health care and create collaboration between researchers, health care providers, and healthcare stakeholders. The key areas of study include patient focused HIT which supports patient care in day to day practice, application architectures to improve health exchange, and secondary use of EHR data to improve quality through HIT (Szemraj 2009). The outlined healthcare data mining services support these key areas, through improved patient care and processing times, enhanced enterprise architectures for information transparency, and second use of data to improve quality of care and operational performance.

Business Intelligence 2.0

BI 2.0 describes the ways business information can be utilized in real-time, and how BI can be applied to the business events. BI 2.0 speaks to the ability to delivery self-service tools and mash up capabilities to end-users in real-time. In addition increasing generations of technology users are demanding these capabilities. Real-time data from widening sources is also generating demand, and is often displaces the centralized data warehouse concept, giving way to context and real-time information from all operation systems, logs, databases, and a wide variety of other sources (Nicholls 2006; Raden 2007).

Organizations have initiated data warehousing web services, however this data is often not in a real-time state for in process decision support. BI 2.0 processes events in memory in line with the business event. The main events are comprised of XML messages, which embed within business processes for real-time analytics vs. batch for business events can start immediately. Most of these events are automated, or alert the user for a specific action. BI 2.0 utilizes middleware for in process analysis compared with historical data. Real-time demands require software applications that are event-driven, and with real-time data that uses service oriented architectures (SOA), which are loosely coupled and interoperable, enforcing a standardized application integration (Nicholls 2006).

Cloud computing provides scalable and virtualized services to the end-user via a simple web browser. A third-party manages the computing infrastructure, and provides the software as a service (SaaS). Salesforce.com, Google Apps, Amazon, and Facebook provide have cloud computing offerings. Cloud computing allows organizational to reduce IT capital costs, and buy computing on an as needed basis. There are economies of scale through shared use of systems and resources by multiple customers. Cloud computing reduces the entry barriers by eliminating software distribution and site installation requirements. This also permits organizations to develop new business models and sources of revenue through on demand services (Kambil 2009).

Enterprise Integration

Enterprise Information Integration (EII) describes the integration of various data sources into a unified form without requiring all sources be contained within a data warehouse and also integration complexity reductions (Saracco 2004; Taylor 2004; Halevy 2005). The enterprise unified view must consume data that is available real-time via direct system access, and semantic resolution must occur across systems. Semantic integration, also known as ontology, is a higher level natural language approach to combine differing pieces of information together, and in support of real-time events. A semantic information model can be constructed using Web Ontology Language (OWL) developed by W3C. The relationships and rules of the data are contained with the model, which the OWL inference engine can read and intelligently integrate the information (Taylor 2004; Raden 2005).

Real-time EII begins with SOA, as the access point for all systems through web services, and XML as the data representation (Raden 2005; Nicholls 2006). SOA promises improved agility and flexibility for organizations to deliver value-based services to their customers. A service is the application of knowledge for co-creation of value between interacting entities. Service systems involve people, technology, and information. Service science is concerned with understanding service systems, and improve and design services for practical purposes. SOA includes Web service, technology, and infrastructures, and is a process that add value, reuse, information, and overall value to the business. SOA provides a commodization of hardware and software providing organizations with improved architectures and which support IT service flexibility. The SOA approaches are utilized to develop SaaS from IaaS (Demirkan 2008).

Most real-time architectures consist of the required data sources and a virtual or mediated data schema which is then queried by the end user or application. The systems are typically build on a XML data model and query language. EII reduces data access time, while Enterprise Application Integration (EAI) allows system updates as part of the business process to occur. Both these technologies are utilized as a best practice and combined into the concept of Enterprise Integration (EI). The EI architecture supports heterogeneous data sources such as relational and non-relational databases, flat files, XML, transactional systems, and content management systems. Information transparency is provided through the virtual data access services layer which permits real-time programming services. This architecture adheres to SOA, where business processes exist as distinct services which communicate through known interfaces. This also helps promote code re-use and more flexible IT infrastructure by allowing focus on business logic, and leaving the data tasks to the EII layer (Saracco 2004; Taylor 2004; Halevy 2005).

Healthcare Applications

Healthcare organizations are increasingly investing in data mining services to improve quality, service, and cost (Fickenscher 2005). Several healthcare data mining applications are available for building a comprehensive BI 2.0 application portfolio, allowing end-use access without dedicated systems and at a reduced cost to improve value to stakeholders. Many of the healthcare service components currently suffer from lengthy delays and additional stakeholder requirements, which limits real-time information accessibility for decision making and improvements. Table 1 describes the set of healthcare service components, each with the specified healthcare requirements and improvements objectives, along with current information and decision lag time, common data mining methods utilized, and supporting works.

Quality improvement approaches have been adopted within the health care industry in attempt to improve quality of care. Most of the activities to date have focused on manual activities without a direct link to the data within the healthcare information system. Support systems can provide patient outcome information and clinical pathways to support patient care and factors influence quality and treatment. Data mining which allows for knowledge discovery from large sets of data can be used to identify patterns or rules to improve healthcare quality. Patient characteristics including age, gender, department, disease class, and quality indicators were utilized as part of decision tree analysis to determine inpatient mortality factors. An index score was developed to identify how inpatient mortality rates compare to overall proportions, and which segments to focus on (Chae 2003). Risk assessment utilizes methods to assess relative risk of individuals within the population, with the relative risk predicting costs. The assessment may be carried out utilizing various forms of data and typically includes claims, pharmacy, and self-reported survey information. This information has been utilized by the federal government to adjust payments to health plans, by employers in determining employee contributions to health coverage, by researchers in measuring outcomes of treatment methods, policy makers for tracking access to care and quality of care, and health plans for case management, disease management, quality improvement, payments to providers, and underwriting activities (Cumming 2002; Rector 2004). Growth in consumer driven health plans is driving the need for improved risk assessment accuracy, as the employee has more options in selecting benefits plans, and increases variability among the plan populations. Risk adjustment then allows the health plan to determine project outcomes appropriately or make equal payments to promote quality improvements rather than population selection, and ensure comparative price and consumer choice (Cumming 2002).

Healthcare fraud and abuse cost public and private sectors billions of dollars, and in the US these costs are estimated as high as 10% of annual spending or $100 billion per year. Many health systems rely on human experts for manual review. Manual monitoring is often expensive and ineffective. Data mining can reduce costs and identify previously unknown patterns and trends (Yang 2006; Liou 2008). Increasingly healthcare entities are using data mining tools to identify fraudulent behaviors. Data mining methods including classification tree, neural network, and regression have been applied to healthcare. The Utah Bureau of Medicaid Fraud, Australian Health Insurance Commission, and Texas Medicaid Fraud and Abuse Detection, mined data to identify fraud and abuse, saving and recovering millions of dollars. Most fraud and abuse cases are associated with diagnosis and services, some studies utilized provider name, id, demographics, claim patient, procedure, charge, bill date, and payment deductible, copayments, insurance, and payment dates to detect fraud (Viaenea 2005; Liou 2008).

Health plans are increasingly requiring prior authorizations for services, and 90% require fax or phone. Most medical policies and technology assessments are not standardized and often out of date. These can be standardized for systematic communication, and centralized for more timely updates. Current turnaround times are 3-4 days, which is long after the patient has sought services. Web-services offer the ability to check eligibility, care guidelines, and routine approvals. There may occur through a short series of questions utilizing decision trees to determine an answer in real-time. Medical policies and guidelines are also updated dynamically, and can be linked to electronic health records to improve information transparency among stakeholders (Moeller 2009).

Model and Research Design

Following the approaches of Zhang and Demirkan, a BI 2.0 data mining services architecture is developed. The data layer contains the raw data from local or remote sources, along with the meta-data. The data layer provides data transparency to the underlying source, and is cached to improve performance being accessed most frequently. The information layer has domain specific components and connectors, and aggregates the raw data. Domain specific tools are utilized such as simulation, geospatial or optimization models. The knowledge layer applies data mining, knowledge discovery and simulation for decision making. The knowledge layer is responsible for generating domain specific knowledge, for use in decision making processes. The presentation layer is the web-based interface with user friendly interfaces. The presentation layer manages lower layers, and provides data, information, and services to end-users via the web. The horizontal layers are able to be vertically integrated, which allows re-use of services and resources. This provides a flexible firm architecture able to rapidly adapt to changing business conditions. (Zhang 2007; Demirkan 2008).

Consider the following example of a healthcare services review request and response delivered real-time via SaaS on a per transaction cost basis. A 278 HIPAA EDI transaction (Workgroup for Electronic Data Interchange 2007) is sent in X12 format via a standard service entered via a web browser through the application layer. For illustration purposes, a single use service with input / output within a standard web browser. The services may be performed and utilized by various individuals such as a patient, provider, and payer, or invoked programmatically for high volume and report services. The 278 follows a per patient event relationship which is fitting for a SaaS model, and where utilization management, payer and provider entities are not required to maintain local system software and hardware, instead all layers are able to be managed remotely and accessed through a web-based application layer.

The EDI transaction which is sent in X12 format is then converted to a standard xml message for usage within the integration and data layer. The specified prior authorization service is selected, along with corresponding transactional cost, and service selection credentials. The XML and service information is utilized within the integration layer. The integration layer auto-maps the ontology for standard HIPAA and HL7 messages, or allows users to custom map elements through the application layer for single use ability. Several services occur within the integration layer; the first selects and aggregates historical data, second rule information is gathered for feeds into the data mining algorithm and decision tree. The decision tree data mining algorithm is utilized to determine the certification outcome from the historical knowledge based and input data. Within the data layer certification information is inserted into the patient record via an XQuery command, third claim information is updated to certify services for payment, and

fourth financial liability is recorded within a financial system. This information is utilized within layers to present a final authorization through the application layer in real-time. Future services can be utilized by other stakeholders relative to this transaction, for example patients can view healthcare information on medical decisions.

Conclusion and Future Directions

Healthcare entities are increasingly investing in information services, however most on premise solutions require staff experts to implement, maintain, and extract information for end-users. This absorbs resources which may be otherwise used for patient care and quality. BI 2.0 healthcare data mining services can be implemented utilizing an enterprise integration framework to allow stakeholders access to information and capabilities on demand, without dedicated IT staff. This also provides a unified data and information set across all services, allowing any stakeholder to transparently utilize services on a per transaction basis or subscription basis on demand. The service set can be extended and categorized as patient services, provider services, payer services, and employer services. Future directions include establishing standards around healthcare data mining services, production system setup, and development of value added services and applications to support healthcare quality and cost improvements.

References

Chae, Y. M., Kim, Hye S., Tark, Kwan C., Park, Hyun J., Ho, Seung H. (2003). "Analysis of healthcare quality indicator using data mining and decision support system." Expert Systems with Applications 24: 167–172.

Cumming, R. B., Knutson, David, Cameron, Brian A., Derrick, Brian (2002). A Comparative Analysis of Claims-based Methods of Health Risk Assessment for Commercial Populations. M. USA. Minneapolis, Society of Actuaries.

Demirkan, H., Kauffman, Robert J., Vayghan, Jamshid A., Fill, Hans-Georg, Karagiannis, Dimitris, Maglio, Paul P. (2008). "Service-oriented technology and management: Perspectives on research and practice for the coming decade." Electronic Commerce Research and Applications 7: 356–376.

Fickenscher, K. M. (2005). "The New Frontier of Data Mining." Heatlh Management Technology.

Halevy, A. Y., Ashishy, Naveen, Bittonz, Dina, Carey, Michael, Draper, Denise, Pollock, Jeff, Rosenthal, Arnon, Sikkay, Vishal (2005). Enterprise Information Integration: Successes, Challenges and Controversies. SIGACMSIGMOD, Baltimore, Maryland.

Kambil, A. (2009). "A head in the clouds." Journal of Business Strategy 30(4): 58-59.

Kanaracus, C. (2010). SaaS BI Will Be Big in 2010. CIO.

Liou, F.-M., Tang, Ying-Chan, Chen, Jean-Yi (2008). "Detecting hospital fraud and claim abuse through diabetic outpatient services." Health Care Manage Sci 11: 353–358.

Moeller, D. (2009). "Manage medical advances with automated prior authorization." MANACED HEALTHCARE EXECUTIVE: 26-27.

Nicholls, C. (2006). BI 2.0: The Next Generation. Information Management Magazine.

Raden, N. (2005). Start Making Sense: Get From Data To Semantic Integration. Intelligent Enterprise.

Raden, N. (2007). Business Intelligence 2.0: Simpler, More Accessible, Inevitable. Intelligent Enterprise.

Rector, T. S., Wickstrom, Steven L., Shah, Mona , Thomas Greeenlee, N., Rheault, Paula, Rogowski, Jeannette, Freedman, Vicki, Adams, John, Escarce, Jose J. (2004). "Specificity and Sensitivity of Claims-Based Algorithms for Identifying Members of Medicare Choice Health Plans That Have Chronic Medical Conditions." Health Services Research 39(6): 1839 - 1861.

Saracco, C. M., Labrie, Jacques, Brodsky, Stephen (2004). Using Service Data Objects with Enterprise Information Integration technology.

Szemraj, N. (2009). HHS announces $60M Program to Fund Strategic Health IT Advanced Research Projects.

Taylor, J. (2004). Enterprise Information Integration: A New Definition Thoughts from the Integration Consortium. Information Management Online.

Viaenea, S., Dedene, G., Derrig, R.A. (2005). "Auto claim fraud detection using Bayesian learning neural networks." Expert Systems with Applications 29: 653–666.

Workgroup for Electronic Data Interchange (2007). "National Electronic Data Interchange Transaction Set Implementation Gide."

Yang, W.-S., Hwang, San-Yih (2006). "A process-mining framework for the detection of healthcare fraud and abuse." Expert Systems with Applications 31: 56-68.

Zhang, S., Goddard, Steve (2007). "A software architecture and framework for Web-based distributed Decision Support Systems." Decision Support Systems 43: 1133– 1150.

Definitive Source and Citation:

Woodside, Joseph M. (2010). A BI 2.0 Application Architecture for Healthcare Data Mining Services in the Cloud. International Data Mining Conference.

Comments


Posts

Search By Tags

  • LinkedIn Social Icon
  • ResearchGate Icon

© 2018 Joseph M. Woodside

bottom of page