Quantcast
Channel: R-bloggers
Viewing all articles
Browse latest Browse all 12100

Calculate all the CVs of all the QC Levels of all the Methods of all the Instruments at all the Sites all at once … with Sunquest LIS and dplyr

$
0
0

[This article was first published on The Lab-R-torian, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Background

As part of our lab accreditation requirements, we have to provide measurement uncertianty estimates for all tests at all hospital sites. As you might imagine, with thousands of testcodes in Sunquest LIS, getting all the coefficients of variation (CVs) represents a daunting task for the quality technologist to accomplish. As it turns out, by capturing the ssh session in a .txt file, you can use R’s dplyr package to do this all in few lines of code.

Getting the Raw Data

You need to get the raw data from Sunquest. You can capture the telnet (yes… older versions of Sunquest use telnet and pass patient information and user passwords unencrypted across the hospital network o_O) or the ssh session to a file using the Esker SmarTerm which Sunquest packages in their product and refers to as “roll-n-scroll”. People disparriage SmarTerm as an old “dos tool”–whereas Sunquest is hosted on AIX operating system. SmarTerm access to Sunquest is a gagillion times faster than the GUI and permits us to capture the raw QC data we need. To capture the session select from the dropdown menu as shown here:

SQ Screenshot1

If you are using Mac OS or Linux OS, you can also capture the ssh session by connecting from the terminal and using tee to dump the session to a file.

ssh user@serverIPaddress | tee captured_session.txt

Once you have connected, use the QC function and select output printer 0 (meaning the screen) and make these selections, changing the dates as appropriate:

SQ Screenshot1

If you make no selections at all for any of:

  • TEST:
  • WORKSHEET:
  • METHOD:
  • CONTROL:
  • SHIFT #:
  • TECH:
  • TESTS REQUESTED:

then you will extract everything, which is what you want and which will make for a very big .txt file. There will be a delay and then thousands of QC results will dump to the screen and to your file. When this is complete, end your SmarTerm or ssh or telnet (cringe) session. I saved my text dump as raw_SQ8.txt.

Getting it intro R and parsing it

Your data will come out as a fixed with file with no delimiters. It will also have a bunch of junk at the bottom and top of the file detailing your commands from the start and end of the session. These need to be discarded. I just used grep() to find all the lines with the appropriate date pattern. After reading it in, because I am lazy, I wrote it back out and read it in again with read.fwf()

library(tidyverse)library(lubridate)library(knitr)# Note to my friend SK - yes... this is mostly in base-R... # create a connectioncon <- file(file.path("raw_SQ8.txt"))raw.qc.data <- readLines(con)close(con)#find good rowsgood.data <- grep("[0-9]{2}(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[2][0][0-9]{6}",raw.qc.data)raw.qc.data <- raw.qc.data[good.data]#remove a screwball encoding characterraw.qc.data[1] <- substr(raw.qc.data[1],6,nchar(raw.qc.data[1]))con <- file("temp.txt")#rewrite the file with no garbage in it.writeLines(raw.qc.data, con)close(con)raw.qc.data <- read.fwf("temp.txt",c(6,6,6,20,13,6,2,15,100))file.remove("temp.txt")names(raw.qc.data) <- c("test.code","instr.code","qc.name","qc.expire",                        "date.performed","tech.code","shift",                        "result","modifier")raw.qc.data <- data.frame(lapply(raw.qc.data, trimws))raw.qc.data$result <- as.numeric(as.character(raw.qc.data$result))raw.qc.data$date.performed <- dmy_hm(raw.qc.data$date.performed)raw.qc.data$tech.code <- as.numeric(raw.qc.data$tech.code) #anonymize tech codesraw.qc.data <- arrange(raw.qc.data, instr.code, test.code)

Now that all the data munging is done, we can examine the data:

test.codeinstr.codeqc.nameqc.expiredate.performedtech.codeshiftresultmodifier
BCLJBGASRAD1R0173 EXP MAR 20212019-11-15 09:17:00682122NA
BCLJBGASRAD1R0173 EXP MAR 20212019-11-15 20:51:00683122NA
BCLJBGASRAD1R0173 EXP MAR 20212019-11-15 21:47:00683122NA
BCLJBGASRAD1R0173 EXP MAR 20212019-11-15 21:50:00683122NA
BCLJBGASRAD1R0173 EXP MAR 20212019-11-17 07:10:00151122NA
BCLJBGASRAD1R0173 EXP MAR 20212019-11-17 07:11:00151122NA

And finally, we can make the dplyr magic happen and discard results for which the counts are too small, which I have chosen to be <20:

raw.qc.data %>% dplyr::filter(!is.na(result)) %>%  group_by(instr.code,test.code,qc.name,qc.expire) %>%  summarise(median = median(result),            IQR = IQR(result),            mean = mean(result),            SD = sd(result),            min = min(result),            max = max(result),            CV = round(sd(result, na.rm = TRUE)/mean(result, na.rm = TRUE)*100,2),            count = n()) %>%  filter(count ≥ 20) %>%  arrange(instr.code, test.code, median) -> summary.table

Which gives us output like this:

head(summary.table)

instr.codetest.codeqc.nameqc.expiremedianIQRmeanSDminmaxCVcount
JBGASBCLRAD3R0141 EXP SEP 201765.01.00065.1454540.650304363.066.01.0055
JBGASBCLRAD2R0175 EXP MAR 202197.00.00097.1282050.336482097.098.00.3578
JBGASBCLRAD1R0173 EXP MAR 2021122.00.000122.1228070.5691527121.0124.00.4757
JBGASBGLUCRAD1R0173 EXP MAR 20211.50.0001.5070170.02577131.51.61.7157
JBGASBGLUCRAD2R0175 EXP MAR 20215.60.0755.5858970.06390815.45.71.1478
JBGASBGLUCRAD3R0141 EXP SEP 201713.70.10013.7636360.131040913.414.10.9555

This permits us to toss out results with low counts. But what about handling outliers? Well, we can calculate the z-scores of the raw data by joining the the mean and SD results back to the raw data.

raw.qc.data %>%  left_join(select(summary.table,c(instr.code:qc.expire, mean, SD)),             by = c("test.code","instr.code", "qc.name", "qc.expire")) %>%  mutate(z.score = (result - mean)/SD) -> raw.qc.data

This will permit you to suppress results outside a certain z-score. So, let’s suppress all results with an undefined z-score and all results with a z-score >= 4:

raw.qc.data %>%  drop_na(z.score) %>%  filter(abs(z.score) < 4) -> raw.qc.data

Now , we can re-run the dplyr summary:

raw.qc.data %>% dplyr::filter(!is.na(result)) %>%  group_by(instr.code,test.code,qc.name,qc.expire) %>%  summarise(median = median(result),            IQR = IQR(result),            mean = mean(result),            SD = sd(result),            min = min(result),            max = max(result),            CV = round(sd(result, na.rm = TRUE)/mean(result, na.rm = TRUE)*100,2),            count = n()) %>%  filter(count ≥ 20) %>%  arrange(instr.code, test.code, median) -> summary.table.no.outliers

And now we have a summary of every QC CV in our Sunquest system with outliers suppressed:

head(summary.table.no.outliers)

instr.codetest.codeqc.nameqc.expiremedianIQRmeanSDminmaxCVcount
JBGASBCLRAD3R0141 EXP SEP 201765.01.00065.1454540.650304363.066.01.0055
JBGASBCLRAD2R0175 EXP MAR 202197.00.00097.1282050.336482097.098.00.3578
JBGASBCLRAD1R0173 EXP MAR 2021122.00.000122.1228070.5691527121.0124.00.4757
JBGASBGLUCRAD1R0173 EXP MAR 20211.50.0001.5070170.02577131.51.61.7157
JBGASBGLUCRAD2R0175 EXP MAR 20215.60.0755.5858970.06390815.45.71.1478
JBGASBGLUCRAD3R0141 EXP SEP 201713.70.10013.7636360.131040913.414.10.9555

And there we have it:

SQ Screenshot1

Now I can write the output file

write_csv(summary.table.no.outliers, "QC_summary.csv")

With dplyr, if you direct your energies to the right place, you reap much. Similarly:

“But seek ye first the kingdom of God, and his righteousness; and all these things shall be added unto you.”

Matthew 6:33

var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) {var s = d.createElement(t); s.type = 'text/javascript'; s.async = true;s.src = '//cdn.viglink.com/api/vglnk.js';var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: The Lab-R-torian.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.


Viewing all articles
Browse latest Browse all 12100

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>