Author: Haibin Shu

by Haibin Shu Haibin Shu No Comments

How to Protect Data Integrity?

Just as integrity is an important characteristic of a human being, data integrity lays the foundation for valid analyses and reports. The integrity of data includes the following important components:

  • Accuracy: truly reflects source records and free of transcription errors. It mirrors the presence of honesty and truthfulness.
  • Consistency: free of logical errors, consistency across domains, visits, and devices etc. (e.g., collecting the same measurement for multiple times during the course of study) Consistency between data entries and data extraction such as data types, formats, conventions etc. This requirement of consistency corresponds to the personality of dependability and accountability.
  • Security: defines roles, access levels, scopes, and activities etc. It’s important to allow the right role to perform the right tasks and prevent the other way around. This talks about the characteristics of self-control and self-discipline.
  • Traceability: who did what at what time, and/or why. Just like in reality, keeping a track-record is essential for data integrity.

How to protect data integrity?

Protecting data integrity requires commitment of 4 Ps.

  • Platform: a secured system that is Part 11 compliant and adequate to provide good front end for data entries, discrepancy management, and other end-user interfaces. It also provides a strong back end to allow programmers to design/implement a database with required edit checks, metrics reports and other important/study specific functions in an efficient and effective way.
  • Process: a rigorous process specifies roles, functions, workflows, team-work, and responsibilities etc. It should include change control management.
  • People: having committed people is the key in the whole picture of protecting data integrity. Paying attention to details, being sensitive to any potential deviations that might compromise data integrity and taking proactive steps before any possible mistakes might take place demonstrate the needed commitment. Timely and ongoing training efforts would help promote more people to become committed to data integrity.
  • Passion: protecting data integrity requires corporate-level consensus and collaborative efforts.
by Haibin Shu Haibin Shu No Comments

Left-hand Programing vs. Right-hand Programming?

Double programming is a gold standard in a team of statistical programming and analysis. The necessity of doing so, is to ensure that the data is being processed correctly and the analysis is being conducted correctly following the pre-specified requirements such as the SAP document. In the end, the beauty of the practice is that same results are achieved by two or more different approaches. Often times the approaches are independent from each other and might even begin with divergent understanding of particular analysis methods. Ultimately, accuracy will be achieved when differences are reconciled and critical understandings converge, just like the team-work between left-hand and right-hand!

Generally speaking, the initial programmer has certain advantages such as choosing/using naming conventions, setting up output layout formats, and applying statistical procedures etc. The QC programmer would generally focus on checking the accuracy of the content of the output and analysis results. Plus, they would follow the variety of conventions that has been set up by the initial programmer.

Some factors to consider when facilitating a strong collaboration between left-hand programming and right-hand programming are:

  • A win-win culture: approaches are independent but the goal is the same. Commonly, the final goal is to prevent any mistakes in any programming and analysis. The initial programmer should always self-check first for initial quality assurance before handing over to the QC programmer for independent reviews.
  • Avoid cosmetic over-do: both parties should stick to simple, common, and effective conventions. Making too many extremely detailed formatting efforts could result in extra time and more difficulty in reconciliation of non-essential content, e.g., concatenating variables by special characters and calculated spaces.
  • Having constant contact: both parties should talk to each other constantly in order to generate high productivity. Changes are usually inevitable. For example, analysis methods may change multiple times in the course of a study before the SAP finalization. Both parties may start revising the respective programs in parallel (not sequentially) had they all be informed of the change requests in the same time.
by Haibin Shu Haibin Shu No Comments

Does One Clinical System Fit All Studies?

The answer is ‘No’ because it’s very hard to build or find a system that can fit into the needs of all clinical trials. Therefore selecting the most sufficient-effective system would become very important prior to launching the upcoming clinical trials.

In theory one would think it might be possible to include all factors when creating such a system, in reality however, the efforts might be too much to be justified. Let alone the ever evolving status of clinical trial requirements – e.g. the best paper-based system in history would fail to address the basic needs of a simple study nowadays that requires electronic data capture.

Some factors to consider when selecting a clinical system:

  • Experiences of site users: less experienced users may require much more robust and less error prone systems to work with for data entries and other necessary tasks.
  • Study design including visit structures, key data points etc.: the more complex a study is the more programming requirements it might need to implement such as edit checks, metric reports etc. So it may require a system that provides strong programming capabilities in the backend.
  • SAS extracts: many systems are equipped with on-demand-type data extracts which make it very convenient to extract real-time data for necessary reporting and analyses. Contrarily for systems that don’t provide such capability it might become a daunting task to get SAS data sets out of the system and nothing could be done until SAS data sets can be generated.
  • Balance of front-end and back-end: a system can easily be voted down if the front end functions aren’t as desirable; on the other hand attentions should be paid to the back end functions as well. Selecting a system with friendly front pages but insufficient back end functions equals driving a nice looking car without a good engine!

by Haibin Shu Haibin Shu No Comments

How to Setup Dropbox as an Effective Programming Environment?

Dropbox provides a cost-effective, secured, and sharing environment for SAS programmers. It’s quick to set up and easy to operate. Furthermore, it provides a server-like platform for programmers to develop, share, and execute codes.

A Cost-Effective Environment

Dropbox is a cost-effective web-based application that provides file storage and file management functions that are integrated with local systems. It provides solutions without heavy IT spending by reducing procurement of hardware and software, and maintenance needs.

Quick Set-up and Easy Operation

  1. Download and install Dropbox desktop application
  2. Share the Dropbox network path, e.g.

3.  Map the above network path to a common drive, e.g. Y:


Synchronization can be done selectively –

As long as the same letter is used to map Dropbox and the same study folders structure is used both programs and data sets can be invoked from any computers with SAS.