-
-
Notifications
You must be signed in to change notification settings - Fork 4k
feat 5340 addition of html and pdf report rendering of encapsulated SR reports. #5345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
feat 5340 addition of html and pdf report rendering of encapsulated SR reports. #5345
Conversation
❌ Deploy Preview for ohif-dev failed. Why did it fail? →
|
|
I see lots of Segmentation related errors in Playwright which does not look related to anything I have done so far. I am thinking the last set of commits I synced from upstream might be faulty or incomplete. Removing the [WIP] designation since I do not see anything else to do for this PR until further guidance. |
|
Same comment I had in other PR, which is seems like the diff includes other changes. This makes merging your PRs slower, if you can only include changes for SR for this we can review,test faster and merge while waiting for the others to get reivewed |
|
Can you provide an anonymized example of both pdf and html reports which can be added to the automated test data set? I can get those uploaded. |
| @@ -0,0 +1,64 @@ | |||
| import React, { useEffect } from 'react'; | |||
| import Markdown from 'marked-react'; | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you do lazy inclusion of the marked-react? This component isn't used elsewhere so loading it using a dynamice import statement, causing the separation into a separate load container will keep the overall OHIF size a bit smaller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are you not using react-markdown which is far more utilized in other projects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. I will read more about this and educate myself and then submit a commit.
| */ | ||
| export function getPayloadType(payload: string, suggested_mime: string = 'text/plain') { | ||
| if (!payload.indexOf('%PDF-')) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you want payload.indexOf('%PDF-') to test if it IS PDF?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am using the Magic number approach I would do in C specially if avoiding deep reads into data contents for performance reasons. Since we are talking JS here which is many abstraction layers above bare metal, I tend to be even less liberal with potentially costly checks. I am primarily a systems programmer so hopefully you can see a pattern here.
With that said, perhaps it is best I regex the input if PDF has a magic pattern at end of file so we ignore any potential malicious payloads at the boundaries and then verify here the payload starts with the magic number otherwise ignore it.
There's arguments in either direction, but I would rather detect the payload is a potentially well formed PDF (starts with the correct signature) and then let the browser object actually test if the PDF is valid. That way, I should have guarantees that the browser implementors have better ways to test this while being potentially more performant (the payload would be on the binary side of the JS engine at that point).
That's my thought process at least.
| * @param {string} suggested_mime Default MIME to use if we cannot identify the content's MIME | ||
| * @return string | ||
| */ | ||
| export function getPayloadType(payload: string, suggested_mime: string = 'text/plain') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add an options containing the suggestedMime (note lower camel case), and whether it should try scanning for html/pdf. It shouldn't automatically do it, since the mime type is supposed to be provided in the DICOM itself for encapsulated objects. You could then set a flag to only do the test if a certain customization was present. That customization could be a simple test function for the given type. That way other types can be recognized as needed, and no special types would need to be provided by default.
| * @param {string} mime MIME to add to Blob so other components can know how to handle contents. | ||
| * @return Blob | ||
| */ | ||
| export function stringToBlob(data: string, mime: string = payloadMIMEOptions.DEFAULT): Blob { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a shared external function. It gets repeated several times in the code so sharing it means any bugs in it could be fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, you are probably asking about a TypeScript specific thing. I am not understanding this request. The function is already exported from the payload module in utils. I thought I had all areas pointing to this function as an import. Is it that I left places accidentally duplicating code?
My brain cleans the code in passes and sometimes I forget to do one last pass.
| export function getCodeMeaningFromConceptNameCodeSequence( | ||
| conceptNameCodeSequence: ConceptNameCodeSequence | ||
| ): string { | ||
| let item: ConceptNameCodeSequenceItem = conceptNameCodeSequence[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const
| ): string { | ||
| let item: ConceptNameCodeSequenceItem = conceptNameCodeSequence[0]; | ||
| const { CodeMeaning } = item; | ||
| return CodeMeaning ?? ""; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not
{
return conceptNameCodeSequence?.[0]?.CodeMeaning || ""
}
Really, all you are doing is extracting path values, and a single extractor is easier to manage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just assumed I was done with cleaning up the code after some debugging and before PR which clearly it is not true. I will correct that.
Question: would the "compiler" elide the extra lines? In other words, do TS compilers optimize code (not just minimize) like a C or Rust compiler? I know TS is mostly linters on top of JS, but I am curious how far the compiler goes through and if the extra lines actually have effects on final package. Thank you for educating me on this one.
Let me get back to you on this. I believe I have access to purely fictitious examples that have not patient involvement. How should the notice be provided? What's the expected format so I can ask for proper permission? Thank you! |
…ted report contents. Basically, the viewer can now display HTML, Markdown, PDF, and Text reports with the appropriate page object. Added sanitize-html, Markdown, and @types/sanitize-html as dependencies. TODO: Add proper HTML sanitization options so we can preserve the core portions of the document but remove anything that can cause security issues. TODO: Hook the html sanitization routine into the component logic. Signed-off-by: Luis M. Santos <luis.santos2@nih.gov>
Adjusted the html extraction regex to ensure we capture html contents only. Signed-off-by: Luis M. Santos <luis.santos2@nih.gov>
Signed-off-by: Luis M. Santos <luis.santos2@nih.gov>
…evels of abstraction better. Signed-off-by: Luis M. Santos <luis.santos2@nih.gov>
… it was not meeting the standard specification of looking into the ConceptNameCodeSequence array. It assumed that the CodeMeaning field was at the array object level. Refactored code for clarity and added type definitions. Signed-off-by: Luis M. Santos <luis.santos2@nih.gov>
…codesequence node. This is the same bug as at the container level. Minor API refactor for clarity. Restored logic from initial commit since it handles the continuity of content which I typically don't and I want ot avoid breaking someone else's expectations. If current solution turns out to not be complete for embedded reports, I shall revisit this file. Signed-off-by: Luis M. Santos <luis.santos2@nih.gov>
Signed-off-by: Luis M. Santos <luis.santos2@nih.gov>
…ction for enforcing utf-8 encoding in payload. Updated the sanitizeHTML function to go through all steps necessary to end up with a clean HTML document. Signed-off-by: Luis M. Santos <luis.santos2@nih.gov>
…s to track the specified encoding type so that we can convert to utf-8 accordingly Signed-off-by: Luis M. Santos <luis.santos2@nih.gov>
…o that we can guarantee that PACS or VNA payloads are safe to render. Signed-off-by: Luis M. Santos <luis.santos2@nih.gov>
… render encapsulated reports correctly. Signed-off-by: Luis M. Santos <luis.santos2@nih.gov>
Signed-off-by: Luis M. Santos <luis.santos2@nih.gov>
Signed-off-by: Luis M. Santos <luis.santos2@nih.gov>
fe2c5c7 to
473ff77
Compare

Context
Fixes #5340
Requires #5293, #5327, and #5337 before merging.
Commits start August 15th.
This PR brings a React component that can handle SR encapsulated contents.
Currently, the viewer assumes all data in the SR tree are pieces of formatted text.
However, in my environment, I come across SR series that are encapsulated reports.
The encapsulated report is an HTML report in my case.
As part of the component bundle, I included a component for handling PDF if the encountered data is PDF.
The new components default to text like before.
Also, because of the danger of rendering HTML, I added a dependency on
sanitize-htmland take steps to clean the input HTML as much as possible. I make sure no scripts are allowed to execute when viewing HTML reports.A follow up PR or two will be needed to introduce:
dcmjslibrary once my PR (455) gets merged.Changes & Results
html-sanitizeto help clean up document.CodeMeaningelements in SR nodes.Before
After
Testing
E2E tests and test dataset will need to be introduced in a later PR because I need to figure out how to propose test data.
Also, the PR is getting much bigger than I feel comfortable and I think it will create more burden to the reviewer, so I am breaking this into a few chunks.
TODO: Finalize before and after testing with current test suites in the morning.
Checklist
PR
semantic-release format and guidelines.
Code
etc.)
Public Documentation Updates
additions or removals.
Tested Environment