Skip to content

Conversation

@TheUnlocked
Copy link
Contributor

@TheUnlocked TheUnlocked commented Sep 26, 2025

Summary

While Mihon has an existing option to detect when an image is wide to split it in half, it currently has no feature to go the other way, detecting when two adjacent images are connected to merge them into a single wide image. This is a desirable feature because some sources only provide spreads pre-split. This PR attempts to implement this "spread fusion" by using image analysis techniques like edge detection to identify when two adjacent pages were originally connected so it can glue them back together.

Note: #2315 also attempts to address this kind of issue by showing pages in a side-by-side format in the pager viewer, but that does not help for webtoon-style viewers, and it also requires that the pages are properly aligned to make sure the left page doesn't show up on the right. The sort of approach taken in that PR could be complementary to this feature, but would not supersede it.

Images

w/o fusion w/ fusion w/ fusion + rotate
horiz. RTL pager w/ fusion
image
In-reader settings Main settings

Technical details

Detecting Spreads

Spread detection follows a special algorithm designed for this purpose and implemented (with comments) here. The algorithm is not perfect, but in my real-world testing on manga I've been reading, it has a fairly high success rate. It's tuned to favor false negatives over false positives, so occasionally you will come across a spread that is not merged, but hopefully you will never come across a non-spread which is merged. Full disclosure-- I am not a computer vision person and while I feel pretty good about its performance currently, I'm sure it could be made much more reliable if anyone wants to contribute their expertise in the field.

The reason that the detection functionality is implemented as a separate library is not because it necessarily has to be. In fact, I originally implemented it to be part of this PR (see VisionUtil.kt in an intermediate commit). The problem is that this feature relies on OpenCV, and importing OpenCV as a package brings along a huge .so file containing the entire OpenCV library, blowing up the final APK size. The only way* to get around this is to compile native code against the OpenCV SDK directly, which would add significant overhead to the build process for Mihon. Additionally, in an effort to reduce the file size impact even more, manga-vision uses a custom build of the OpenCV SDK with certain hardware optimizations and other unnecessary features disabled. All of the setup, compilation, and deployment is automated using GitHub Actions at TheUnlocked/manga-vision.

Which direction it attempts to detect spreads in (i.e. first page on the left vs first page on the right) depends on the pager type in pager mode as well as the "invert split page placement" setting in all cases.

*Technically you could also achieve a much smaller size by compiling a custom version of the .so file without unnecessary functions while still using the Java API, but that doesn't resolve any of the project maintenance issues, and the .so is still larger than it is when compiling against the native object files directly.

Merging Spreads

Rather than make more ad-hoc changes to the page holders to handle spread fusion, this PR introduces a high-level concept of PageLoaderInterceptors, which can intercept the page loading process and modify page data as appropriate before (or after) it gets initially rendered. Existing features like wide page rotation and splitting can be migrated to this model in order to simplify the page holders, though that work is not included in this PR (splitting would be slightly more involved than rotating since it would need a new type of page state to indicate that a page needs to be inserted).

SpreadFusionInterceptor is the interceptor that handles detecting and merging spreads, and handles merging spreads by modifying the image in the first page and setting the second page's status to a new state value, Skip. The handling for the Skip state in the webtoon reader is simple, just hiding the page holder when its ReaderPage has that status. For the pager reader, the approach is similar to the one used when splitting wide pages, and actually removes the skipped page from the adapter. While that works for this feature, it does mean that, like with inserted pages, a skipped page cannot be un-skipped without reloading the reader, which may (or may not) be an issue for future features. However, that bridge can be crossed later.

In earlier iterations of this feature, I tried utilizing the capabilities of the interceptor more by only marking a page as Skip or Ready once it had already been checked against its neighboring pages. While this worked alright for local chapters, it added significant latency when pages had to be downloaded, and overall degraded the reading experience. The implementation in this PR eagerly renders each page as it loads, and retroactively tries to merge it later once its neighbors have also loaded.

Image Pipeline

Previously image data streams and buffers were used to store intermediate image data between processing steps, for example when splitting or rotating wide pages. That meant that bitmap data had to be read from an image format, processed, and then re-compressed to an image format every time it had to be modified, adding overhead and potentially introducing re-compression artifacts. The small number of cases where that would occur previously meant it was never really a big issue, but if several interceptors were used in the future, that could be an issue. To avoid this, ReaderPage has been enhanced to allow Bitmaps to be used directly as a backing data store (though all of it is still lazily loaded to avoid holding large objects in memory).

Additionally, many of the image processing operations that occurred (and still occur) inside of the page holders now use an ImageSource interface, which likewise can either be backed by a bitmap or by a buffer (though unlike the ReaderPage, these are the actual image data, not thunks to obtain the image data).

Misc. Changes

There are a few small issues which I incidentally encountered when working on this feature which I also resolved in this PR:

  • Previously there was an inconsistency where the "rotate wide pages" and "split wide pages" settings were mutually exclusive and would each unset the other in the main settings page, but would not in the in-reader settings. This has been fixed so the in-reader settings follows the main settings logic.
  • Some existing image processing steps used BitmapFactory.decodeStream when they should've used the ImageDecoder API which supports more image formats. This has been adjusted so that image processing still tries to use BitmapFactory.decodeStream first because it is significantly faster, but now falls back to ImageDecoder when that fails.
  • The way settings are loaded into viewers has been adjusted slightly to reduce unnecessary refreshes/updates.

In the main settings page, checking the setting to split wide pages or rotate wide pages unchecks the other. This behavior was not implemented in the settings sheet in the reader, meaning users could check both settings at the same time.
Currently the image post-processing pipeline involves lots of converting back and forth between bitmaps and streams. This change abstracts the data storage into an `ImageSource` type, which can be backed either by a `BufferedSource` or by a `Bitmap`. This allows us to avoid conversion to a bitmap when there is no post-processing, while also letting us keep images as a bitmap through the entire pipeline when there are multiple post-processing steps.
Adds a system for intercepting page loading to create a more consistent way of transforming content before it gets rendered. This commit just includes the infrastructure, no existing features are converted to use interceptors and no new features are added utilizing interceptors.
The `Preference<T>.register` function in `ViewerConfig` has the following changes:
* `valueAssignment` is fired immediately on registration.
* `onChanged` is only fired when there's a change, and not shortly after registration.

Consumers have also been updated to account for this change in cases where they relied on or circumvented the old behavior.
Adds the capability for a page's status to be set to `Skip`, hiding it in the viewer. This functionality is not used in this commit, but is part of the groundwork for future features.
Adds a new config option to detect when two adjacent pages form a spread and automatically merge them into a single wide page. This option follows the same direction as the "split wide pages" option when deciding which edges to detect spreads with, including considering the "invert split page placement" option. This option is supported in both the pager and webtoon viewer.

The detection algorithm is described and implemented in `VisionUtil` and adds a dependency on OpenCV to efficiently perform operations like edge detection and convolution.

Also included is a minor fix to WebtoonViewer.refreshAdapter(), which previously did not refresh properly before the viewer had loaded.
Including the entire opencv .so file significantly increases the final APK size. manga-vision uses opencv internally, but uses a custom build for minimal file size, and only includes the functions which are necessary for spread detection.

Simplify use of SpreadDetector

Update SpreadFusionInterceptor.kt
Because the proxy pages have their lifetime tied to the lifetime of the chapter (via PageLoaderInterceptorManager), bitmaps stored in them will likewise have their lifetime bound to the chapter. While those bitmaps should eventually get cleaned up when the chapter is unloaded, in the mean time they can create significant strain on memory if there are a large number of spreads involving large images, and if more image-altering interceptors are added in the future, it would only get worse. By creating bitmaps on-demand like is done with streams, we can avoid this leakage, while still keeping the benefits of not needing to convert back and forth to/from streams during every processing step.
The earlier approach favors correctness by not displaying pages until it's sure of whether a page is a spread or not. This adds significant latency to page loading however, and relies on preloading to work correctly. Eagerly assuming that pages are not spreads and letting them render when adjacent pages have not yet loaded will provide a worse experience when there are a large number of split spreads, but a significantly better experience when there are only a few, which is by far the more typical case.
@TheUnlocked
Copy link
Contributor Author

To address the issue of the large binary size, I spent the last week or so researching OpenCV's source code and developing a custom implementation of every behavior I used OpenCV for (with the exception of gaussian blur, where I utilized a library that has an optimized implementation). The result is that with the latest version of manga-vision, instead of adding ~13MB to the universal APK size, it adds <1.5MB.

Additionally, as an unexpected but welcome benefit from this change, the overall spread detection performance is ~3x faster in rough e2e testing. My guess is that this is due to reduced overhead from not needing to convert to and from OpenCV data structures (It's not because my naive algorithm implementations are faster than their SIMD-optimized implementations--I've benchmarked them and they're not).

@AntsyLich
Copy link
Member

Now we're talking! But uh I would potentially like to have the repo under @mihon if you don't mind.

@TheUnlocked
Copy link
Contributor Author

TheUnlocked commented Oct 9, 2025

That's completely fine, but you'll need to handle deployment to Maven Central (or whatever other repo you want) on your own.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants