-
Notifications
You must be signed in to change notification settings - Fork 127
The performance of creating new datetime formats can be improved #585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
The performance of creating new datetime formats can be improved #585
Conversation
|
@dkhalanskyjb Hello! What do you think about this idea? |
|
Hi! This may help with the first stage (building a parser before the normalisation), but normalisation has quadratic complexity, too, and it wouldn't benefit from the proposed approach, as the lists themselves will also need to be reconstructed. We could extract the happy fast path where there are no adjacent numeric parser operations and simplify normalisation there. That is a common case, so it would be nice to provide brilliant performance there. I'm not yet convinced the common case can't be drastically improved by a better algorithm. |
|
Now that I think about it, it doesn't even fix the quadratic complexity of the initial stage. Concatenating Most parsers we concatenate are going to be single-element, so |
|
Yep, the initial stage also doesn't benefit from this. A quick run of a benchmark shows this: Before the change: After the change: Here, less is better (the numbers 5.2 and 7.8 show how long in milliseconds an operation takes). The benchmark itself is creating the datetime format used in Python: @Benchmark
fun buildFormat(blackhole: Blackhole) {
val v = LocalDateTime.Format {
year()
char('-')
monthNumber()
char('-')
day()
char(' ')
hour()
char(':')
minute()
optional {
char(':')
second()
optional {
char('.')
secondFraction()
}
}
}
blackhole.consume(v)
} |
|
I have avoided open class ParserStructureConcatBenchmark {
@Param("1", "2", "4", "8", "16", "32", "64", "128", "256", "512", "1024")
var n = 0
@Benchmark
fun largeSerialFormat(blackhole: Blackhole) {
val format = LocalDateTime.Format {
repeat(n) {
char('^')
monthNumber()
char('&')
day()
char('!')
hour()
char('$')
minute()
char('#')
second()
char('@')
}
}
blackhole.consume(format)
}
}
|
|
|
|
It's cool that the quadratic complexity can be mitigated in some scenarios, and I still believe that it's currently contributing significantly to the runtime we essentially observe. That said, the PR as a whole looks very much like a Pyrrhic victory to me, as the (Less is better here) Performance improvements are pointless if they negatively impact the actual use cases our users will encounter. Yes, the quadratic complexity was eliminated, but the increase in the resulting constant just seems too high. I propose that we add all the common formats we already provide as benchmarks ( |
…ation - Implement SerialFormatBenchmark to test repeated datetime format sequences. - Implement PythonDateTimeFormatBenchmark to evaluate Python-compatible datetime formats. - Implement ParallelFormatBenchmark to test creation of formats using nested and alternative parsing logic.
…internal datetime parsing logic.
…ethod definition
…tions` for consistency and clarity in `simplify` method logic
…ist construction and improved readability.
…me format! Benchmark Mode Cnt Score Error Units PythonDateTimeFormatBenchmark.buildPythonDateTimeFormat avgt 5 4142.002 ± 374.247 ns/op
…aner handling of operation merging and reduce duplication. Benchmark Mode Cnt Score Error Units PythonDateTimeFormatBenchmark.buildPythonDateTimeFormat avgt 5 3708.643 ± 29.908 ns/op
… usage by passing `ParserStructure` directly instead of separating operations and followedBy.
9639740 to
b6e80f8
Compare
…Structure` logic.
…struction and simplify `NumberSpanParserOperation` handling.
… and avoid duplication.
… benchmarking parameters, and add ISO DateTime format benchmark.
Current statecurrent branch master |
|
@dkhalanskyjb Hi! PR is ready for review. |
dkhalanskyjb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it actually ready? A new set of changes arrived after I've started reviewing the code. Please let me know when you're no longer actively working on this.
| } | ||
| firstOperation is NumberSpanParserOperation -> { | ||
| add(NumberSpanParserOperation(numberSpan + firstOperation.consumers)) | ||
| addAll(operationsToMerge.drop(1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drop(1) creates an additional new list, so I'd expect a manual iteration over the indices here instead.
| } else { | ||
| if (currentNumberSpan != null) { | ||
| newOperations.add(NumberSpanParserOperation(currentNumberSpan)) | ||
| currentNumberSpan = null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idea: it's possible to also empty out unconditionalModifications at this point, it won't affect the correctness, but may save some time down the line.
|
Now it's ready |
|
SerialFormatBenchmark results after accumulatedOperations optimization. Before: After: |
ParserStructure.append()withConcatenatedListViewfor improved efficiency.ConcatenatedListViewimplementation to lazily combine two lists without creating a new collection.