Java duplication detection

It looks like Codacy uses PMD CPD for Java duplication checks. PMD CPD allows for CLI arguments to further configure how it checks for duplicate code.

How are these values handled by Codacy? Does it use the defaults of the CPD CLI? Are they configurable?

Hello @doug-papenthien-by,

We’re currently in the process of documenting how Codacy allows you to configure a few PMD CPD options through the Codacy configuration file.

Please check the following pull request that includes these updates, it’s still under review though:

1 Like

Thanks, Paulo. This documentation would definitely help in the future.

The reason I brought this up is because we recently had a pull request that was reporting duplication that only existed if literals and variable names were being ignored. I saw in the PMD CPD documentation that ignore-literals and ignore-identifiers is set to false so differences in these values should have resulted in no duplication on the aforementioned PR.

So I was trying to determine if codacy overwrote these defaults. The PR you linked would seem to indicate it does NOT modify these defaults. The code I saw in GitHub for Codacy was a bit inconclusive. While this block of code appears to default them to true, it’s tough to follow it to the caller to see if default options (for false) are being supplied elsewhere.

Can you confirm that codacy uses the same defaults as documented by PMD CPD CLI? Is there a way to view more details logs for this step of the analysis to confirm the results and aid in troubleshooting further?

Thanks for the feedback @doug-papenthien-by.

By default, Codacy actually sets a few PMD CPD flags to true, namely the --ignore-literals and --ignore-identifiers. However, you can use the Codacy configuration file to force these flags back to false (which are the PMD CPD defaults that you mention).

The new documentation is already live, and includes a small change to try to clarify better that you only need to use the Codacy configuration file if you want to disable any of the options that Codacy enables by default:

https://docs.codacy.com/repositories-configure/codacy-configuration-file/#pmd-cpd-duplication

I tried this out however, the results were not as I expected. Before reconfiguring codacy, I downloaded the version of PMD we’re using and leverage the CPD GUI to confirm that - when ignoring identifiers - the tool reported the same duplication results as found in codacy. When I toggled that flag, the duplicates went away. However, after applying the same change to the codacy.yml file, the duplicates are still showing up.

Is there some way to get more information from codacy on what’s happening here?

Hello @doug-papenthien-by,

This is just to let you know that I’m checking this behavior internally and I’ll give you an update as soon as I have new info. :+1:

We were able to reproduce the behavior you’re reporting, @doug-papenthien-by. Codacy currently fails to read and apply the options specified on the Codacy configuration file when checking for duplication using PMD CPD.

One way you can have more detail about the analysis that Codacy performs is by running the Codacy Analysis CLI with the --verbose flag:

$ cat .codacy.yml 
---
engines:
  duplication:
    minTokenMatch: 10
    ignoreIdentifiers: false

$ codacy-analysis-cli analyze --verbose

[...]

I’m sorry that this solution isn’t working, and I thank you for the time you spent testing this and providing feedback. I have created a ticket for our Engineering team to investigate further (internal reference CY-5184) and I’m going to hide the new section from the documentation. I will share an update whenever I have more information on this.

1 Like

Thanks for the follow-up. Please let us know when this has been corrected and we’ll be happy to test it out again.

1 Like