FilenameUtils.getName function analysis

1. Background
Recently, the org.apache.commons.io.FilenameUtils#getName method is used, which can pass in the file path and get the file name.
After a brief look at the source code, although it is not complicated, it is slightly different from my own idea, and it is worth learning. This article briefly analyzes it.
insert image description here

2. Source code analysis
org.apache.commons.io.FilenameUtils#getName

/**
* Gets the name minus the path from a full fileName.
*


* This method will handle a file in either Unix or Windows format.
* The text after the last forward or backslash is returned.
*


* a/b/c.txt --> c.txt
* a.txt --> a.txt
* a/b/c --> c
* a/b/c/ --> ""
*

*


* The output will be the same irrespective of the machine that the code is running on.
*
* @param fileName the fileName to query, null returns null
* @return the name of the file without the path, or an empty string if none exists.
* Null bytes inside string will be removed
*/
public static String getName(final String fileName) {
// Pass in null and return null directly
if (fileName == null) {
return null;
}

// NonNul check
requireNonNullChars(fileName);

// find the last delimiter
final int index = indexOfLastSeparator(fileName);

// steal from the last delimiter to the end
return fileName.substring(index + 1);
}
2.1 Question 1: Why is NonNul check required?
2.1.1 How to check?
org.apache.commons.io.FilenameUtils#requireNonNullChars

/**
* Checks the input for null bytes, a sign of unsanitized data being passed to to file level functions.
*
* This may be used for poison byte attacks.
*
* @param path the path to check
*/
private static void requireNonNullChars(final String path) {
if (path. indexOf(0) >= 0) {
throw new IllegalArgumentException("Null byte present in file/path name. There are no "
+ "known legitimate use cases for such data, but several injection attacks may use it");
}
}
java.lang.String#indexOf(int) source code:

/**
* Returns the index within this string of the first occurrence of
* the specified character. If a character with value
* {@code ch} occurs in the character sequence represented by
* this {@code String} object, then the index (in Unicode
* code units) of the first such occurrence is returned. For
* values of {@code ch} in the range from 0 to 0xFFFF
* (inclusive), this is the smallest value k such that:
*


* this.charAt(k) == ch
*

* is true. For other values of {@code ch}, it is the
* smallest value k such that:
*

* this.codePointAt(k) == ch
*

* is true. In either case, if no such character occurs in this
* string, then {@code -1} is returned.
*
* @param ch a character (Unicode code point).
* @return the index of the first occurrence of the character in the
* character sequence represented by this object, or
* {@code -1} if the character does not occur.
*/
public int indexOf(int ch) {
return indexOf(ch, 0);
}
It can be seen that the purpose of indexOf(0) is to find the position of the character whose ASCII code is 0, and if found, an IllegalArgumentException is thrown.
Searching the ASCII comparison table, I learned that the ASCII value of 0 represents the control character NUT, which is not a character that should be included in a regular file name.
insert image description here

2.1.2 Why is this inspection done?
A null byte is a byte with a value of 0, such as 0x00 in hexadecimal.
There is a security vulnerability related to null bytes.
Because the null byte is used as a string terminator in C language, but other languages (Java, PHP, etc.) do not have this string terminator;
For example, Java Web projects only allow users to upload pictures in .jpg format, but this vulnerability can be used to upload .jsp files.
Some programming languages do not allow the use of ·· in filenames
, if the programming language you use does not handle this, you need to handle it yourself.
Therefore, this check is necessary.

Code example:

package org.example;

import org.apache.commons.io.FilenameUtils;

public class FilenameDemo {
public static void main(String[] args) {
System.out.println( FilenameUtils.getName(filename));
}
}
Error message:

Exception in thread "main" java.lang.IllegalArgumentException: Null byte present in file/path name. There are no known legitimate use cases for such data, but several injection attacks may use it
at org.apache.commons.io.FilenameUtils.requireNonNullChars(FilenameUtils.java:998)
at org.apache.commons.io.FilenameUtils.getName(FilenameUtils.java:984)
at org.example.FilenameDemo.main(FilenameDemo.java:8)
If the validation is removed:

package org.example;

import org.apache.commons.io.FilenameUtils;

public class FilenameDemo {
public static void main(String[] args) {

// do not add validation
String name = getName(filename);

// Get the extension name
String extension = FilenameUtils. getExtension(name);
System.out.println(extension);
}

public static String getName(final String fileName) {
if (fileName == null) {
return null;
}
final int index = FilenameUtils. indexOfLastSeparator(fileName);
return fileName.substring(index + 1);
}
}
Java does recognize the extension as jpg

If you are interested, you can try to use C language to write the file named lse if (delta > 0) {
// We don't really know how many new threads are "needed".
// As a heuristic, prestart enough new workers (up to new
// core size) to handle the current number of tasks in
// queue, but stop if queue becomes empty while doing so.
int k = Math. min(delta, workQueue. size());
while (k-- > 0 && addWorker(null, true)) {
if (workQueue. isEmpty())
break;
}
}
}
(3) In daily business development, it is highly recommended to put relevant documents and configuration page links in comments, which greatly facilitates later maintenance.
like:

/**
* certain function
*
* Related documents:
* Design Document
* Three-party API address
*/
public void demo(){
// omitted

}
(4) For the tool class, it can be considered to give the output corresponding to the common input.
Such as org.apache.commons.lang3.StringUtils#center(java.lang.String, int, char)

/**
*

Centers a String in a larger String of size {@code size}.
* Uses a supplied character as the value to pad the String with.


*
*

If the size is less than the String length, the String is returned.
* A {@code null} String returns {@code null}.
* A negative size is treated as zero.


*
*

* StringUtils. center(null, *, *) = null
* StringUtils. center("", 4, ' ') = " "
* StringUtils. center("ab", -1, ' ') = "ab"
* StringUtils. center("ab", 4, ' ') = " ab "
* StringUtils. center("abcd", 2, ' ') = "abcd"
* StringUtils. center("a", 4, ' ') = " a "
* StringUtils. center("a", 4, 'y') = "yayy"
*

*
* @param str the String to center, may be null
* @param size the int size of new String, negative treated as zero
* @param padChar the character to pad the new String with
* @return centered String, {@code null} if null String input
* @since 2.0
*/
public static String center(String str, final int size, final char padChar) {
if (str == null || size <= 0) {
return str;
}
final int strLen = str. length();
final int pads = size - strLen;
if (pads <= 0) {
return str;
}
str = leftPad(str, strLen + pads / 2, padChar);
str = rightPad(str, size, padChar);
return str;
}
(5) For the discarded method, the reason for discarding must be indicated and an alternative plan must be given.
Such as: java.security.Signature#setParameter(java.lang.String, java.lang.Object)

/**
* omitted part
*
* @see #getParameter
*
* @deprecated Use
* {@link #setParameter(java.security.spec.AlgorithmParameterSpec)
* setParameter}.
*/
@Deprecated
public final void setParameter(String param, Object value)
throws InvalidParameterException {
engineSetParameter(param, value);
}
Four. Summary
The code design of many excellent open source projects is very rigorous, and often simple code also contains careful thinking.
When we have time, we can look at some excellent open source projects. We can start from the simple ones. We can first think about how to implement it if we write it ourselves, and then compare it with the author's implementation ideas, and we will gain more.
When you usually look at the source code, you not only need to know how the source code looks like this, but also understand why it is designed this way.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us