JDFrame: A JVM‑Level DataFrame‑Like API for Simplified Java Stream Processing
This article introduces JDFrame/SDFrame, a Java library that provides a DataFrame‑style, semantic API for stream processing, covering quick start, dependency setup, extensive examples of filtering, aggregation, distinct, grouping, sorting, joining, and utility functions, along with Maven coordinates and source repository links.
The author presents JDFrame (and its counterpart SDFrame), a JVM‑level DataFrame‑style tool designed to make Java 8 stream operations more expressive and concise, especially for tasks that would otherwise require verbose stream code.
0. Introduction
Motivated by the difficulty of remembering long Stream APIs and the desire for a more semantic, DataFrame‑like approach (similar to Spark or Pandas), the author created a library that abstracts common stream operations into readable methods.
1. Quick Start
1.1 Add Dependency
<dependency>
<groupId>io.github.burukeyou</groupId>
<artifactId>jdframe</artifactId>
<version>0.0.2</version>
</dependency>1.2 Example
Calculate the total score of students aged 9‑16 for each school and retrieve the top‑2 schools.
static List<Student> studentList = new ArrayList<>();
// ... populate list ...
SDFrame<FI2<String, BigDecimal>> sdf2 = SDFrame.read(studentList)
.whereNotNull(Student::getAge)
.whereBetween(Student::getAge, 9, 16)
.groupBySum(Student::getSchool, Student::getScore)
.sortDesc(FI2::getC2)
.cutFirst(2);
sdf2.show();Output:
c1 c2
三中 10
二中 72. API Cases
2.1 Matrix Information
void show(int n); // print matrix info
List
columns(); // get header names
List
col(Function
function); // get a column
T head(); // first element
List
head(int n); // first n elements
T tail(); // last element
List
tail(int n); // last n elements2.2 Filtering
SDFrame.read(studentList)
.whereBetween(Student::getAge, 3, 6) // [3,6]
.whereBetweenR(Student::getAge, 3, 6) // (3,6]
.whereBetweenL(Student::getAge, 3, 6) // [3,6)
.whereNotNull(Student::getName)
.whereGt(Student::getAge, 3)
.whereGe(Student::getAge, 3)
.whereLt(Student::getAge, 3)
.whereIn(Student::getAge, Arrays.asList(3,7,8))
.whereNotIn(Student::getAge, Arrays.asList(3,7,8))
.whereEq(Student::getAge, 3)
.whereNotEq(Student::getAge, 3)
.whereLike(Student::getName, "jay")
.whereLikeLeft(Student::getName, "jay")
.whereLikeRight(Student::getName, "jay");2.3 Aggregation
JDFrame<Student> frame = JDFrame.read(studentList);
Student maxAgeStudent = frame.max(Student::getAge);
Integer maxAge = frame.maxValue(Student::getAge);
Student minAgeStudent = frame.min(Student::getAge);
Integer minAge = frame.minValue(Student::getAge);
BigDecimal avgAge = frame.avg(Student::getAge);
BigDecimal sumAge = frame.sum(Student::getAge);
MaxMin<Student> maxMinStudent = frame.maxMin(Student::getAge);
MaxMin<Integer> maxMinValue = frame.maxMinValue(Student::getAge);2.4 Distinct
Native streams only deduplicate whole objects; JDFrame adds field‑level distinct.
List<Student> distinct = SDFrame.read(studentList).distinct().toLists();
List<Student> distinctBySchool = SDFrame.read(studentList).distinct(Student::getSchool).toLists();
List<Student> distinctByComposite = SDFrame.read(studentList).distinct(e -> e.getSchool() + e.getLevel()).toLists();2.5 Simple Group‑by Aggregation
JDFrame<Student> frame = JDFrame.from(studentList);
List<FI2<String, BigDecimal>> sumBySchool = frame.groupBySum(Student::getSchool, Student::getAge).toLists();
List<FI2<String, Integer>> maxBySchool = frame.groupByMaxValue(Student::getSchool, Student::getAge).toLists();
List<FI2<String, Student>> maxObjBySchool = frame.groupByMax(Student::getSchool, Student::getAge).toLists();
List<FI2<String, Long>> countBySchool = frame.groupByCount(Student::getSchool).toLists();
List<FI2<String, BigDecimal>> avgBySchool = frame.groupByAvg(Student::getSchool, Student::getAge).toLists();
// multi‑level grouping examples omitted for brevity2.6 Sorting
SDFrame.read(studentList).sortDesc(Student::getAge);
SDFrame.read(studentList).sortDesc(Student::getAge).sortAsc(Student::getLevel);
SDFrame.read(studentList).sortAsc(Student::getAge);
SDFrame.read(studentList).sortAsc(Comparator.comparing(e -> e.getLevel() + e.getId()));2.7 Joining Matrices
API list includes append , union , join , leftJoin , rightJoin . Example of an inner join:
SDFrame<Student> sdf = SDFrame.read(studentList);
SDFrame<FI2<String, BigDecimal>> topSchools = /* same as earlier */;
UserInfo frame = sdf.join(topSchools,
(a,b) -> a.getSchool().equals(b.getC1()),
(a,b) -> {
UserInfo ui = new UserInfo();
ui.setKey1(a.getSchool());
ui.setKey2(b.getC2().intValue());
ui.setKey3(String.valueOf(a.getId()));
return ui;
});
frame.show(5);2.8 Other Utilities
Percentage conversion : SDFrame.read(list).mapPercent(Student::getScore, Student::setScore, 2)
Partition : split list into sub‑lists of a given size.
Generate sequence numbers and ranking numbers based on sorted order.
Replenish missing entries for dimensions such as schools or grades.
Final Notes
The library provides two frames: SDFrame (lazy, similar to native streams) and JDFrame (eager, operations take effect immediately). Choose SDFrame for simple one‑pass stream processing; use JDFrame when intermediate results are needed.
Source code: https://github.com/burukeYou/JDFrame
Maven coordinates: https://central.sonatype.com/artifact/io.github.burukeyou/jdframe
Java Architect Essentials
Committed to sharing quality articles and tutorials to help Java programmers progress from junior to mid-level to senior architect. We curate high-quality learning resources, interview questions, videos, and projects from across the internet to help you systematically improve your Java architecture skills. Follow and reply '1024' to get Java programming resources. Learn together, grow together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.