Human Motion Capture (MoCap) involves tracking and reco-rding human movements. Image-based MoCap systems have gained significant research interest, particularly for their simplified deployment that eliminates the requirement for reflective markers and wearable sensors. However, the deployment of such systems faces two major challenges: temporal misalignments caused by unsynchronized cameras and the sparsity of detectable features on human bodies. To address these limitations, we propose a marker-free MoCap system that is built on unsynchronized cameras. Our system introduces two crucial components: multi-view temporal post-processing and temporal augmentation training, which enable automatic temporal alignment across multiple cameras. To overcome feature sparsity, we define virtual markers as 3D surface points on the human mesh model. Experimental results validate the high performance of our system in global marker localization. The proposed approach presents a scalable and flexible solution for MoCap deployment in unconstrained environments, showing promise for applications in mobile health monitoring and sports training.